In [46]:
!pip install -qU \
  langchain-core \
  langchain-google-genai \
  langchain-community \
  scikit-image


In [2]:
import getpass
import os

if not os.environ.get("GOOGLE_API_KEY"):
  os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter API key for Google Gemini: ")



Enter API key for Google Gemini: ········


In [3]:
from langchain.chat_models import init_chat_model

llm = init_chat_model("gemini-2.0-flash", model_provider="google_genai", temperature=0.0)
creative_llm = init_chat_model("gemini-2.0-flash", model_provider="google_genai", temperature=0.9)

In [4]:
article = """

Introduction
The future of automation is with “Agents”. Whereas, in the business automations, there isn't an easy solution to get the functionality done. In the present day and age of intelligent automation, it is highly crucial to develop powerful platforms and tools. Such a vast combination is Bright Data, LangChain, and Google Gemini. Bright Data facilitates web-scale data extraction, LangChain facilitates developing advanced language models and chains, and Google Gemini provides premium summarization capabilities.

This blog post will take you through a real-world use case integrating these technologies to create an intelligent agent that can execute Google search queries via Bright Data, scrape Airbnb listings, and return summarized insights from the results. You will be demonstrated with the usage of LangChain to organize your workflow and Google Gemini for summarization so that the output is actionable, meaningful, and concise.

New users of Bright Data, please make sure to sign-up here - Bright Data

Who is this for?

This solution is ideal for:

• Data Engineers and Data Scientists: Wishing to create an intelligent agent that can gather, process, and abstract data from diverse sources.

• Developers: Wanting to incorporate APIs and create sophisticated applications with the help of LangChain and other third-party platforms such as Bright Data and Google Gemini.

• Business Analysts and Product Managers: Who wish to find means of deriving insights from two platforms (Google and Airbnb) and summarizing the data for faster decision-making.

What problem is this workflow solving?

Actionable information quickly is always critical. Traditional data extraction methods are often slow, error-prone, and require significant manual effort.

This workflow solves the following problems:

Web scraping complexity: Automating the extraction of data from websites like Google and Airbnb in a structured and scalable manner.
Search optimization: Refining Google search results and presenting them in a meaningful way, specifically for business applications like competitive analysis or market research.
Summarization: Aggregating data and providing concise summaries using advanced AI techniques to ensure that key insights are easily consumable.
What this workflow does

The core of this workflow is a LangChain agent that:

Performs a Google search using the Bright Data SERP API.
Scrapes Airbnb listings from a specific location using the Bright Data Web Unlocker.
Summarizes the results using Google Gemini.
Detailed Breakdown of the Process

Scraping Airbnb Listings: With a location input, the agent scrapes Airbnb listings using Bright Data Web Unlocker, which provides access to dynamic content such as property details (price, location, amenities).
Google Search via Bright Data SERP API: The workflow first sends a search query to Google using Bright Data’s SERP API. This API allows us to bypass search engine restrictions and retrieve organic search results (titles, snippets, URLs) for a given query.
Summarization with Google Gemini: Once the data is retrieved, Google Gemini is used to summarize the results. The model condenses the large set of information into a few concise points, allowing the user to quickly understand the key insights without having to read through every detail.
Setup

To set up this workflow, follow the steps below:

1. Install Required Libraries

Before getting started, ensure that you have all necessary libraries installed. You can use the following requirements.txt file to manage dependencies:

2. Set Up API Keys

Bright Data: You will need an API key for the Bright Data SERP API and Web Unlocker. Sign up for an account on Bright Data and retrieve your API keys from the dashboard.
Google Gemini: You need an API key to access the Google Gemini model. Set up the necessary authentication and obtain an API key from the Google Cloud Console.
3. Environment Variables

To securely store your API keys, use a .env file. This ensures that your credentials are not exposed in your codebase.

BRIGHTDATA_SERP_API_KEY=your_brightdata_api_key

BRIGHTDATA_BEARER_TOKEN=your_brightdata_bearer_token

GOOGLE_API_KEY=your_google_api_key

GOOGLE_GEMINI_MODEL_NAME=your_google_gemini_model_name

4. Write the LangChain Agent

Now that all dependencies are in place, let's write the LangChain agent. This agent will interact with Bright Data, scrape the necessary information, and pass it through Google Gemini for summarization.

Code for the LangChain Agent:

Source Code LangChain-BrightData-Agent

Here’s the crucial agent implementation which utilizes the Bright Data, Airbnb and Google Gemini providers.

from dotenv import load_dotenv
from langchain.agents import initialize_agent, Tool

from tools.google_search import GoogleSearchTool
from tools.airbnb import AirbnbTool
from gemini_summary import summarize_with_gemini
from llm import GeminiLLM

load_dotenv()

llm = GeminiLLM()

tools = [
    Tool.from_function(func=GoogleSearchTool(), name="Google Search", description="Search Google for answers"),
    Tool.from_function(func=AirbnbTool(), name="Airbnb Search", description="Search Airbnb for listings")
]

agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

if __name__ == "__main__":
    query = "Find Airbnb listings in New York and summarize Google reviews about staying there."
    try:
        result = agent.run(query)
        summary = summarize_with_gemini(result)
        print("\n===== RAW RESULT =====")
        print(result)
        print("\n===== SUMMARY =====")
        print(summary)
    except Exception as e:
        print(f"[Agent Error] {e}")
How to Customize this Workflow to Your Needs

Modify Search Queries: You can change the search query by modifying the query variable. This allows the agent to search for different topics, products, or services.
Summarization Settings: If you want a different style or level of detail for the summaries, update the gemini_summary.py with the prompt template for summarization.
Add More Tools: LangChain allows for easily adding new tools. You can integrate other web scraping tools, APIs, or services to extend this agent's capabilities.
Source Code

Here's the Source Code LangChain-BrightData-Agent

Conclusion

By integrating Bright Data, LangChain, and Google Gemini, you can build an intelligent agent that efficiently scrapes data, processes it, and gives insightful answers in the form of summaries. This process can be further tailored to adapt to various applications, including competitive research, market analysis, or vacation planning.

Content Credits - This blog-post contents were formatted with ChatGPT to make it more professional and produce a polished content for the targeted audience.
"""

In [5]:
from langchain.prompts import (
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate
)

In [6]:
system_prompt = SystemMessagePromptTemplate.from_template(
    "You are an AI assistant that helps generate article titles"
)

user_prompt = HumanMessagePromptTemplate.from_template(
    """
    You are tasked with creating a name for an article.
    The article here is for you to examine 
    
    ---
    
    {article}
    
    ---
    The name should be based on the context of the article.
    Be creative, but make sure names are clear, catchy,
    and relevant to the theme of the article.
    
    Only output the article name, no other explaination or 
    text can be provided.   
    
    """,
        input_variables = ["article"]
)

In [7]:
user_prompt.format(article = "TEST String").content

'\n    You are tasked with creating a name for an article.\n    The article here is for you to examine \n    \n    ---\n    \n    TEST String\n    \n    ---\n    The name should be based on the context of the article.\n    Be creative, but make sure names are clear, catchy,\n    and relevant to the theme of the article.\n    \n    Only output the article name, no other explaination or \n    text can be provided.   \n    \n    '

In [8]:
print(user_prompt.format(article = "TEST String").content)


    You are tasked with creating a name for an article.
    The article here is for you to examine 
    
    ---
    
    TEST String
    
    ---
    The name should be based on the context of the article.
    Be creative, but make sure names are clear, catchy,
    and relevant to the theme of the article.
    
    Only output the article name, no other explaination or 
    text can be provided.   
    
    


In [9]:
from langchain.prompts import ChatPromptTemplate

In [10]:
first_prompt = ChatPromptTemplate.from_messages([system_prompt, user_prompt])

In [11]:
print(first_prompt.format(article = "TEST STRING"))

System: You are an AI assistant that helps generate article titles
Human: 
    You are tasked with creating a name for an article.
    The article here is for you to examine 
    
    ---
    
    TEST STRING
    
    ---
    The name should be based on the context of the article.
    Be creative, but make sure names are clear, catchy,
    and relevant to the theme of the article.
    
    Only output the article name, no other explaination or 
    text can be provided.   
    
    


In [12]:
chain_one = (
    {"article": lambda x: x["article"]}
    | first_prompt
    | creative_llm
    | { "article_title": lambda x: x.content}
)

In [13]:
article_title_msg = chain_one.invoke({
    "article": article
})
article_title_msg

{'article_title': 'Building an Intelligent Agent: Bright Data, LangChain, and Google Gemini for Data Extraction and Summarization'}

In [14]:
second_user_prompt = HumanMessagePromptTemplate.from_template(
    """
    You are tasked with creating a description for
    the article. The article is here for you to examine.
    
    ---
    {article}
    ---
    Here is the article title {article_title}
    
    Output the SEO freindly article description. Make sure don't exceed
    200 characters. Do not output
    anything other than the description.
    
    """,
    input_variables = ["article", "article_title"]
)

second_prompt = ChatPromptTemplate.from_messages([
    system_prompt,
    second_user_prompt
])

In [15]:
print(second_prompt.format(article = "TEST STRING", article_title = "TEST Title"))

System: You are an AI assistant that helps generate article titles
Human: 
    You are tasked with creating a description for
    the article. The article is here for you to examine.
    
    ---
    TEST STRING
    ---
    Here is the article title TEST Title
    
    Output the SEO freindly article description. Make sure don't exceed
    200 characters. Do not output
    anything other than the description.
    
    


In [16]:
chain_two = (
    {"article": lambda x: x["article"],
     "article_title": lambda x: x["article_title"]}
    | second_prompt
    | creative_llm
    | { "summary": lambda x: x.content}
)

In [17]:
article_description_msg = chain_two.invoke({
    "article": article,
    "article_title": article_title_msg
})
article_description_msg

{'summary': 'Automate data extraction & summarization with Bright Data, LangChain, & Google Gemini. Build an intelligent agent for insights from web data!'}

In [18]:
chain_two_llm = (
    {"article": lambda x: x["article"],
     "article_title": lambda x: x["article_title"]}
    | second_prompt
    | llm
    | { "summary": lambda x: x.content}
)

In [19]:
article_description_msg_llm = chain_two.invoke({
    "article": article,
    "article_title": article_title_msg
})
article_description_msg_llm

{'summary': 'Automate data extraction & summarization with Bright Data, LangChain, & Google Gemini. Build an intelligent agent for insights from web data!'}

In [20]:
third_user_prompt = HumanMessagePromptTemplate.from_template(
"""
    You are tasked with creating a paragraph from the article. The
    article for you to examine:
    ---
    {article}
    ---
    
    Choose one paragraph to review and edit. During your edit ensure you provide
    constructive feedback to the user, so they can learn where to improve their writing.
""", input_variables = ["article"]
)

In [21]:
third_prompt = ChatPromptTemplate.from_messages([
    system_prompt,
    third_user_prompt
])

In [31]:
from pydantic import BaseModel, Field

class Paragraph(BaseModel):
    original_paragraph: str = Field(description = "The original paragraph")
    edited_paragraph: str = Field(description = "The improved edited paragraph")
    feedback: str = Field(description = "The constructive feedback")
    feedback_score: int = Field(description = "The constructive feedback score")

structured_llm = creative_llm.with_structured_output(Paragraph)

In [32]:
chain_three = (
    { "article": lambda x: x["article"]}
    | third_prompt
    | structured_llm
    | {
        "original_paragraph": lambda x: x.original_paragraph,
        "edited_paragraph": lambda x: x.edited_paragraph,
        "feedback": lambda x: x.feedback,
        "feedback_score": lambda x: x.feedback_score,
    }
    
)

In [33]:
out = chain_three.invoke({"article": article})
print(out)

{'original_paragraph': "Whereas, in the business automations, there isn't an easy solution to get the functionality done. In the present day and age of intelligent automation, it is highly crucial to develop powerful platforms and tools. Such a vast combination is Bright Data, LangChain, and Google Gemini. Bright Data facilitates web-scale data extraction, LangChain facilitates developing advanced language models and chains, and Google Gemini provides premium summarization capabilities.", 'edited_paragraph': "In the realm of business automation, a straightforward solution for achieving desired functionality is often elusive. However, the convergence of Bright Data, LangChain, and Google Gemini offers a powerful combination in today's era of intelligent automation. Bright Data excels in facilitating web-scale data extraction, LangChain enables the development of advanced language models and chains, and Google Gemini provides premium summarization capabilities.", 'feedback': 'The revised

In [42]:
from langchain_community.utilities.dalle_image_generator import DallEAPIWrapper
from langchain_core.prompts import PromptTemplate

image_prompt = PromptTemplate(
    input_variables = ["article"],
    template = (
    """
    Generate a prompt with less than 500 characters to generate an image
    based on following article
    ---
    {article}
    ---
    """
               )
)



In [47]:
from skimage import io
import matplotlib.pyplot as plt

from langchain_core.runnables import RunnableLambda

def generate_and_display_image(image_prompt):
    print(image_prompt)
    
    image_url = DallEAPIWrapper().run(image_prompt)
    image_date = io.imread(image_url)
    
    plt.imshow(image_data)
    plt.axis("off")
    plt.show
    

img_gen_runnable = RunnableLambda(generate_and_display_image)



In [49]:
chain_four = (
    {"article": lambda x: x["article"]}
    | image_prompt
    | llm
    | (lambda x: x.content)
    | img_gen_runnable
)

In [50]:
chain_four.invoke({"article": article})

AI agent scraping Airbnb listings and summarizing Google reviews, powered by Bright Data, LangChain, and Google Gemini.


ImportError: Could not import openai python package. Please install it with `pip install openai`.