# LLM powered Search Application with Ollama & LangChain

## Keyword based Seaching & Summerization of medium articles using Langchain and ollama

### Installation of required packages

In [None]:
%pip install beautifulsoup4
%pip install langchain
%pip install langchain-google-community
%pip install langchain-ollama
%pip install python-dotenv
%pip install tqdm
%pip install pandas

### Import Required Packages

In [None]:
import requests
from bs4 import BeautifulSoup

from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

from langchain_core.tools import Tool
from langchain_google_community import GoogleSearchAPIWrapper

from tqdm import tqdm


### Load environment variable which are needed in the notebook.
Needed env variables: GOOGLE_CSE_ID & GOOGLE_API_KEY to connect to google api for searching medium articles.

In [None]:
from dotenv import load_dotenv
load_dotenv()

The below function extracts metadata such as **auther details**, **photo**, **title of the article**, **followers**, **read time of the article** etc from the scrapped medium article.

In [None]:
def extract_metadata(soup):
    metadata={
        "read_time" : soup.select_one("span[data-testid=storyReadTime]").text,
        "published_date": soup.select_one("span[data-testid=storyPublishDate]").text,
        "author" : soup.select_one("a[data-testid=authorName]").text,
        "title": soup.find('meta', {"property": "og:title"})['content'],
        "author_url" : soup.find('meta', {"property": "article:author"})['content'],
        "followers": soup.select_one("span[class='pw-follower-count bf b bg z bk']").text,
        "author_image": soup.select_one("img[data-testid=authorPhoto]")['src']
    }
    return metadata


Next cell creates a prompt template using the **ChatPromptTemplate** module of LangChain fro summerizing the medium articles scrapped from the internet.<br>
It also uses the **chaining technique of LangChain** to combine the prompt template with the LLM model.<br><br>
Ensure that Ollama is running on your machine. If it's not running, you can start it by Ollama on the terminal.

In [None]:
prompt='''
You are an expert in summerizing a article from web. 
your job to precisely summarize the given article in an easy and readable way. 
Remove any coding examples or code blocks or links in the article. 
Do not put any extra comment or try to explain anything.

article:
{input}

output: 

'''
promptTemplate = ChatPromptTemplate.from_template(prompt)
model = OllamaLLM(model="llama3.2")
chain = promptTemplate | model


In below cell, will utilize **BeautifulSoup** to parse the URLs obtained from the search. <br><br> The code scrapes Medium URLs from Google search results and extracts metadata such as author details, title, follower count, read time, etc. It also generates a summary of the article using a locally running **LLaMA** model via **Ollama**. <br><br>Finally, all extracted data and the summary are compiled into a structured JSON response.

In [None]:
def extract_data(urls):
    extracted_response = []
    with tqdm(total=len(urls)) as pbar:
        for url in urls:
            headers = {
                    "User-Agent": "Guest"
                }

            response = requests.get(url, headers=headers)

            if response.status_code == 200:    # if request granted
                soup = BeautifulSoup(response.content, 'html.parser')

            meatdata = extract_metadata(soup)
            meatdata["summary"] = chain.invoke({"input":soup.get_text()})
            extracted_response.append(meatdata)
            pbar.update(1)
    return extracted_response


Next cell uses the **GoogleSearchAPIWrapper** feature, which is a LangChain wrapper for performing Google searches. Simply send the query you want to search, and it will return the snippet, title, and link information.

In [None]:
def get_search_results(query, count=5):
    search = GoogleSearchAPIWrapper()

    def search_results(query):
        return search.results(query,count)

    tool = Tool(
        name="google_search",
        description="Search Google for recent results.",
        func=search_results
    )
    results = tool.run(query)
    for result in results:
        print(result)

    urls = [article["link"] for article in results] 
    return urls

### Finally, it's time to run the application and review the results !!!!!!

Make sure to have "site:medium.com" before the actual search keyword. This makes sure that it searches articles only on the medium site. Other it will search over entire Internet

In [None]:
query = "site:medium.com GenAI"


print("\n\nSearching Urls ......")
urls = get_search_results(query)

print("\n\nExtrating & summerizing the Urls ......")
response = extract_data(urls)


### Wrapping the results in pandas dataframe to visualize the results in structures tabular format


In [None]:
import pandas as pd
pd.DataFrame(response)

### Example of the summary generated by the local llama 3.2 llm model

In [None]:
print(response[0]['summary'])

# Hurray!!!!!!! we have sucessfully build an medium application