
# Google Trends and Search API Application

This Jupyter notebook demonstrates the use of the Google Search API and Google Trends analysis.
It includes functionalities such as:

Section 1: Google Trends Analysis
- Visualizing search trends over different timeframes
- Comparing trends for multiple keywords
- Displaying trending regions for specific keywords

Section 2: Google Search with Filters
- Performing Google searches with various filters
- Summarizing text from search results

**Note:** Summarization uses a [**facebook/bart-large-cnn**](https://huggingface.co/facebook/bart-large-cnn) model from Hugging Face and may take a few seconds per request.

Each feature will be shown in individual sections with examples for easy understanding.



## 1. Google Trends Analysis

This section demonstrates how to display Google Trends data for a given keyword over multiple timeframes.


In [1]:
from pytrends.request import TrendReq
import plotly.graph_objects as go
import plotly.express as px

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

def show_trends(keywords, timeframes=['today 1-m', 'today 3-m', 'today 12-m', 'today 5-y']):
    """
    Display Google Trends data for given keywords across multiple timeframes.

    Args:
        keywords (list): List of up to three keywords to display trends for (e.g., ["bitcoin", "ethereum", "dogecoin"]).
        timeframes (list): List of timeframes to display trends for (e.g., 'today 1-m', 'today 5-y'). Default is 
                          ['today 1-m', 'today 3-m', 'today 12-m', 'today 5-y'].
                          - 'today 1-m': Data for the past month.
                          - 'today 3-m': Data for the past three months.
                          - 'today 12-m': Data for the past year.
                          - 'today 5-y': Data for the past five years.

    Returns:
        None: Displays the trend visualization using Plotly.
    """
    if len(keywords) == 0:
        print("Please provide at least one keyword.")
        return

    # Initialize pytrends request
    pytrends = TrendReq(hl='en-US', tz=360)

    # Store trends data
    trends_data = {}

    # Loop through each timeframe and get interest over time
    for timeframe in timeframes:
        pytrends.build_payload(keywords, timeframe=timeframe)
        data = pytrends.interest_over_time()
        if not data.empty:
            trends_data[timeframe] = data[keywords]

    # Plot data using Plotly for interactive visualization
    for timeframe, data in trends_data.items():
        print(f"Timeframe: {timeframe}")
        fig = go.Figure()
        for keyword in keywords:
            fig.add_trace(go.Scatter(x=data.index, y=data[keyword], mode='lines', name=f'{keyword}'))

        fig.update_layout(
            title=f'Google Trends for Keywords: {", ".join(keywords)} - Timeframe: {timeframe}',
            xaxis_title='Date',
            yaxis_title='Interest over time',
            legend_title='Keywords',
            template='plotly_white'
        )
        fig.show()

show trends by running the following code:

In [2]:

# Example: Show trends for the keyword "Python programming" and compare it with "Machine Learning" and "Data Science"
show_trends(["LLM", "Machine Learning", "Deep learning"])


Timeframe: today 1-m


Timeframe: today 3-m


Timeframe: today 12-m


Timeframe: today 5-y



### Regional Interest Map

This section demonstrates how to visualize the regions where a keyword is trending on a geographic map.


In [3]:

def show_trending_regions(keyword, timeframe='today 12-m'):
    """
    Visualize the regions where the keyword is trending on a geographic map.

    Args:
        keyword (str): The keyword to display regional interest for (e.g., "bitcoin").
        timeframe (str): Timeframe to display trends for (e.g., 'today 1-m', 'today 5-y'). Default is 'today 12-m'.

    Returns:
        None: Displays a geographical visualization using Plotly.
    """
    # Initialize pytrends request
    pytrends = TrendReq(hl='en-US', tz=360)
    
    # Get regional interest data
    pytrends.build_payload([keyword], timeframe=timeframe)
    data = pytrends.interest_by_region(resolution='COUNTRY', inc_low_vol=True, inc_geo_code=False)
    
    if not data.empty:
        data.reset_index(inplace=True)
        fig = px.choropleth(data, locations='geoName', locationmode='country names', color=keyword,
                            title=f'Regional Interest for "{keyword}"',
                            labels={keyword: 'Interest'},
                            template='plotly_white')
        fig.update_layout(
            geo=dict(showframe=False, showcoastlines=False, projection_type='equirectangular')
        )
        fig.show()
    else:
        print("No regional data available for the given keyword and timeframe.")


In [4]:

# Example: Show trending regions for the keyword "Python programming"
show_trending_regions("LLM", "today 12-m")



## 2. Google Search with Filters, and Summarization Pipeline

This section demonstrates how to perform a Google search with various filters like date range, content type, and domain filtering.


In [5]:
import os
import requests
from datetime import datetime
from bs4 import BeautifulSoup
import pandas as pd
from transformers import pipeline


# Initialize summarization pipeline
# Using the Facebook BART large CNN model for text summarization.
# More details about the model: https://huggingface.co/facebook/bart-large-cnn
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

def google_search(query, api_key, cse_id, num=10, date_range="All time", content_type=None, domain_filter=None):
    """
    Perform a Google Custom Search with optional date range, content type, and domain filtering.

    Args:
        query (str): Search query string.
        api_key (str): Google JSON API key (must be obtained from Google Cloud Console).
        cse_id (str): Custom Search Engine (CSE) ID (created via Google Custom Search). 
        num (int): Number of results per request (max 10).
        date_range (str): Date range for the search (e.g., "Today", "This week", "This month", "Last three months", "This year", "Last 5 years", "All time").
        content_type (str): Type of content to filter (e.g., "image" to retrieve images).
        domain_filter (str): Domain to include or exclude (e.g., "site:wikipedia.org" to include Wikipedia or "-site:example.com" to exclude specific domains).

    Returns:
        dict: JSON response from the Google Custom Search API.
    """
    # Base URL for Google Custom Search API
    url = "https://www.googleapis.com/customsearch/v1"
    
    # Parameters for the search request
    params = {
        "key": api_key,
        "cx": cse_id,
        "q": query,
        "num": num  # Number of results per request (max 10)
    }

    # Define dateRestrict based on user-specified date range
    date_ranges = {
        "Today": "d1",             # Last day
        "This week": "w1",         # Last week
        "This month": "m1",        # Last month
        "Last three months": "m3", # Last three months
        "This year": "y1",         # Last year
        "Last 5 years": "y5",
        "All time": None            # No restriction
    }

    # Set date restriction if specified
    date_restrict = date_ranges.get(date_range)
    if date_restrict:
        params["dateRestrict"] = date_restrict
    
    # If sorting by date, add "sort" parameter to ensure recent results are prioritized
    if date_range != "All time":
        params["sort"] = "date:r"  # Sort by most recent if restricted

    # Add content type filtering based on user input (e.g., "image")
    if content_type == "image":
        params["searchType"] = "image"

    # Add domain filtering if specified (e.g., restrict to or exclude specific domains)
    if domain_filter:
        params["q"] += f" {domain_filter}"

    # Make the request to Google Custom Search API
    response = requests.get(url, params=params)
    if response.status_code == 200:
        results = response.json()
        
        # Summarize the texts if the search is not for images
        if content_type != "image":
            summarize_results(results)
        
        return results
    else:
        raise Exception(f"Error: {response.status_code}, {response.text}")

def summarize_results(results):
    """
    Summarize the text content of the output pages using primary and fallback mechanisms.

    Args:
        results (dict): The search results from Google Custom Search API. The "items" field contains the search results.
            Each item contains fields like "link", "title", "snippet", etc.

    Returns:
        None: Summaries are added directly to the "items" of the search results dictionary.
    
    The function fetches the webpage content for each result, extracts relevant text content, and uses the BART model for summarization.
    If the <p> tags are empty, it falls back to <div> tags to collect content.
    """
    summaries = []
    for item in results.get("items", []):
        link = item.get("link")
        if link:
            try:
                # print(f"Fetching content for link: {link}")  # Debug print
                # Fetch page content
                page_response = requests.get(link)
                if page_response.status_code == 200:
                    soup = BeautifulSoup(page_response.content, "html.parser")
                    paragraphs = soup.find_all('p')
                    # Extract text content from <p> tags
                    text_content = " ".join([p.get_text() for p in paragraphs])
                    # Limit the text content to avoid overwhelming the summarizer
                    text_content = text_content[:4000]
                    if not text_content:
                        # Fallback to <div> tags if <p> tags are empty
                        divs = soup.find_all('div')
                        text_content = " ".join([div.get_text() for div in divs])
                        text_content = text_content[:4000]
                    if text_content:
                        # Summarize the text content using the BART model
                        try:
                            summary_result = summarizer(text_content, max_length=130, min_length=30, do_sample=False)
                            if summary_result and len(summary_result) > 0:
                                summary = summary_result[0].get('summary_text', "No summary available")
                                item['summary'] = summary
                                summaries.append(summary)
                                # print(f"Summary for {link}: {summary}")  # Debug print to verify each summary
                            else:
                                print(f"Error summarizing content from {link}: No summary generated.")
                                item['summary'] = "No summary available"
                                summaries.append("No summary available")
                        except Exception as e:
                            print(f"Error during summarization for {link}: {e}")
                            item['summary'] = "No summary available"
                            summaries.append("No summary available")
            except Exception as e:
                print(f"Error summarizing content from {link}: {e}")
                item['summary'] = "No summary available"
                summaries.append("No summary available")
    
    # Debug print to check if summaries were collected
    print(f"Collected Summaries: {summaries}")





Test the google_search function:

In [6]:
# Example Usage of Google Search with Filters
api_key = "AIzaSyCHgmCrqOQO7sbYPC-b2qQ6fwj74EJd6f4"  # Replace with your Google JSON API Key
cse_id = "123e1726f2db04f43"    # Replace with Custom Search Engine (CSE) ID obtained from google
query = "LLM"

results = google_search(query, api_key, cse_id, num=3, date_range="Last three months", content_type="news", domain_filter="wikipedia.org")
print('''****''')
for item in results.get("items", []):
    title = item.get("title", "No title available")
    link = item.get("link", "No link available")
    snippet = item.get("snippet", "No snippet available")
    display_link = item.get("displayLink", "No domain available")
    summary = item.get("summary", "No summary available")
    
    print(f"Title: {title}")
    print(f"Link: {link}")
    print(f"Snippet: {snippet}")
    print(f"Domain: {display_link}")
    print(f"Summary: {summary}")
    print("-" * 80)

Collected Summaries: ['A large language model (LLM) is a type of computational model designed for natural language processing tasks such as language generation. The largest and most capable LLMs are artificial neural networks built with a decoder-only transformer-based architecture. Modern models can be fine-tuned for specific tasks, or be guided by prompt engineering.', 'Creatives are clamoring for a simple way to opt-out of their publicly published content from being used to train GenAI. If you can write a license that forbids “commercial use”, then you should be able to write a licence that forbids use in “training models’', 'A transformer is a deep learning architecture developed by researchers at Google and based on the multi-head attention mechanism. Text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. Each token is then contextualized within the scope of the context window with other (unma