!['google-header.png'](attachment:google-header.png)

## **Google Search Workbook**

A notebook showcasing how to easily pull Google Search results from Python into easy to use JSON format. Use for exploration, programatic search, or downstream data analysis. The choice is yours!

## **Getting Started**

#### **1.Get API Key**
First check out the following site to get your own Google [API key](https://developers.google.com/custom-search/v1/overview) to enable making requests to the search engine service.

#### **2. Create a Search Engine**
After you have an API key you will need to create a [custom search engine](https://developers.google.com/custom-search/docs/tutorial/creatingcse). Make sure **"Search the entire web"** is enabled in the control panel. If this is not set your search engine will not return any results.

**Search Engine ID** Copy this value from the control panel you will need this value.

#### **3.Set Env Variables**
```bash
export GOOGLE_API_KEY=YOUR_API_KEY
export GOOGLE_SEARCH_ENGINE=SEARCH_ENGINE_ID
export
```

#### **4.Search Google**
```python
# Create Google Client
gc = GoogleSearch(api_key=GOOGLE_API_KEY, 
                  search_engine=GOOGLE_SEARCH_ENGINE)

# Get Results
results = gc.get_results(q='coffee near me')

```


In [125]:
import pandas as pd
import numpy as np
import requests
from pathlib import Path
import os
import json
import pprint as pp
import uuid
import dotenv
from pprint import pprint
from itertools import chain

#### **Configuration**

All Google API Documention can be refered to at the below there are dozens of parameters to further customize search functions **[Google API Reference](https://developers.google.com/custom-search/v1/introduction)**


In [60]:
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
GOOGLE_SEARCH_ENGINE = os.getenv('GOOGLE_SEARCH_ENGINE')

#### **Google Search Wrapper**
Creates a wrapper around Google Search API functionality to provide easy to use access to core functionality from the API.

In [128]:
class GoogleSearch:
    '''
    Creates wrapper class around Google Custom API
    that returns JSON response of search result items
    
    Params:
    api_key - String of API key acquired from Google API 
    search_engine - String of custom engine (cx) identifier from console
    '''
    def __init__(self, api_key, search_engine):
        self.api_key = api_key
        self.engine = search_engine
        self.api_url = 'https://www.googleapis.com/customsearch/v1?'
        self.has_next = True
        self.search_stats = None
        self.search_terms = None
        
    def get_results(self, **kwargs):
        '''
        Gets the JSON search results meeting
        search criteria
        
        Params:
        kwargs - Any valid key/vlaue combination from Google API
                 https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list
        
        Returns:
        res - JSON response of Google Search Items if any along with additional metadata keys
        
        '''
        
        params = {'key': self.api_key,
                  'cx': self.engine,
                  'start': 1}
        
        for key, value in kwargs.items():
            params[key] = value
        
        results = []
        
        while self.has_next == True:
            res = requests.get(self.api_url, params=params)
            res = res.json()
            
            # Get results from search
            try:
                search_results = res['items']
                results.append(search_results)
            except:
                pass
            
            if params['start'] == 1:
                self.search_stats = res['searchInformation']
                self.search_terms = res['queries']['request'][0]
            try:
                params['start'] = res['queries']['nextPage'][0]['startIndex']
            except:
                self.has_next = False
        
        return list(chain(*results))

#### **Create Google Client**

In [163]:
# Create Google Client
gc = GoogleSearch(api_key=GOOGLE_API_KEY, search_engine=GOOGLE_SEARCH_ENGINE)

#### **Search Results**

In [164]:
search_results = gc.get_results(q='Covid-19 Cures')

In [165]:
print(f'{len(search_results)} Google Results Returned...')

100 Google Results Returned...


In [166]:
search_stats = gc.search_stats
search_terms = gc.search_terms
print(search_stats)
print(search_terms)

{'searchTime': 0.561918, 'formattedSearchTime': '0.56', 'totalResults': '251000000', 'formattedTotalResults': '251,000,000'}
{'title': 'Google Custom Search - Covid-19 Cures', 'totalResults': '251000000', 'searchTerms': 'Covid-19 Cures', 'count': 10, 'startIndex': 1, 'inputEncoding': 'utf8', 'outputEncoding': 'utf8', 'safe': 'off', 'cx': '007553850390062762624:iio37zxu5e3'}


#### **Data Cleanup**
##### **Search Results Dataframe**

In [167]:
search_df = pd.DataFrame(search_results)
search_df.drop(columns=['kind', 'htmlTitle', 'htmlSnippet', 'htmlFormattedUrl'], inplace=True)
search_df.head()

Unnamed: 0,title,link,displayLink,snippet,cacheId,formattedUrl,pagemap,mime,fileFormat
0,Where We're at with Vaccines and Treatments fo...,https://www.healthline.com/health-news/heres-e...,www.healthline.com,2 days ago ... They say the drug could potenti...,3ipdFSvg4JEJ,https://www.healthline.com/.../heres-exactly-w...,{'cse_thumbnail': [{'src': 'https://encrypted-...,,
1,"COVID-19: New drug candidates, treatments offe...",https://www.medicalnewstoday.com/articles/covi...,www.medicalnewstoday.com,"May 15, 2020 ... We review the latest evidence...",BAPywsS2JukJ,https://www.medicalnewstoday.com/.../covid-19-...,{'cse_thumbnail': [{'src': 'https://encrypted-...,,
2,Treatments for COVID-19 - Harvard Health,https://www.health.harvard.edu/diseases-and-co...,www.health.harvard.edu,"Mar 24, 2020 ... Currently there is no specifi...",RTid9N4wCZ8J,https://www.health.harvard.edu/diseases-and......,{'cse_thumbnail': [{'src': 'https://encrypted-...,,
3,Coronavirus (COVID-19) Update: FDA Issues Emer...,https://www.fda.gov/news-events/press-announce...,www.fda.gov,"May 1, 2020 ... FDA has issued emergency use a...",xd0ZL6Cuef8J,https://www.fda.gov/.../coronavirus-covid-19-u...,{'cse_thumbnail': [{'src': 'https://encrypted-...,,
4,Gilead data suggests coronavirus patients are ...,https://www.statnews.com/2020/04/16/early-peek...,www.statnews.com,"Apr 16, 2020 ... A Chicago hospital treating s...",4Pnz6T5fVJYJ,https://www.statnews.com/.../early-peek-at-dat...,"{'hcard': [{'fn': 'Kathleen', 'nickname': 'Kat...",,


#### **Search Term Data Frame**

In [168]:
search_terms_df = pd.DataFrame(search_terms, index=[0])
search_terms_df.drop(columns=['count', 'startIndex', 'inputEncoding', 'outputEncoding','safe', 'cx'], inplace=True)
search_terms_df.head()

Unnamed: 0,title,totalResults,searchTerms
0,Google Custom Search - Covid-19 Cures,251000000,Covid-19 Cures
