### Google Names Batch Search

By: Shirsho Dasgupta (2021) 

The Miami Herald often works on investigations based on corporate records — sometimes public, at other times leaked. These records often have — or reporters can make it themselves — lists of companies, their owners and/or directors and other officers. 

This project was initiated to automate an initial search on who these people are. 

The code imports a spreadsheet with a list of names then searches for them in Google. It then extracts the first few lines about that person that come up as flashcard in a regular Google search.  

An short example is attached. 

The file that is imported is names.csv

The resulting file is search_results.csv

##### Notes:

1. This search is only to be used as a starting point. The results are not fully confirmed. Some of the ways in which one can obtain a complete confirmation is to match DOBs or photos. 

2. Overloading Google with queries can make their networks label the code as a bot and block access. Care must be taken to break the searches up and have sleep times between each iteration.

### Importing libraries

In [1]:
import requests
import bs4
import pandas as pd
import time

### Importing spreadsheet for batch of names to be searched

In [2]:
searchlist = pd.read_csv("names.csv")  
searchlist.head(5)

Unnamed: 0,Names
0,Donald Trump
1,Mark Zuckerberg
2,Tony Blair
3,Joe Biden
4,Steve Jobs


### Preparing dataframe and running search

In [3]:
## adding columns to be filled in from google
searchlist["Googled_Names"] = " "
searchlist["Descriptor_1"] = " "
searchlist["Descriptor_2"] = " "
searchlist["Descriptor_3"] = " "
searchlist["Descriptor_4"] = " "
searchlist["Descriptor_5"] = " "

In [4]:
## storing number of rows in the spreadsheet
rows = searchlist.shape[0] 

## setting up loop to run through each row
for i in range(0, rows):
    
    ## concatenating with "+" sign if a cell has multiple words for google search url pattern
    txt = searchlist["Names"][i]
    terms = "+"
    x = txt.split()
    terms = terms.join(x)
    
    ## storing url
    url = "https://google.com/search?q=" + terms
    
    ## getting url and converting for scrape
    request_result = requests.get(url)
    soup = bs4.BeautifulSoup(request_result.text, "html.parser")
    
    ## setting up exception handling, if there is a result the search details are stored, if not, loops moves onto next row
    try:
        
        ## finds "div" tag and the class that stores the names and descriptors; note: this sometimes changes and should be checked and modified accordingly
        heading_object = soup.find_all("div", class_= "BNeawe")
        
        ## runs through each of the entries; relevant information is generally stored in the first six cells
        for info in heading_object:
            names = heading_object
        
        ## writes results into the relevant results column
        searchlist["Googled_Names"][i] = names[0].text
        searchlist["Descriptor_1"][i] = names[1].text
        searchlist["Descriptor_2"][i] = names[2].text
        searchlist["Descriptor_3"][i] = names[3].text
        searchlist["Descriptor_4"][i] = names[5].text
        searchlist["Descriptor_5"][i] = names[6].text
    except:
        i = i + 1
        
    ## sleeper ensures that google does not mistake script for a bot and blocks access    
    time.sleep(0.2)   

In [5]:
## displaying results
searchlist.head(5)

Unnamed: 0,Names,Googled_Names,Descriptor_1,Descriptor_2,Descriptor_3,Descriptor_4,Descriptor_5
0,Donald Trump,Donald Trump,45th U.S. President · donaldjtrump.com,"Born: June 14, 1946 (age 75 years), Jamaica Ho...",Height: 6′ 3″,Party: Republican Party,"Spouse: Melania Trump (m. 2005), Marla Maples ..."
1,Mark Zuckerberg,Mark Zuckerberg,Chief Executive Officer of Facebook,View all,Mark Elliot Zuckerberg is an American media ma...,Net worth: 122.7 billion USD (2021),"Born: May 14, 1984 (age 37 years), White Plain..."
2,Tony Blair,Tony Blair,Former Prime Minister of the United Kingdom,Anthony Charles Lynton Blair is a British poli...,Anthony Charles Lynton Blair is a British poli...,Height: 6′ 0″,Spouse: Cherie Blair (m. 1980)
3,Joe Biden,Joe Biden,46th U.S. President · whitehouse.gov,Joseph Robinette Biden Jr. is an American poli...,Joseph Robinette Biden Jr. is an American poli...,"Born: November 20, 1942 (age 78 years), Scrant...",Height: 6′ 0″
4,Steve Jobs,Steve Jobs,American business magnate,View all,Steven Paul Jobs was an American business magn...,"Born: February 24, 1955, San Francisco, CA","Died: October 5, 2011, Palo Alto, CA"


### Exporting spreadsheet

In [6]:
searchlist.to_csv("search_results.csv", index = False)