# <font  color = "blue"> Google Trends Coordinates Extraction

## <font  color = "black"> 1. Defining the Question

### a) Specifying the Data Analytic Question
Google Trends is a search trends feature that shows how frequently a given search term is entered into Google’s search engine relative to the site’s total search volume over a given period of time. Google Trends can be used for comparative keyword research and to discover event-triggered spikes in keyword search volume.

Google Trends provides keyword-related data including search volume index and geographical information about search engine users.

### b) Defining the Metric for Success
The evaluation metric for this challenge is to extract longitudes and latitudes of sentiments then map them

### c) Understanding the context 
You can enter a search term into the search box at the top of the tool to see how search volume has varied for that term over time and in different locations. Change the location, time frame, category or industry, and type of search (web, news, shopping, or YouTube) for more fine-grained data.

While only a sample of Google searches are used in Google Trends, this is sufficient because we handle billions of searches per day. Providing access to the entire data set would be too large to process quickly. By sampling data, we can look at a dataset representative of all Google searches, while finding insights that can be processed within minutes of an event happening in the real world.

Google Trends normalizes search data to make comparisons between terms easier. Search results are normalized to the time and location of a query by the following process:

Each data point is divided by the total searches of the geography and time range it represents to compare relative popularity. Otherwise, places with the most search volume would always be ranked highest.

The resulting numbers are then scaled on a range of 0 to 100 based on a topic’s proportion to all searches on all topics.

Different regions that show the same search interest for a term don't always have the same total search volumes.

## <font  color = "black"> 2. Installing Libraries

In [None]:
# Installing chrome driver and library updates

!apt-get update 
!apt install chromium-chromedriver

# Installing webdriver manager

pip install webdriver-manager

#  Installing selenium and folium
!pip install selenium
!pip install folium

0% [Working]            Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
0% [Connecting to archive.ubuntu.com (91.189.88.152)] [1 InRelease 14.2 kB/88.70% [Waiting for headers] [Connected to cloud.r-project.org (13.32.81.22)] [Wait0% [1 InRelease gpgv 88.7 kB] [Waiting for headers] [Connected to cloud.r-proje                                                                               Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
0% [1 InRelease gpgv 88.7 kB] [Waiting for headers] [Waiting for headers] [Wait                                                                               Get:3 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease [3,626 B]
0% [1 InRelease gpgv 88.7 kB] [Waiting for headers] [3 InRelease 3,626 B/3,626 0% [1 InRelease gpgv 88.7 kB] [Waiting for headers] [Waiting for headers] [Wait                                                                            

In [None]:
# Importing folium and selenium 

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import folium


# Importing tqdm - an essential progressbar python package. 
# It is very useful to estimate how much time the web scraping part of your code will take

from tqdm import tqdm_notebook as tqdmn
from tqdm import notebook

# Importing pandas
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)



# Importing os and request to extract data from source
import os
from io import StringIO
import requests

## <font  color = "black"> 3. Reading Data from Source(Drive)

In [None]:
# Data url
google_trends = 'https://drive.google.com/file/d/1AHrXRp4x4MxKRALN5ImArKRATaaRUQ6U/view?usp=sharing'


# Function to read csv data from url

def read_csv(url):
  url = 'https://drive.google.com/uc?export=download&id=' + url.split('/')[-2]
  csv_raw = requests.get(url).text
  csv = StringIO(csv_raw)
  return csv

# Defining csv data to dataframe

df = pd.read_csv(read_csv(google_trends),skiprows=2)


## <font  color = "black">4. Checking the Data

In [None]:
# Previewing data

df.head(3)

Unnamed: 0,Region,Youth: (6/10/19 - 6/10/20),Love: (6/10/19 - 6/10/20),Education: (6/10/19 - 6/10/20),Mental health: (6/10/19 - 6/10/20),Social media: (6/10/19 - 6/10/20)
0,Hawaii,6%,60%,23%,6%,5%
1,Mississippi,7%,55%,27%,7%,4%
2,New Jersey,6%,56%,27%,7%,4%


##<font  color = "black">5. Creating URL'S for each Location

In [None]:
# Creating column containing all the URLs we want to crawl

df['Url'] = ['https://www.google.com/maps/search/' + i for i in df['Region'] ]

In [None]:
# Previewing the data

df.head()

Unnamed: 0,Region,Youth: (6/10/19 - 6/10/20),Love: (6/10/19 - 6/10/20),Education: (6/10/19 - 6/10/20),Mental health: (6/10/19 - 6/10/20),Social media: (6/10/19 - 6/10/20),Url
0,Hawaii,6%,60%,23%,6%,5%,https://www.google.com/maps/search/Hawaii
1,Mississippi,7%,55%,27%,7%,4%,https://www.google.com/maps/search/Mississippi
2,New Jersey,6%,56%,27%,7%,4%,https://www.google.com/maps/search/New Jersey
3,California,7%,59%,21%,8%,5%,https://www.google.com/maps/search/California
4,Georgia,7%,58%,25%,6%,4%,https://www.google.com/maps/search/Georgia


##<font  color = "black">6. Iterate URLs through Google Maps with Coordinates

In [None]:
# Here we create an empty list in which we shall append Urls with coordinates

Url_With_Coordinates = []


#  prefs to run the Webdriver without javascript and images 
# This way the code will take much less time to load webpages 

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)

driver =webdriver.Chrome('chromedriver',chrome_options=chrome_options)

# Storing the url with coordinates in a list

for url in tqdmn(df.Url, leave=False):
    driver.get(url)
    Url_With_Coordinates.append(driver.find_element_by_css_selector('meta[itemprop=image]').get_attribute('content'))
    
driver.close()

  # Remove the CWD from sys.path while we load stuff.
  if sys.path[0] == '':
Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  app.launch_new_instance()


HBox(children=(FloatProgress(value=0.0, max=51.0), HTML(value='')))



In [None]:
# Preview of url with coordinates

Url_With_Coordinates[:5]

['https://maps.google.com/maps/api/staticmap?center=20.46%2C-157.505&zoom=7&size=256x256&language=en&sensor=false&client=google-maps-frontend&signature=zeU0_pnAVWi-jnyJNaZFoJAhP2A',
 'https://maps.google.com/maps/api/staticmap?center=32.57110255%2C-89.87644845&zoom=7&size=256x256&language=en&sensor=false&client=google-maps-frontend&signature=mlb2fpXUwvafZj2A7zmGTyf_JRU',
 'https://maps.google.com/maps/api/staticmap?center=40.07304%2C-74.72432305&zoom=8&size=256x256&language=en&sensor=false&client=google-maps-frontend&signature=AGc2oMLX_-rrFEup0uyOvEjiSxs',
 'https://maps.google.com/maps/api/staticmap?center=37.26917445%2C-119.306607&zoom=6&size=256x256&language=en&sensor=false&client=google-maps-frontend&signature=TT0fHIZyKC0Fpc-vt6sR05B6Ylg',
 'https://maps.google.com/maps/api/staticmap?center=32.67812485%2C-83.17829695&zoom=7&size=256x256&language=en&sensor=false&client=google-maps-frontend&signature=JAbdIU8s8WQK64I4_PXhLD_6oF4']

In [None]:
# Appending coordinates to our dataframe

df['Url_With_Coordinates'] = Url_With_Coordinates

##<font  color = "black">7. Separating Longitudes from latitudes

In [None]:
# Extracting Latitude from URL with coordinates
df['lat'] = [ url.split('?center=')[1].split('&zoom=')[0].split('%2C')[0] for url in df['Url_With_Coordinates'] ]

# Extracting Longitude from URL with coordinates
df['long'] = [url.split('?center=')[1].split('&zoom=')[0].split('%2C')[1] for url in df['Url_With_Coordinates'] ]

In [None]:
# Previewing Output

df.head(3)

Unnamed: 0,Region,Youth: (6/10/19 - 6/10/20),Love: (6/10/19 - 6/10/20),Education: (6/10/19 - 6/10/20),Mental health: (6/10/19 - 6/10/20),Social media: (6/10/19 - 6/10/20),Url,Url_With_Coordinates,lat,long
0,Hawaii,6%,60%,23%,6%,5%,https://www.google.com/maps/search/Hawaii,https://maps.google.com/maps/api/staticmap?cen...,20.46,-157.505
1,Mississippi,7%,55%,27%,7%,4%,https://www.google.com/maps/search/Mississippi,https://maps.google.com/maps/api/staticmap?cen...,32.57110255,-89.87644845
2,New Jersey,6%,56%,27%,7%,4%,https://www.google.com/maps/search/New Jersey,https://maps.google.com/maps/api/staticmap?cen...,40.07304,-74.72432305
3,California,7%,59%,21%,8%,5%,https://www.google.com/maps/search/California,https://maps.google.com/maps/api/staticmap?cen...,37.26917445,-119.306607
4,Georgia,7%,58%,25%,6%,4%,https://www.google.com/maps/search/Georgia,https://maps.google.com/maps/api/staticmap?cen...,32.67812485,-83.17829695


In [None]:
# Saving Output as CSV

df.to_csv('google_trends_with_coordinates.csv')