# Google Trends Data

This program obtains Google Trends Data using an unofficial API called [pytrends](https://github.com/GeneralMills/pytrends). Thank you to the creators!

In [2]:
# import all necessary packages
import pandas as pd
import numpy as np
import lxml
import requests
import requests_cache
import pytrends
from pytrends.request import TrendReq
from bs4 import BeautifulSoup
from collections import Counter
from matplotlib import pyplot as plt
plt.style.use('ggplot')
requests_cache.install_cache("cache")

# required info
google_username = "***@gmail.com"
google_password = "***"

ImportError: No module named 'pytrends'

In [3]:
# Login to Google. Only need to run this once, the rest of requests will use the same session.
pytrend = TrendReq(google_username, google_password, custom_useragent=None)

For our project, we decided to take a look at five companies from different sectors to analyze their search terms: Walmart, Goldman Sachs, Exxon Mobile, Facebook, and Nike. We also chose to focus on data from the US and from the last three months only.

We first tried to look at all the search terms once in one payload, but that gave us different results than if we searched the terms separately. The reason is because when we used all the terms at once, the trends data returned was actually comparing each of the search terms and their popularity to each other. This is not what we wanted, so we decided to do them all separately.

**A side note:** According to the [Google Trends Search Tips](https://support.google.com/trends/answer/4359582?hl=en) page, searching a term like *tennis shoes* results in "searches containing both tennis and shoes in any order. Results can also include searches like red tennis shoes, funny shoes for tennis, or tennis without shoes. Therefore, we chose to use terms like *walmart stock price* so that each word would be considered as well as many combinations of these words."

In [27]:
# list of search terms
search_list = ['walmart stock price', 'goldman sachs stock price', 'exxon mobile stock price', 'facebook stock price', 'nike stock price']

# get the comparison search data
pytrend.build_payload(kw_list = search_list, geo = 'US', timeframe = 'today 3-m')
compare_df = pytrend.interest_over_time()
compare_df.head()

Unnamed: 0_level_0,walmart stock price,goldman sachs stock price,exxon mobile stock price,facebook stock price,nike stock price
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-12-01,32,3,0,42,15
2016-12-02,33,5,0,50,22
2016-12-03,7,7,0,19,3
2016-12-04,14,5,0,22,3
2016-12-05,27,5,0,43,20


Because of the way the API is structured, we had to make five separate pyloads and get the individual data frames that way before combining them into a single data frame. Notice that the numbers are different from the data frame above!

Before that, we quickly needed to change the search term list into a list of lists with each term in its own list. This is because the API requires the keywords to be in a list format.

In [42]:
# quickly change the search term list into a list of lists
def listit(t):
    term_listed = []
    term_listed.append(t)
    return term_listed

In [47]:
search_list2 = [listit(t) for t in search_list]
search_list2

[['walmart stock price'],
 ['goldman sachs stock price'],
 ['exxon mobile stock price'],
 ['facebook stock price'],
 ['nike stock price']]

In [36]:
# function to get each company's data frame separately
def get_comp_df(term):
    pytrend.build_payload(kw_list = term, geo = 'US', timeframe = 'today 3-m')
    new_df = pytrend.interest_over_time()
    return new_df

In [1]:
# combine all data frames by column since the indexes are the same anyways
all_df = pd.concat([get_comp_df(search_term) for search_term in search_list2], axis = 1)
print('Relative frequency of stock related search terms for different companies over time.')
print('(Note, results are normalized independently within each column based as a percent of the maximum value found in the given time period for those search terms)')
all_df.head()

NameError: name 'pd' is not defined