# Research Question

Can Google trends search data be used to predict stock price movements and volume using machine learning techniques? The Google trends data tracks the relative search volume of a particular term on a given day. i.e. it is a number between 1 and 100 where a number close to 100 indicates the search volume was relatively high on the day and a number close to 1 means the volume was relatively low on that day. Using this information, one of the possible hypothesis is that if there is a high search volume for a particular stock/ stock price on particular day a large number of people are looking to buy or sell that stock in the near future. This story is arguably true because Google has increasingly become the search engine of choice to easily obtain information on just about any topic. 

In the finance industry managing risk has been a major concern especially since the recession of 2008. Therefore if there is a way to predict the volatility of a stock using Google trends data firms that manage mutual funds and other financial products that are primarily designed to minimize risk can use this information to minimize risk for their customers and increase the overall welfare of the society. 

# Objectives

The first step will be to develop a model that defines the interaction betweem price and volume with google search trend data as the exogenous variable. The structural model will be used to derive the reduced form model and to perform simulations based on empirical relations. In order to test the hypothesis I intend to use Google search data for the top 20 stocks (and possibly more) in the Standard and Poor's (S&P) 500 index as the independent variable of interest. For the dependent variable I intend to use the price, trading volume and volatility for the stock on the given day. In order to test the relation I will implement the fixed effect technique by controlling for overall market movement and/ or sector fixed effects for the stock. I also intend to use OLS and Lasso regression techniques in order to estimate and forecast future outcomes. 

# Testable hypotheses

1. Is the a correlation between the Google search trends and the  price/ volume of stock? 
2. Is there a lag effect? If so how many periods?
3. Does the Google trends data only help explain technology stock price movement/ volume?

# Data 
Below is sample code to get the data

# Stock Price/ Volume Data

In [None]:
import matplotlib.pyplot as plt
import fix_yahoo_finance as yf  
data = yf.download('AAPL','2016-01-01','2018-09-19')
data

[*********************100%***********************]  1 of 1 downloaded


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-01-04,102.610001,105.370003,102.000000,105.349998,100.274513,67649400
2016-01-05,105.750000,105.849998,102.410004,102.709999,97.761681,55791000
2016-01-06,100.559998,102.370003,99.870003,100.699997,95.848511,68457400
2016-01-07,98.680000,100.129997,96.430000,96.449997,91.803276,81094400
2016-01-08,98.550003,99.110001,96.760002,96.959999,92.288696,70798000
2016-01-11,98.970001,99.059998,97.339996,98.529999,93.783073,49739400
2016-01-12,100.550003,100.690002,98.839996,99.959999,95.144165,49154200
2016-01-13,100.320000,101.190002,97.300003,97.389999,92.697990,62439600
2016-01-14,97.959999,100.480003,95.739998,99.519997,94.725372,63170100
2016-01-15,96.199997,97.709999,95.360001,97.129997,92.450516,79010000


# Google Trends Data

In [12]:
from pytrends.request import TrendReq

pytrend = TrendReq()

# Create payload and capture API tokens. Only needed for interest_over_time(), interest_by_region() & related_queries()
kw_list = ["NVDA Stock"]
pytrend.build_payload(kw_list, cat=0, timeframe='2011-01-01 2015-01-01', geo='', gprop='')

# Interest Over Time
interest_over_time_df = pytrend.interest_over_time()
print(interest_over_time_df)
len(interest_over_time_df)

            NVDA Stock  isPartial
date                             
2011-01-02          87      False
2011-01-09          95      False
2011-01-16          48      False
2011-01-23          36      False
2011-01-30          43      False
2011-02-06          47      False
2011-02-13          79      False
2011-02-20          53      False
2011-02-27          38      False
2011-03-06          52      False
2011-03-13          32      False
2011-03-20          17      False
2011-03-27          37      False
2011-04-03          35      False
2011-04-10          35      False
2011-04-17          28      False
2011-04-24          33      False
2011-05-01          30      False
2011-05-08          55      False
2011-05-15          38      False
2011-05-22          26      False
2011-05-29          21      False
2011-06-05          14      False
2011-06-12          20      False
2011-06-19          12      False
2011-06-26           9      False
2011-07-03          29      False
2011-07-10    

209