# Research Question

Can Google trends search data be used to predict stock price movements and volume using machine learning techniques? The Google trends data tracks the relative search volume of a particular term on a given day. i.e. it is a number between 1 and 100 where a number close to 100 indicates the search volume was relatively high on the day and a number close to 1 means the volume was relatively low on that day. Using this information, one of the possible hypothesis is that if there is a high search volume for a particular stock/ stock price on particular day a large number of people are looking to buy or sell that stock in the near future. This story is arguably true because Google has increasingly become the search engine of choice to easily obtain information on just about any topic. 

In the finance industry managing risk has been a major concern especially since the recession of 2008. Therefore if there is a way to predict the volatility of a stock using Google trends data firms that manage mutual funds and other financial products that are primarily designed to minimize risk can use this information to minimize risk for their customers and increase the overall welfare of the society. 

# Objectives

The first step will be to develop a model that defines the interaction betweem price and volume with google search trend data as the exogenous variable. The structural model will be used to derive the reduced form model and to perform simulations based on empirical relations. In order to test the hypothesis I intend to use Google search data for the top 20 stocks (and possibly more) in the Standard and Poor's (S&P) 500 index as the independent variable of interest. For the dependent variable I intend to use the price, trading volume and volatility for the stock on the given day. In order to test the relation I will implement the fixed effect technique by controlling for overall market movement and/ or sector fixed effects for the stock. I also intend to use OLS and Lasso regression techniques in order to estimate and forecast future outcomes. 

# Testable hypotheses

1. Is the a correlation between the Google search trends and the  price/ volume of stock? 
2. Is there a lag effect? If so how many periods?
3. Does the Google trends data only help explain technology stock price movement/ volume?

# Data 
Below is sample code to get the data

# Stock Price/ Volume Data

In [1]:
import pandas as pd
import Quandl
 
mydata = Quandl.get("WIKI/GOOGL", rows=100)

mydata.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Ex-Dividend,Split Ratio,Adj. Open,Adj. High,Adj. Low,Adj. Close,Adj. Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2017-10-31,1033.0,1041.0,1026.3,1033.04,1490660.0,0.0,1.0,1033.0,1041.0,1026.3,1033.04,1490660.0
2017-11-01,1036.32,1047.86,1034.0,1042.6,2105729.0,0.0,1.0,1036.32,1047.86,1034.0,1042.6,2105729.0
2017-11-02,1039.99,1045.52,1028.66,1042.97,1233333.0,0.0,1.0,1039.99,1045.52,1028.66,1042.97,1233333.0
2017-11-03,1042.75,1050.66,1037.65,1049.99,1370874.0,0.0,1.0,1042.75,1050.66,1037.65,1049.99,1370874.0
2017-11-06,1049.1,1052.59,1042.0,1042.68,897897.0,0.0,1.0,1049.1,1052.59,1042.0,1042.68,897897.0


# Google Trends Data

In [2]:
from pytrends.request import TrendReq

pytrend = TrendReq()

# Create payload and capture API tokens. Only needed for interest_over_time(), interest_by_region() & related_queries()
pytrend.build_payload(kw_list=['Google Stock'])

# Interest Over Time
interest_over_time_df = pytrend.interest_over_time()
print(interest_over_time_df.head())

            Google Stock isPartial
date                              
2013-09-15            46     False
2013-09-22            48     False
2013-09-29            45     False
2013-10-06            42     False
2013-10-13            71     False
