# <center>Testing News Asymmetry Between Small and Large Companies</center>


### <center>A sentiment analysis study by Robert Grote and Ryan Fairhurst</center>

This is an interactive Python notebook, which requires that the user run the cells within in order to see the results. Run the cells from the top, all the way down the page by clicking on the first cell, then holding down shift or command or control (depending on whether you're on PC or Mac), then hitting enter.

Install the modules that you will need for reading in all of the data by running the following cell.    
Uncomment the following two lines the first time you run this in order to install the necessary modules by deleting the hash symbol.

In [2]:
#!pip install pandas-datareader
#!pip install yahoo-finance

Import the following modules:

In [9]:
import yahoo_finance as yfin
import pandas as pd
import pandas_datareader.data as web
import datetime
import csv
import pandas_datareader.data as data
from yahoo_finance import Share
from pandas_datareader.yahoo.quotes import _yahoo_codes

The basic idea is that, as companies grow in size, the relative proportion of bad news grows with respect to the amount of good news. We are interested in testing whether this is true and appears in the data, whether it differs by industry, whether there is a maximum threshold of how much bad news can be experienced by small companies, and how the proportions change by company size. In the following  prototype, we conceptualize using jumps in Google Trends data and jumps in stock price data to detect newsworthy events, and observations which will amount to the idea of linking sentiment to a company, then recording its size and several other key variables that we think will have interesting relationships with company size and sentiment.

In [10]:
#'Jumps' in data--a place for our ideas
"""
Defining a Jump
When an event happens
Positive/Negative Event
Positive/Negative interest
"""
#trends-->prices
#prices-->trends
#volume-->prices


#Jump Program: Percentage change--not dollar change
#Different times (Stamp when, magnitude, duration, direction)
#--make a program that measures relatively, not absolutely (percentages, not dollars)
#Sustained vs transient growth

#Jumps:
#1: Identify Jumps
#2:Categorize comparable categories
##Compare news profiles (trends)
##Compare

#Var Types:
#C-Continuous
#S-Scalar
#B-Binary (Dummy)
#P-Percentage change
#R-Rate of change
#D-Discreet--qualitative
#Z-Z Score
#T-Time

#Create a new dataset of jump observations
###Categories###
#*0: Company Name/Ticker Symbol
#1:S Size of company (Market cap pre- and post-jump)
#*2:B Direction (+/-, good/bad news)
#*3:S Time: Does this relationship between news and jumps change in different market conditions?
#4:B Period (Does the relationship between news and jumps change in different market conditions (recession, recovery))
###Possibly just use S&P 500 or define broad market categories
#*4:P Magnitude (percent change in price from before jump until after jump)
#5:R Rate of change (magnitude/time)
#6:D Industry (do different industries behave differently)
#7:B Sustainability-do prices return to pre-jump level (separate test)
#8: Accompanying Google Trends pattern
###Increased interest before/after/during
###sustained interest sustained increased interest?
###Magnitude
###Rate of change
#*9:Z News volatility (maybe find a relationship between this and company size)
#10:Z Stock Price Volatility
#10:T Duration: when do prices stabilize?

#Defining jump: like RObot Broker Program, but with variable x time and variable percentage p
#get rid of redundancy and characterize different types of movement

'\nDefining a Jump\nWhen an event happens\nPositive/Negative Event\nPositive/Negative interest\n'

## Companies we had in mind:

Dryships
<img src='DRYS.png'>
(Source: Yahoo Finance)

Semi-LED Corp
<img src='DRYS.png'>
(Source: Yahoo Finance)

Chipotle
<img src='CMG.png'>
(Source: Yahoo Finance)

In [11]:
#read in the csv with the russell 3000 stock data and Wilshire 5000 stock data (Just a dataframe of symbols for now)
dfRussell=pd.read_csv('russell_3000_2011-06-27.csv')
dfWilshire=pd.read_csv('wilshire5000.csv')

In [12]:
#Taking the values from the dataframes and making them into arrays
arrayRussell=dfRussell.values.flatten()
arrayWilshire=dfWilshire.values.flatten()

In [16]:
#Make the dataframe with the adjusted close prices for all of the Wilshire 5000 Companies
ls_key = 'Adj Close'
start = datetime.datetime(2004,1,1)
end = datetime.datetime(2016,12,31)
f = data.DataReader(arrayWilshire[:], 'yahoo',start,end)

cleanData = f.ix[ls_key]
dataFrame = pd.DataFrame(cleanData)

print dataFrame[:]



                   AA    AACC  AAI  AAII      AAME        AAN       AAON  \
Date                                                                       
2004-01-01        NaN     NaN  NaN   NaN       NaN        NaN        NaN   
2004-01-02        NaN     NaN  NaN   NaN  2.967206   7.933725   3.263127   
2004-01-05        NaN     NaN  NaN   NaN  2.957786   8.036587   3.278643   
2004-01-06        NaN     NaN  NaN   NaN  3.004885   8.036587   3.325183   
2004-01-07        NaN     NaN  NaN   NaN  2.863589   7.929433   3.307948   
2004-01-08        NaN     NaN  NaN   NaN  2.901268   8.208036   3.311394   
2004-01-09        NaN     NaN  NaN   NaN  2.882428   8.229465   3.383791   
2004-01-12        NaN     NaN  NaN   NaN  2.929527   8.400914   3.361381   
2004-01-13        NaN     NaN  NaN   NaN  2.920107   8.465209   3.361381   
2004-01-14        NaN     NaN  NaN   NaN  2.920107   8.572364   3.518251   
2004-01-15        NaN     NaN  NaN   NaN  2.920107   8.572364   3.714759   
2004-01-16  

In [11]:
#Make the dataframe into a csv for turning into more meaningful data on 'jumps'
dataFrame.to_csv('WilshireHist')

# Land of Forgotten Code

In [None]:
#start=datetime.datetime(2013,1,1)
#end=datetime.datetime(2017,1,1)
#df = data.DataReader(arrayWilshire[0:4], 'yahoo', start, end) 
#print df
#dates =[]
#for x in range(len(df)):
#    newdate = str(df.index[x])
#    newdate = newdate[0:10]
#    dates.append(newdate)

#df['dates'] = dates

#print df.head()
#print df.tail()

In [None]:
#yahoo=Share('YHOO')
#print yahoo
#yahoo.get_price()

stocklist = ['aapl','goog','fb','amzn','COP']

#http://www.jarloo.com/yahoo_finance/
#https://greenido.wordpress.com/2009/12/22/yahoo-finance-hidden-api/
_yahoo_codes.update({'Market Cap': 'j1'})
_yahoo_codes.update({'Div Yield': 'y'})
_yahoo_codes.update({'Bid': 'b'})
_yahoo_codes.update({'Ask': 'a'})
_yahoo_codes.update({'Prev Close': 'p'})
_yahoo_codes.update({'Open': 'o'})
_yahoo_codes.update({'1 yr Target Price': 't8'})
_yahoo_codes.update({'Earnings/Share': 'e'})
_yahoo_codes.update({"Day’s Range": 'm'})
_yahoo_codes.update({'52-week Range': 'w'})
_yahoo_codes.update({'Volume': 'v'})
_yahoo_codes.update({'Avg Daily Volume': 'a2'})
_yahoo_codes.update({'EPS Est Current Year': 'e7'})
_yahoo_codes.update({'EPS Est Next Quarter': 'e9'})

data.get_quote_yahoo(stocklist).to_csv('test.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)

data.get_quote_yahoo(stocklist).transpose()