# Combined_News_DJIA - Data Preprocessing and Sentiment Analysis

Link to Kaggle dataset: 
https://www.kaggle.com/aaron7sun/stocknews

Useful links:
https://www.analyticsvidhya.com/blog/2018/02/the-different-methods-deal-text-data-predictive-python/

## Import packages

In [25]:
import nltk

#If you are running this for the first time, uncomment and download the VADER list of words / lexicon
# nltk.download('vader_lexicon')

In [26]:
import pandas as pd
import numpy as np

from nltk.stem import PorterStemmer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB


## Load data sets

In [27]:
DJIA_news = pd.read_csv("Combined_News_DJIA.csv")
DJIA = pd.read_csv("DJIA_table.csv")
reddit = pd.read_csv("RedditNews.csv")

# Data pre-processing

### Data Inspection

In [28]:
DJIA_news.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1989 entries, 0 to 1988
Data columns (total 27 columns):
Date     1989 non-null object
Label    1989 non-null int64
Top1     1989 non-null object
Top2     1989 non-null object
Top3     1989 non-null object
Top4     1989 non-null object
Top5     1989 non-null object
Top6     1989 non-null object
Top7     1989 non-null object
Top8     1989 non-null object
Top9     1989 non-null object
Top10    1989 non-null object
Top11    1989 non-null object
Top12    1989 non-null object
Top13    1989 non-null object
Top14    1989 non-null object
Top15    1989 non-null object
Top16    1989 non-null object
Top17    1989 non-null object
Top18    1989 non-null object
Top19    1989 non-null object
Top20    1989 non-null object
Top21    1989 non-null object
Top22    1989 non-null object
Top23    1988 non-null object
Top24    1986 non-null object
Top25    1986 non-null object
dtypes: int64(1), object(26)
memory usage: 419.6+ KB


#### Observations:

1. Date column is object type

2. Not all days have 25 news headlines (Top23, Top24, Top25)

In [29]:
#Changing Date column's type to Datetime
DJIA_news.Date = pd.to_datetime(DJIA_news.Date)

DJIA_news.dtypes

Date     datetime64[ns]
Label             int64
Top1             object
Top2             object
Top3             object
Top4             object
Top5             object
Top6             object
Top7             object
Top8             object
Top9             object
Top10            object
Top11            object
Top12            object
Top13            object
Top14            object
Top15            object
Top16            object
Top17            object
Top18            object
Top19            object
Top20            object
Top21            object
Top22            object
Top23            object
Top24            object
Top25            object
dtype: object

In [30]:
DJIA_news.head()

Unnamed: 0,Date,Label,Top1,Top2,Top3,Top4,Top5,Top6,Top7,Top8,...,Top16,Top17,Top18,Top19,Top20,Top21,Top22,Top23,Top24,Top25
0,2008-08-08,0,"b""Georgia 'downs two Russian warplanes' as cou...",b'BREAKING: Musharraf to be impeached.',b'Russia Today: Columns of troops roll into So...,b'Russian tanks are moving towards the capital...,"b""Afghan children raped with 'impunity,' U.N. ...",b'150 Russian tanks have entered South Ossetia...,"b""Breaking: Georgia invades South Ossetia, Rus...","b""The 'enemy combatent' trials are nothing but...",...,b'Georgia Invades South Ossetia - if Russia ge...,b'Al-Qaeda Faces Islamist Backlash',"b'Condoleezza Rice: ""The US would not act to p...",b'This is a busy day: The European Union has ...,"b""Georgia will withdraw 1,000 soldiers from Ir...",b'Why the Pentagon Thinks Attacking Iran is a ...,b'Caucasus in crisis: Georgia invades South Os...,b'Indian shoe manufactory - And again in a se...,b'Visitors Suffering from Mental Illnesses Ban...,"b""No Help for Mexico's Kidnapping Surge"""
1,2008-08-11,1,b'Why wont America and Nato help us? If they w...,b'Bush puts foot down on Georgian conflict',"b""Jewish Georgian minister: Thanks to Israeli ...",b'Georgian army flees in disarray as Russians ...,"b""Olympic opening ceremony fireworks 'faked'""",b'What were the Mossad with fraudulent New Zea...,b'Russia angered by Israeli military sale to G...,b'An American citizen living in S.Ossetia blam...,...,b'Israel and the US behind the Georgian aggres...,"b'""Do not believe TV, neither Russian nor Geor...",b'Riots are still going on in Montreal (Canada...,b'China to overtake US as largest manufacturer',b'War in South Ossetia [PICS]',b'Israeli Physicians Group Condemns State Tort...,b' Russia has just beaten the United States ov...,b'Perhaps *the* question about the Georgia - R...,b'Russia is so much better at war',"b""So this is what it's come to: trading sex fo..."
2,2008-08-12,0,b'Remember that adorable 9-year-old who sang a...,"b""Russia 'ends Georgia operation'""","b'""If we had no sexual harassment we would hav...","b""Al-Qa'eda is losing support in Iraq because ...",b'Ceasefire in Georgia: Putin Outmaneuvers the...,b'Why Microsoft and Intel tried to kill the XO...,b'Stratfor: The Russo-Georgian War and the Bal...,"b""I'm Trying to Get a Sense of This Whole Geor...",...,b'U.S. troops still in Georgia (did you know t...,b'Why Russias response to Georgia was right',"b'Gorbachev accuses U.S. of making a ""serious ...","b'Russia, Georgia, and NATO: Cold War Two'",b'Remember that adorable 62-year-old who led y...,b'War in Georgia: The Israeli connection',b'All signs point to the US encouraging Georgi...,b'Christopher King argues that the US and NATO...,b'America: The New Mexico?',"b""BBC NEWS | Asia-Pacific | Extinction 'by man..."
3,2008-08-13,0,b' U.S. refuses Israel weapons to attack Iran:...,"b""When the president ordered to attack Tskhinv...",b' Israel clears troops who killed Reuters cam...,b'Britain\'s policy of being tough on drugs is...,b'Body of 14 year old found in trunk; Latest (...,b'China has moved 10 *million* quake survivors...,"b""Bush announces Operation Get All Up In Russi...",b'Russian forces sink Georgian ships ',...,b'Elephants extinct by 2020?',b'US humanitarian missions soon in Georgia - i...,"b""Georgia's DDOS came from US sources""","b'Russian convoy heads into Georgia, violating...",b'Israeli defence minister: US against strike ...,b'Gorbachev: We Had No Choice',b'Witness: Russian forces head towards Tbilisi...,b' Quarter of Russians blame U.S. for conflict...,b'Georgian president says US military will ta...,b'2006: Nobel laureate Aleksander Solzhenitsyn...
4,2008-08-14,1,b'All the experts admit that we should legalis...,b'War in South Osetia - 89 pictures made by a ...,b'Swedish wrestler Ara Abrahamian throws away ...,b'Russia exaggerated the death toll in South O...,b'Missile That Killed 9 Inside Pakistan May Ha...,"b""Rushdie Condemns Random House's Refusal to P...",b'Poland and US agree to missle defense deal. ...,"b'Will the Russians conquer Tblisi? Bet on it,...",...,b'Bank analyst forecast Georgian crisis 2 days...,"b""Georgia confict could set back Russia's US r...",b'War in the Caucasus is as much the product o...,"b'""Non-media"" photos of South Ossetia/Georgia ...",b'Georgian TV reporter shot by Russian sniper ...,b'Saudi Arabia: Mother moves to block child ma...,b'Taliban wages war on humanitarian aid workers',"b'Russia: World ""can forget about"" Georgia\'s...",b'Darfur rebels accuse Sudan of mounting major...,b'Philippines : Peace Advocate say Muslims nee...


### Text Cleaning

1. Removing first letter 'b'
2. Convert headlines to lower case
3. Removing punctuation

As we will be using the VADER sentiment analysis later on, which takes punctuation into account, we will only implement point 1 to our original dataset. 

For points 2 and 3, we will implement it in a separate copy of the dataset which will be used to calculate relevance. 

In [31]:
DJIA_news_col = DJIA_news.columns

for i in range(2, len(DJIA_news_col)):
    DJIA_news[DJIA_news_col[i]] = DJIA_news[DJIA_news_col[i]].astype(str)
    DJIA_news[DJIA_news_col[i]] = DJIA_news[DJIA_news_col[i]].apply(lambda x: x[1:] if x[0] == "b" else x)

In [32]:
DJIA_news.head()

Unnamed: 0,Date,Label,Top1,Top2,Top3,Top4,Top5,Top6,Top7,Top8,...,Top16,Top17,Top18,Top19,Top20,Top21,Top22,Top23,Top24,Top25
0,2008-08-08,0,"""Georgia 'downs two Russian warplanes' as coun...",'BREAKING: Musharraf to be impeached.','Russia Today: Columns of troops roll into Sou...,'Russian tanks are moving towards the capital ...,"""Afghan children raped with 'impunity,' U.N. o...",'150 Russian tanks have entered South Ossetia ...,"""Breaking: Georgia invades South Ossetia, Russ...","""The 'enemy combatent' trials are nothing but ...",...,'Georgia Invades South Ossetia - if Russia get...,'Al-Qaeda Faces Islamist Backlash',"'Condoleezza Rice: ""The US would not act to pr...",'This is a busy day: The European Union has a...,"""Georgia will withdraw 1,000 soldiers from Ira...",'Why the Pentagon Thinks Attacking Iran is a B...,'Caucasus in crisis: Georgia invades South Oss...,'Indian shoe manufactory - And again in a ser...,'Visitors Suffering from Mental Illnesses Bann...,"""No Help for Mexico's Kidnapping Surge"""
1,2008-08-11,1,'Why wont America and Nato help us? If they wo...,'Bush puts foot down on Georgian conflict',"""Jewish Georgian minister: Thanks to Israeli t...",'Georgian army flees in disarray as Russians a...,"""Olympic opening ceremony fireworks 'faked'""",'What were the Mossad with fraudulent New Zeal...,'Russia angered by Israeli military sale to Ge...,'An American citizen living in S.Ossetia blame...,...,'Israel and the US behind the Georgian aggress...,"'""Do not believe TV, neither Russian nor Georg...",'Riots are still going on in Montreal (Canada)...,'China to overtake US as largest manufacturer','War in South Ossetia [PICS]','Israeli Physicians Group Condemns State Torture',' Russia has just beaten the United States ove...,'Perhaps *the* question about the Georgia - Ru...,'Russia is so much better at war',"""So this is what it's come to: trading sex for..."
2,2008-08-12,0,'Remember that adorable 9-year-old who sang at...,"""Russia 'ends Georgia operation'""","'""If we had no sexual harassment we would have...","""Al-Qa'eda is losing support in Iraq because o...",'Ceasefire in Georgia: Putin Outmaneuvers the ...,'Why Microsoft and Intel tried to kill the XO ...,'Stratfor: The Russo-Georgian War and the Bala...,"""I'm Trying to Get a Sense of This Whole Georg...",...,'U.S. troops still in Georgia (did you know th...,'Why Russias response to Georgia was right',"'Gorbachev accuses U.S. of making a ""serious b...","'Russia, Georgia, and NATO: Cold War Two'",'Remember that adorable 62-year-old who led yo...,'War in Georgia: The Israeli connection','All signs point to the US encouraging Georgia...,'Christopher King argues that the US and NATO ...,'America: The New Mexico?',"""BBC NEWS | Asia-Pacific | Extinction 'by man ..."
3,2008-08-13,0,' U.S. refuses Israel weapons to attack Iran: ...,"""When the president ordered to attack Tskhinva...",' Israel clears troops who killed Reuters came...,'Britain\'s policy of being tough on drugs is ...,'Body of 14 year old found in trunk; Latest (r...,'China has moved 10 *million* quake survivors ...,"""Bush announces Operation Get All Up In Russia...",'Russian forces sink Georgian ships ',...,'Elephants extinct by 2020?','US humanitarian missions soon in Georgia - if...,"""Georgia's DDOS came from US sources""","'Russian convoy heads into Georgia, violating ...",'Israeli defence minister: US against strike o...,'Gorbachev: We Had No Choice','Witness: Russian forces head towards Tbilisi ...,' Quarter of Russians blame U.S. for conflict:...,'Georgian president says US military will tak...,'2006: Nobel laureate Aleksander Solzhenitsyn ...
4,2008-08-14,1,'All the experts admit that we should legalise...,'War in South Osetia - 89 pictures made by a R...,'Swedish wrestler Ara Abrahamian throws away m...,'Russia exaggerated the death toll in South Os...,'Missile That Killed 9 Inside Pakistan May Hav...,"""Rushdie Condemns Random House's Refusal to Pu...",'Poland and US agree to missle defense deal. I...,"'Will the Russians conquer Tblisi? Bet on it, ...",...,'Bank analyst forecast Georgian crisis 2 days ...,"""Georgia confict could set back Russia's US re...",'War in the Caucasus is as much the product of...,"'""Non-media"" photos of South Ossetia/Georgia c...",'Georgian TV reporter shot by Russian sniper d...,'Saudi Arabia: Mother moves to block child mar...,'Taliban wages war on humanitarian aid workers',"'Russia: World ""can forget about"" Georgia\'s ...",'Darfur rebels accuse Sudan of mounting major ...,'Philippines : Peace Advocate say Muslims need...


#### Stemming

As stemming may affect our sentiment analysis, we will be creating a copy of the dataset before stemming.

The stemmed dataset will be used to calculate relevance between the various headlines.

In [33]:
DJIA_news2 = DJIA_news.copy()
DJIA_news2_col = DJIA_news2.columns

for i in range(2, len(DJIA_news_col)):
    DJIA_news2[DJIA_news_col[i]] = DJIA_news2[DJIA_news2_col[i]].apply(lambda x: x.lower())
    DJIA_news2[DJIA_news_col[i]] = DJIA_news2[DJIA_news2_col[i]].str.replace('[^\w\s]','')

In [34]:
st = PorterStemmer()

DJIA_news2_col = DJIA_news2.columns

for i in range(2, len(DJIA_news2_col)):
    DJIA_news2[DJIA_news2_col[i]] = DJIA_news2[DJIA_news2_col[i]].apply(lambda x: " ".join([st.stem(word) for word in x.split()]))

In [35]:
DJIA_news2.head()

Unnamed: 0,Date,Label,Top1,Top2,Top3,Top4,Top5,Top6,Top7,Top8,...,Top16,Top17,Top18,Top19,Top20,Top21,Top22,Top23,Top24,Top25
0,2008-08-08,0,georgia down two russian warplan as countri mo...,break musharraf to be impeach,russia today column of troop roll into south o...,russian tank are move toward the capit of sout...,afghan children rape with impun un offici say ...,150 russian tank have enter south ossetia whil...,break georgia invad south ossetia russia warn ...,the enemi combat trial are noth but a sham sal...,...,georgia invad south ossetia if russia get invo...,alqaeda face islamist backlash,condoleezza rice the us would not act to preve...,thi is a busi day the european union ha approv...,georgia will withdraw 1000 soldier from iraq t...,whi the pentagon think attack iran is a bad id...,caucasu in crisi georgia invad south ossetia,indian shoe manufactori and again in a seri of...,visitor suffer from mental ill ban from olymp,no help for mexico kidnap surg
1,2008-08-11,1,whi wont america and nato help us if they wont...,bush put foot down on georgian conflict,jewish georgian minist thank to isra train wer...,georgian armi flee in disarray as russian adva...,olymp open ceremoni firework fake,what were the mossad with fraudul new zealand ...,russia anger by isra militari sale to georgia,an american citizen live in sossetia blame us ...,...,israel and the us behind the georgian aggress,do not believ tv neither russian nor georgian ...,riot are still go on in montreal canada becaus...,china to overtak us as largest manufactur,war in south ossetia pic,isra physician group condemn state tortur,russia ha just beaten the unit state over the ...,perhap the question about the georgia russia c...,russia is so much better at war,so thi is what it come to trade sex for food
2,2008-08-12,0,rememb that ador 9yearold who sang at the open...,russia end georgia oper,if we had no sexual harass we would have no ch...,alqaeda is lose support in iraq becaus of a br...,ceasefir in georgia putin outmaneuv the west,whi microsoft and intel tri to kill the xo 100...,stratfor the russogeorgian war and the balanc ...,im tri to get a sens of thi whole georgiarussi...,...,us troop still in georgia did you know they we...,whi russia respons to georgia wa right,gorbachev accus us of make a seriou blunder in...,russia georgia and nato cold war two,rememb that ador 62yearold who led your countr...,war in georgia the isra connect,all sign point to the us encourag georgia to i...,christoph king argu that the us and nato are b...,america the new mexico,bbc news asiapacif extinct by man not climat
3,2008-08-13,0,us refus israel weapon to attack iran report,when the presid order to attack tskhinvali the...,israel clear troop who kill reuter cameraman,britain polici of be tough on drug is pointles...,bodi of 14 year old found in trunk latest rans...,china ha move 10 million quak survivor into pr...,bush announc oper get all up in russia grill y...,russian forc sink georgian ship,...,eleph extinct by 2020,us humanitarian mission soon in georgia if rus...,georgia ddo came from us sourc,russian convoy head into georgia violat truce,isra defenc minist us against strike on iran,gorbachev we had no choic,wit russian forc head toward tbilisi in breach...,quarter of russian blame us for conflict poll,georgian presid say us militari will take cont...,2006 nobel laureat aleksand solzhenitsyn accus...
4,2008-08-14,1,all the expert admit that we should legalis drug,war in south osetia 89 pictur made by a russia...,swedish wrestler ara abrahamian throw away med...,russia exagger the death toll in south ossetia...,missil that kill 9 insid pakistan may have bee...,rushdi condemn random hous refus to publish no...,poland and us agre to missl defens deal intere...,will the russian conquer tblisi bet on it no s...,...,bank analyst forecast georgian crisi 2 day earli,georgia confict could set back russia us relat...,war in the caucasu is as much the product of a...,nonmedia photo of south ossetiageorgia conflict,georgian tv report shot by russian sniper dure...,saudi arabia mother move to block child marriag,taliban wage war on humanitarian aid worker,russia world can forget about georgia territor...,darfur rebel accus sudan of mount major attack,philippin peac advoc say muslim need assur chr...


# -- End of Data Preprocessing --

# Calculating Relevance score

We will use the Jaccard Similarity to determine how similar each headline is compared to the rest of the headlines for that particular day.

The higher the score, the more relevant the headline is for that particular day. This means that the events mentioned in the higher scoring headlines are likely to be larger events.

Since stock markets are sensitive to current affairs, larger events are likely to affect the stock market more.

Hence, using the relevance score, we will be able to add weightage to the sentiment score for each headline.

In [157]:
DJIA_news2_row = DJIA_news2.shape[0]
DJIA_news_jaccard = []

def calculate_jaccard_score(d1, d2):
    set_a, set_b = set(d1), set(d2)
    return len(set_a & set_b) / len(set_a | set_b)

for i in range(DJIA_news2_row):
    jaccard_score_list = []
    for j in range(2, len(DJIA_news2_col)):
        jaccard_scores = []
        for k in range(2, len(DJIA_news2_col)):
            if j != k:
                jaccard_scores.append(calculate_jaccard_score(DJIA_news2.iloc[i][j], DJIA_news2.iloc[i][k]))
        jaccard_score_list.append(np.mean(jaccard_scores))
    DJIA_news_jaccard.append(jaccard_score_list)
    


In [141]:
#Converting into Dataframe
DJIA_news_jaccard_df = pd.DataFrame(DJIA_news_jaccard)

#Adding column name
DJIA_news_jaccard_df.columns = (DJIA_news2.columns[2:])

DJIA_news_jaccard_df.head()

Unnamed: 0,Top1,Top2,Top3,Top4,Top5,Top6,Top7,Top8,Top9,Top10,...,Top16,Top17,Top18,Top19,Top20,Top21,Top22,Top23,Top24,Top25
0,0.743682,0.637267,0.750373,0.782153,0.769567,0.581044,0.680257,0.660511,0.750307,0.620418,...,0.720483,0.529736,0.722943,0.740832,0.705644,0.752041,0.62246,0.7487,0.631304,0.676312
1,0.695923,0.717916,0.637781,0.719973,0.520379,0.652183,0.616578,0.704732,0.637517,0.602522,...,0.653665,0.701607,0.682673,0.669827,0.623605,0.683917,0.622511,0.664112,0.601276,0.576546
2,0.697748,0.542614,0.627454,0.697886,0.693806,0.64104,0.740321,0.718132,0.73329,0.684228,...,0.729916,0.622667,0.738715,0.650858,0.657616,0.595659,0.717359,0.70943,0.5332,0.580802
3,0.658834,0.676661,0.669712,0.708683,0.589353,0.615724,0.678822,0.625214,0.679337,0.555979,...,0.373997,0.630092,0.560927,0.631035,0.653917,0.512024,0.712802,0.6373,0.698961,0.576185
4,0.660675,0.676434,0.672871,0.638886,0.609757,0.739593,0.695038,0.613225,0.69434,0.654393,...,0.606681,0.670824,0.729257,0.756877,0.720952,0.679731,0.653483,0.682259,0.605093,0.717553


# Sentiment Analysis 

We will split the sentiment analysis into 3 parts. 

For part 1, we will calculate a sentiment score for each headline.

For part 2, we will bring in the relevance score and calculate a weighted sentiment score for each headline. 

For part 3, we will calculate an overall sentiment score for each day.


## Part 1

To calculate the sentiment score, we will use the sentiment analyzer from the `nltk` library. It uses the VADER method or **Valence Aware Dictionary for sEntiment Reasoning**. It is a lexicon (vocabulary) of words and their relative sentiment strength.

In [148]:
sid = SentimentIntensityAnalyzer()

DJIA_news_row = DJIA_news.shape[0]
DJIA_news_sentiment = []

for i in range(DJIA_news_row):
    sentiment_score_list = []
    for j in range(2, len(DJIA_news_col)):
        ss = sid.polarity_scores(DJIA_news.iloc[i][j])
        sentiment_score = ss['compound']
        sentiment_score_list.append(sentiment_score)
    DJIA_news_sentiment.append(sentiment_score_list)

In [150]:
#Converting into Dataframe
DJIA_news_sentiment_df = pd.DataFrame(DJIA_news_sentiment)

#Adding column name
DJIA_news_sentiment_df.columns = (DJIA_news.columns[2:])

DJIA_news_sentiment_df.head()

Unnamed: 0,Top1,Top2,Top3,Top4,Top5,Top6,Top7,Top8,Top9,Top10,...,Top16,Top17,Top18,Top19,Top20,Top21,Top22,Top23,Top24,Top25
0,-0.5994,0.0,-0.3612,-0.7089,-0.926,0.0,-0.2732,0.2144,-0.5719,-0.5994,...,-0.5994,0.0,-0.3125,0.2023,0.0258,-0.7579,-0.6249,-0.2755,-0.8519,0.128
1,0.7964,-0.3182,0.4404,-0.1965,0.0,-0.4939,-0.5106,-0.0772,-0.2263,-0.34,...,-0.296,-0.3804,-0.8271,0.0,-0.5994,-0.802,0.0,-0.3182,-0.1832,0.0
2,0.0258,0.0,-0.7845,-0.6124,0.0,-0.6908,-0.5994,-0.5994,0.34,-0.765,...,0.0,0.0,0.0772,-0.5994,-0.5859,-0.5994,0.5267,0.3818,0.0,0.0
3,-0.7184,-0.8074,-0.6369,-0.128,-0.5106,0.0,0.5106,0.0,0.3612,0.4019,...,0.0,0.0,0.0,-0.5423,-0.0258,-0.296,0.4939,-0.5719,-0.4215,-0.34
4,0.2023,-0.5994,0.6808,-0.8689,-0.6124,-0.6369,0.6908,-0.4404,-0.6808,-0.34,...,-0.6249,0.0,-0.7579,-0.3182,0.0,-0.4404,-0.5994,0.1779,-0.6908,0.7096


## Part 2

To calculate the weighted sentiment score, we will multiply the sentiment score by its relevance score. 

In [160]:
DJIA_news_weighted_df = DJIA_news_jaccard_df * DJIA_news_sentiment_df
DJIA_news_weighted_df.head()

Unnamed: 0,Top1,Top2,Top3,Top4,Top5,Top6,Top7,Top8,Top9,Top10,...,Top16,Top17,Top18,Top19,Top20,Top21,Top22,Top23,Top24,Top25
0,-0.445763,0.0,-0.271035,-0.554468,-0.712619,0.0,-0.185846,0.141614,-0.429101,-0.371879,...,-0.431857,0.0,-0.22592,0.14987,0.018206,-0.569972,-0.388975,-0.206267,-0.537808,0.086568
1,0.554233,-0.228441,0.280879,-0.141475,0.0,-0.322113,-0.314825,-0.054405,-0.14427,-0.204858,...,-0.193485,-0.266891,-0.564639,0.0,-0.373789,-0.548501,0.0,-0.211321,-0.110154,0.0
2,0.018002,0.0,-0.492237,-0.427385,0.0,-0.44283,-0.443749,-0.430448,0.249319,-0.523435,...,0.0,0.0,0.057029,-0.390124,-0.385297,-0.357038,0.377833,0.27086,0.0,0.0
3,-0.473306,-0.546336,-0.42654,-0.090711,-0.300924,0.0,0.346607,0.0,0.245376,0.223448,...,0.0,0.0,0.0,-0.34221,-0.016871,-0.151559,0.352053,-0.364472,-0.294612,-0.195903
4,0.133654,-0.405455,0.45809,-0.555128,-0.373415,-0.471047,0.480132,-0.270064,-0.472707,-0.222494,...,-0.379115,0.0,-0.552704,-0.240838,0.0,-0.299354,-0.391698,0.121374,-0.417998,0.509176


## Part 3

To calculate the overall sentiment for a particular day, we will find the average of all the weighted sentiment scores of that day.

In [180]:
DJIA_news_overall_sentiment = DJIA_news[['Date', 'Label']]


DJIA_news_overall_sentiment['DJIA_news_sentiment'] = DJIA_news_weighted_df.mean(axis=1)

DJIA_news_overall_sentiment.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,Date,Label,DJIA_news_sentiment
0,2008-08-08,0,-0.256694
1,2008-08-11,1,-0.103339
2,2008-08-12,0,-0.163408
3,2008-08-13,0,-0.091632
4,2008-08-14,1,-0.128635
