## Stock Sentiment Analysis using News Headlines
***Using News Paper Headlines to predict weather the Next day's news would increase the stock price of a news channel or not.***

### Dataset
***The data set in consideration is a combination of the world news and stock price shifts.
There are 25 columns of top news headlines for each day in the data frame.***
- Class 1- the stock price increased.
- Class 0- the stock price stayed the same or decreased.

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('Stock Sentiment Analysis using News Headlines/Data.csv', encoding='ISO-8859-1')

In [3]:
df.head(3)

Unnamed: 0,Date,Label,Top1,Top2,Top3,Top4,Top5,Top6,Top7,Top8,...,Top16,Top17,Top18,Top19,Top20,Top21,Top22,Top23,Top24,Top25
0,2000-01-03,0,A 'hindrance to operations': extracts from the...,Scorecard,Hughes' instant hit buoys Blues,Jack gets his skates on at ice-cold Alex,Chaos as Maracana builds up for United,Depleted Leicester prevail as Elliott spoils E...,Hungry Spurs sense rich pickings,Gunners so wide of an easy target,...,Flintoff injury piles on woe for England,Hunters threaten Jospin with new battle of the...,Kohl's successor drawn into scandal,The difference between men and women,"Sara Denver, nurse turned solicitor",Diana's landmine crusade put Tories in a panic,Yeltsin's resignation caught opposition flat-f...,Russian roulette,Sold out,Recovering a title
1,2000-01-04,0,Scorecard,The best lake scene,Leader: German sleaze inquiry,"Cheerio, boyo",The main recommendations,Has Cubie killed fees?,Has Cubie killed fees?,Has Cubie killed fees?,...,On the critical list,The timing of their lives,Dear doctor,Irish court halts IRA man's extradition to Nor...,Burundi peace initiative fades after rebels re...,PE points the way forward to the ECB,Campaigners keep up pressure on Nazi war crime...,Jane Ratcliffe,Yet more things you wouldn't know without the ...,Millennium bug fails to bite
2,2000-01-05,0,Coventry caught on counter by Flo,United's rivals on the road to Rio,Thatcher issues defence before trial by video,Police help Smith lay down the law at Everton,Tale of Trautmann bears two more retellings,England on the rack,Pakistan retaliate with call for video of Walsh,Cullinan continues his Cape monopoly,...,South Melbourne (Australia),Necaxa (Mexico),Real Madrid (Spain),Raja Casablanca (Morocco),Corinthians (Brazil),Tony's pet project,Al Nassr (Saudi Arabia),Ideal Holmes show,Pinochet leaves hospital after tests,Useful links


In [4]:
train = df[df['Date'] < '20150101']
test = df[df['Date'] > '20141231']

In [5]:
data = train.iloc[:, 2:27]

## removing the punctuations from the data
data.replace("[^a-zA-Z]", " ", regex=True, inplace=True)

## renaming the columns 
lst = list(range(25))
col_index = [str(i) for i in lst]
data.columns = col_index
data.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,15,16,17,18,19,20,21,22,23,24
0,A hindrance to operations extracts from the...,Scorecard,Hughes instant hit buoys Blues,Jack gets his skates on at ice cold Alex,Chaos as Maracana builds up for United,Depleted Leicester prevail as Elliott spoils E...,Hungry Spurs sense rich pickings,Gunners so wide of an easy target,Derby raise a glass to Strupar s debut double,Southgate strikes Leeds pay the penalty,...,Flintoff injury piles on woe for England,Hunters threaten Jospin with new battle of the...,Kohl s successor drawn into scandal,The difference between men and women,Sara Denver nurse turned solicitor,Diana s landmine crusade put Tories in a panic,Yeltsin s resignation caught opposition flat f...,Russian roulette,Sold out,Recovering a title
1,Scorecard,The best lake scene,Leader German sleaze inquiry,Cheerio boyo,The main recommendations,Has Cubie killed fees,Has Cubie killed fees,Has Cubie killed fees,Hopkins furious at Foster s lack of Hannibal...,Has Cubie killed fees,...,On the critical list,The timing of their lives,Dear doctor,Irish court halts IRA man s extradition to Nor...,Burundi peace initiative fades after rebels re...,PE points the way forward to the ECB,Campaigners keep up pressure on Nazi war crime...,Jane Ratcliffe,Yet more things you wouldn t know without the ...,Millennium bug fails to bite
2,Coventry caught on counter by Flo,United s rivals on the road to Rio,Thatcher issues defence before trial by video,Police help Smith lay down the law at Everton,Tale of Trautmann bears two more retellings,England on the rack,Pakistan retaliate with call for video of Walsh,Cullinan continues his Cape monopoly,McGrath puts India out of their misery,Blair Witch bandwagon rolls on,...,South Melbourne Australia,Necaxa Mexico,Real Madrid Spain,Raja Casablanca Morocco,Corinthians Brazil,Tony s pet project,Al Nassr Saudi Arabia,Ideal Holmes show,Pinochet leaves hospital after tests,Useful links


In [6]:
## making all the data in the lower casing
for col in col_index:
    data[col] = data[col].str.lower()

data.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,15,16,17,18,19,20,21,22,23,24
0,a hindrance to operations extracts from the...,scorecard,hughes instant hit buoys blues,jack gets his skates on at ice cold alex,chaos as maracana builds up for united,depleted leicester prevail as elliott spoils e...,hungry spurs sense rich pickings,gunners so wide of an easy target,derby raise a glass to strupar s debut double,southgate strikes leeds pay the penalty,...,flintoff injury piles on woe for england,hunters threaten jospin with new battle of the...,kohl s successor drawn into scandal,the difference between men and women,sara denver nurse turned solicitor,diana s landmine crusade put tories in a panic,yeltsin s resignation caught opposition flat f...,russian roulette,sold out,recovering a title
1,scorecard,the best lake scene,leader german sleaze inquiry,cheerio boyo,the main recommendations,has cubie killed fees,has cubie killed fees,has cubie killed fees,hopkins furious at foster s lack of hannibal...,has cubie killed fees,...,on the critical list,the timing of their lives,dear doctor,irish court halts ira man s extradition to nor...,burundi peace initiative fades after rebels re...,pe points the way forward to the ecb,campaigners keep up pressure on nazi war crime...,jane ratcliffe,yet more things you wouldn t know without the ...,millennium bug fails to bite
2,coventry caught on counter by flo,united s rivals on the road to rio,thatcher issues defence before trial by video,police help smith lay down the law at everton,tale of trautmann bears two more retellings,england on the rack,pakistan retaliate with call for video of walsh,cullinan continues his cape monopoly,mcgrath puts india out of their misery,blair witch bandwagon rolls on,...,south melbourne australia,necaxa mexico,real madrid spain,raja casablanca morocco,corinthians brazil,tony s pet project,al nassr saudi arabia,ideal holmes show,pinochet leaves hospital after tests,useful links


In [7]:
## joining all the text from all the columns and creating one combined paragraph
## This will create  a whole paragraph from all the row text
headlines = []
for row in range(0, data.shape[0]):
    headlines.append(' '.join(str(x) for x in data.iloc[row,:]))

In [8]:
headlines[0]

'a  hindrance to operations   extracts from the leaked reports scorecard hughes  instant hit buoys blues jack gets his skates on at ice cold alex chaos as maracana builds up for united depleted leicester prevail as elliott spoils everton s party hungry spurs sense rich pickings gunners so wide of an easy target derby raise a glass to strupar s debut double southgate strikes  leeds pay the penalty hammers hand robson a youthful lesson saints party like it s      wear wolves have turned into lambs stump mike catches testy gough s taunt langer escapes to hit     flintoff injury piles on woe for england hunters threaten jospin with new battle of the somme kohl s successor drawn into scandal the difference between men and women sara denver  nurse turned solicitor diana s landmine crusade put tories in a panic yeltsin s resignation caught opposition flat footed russian roulette sold out recovering a title'

In [9]:
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [10]:
# import nltk
# corpus = []

# for paragraph in headlines:
#     temp = []
#     sentences = nltk.sent_tokenize(paragraph)
#     for sentence in sentences:
#         words = [lemmatizer.lemmatize(word) for word in nltk.word_tokenize(sentence) if word not in set(stopwords.words('english'))]
#         words = ' '.join(words)
#         temp.append(words)
#     corpus.append(temp)

In [11]:
# corpus

In [12]:
## creating bag of words model
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier

In [13]:
cv = CountVectorizer(ngram_range=(2,2))
train_ds = cv.fit_transform(headlines)

In [14]:
from sklearn.ensemble import RandomForestClassifier

spam_detector = RandomForestClassifier(random_state=23)
spam_detector.fit(train_ds,train['Label'])

In [15]:
test_data = test.iloc[:, 2:27]

## removing the punctuations from the data
test_data.replace("[^a-zA-Z]", " ", regex=True, inplace=True)

## renaming the columns 
lst = list(range(25))
col_index = [str(i) for i in lst]
test_data.columns = col_index
test_data.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,15,16,17,18,19,20,21,22,23,24
3723,Most cases of cancer are the result of sheer b...,Iran dismissed United States efforts to fight ...,Poll One in Germans would join anti Muslim ...,UK royal family s Prince Andrew named in US la...,Some asylum seekers refused to leave the bu...,Pakistani boat blows self up after India navy ...,Sweden hit by third mosque arson attack in a week,cars set alight during French New Year,Salaries for top CEOs rose twice as fast as av...,Norway violated equal pay law judge says Jud...,...,Ukrainian minister threatens TV channel with c...,Palestinian President Mahmoud Abbas has entere...,Israeli security center publishes names of ...,The year was the deadliest year yet in Sy...,A Secret underground complex built by the Nazi...,Restrictions on Web Freedom a Major Global Iss...,Austrian journalist Erich Mchel delivered a pr...,Thousands of Ukraine nationalists march in Kiev,Chinas New Years Resolution No More Harvestin...,Authorities Pull Plug on Russia s Last Politic...
3724,Moscow gt Beijing high speed train will reduc...,Two ancient tombs were discovered in Egypt on ...,China complains to Pyongyang after N Korean so...,Scotland Headed Towards Being Fossil Fuel Free...,Prime Minister Shinzo Abe said Monday he will ...,Sex slave at centre of Prince Andrew scandal f...,Gay relative of Hamas founder faces deportatio...,The number of female drug addicts in Iran has ...,After Decades of Searching the Causeway for t...,India lost tigers in,...,The Islamic State has approved a budget o...,Iceland To Withdraw EU Application Lift Capit...,Blackfield Capital Founder Goes Missing The v...,Rocket stage crashes back to Earth in rural Ch...,Dead as Aircraft Bombs Greek Tanker in Libya...,Belgian murderer Frank Van Den Bleeken to die ...,Czech President criticizes Ukrainian PM says ...,Vietnamese jets join search for missing F...,France seeks end to Russia sanctions over Ukraine,China scraps rare earths caps
3725,US oil falls below a barrel,Toyota gives away fuel cell patents to b...,Young Indian couple who had been granted polic...,A senior figure in Islamic States self declare...,Fukushima rice passes radiation tests for st ...,Nearly all Spanish parties guilty of financial...,King Abdullah to abdicate Saudi Throne,Taliban Commander Caught Networking On LinkedIn,Mexican missing students mayor s wife charged...,New York Times reporter James Risen refused on...,...,Thousands of Indians have fled from their home...,Turkey sacks judges who oversaw Erdogan corrup...,SpaceX Falcon launch and recovery has been a...,CNN Americans charged in botched Gambia coup,Islamic State Police Official Beheaded,Libya bans Palestinians from country to preven...,A judicial inquiry was opened in France on Mon...,Video has captured the moment a cameraman was ...,Syria has complained to the United Nations tha...,Tests over India set to make the iris of bigg...


In [16]:
## making all the data in the lower casing
for col in col_index:
    test_data[col] = test_data[col].str.lower()

test_data.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,15,16,17,18,19,20,21,22,23,24
3723,most cases of cancer are the result of sheer b...,iran dismissed united states efforts to fight ...,poll one in germans would join anti muslim ...,uk royal family s prince andrew named in us la...,some asylum seekers refused to leave the bu...,pakistani boat blows self up after india navy ...,sweden hit by third mosque arson attack in a week,cars set alight during french new year,salaries for top ceos rose twice as fast as av...,norway violated equal pay law judge says jud...,...,ukrainian minister threatens tv channel with c...,palestinian president mahmoud abbas has entere...,israeli security center publishes names of ...,the year was the deadliest year yet in sy...,a secret underground complex built by the nazi...,restrictions on web freedom a major global iss...,austrian journalist erich mchel delivered a pr...,thousands of ukraine nationalists march in kiev,chinas new years resolution no more harvestin...,authorities pull plug on russia s last politic...
3724,moscow gt beijing high speed train will reduc...,two ancient tombs were discovered in egypt on ...,china complains to pyongyang after n korean so...,scotland headed towards being fossil fuel free...,prime minister shinzo abe said monday he will ...,sex slave at centre of prince andrew scandal f...,gay relative of hamas founder faces deportatio...,the number of female drug addicts in iran has ...,after decades of searching the causeway for t...,india lost tigers in,...,the islamic state has approved a budget o...,iceland to withdraw eu application lift capit...,blackfield capital founder goes missing the v...,rocket stage crashes back to earth in rural ch...,dead as aircraft bombs greek tanker in libya...,belgian murderer frank van den bleeken to die ...,czech president criticizes ukrainian pm says ...,vietnamese jets join search for missing f...,france seeks end to russia sanctions over ukraine,china scraps rare earths caps
3725,us oil falls below a barrel,toyota gives away fuel cell patents to b...,young indian couple who had been granted polic...,a senior figure in islamic states self declare...,fukushima rice passes radiation tests for st ...,nearly all spanish parties guilty of financial...,king abdullah to abdicate saudi throne,taliban commander caught networking on linkedin,mexican missing students mayor s wife charged...,new york times reporter james risen refused on...,...,thousands of indians have fled from their home...,turkey sacks judges who oversaw erdogan corrup...,spacex falcon launch and recovery has been a...,cnn americans charged in botched gambia coup,islamic state police official beheaded,libya bans palestinians from country to preven...,a judicial inquiry was opened in france on mon...,video has captured the moment a cameraman was ...,syria has complained to the united nations tha...,tests over india set to make the iris of bigg...


In [17]:
## joining all the text from all the columns and creating one combined paragraph
## This will create  a whole paragraph from all the row text
test_headlines = []
for row in range(0, test_data.shape[0]):
    test_headlines.append(' '.join(str(x) for x in test_data.iloc[row,:]))

In [18]:
# test_corpus = []

# for paragraph in headlines:
#     temp = []
#     sentences = nltk.sent_tokenize(paragraph)
#     for sentence in sentences:
#         words = [lemmatizer.lemmatize(word) for word in nltk.word_tokenize(sentence) if word not in set(stopwords.words('english'))]
#         words = ' '.join(words)
#         temp.append(words)
#     test_corpus.append(temp)

In [19]:
X_test = cv.transform(test_headlines).toarray()

In [20]:
predictions = spam_detector.predict(X_test)

In [21]:
## Import library to check accuracy
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score

In [22]:
matrix=confusion_matrix(test['Label'],predictions)
print(matrix)
score=accuracy_score(test['Label'],predictions)
print(score)
report=classification_report(test['Label'],predictions)
print(report)

[[140  46]
 [  7 185]]
0.8597883597883598
              precision    recall  f1-score   support

           0       0.95      0.75      0.84       186
           1       0.80      0.96      0.87       192

    accuracy                           0.86       378
   macro avg       0.88      0.86      0.86       378
weighted avg       0.88      0.86      0.86       378

