Sentiment Analysis on headlines of financial news articles

Introduction

The goal of this project was to introduce SVM and Kernel methods, which is visible on the report.pdf. This readme is about the application part.

The application has two parts:

Converting the headlines to vectors
Classifying the vectors as positive, negative or neutra

Converting the headlines to vectors

This was done using a pretrained Word2Vec model with Gensim and NLTK libraries. The model was trained on the Google News dataset, which contains 3 million words and phrases and was trained on roughly 100 billion words from a Google News dataset. The model contains 300-dimensional vectors for 3 million words and phrases. The model can be downloaded from here.

Classifying the vectors as positive, negative or neutral

The data for classification was taken from Kaggle. The data contains 4846 headlines from financial news articles, which have been labeled as positive, negative or neutral (more info from authors).

The classification was done using SVM (sklearn SVC implementation). To find good parameters for SVM, and class weights/SMOTE for imbalanced data, a grid search was done. The grid search was done using 5-fold cross validation. The best parameters were selected based on the best macro F1 score.

The model achieved a macro-F1 score of 0.72 on the test set, with a accuracy of 76%.

We also tried the model on some made-up headlines:

US banks prepare for losses in rush for commercial property exit.: negative
Oil prices pop after Saudi Arabia pledges more voluntary production cuts.: negative
Nvidia short-sellers bleed $3.6bn in May as AI boom continues.: negative
Silicon Valley's tech giants are in trouble. Here's why.: negative
Now is a good time to buy stocks, history shows.: neutral
Federal Reserve's Jerome Powell says economic recovery could stretch through end of 2023.: neutral
Putin moves to extend his rule until 2036 after Russians vote to back constitutional changes.: neutral
U.S. stock futures show S&P 500 consolidating at 2023 highs after Friday's powerful session.: positive
Amazon beats expectations with $88.9bn in sales, stock surges.: positive
Google issues positive earnings surprise as ad revenue rebounds.: positive

Confusion matrix of the best model on test data

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
img		img
.gitignore		.gitignore
README.md		README.md
SentenceToVec.py		SentenceToVec.py
main.py		main.py
report.pdf		report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis on headlines of financial news articles

Introduction

Converting the headlines to vectors

Classifying the vectors as positive, negative or neutral

About

Releases

Packages

Languages

ilmari99/M4ML-SVM

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on headlines of financial news articles

Introduction

Converting the headlines to vectors

Classifying the vectors as positive, negative or neutral

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages