Skip to content

A project, where SVM and kernel methods are introduced, and applied to financial news sentiment classification.

Notifications You must be signed in to change notification settings

ilmari99/M4ML-SVM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis on headlines of financial news articles

Introduction

The goal of this project was to introduce SVM and Kernel methods, which is visible on the report.pdf. This readme is about the application part.

The application has two parts:

  1. Converting the headlines to vectors
  2. Classifying the vectors as positive, negative or neutra

Converting the headlines to vectors

This was done using a pretrained Word2Vec model with Gensim and NLTK libraries. The model was trained on the Google News dataset, which contains 3 million words and phrases and was trained on roughly 100 billion words from a Google News dataset. The model contains 300-dimensional vectors for 3 million words and phrases. The model can be downloaded from here.

Classifying the vectors as positive, negative or neutral

The data for classification was taken from Kaggle. The data contains 4846 headlines from financial news articles, which have been labeled as positive, negative or neutral (more info from authors).

The classification was done using SVM (sklearn SVC implementation). To find good parameters for SVM, and class weights/SMOTE for imbalanced data, a grid search was done. The grid search was done using 5-fold cross validation. The best parameters were selected based on the best macro F1 score.

The model achieved a macro-F1 score of 0.72 on the test set, with a accuracy of 76%.

We also tried the model on some made-up headlines:

US banks prepare for losses in rush for commercial property exit.: negative
Oil prices pop after Saudi Arabia pledges more voluntary production cuts.: negative
Nvidia short-sellers bleed $3.6bn in May as AI boom continues.: negative
Silicon Valley's tech giants are in trouble. Here's why.: negative
Now is a good time to buy stocks, history shows.: neutral
Federal Reserve's Jerome Powell says economic recovery could stretch through end of 2023.: neutral
Putin moves to extend his rule until 2036 after Russians vote to back constitutional changes.: neutral
U.S. stock futures show S&P 500 consolidating at 2023 highs after Friday's powerful session.: positive
Amazon beats expectations with $88.9bn in sales, stock surges.: positive
Google issues positive earnings surprise as ad revenue rebounds.: positive

confusion_matrix

Confusion matrix of the best model on test data

About

A project, where SVM and kernel methods are introduced, and applied to financial news sentiment classification.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages