# FINANCIAL MARKET SENTIMENT ANALYSIS

### OBJECTIVE

The objective of this project is:
1.   To develop a powerful sentiment analysis model for financial markets, analyzing news, social media data, and other textual sources.
2.   To provide real-time sentiment scores and actionable insights to aid investors and traders in making informed decisions and optimizing their strategies.
3.  To gauge the sentiment of market participants towards a particular asset, company, or market in general. This sentiment can be valuable for investors, financial analysts, and traders as it provides insights into market perceptions and potential future price movements.


### DATA SOURCE

https://raw.githubusercontent.com/YBI-Foundation/Dataset/main/Financial%20Market%20News.csv

## PROJECT

### Import Library

In [1]:
import pandas as pd
import numpy as np

### Import Data

In [2]:
df = pd.read_csv(r'https://raw.githubusercontent.com/YBI-Foundation/Dataset/main/Financial%20Market%20News.csv', encoding="ISO-8859-1")

### Describe Data

In [3]:
df.head()

Unnamed: 0,Date,Label,News 1,News 2,News 3,News 4,News 5,News 6,News 7,News 8,...,News 16,News 17,News 18,News 19,News 20,News 21,News 22,News 23,News 24,News 25
0,01-01-2010,0,McIlroy's men catch cold from Gudjonsson,Obituary: Brian Walsh,Workplace blues leave employers in the red,Classical review: Rattle,Dance review: Merce Cunningham,Genetic tests to be used in setting premiums,Opera review: La Bohème,Pop review: Britney Spears,...,Finland 0 - 0 England,Healy a marked man,Happy birthday Harpers & Queen,Win unlimited access to the Raindance film fes...,Labour pledges £800m to bridge north-south divide,Wales: Lib-Lab pact firm despite resignation,Donald Dewar,Regenerating homes regenerates well-being in ...,Win £100 worth of underwear,TV guide: Random views
1,02-01-2010,0,Warning from history points to crash,Investors flee to dollar haven,Banks and tobacco in favour,Review: Llama Farmers,War jitters lead to sell-off,Your not-so-secret history,Review: The Northern Sinfonia,Review: Hysteria,...,Why Wenger will stick to his Gunners,Out of luck England hit rock bottom,Wilkinson out of his depth,Kinsella sparks Irish power play,Brown banished as Scots rebound,Battling Wales cling to lifeline,Ehiogu close to sealing Boro move,Man-to-man marking,Match stats,French referee at centre of storm is no strang...
2,03-01-2010,0,Comment: Why Israel's peaceniks feel betrayed,Court deals blow to seizure of drug assets,An ideal target for spooks,World steps between two sides intent on war,What the region's papers say,Comment: Fear and rage in Palestine,Poverty and resentment fuels Palestinian fury,Republican feud fear as dissident is killed,...,FTSE goes upwardly mobile,At this price? BP Amoco,Go fish,Bosnian Serb blows himself up to evade law,Orange float delayed to 2001,"Angry factory workers root out fear, favours a...",Smith defied advice on dome payout,Xerox takes the axe to jobs,Comment: Refugees in Britain,Maverick who sparked the new intifada
3,04-01-2010,1,"£750,000-a-goal Weah aims parting shot",Newcastle pay for Fletcher years,Brown sent to the stands for Scotland qualifier,Tourists wary of breaking new ground,Canary Wharf climbs into the FTSE 100,Review: Bill Bailey,Review: Classical,Review: New Contemporaries 2000,...,More cash on way for counties,Cairns carries Kiwis to victory,Year of Blanchflower's flourish when Spurs sto...,New direct approach brings only pay-per-blues,Third Division round-up,Second Division round-up,First Division round-up,McLean ends his career with a punch,Heskey grabs triple crown,Weah on his way as City march on
4,05-01-2010,1,Leeds arrive in Turkey to the silence of the fans,One woman's vision offers loan lifeline,Working Lives: How world leaders worked,Working Lives: Tricks of the trade,"Working Lives: six-hour days, long lunches and...",Pop review: We Love UK,World music review: Marisa Monte,Art review: Hollingsworth/Heyer,...,Duisenberg in double trouble,Pru to cut pension charges,Art review: Paul Graham,Shearer shot sparks Boro humiliation,Ridsdale's lingering fears as Leeds revisit Tu...,Champions League: Rangers v Galatasaray,Champions League: Lazio v Arsenal,Lazio 1 - 1 Arsenal,England in Pakistan,England given olive-branch reception


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4101 entries, 0 to 4100
Data columns (total 27 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Date     4101 non-null   object
 1   Label    4101 non-null   int64 
 2   News 1   4101 non-null   object
 3   News 2   4101 non-null   object
 4   News 3   4101 non-null   object
 5   News 4   4101 non-null   object
 6   News 5   4101 non-null   object
 7   News 6   4101 non-null   object
 8   News 7   4101 non-null   object
 9   News 8   4101 non-null   object
 10  News 9   4101 non-null   object
 11  News 10  4101 non-null   object
 12  News 11  4101 non-null   object
 13  News 12  4101 non-null   object
 14  News 13  4101 non-null   object
 15  News 14  4101 non-null   object
 16  News 15  4101 non-null   object
 17  News 16  4101 non-null   object
 18  News 17  4101 non-null   object
 19  News 18  4101 non-null   object
 20  News 19  4101 non-null   object
 21  News 20  4101 non-null   object
 22  

In [5]:
df.shape

(4101, 27)

In [6]:
df.columns

Index(['Date', 'Label', 'News 1', 'News 2', 'News 3', 'News 4', 'News 5',
       'News 6', 'News 7', 'News 8', 'News 9', 'News 10', 'News 11', 'News 12',
       'News 13', 'News 14', 'News 15', 'News 16', 'News 17', 'News 18',
       'News 19', 'News 20', 'News 21', 'News 22', 'News 23', 'News 24',
       'News 25'],
      dtype='object')

### Data Visualization


In [7]:
import matplotlib.pyplot as plt

### Data Preprocessing

In [30]:
' '.join(str(x) for x in df.iloc[1,2:27])



In [31]:
df.index


RangeIndex(start=0, stop=4101, step=1)

In [32]:
len(df.index)

4101

In [33]:
news=[]
for row in range(0, len(df.index)):
  news.append(' '.join(str(x) for x in df.iloc[row,2:27]))

In [34]:
type(news)

list

In [35]:
news[0]

"McIlroy's men catch cold from Gudjonsson Obituary: Brian Walsh Workplace blues leave employers in the red Classical review: Rattle Dance review: Merce Cunningham Genetic tests to be used in setting premiums Opera review: La Bohème Pop review: Britney Spears Theatre review: The Circle Wales face a fraught night Under-21  round-up Smith off to blot his copybook Finns taking the mickey Praise wasted as Brown studies injury options Ireland wary of minnows Finland 0 - 0 England Healy a marked man Happy birthday Harpers & Queen Win unlimited access to the Raindance film festival Labour pledges £800m to bridge north-south divide Wales: Lib-Lab pact firm despite resignation Donald Dewar Regenerating homes  regenerates well-being in people Win £100 worth of underwear TV guide: Random views"

###Define Target Variable (y) and Feature Variables (X)

In [43]:
X =news

In [44]:
type(X)

list

In [45]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(lowercase= True, ngram_range=(1,1))

In [46]:
X = cv.fit_transform(X)

In [47]:
X.shape

(4101, 48527)

In [48]:
y = df["Label"]

In [49]:
y.shape

(4101,)

### Train Test Split


In [21]:
from sklearn.model_selection import train_test_split

In [22]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,stratify = y,random_state=2529)

###Modeling

In [23]:
from sklearn.ensemble import RandomForestClassifier

In [24]:
rf =RandomForestClassifier(n_estimators=200)

In [25]:
rf.fit(X_train,y_train)

###Prediction

In [26]:
y_pred = rf.predict(X_test)

###Model Evaluation

In [27]:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score

In [28]:
confusion_matrix(y_test,y_pred)

array([[159, 422],
       [170, 480]])

In [29]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.48      0.27      0.35       581
           1       0.53      0.74      0.62       650

    accuracy                           0.52      1231
   macro avg       0.51      0.51      0.48      1231
weighted avg       0.51      0.52      0.49      1231



### Explanation


A Machine Learning model was developed for financial market sentiment analysis and aims to analyze and predict the sentiment or emotional tone of market participants, such as investors or traders, towards financial assets like stocks, cryptocurrencies, or commodities. The sentiment can be positive, negative, or neutral, and it is an essential factor in driving market movements.

**THE STEPS TO BUILD THIS MODEL ARE AS FOLLOWS:**

**Data Collection**:
To build an effective sentiment analysis model, you need labeled data containing textual information (e.g., news articles, social media posts, financial reports) along with sentiment labels (positive, negative, or neutral). The data can be obtained from various sources like financial news websites, Twitter feeds, Reddit discussions, or specialized financial sentiment datasets.
Here, we use a pre-existing dataset from YBI Foundation.

**Data Preprocessing:**
The collected textual data needs to be preprocessed before training the ML model. The preprocessing steps may include:

1. Tokenization:
 Splitting the text into individual words or tokens.
Removing Noise: Removing irrelevant information like special characters, numbers, or punctuation.
2. Lowercasing: Converting all text to lowercase for consistency.
Stopword Removal: Eliminating common words like "the," "is," "and" that do not contribute much to sentiment analysis.
Lemmatization or Stemming: Reducing words to their base or root form to group similar words together.
3. Feature Extraction:
After preprocessing, the textual data needs to be converted into numerical features that ML algorithms can understand. One popular technique for this is using the Bag-of-Words (BoW) or TF-IDF (Term Frequency-Inverse Document Frequency) representation. BoW represents each document as a fixed-length vector, where each element represents the frequency of a word in the document. TF-IDF takes into account the importance of words by scaling their frequency based on their rarity across documents.

**Model Selection:**
For sentiment analysis, several ML algorithms can be used, such as:

1. Naive Bayes: A probabilistic algorithm that works well for text classification tasks like sentiment analysis.
2. Support Vector Machines (SVM): A powerful algorithm for binary classification tasks, suitable for sentiment polarity classification.
3. Deep Learning Models: Recurrent Neural Networks (RNNs) or Transformer-based models like BERT have shown excellent performance in sentiment analysis due to their ability to capture contextual information.

Here, we use the Random Forest Classifier.

**Model Training and Evaluation:**
The labeled data is split into training and testing sets, and the ML model is trained on the training data. After training, the model's performance is evaluated on the testing data using metrics like accuracy, precision, recall, F1-score, or confusion matrix to assess its effectiveness in predicting sentiment.

**Deployment and Real-Time Analysis:**Once the model is trained and evaluated, it can be deployed to analyze real-time data, such as live news feeds, social media streams, or financial reports. The model can provide continuous sentiment analysis, helping traders and investors make informed decisions based on market sentiment.

**Improvement and Iteration:**
Financial market sentiment analysis is an ongoing process. The model's performance can be further improved by fine-tuning hyperparameters, incorporating more relevant features, and using more extensive and up-to-date datasets.

Regular monitoring and updates are necessary to maintain the model's effectiveness in analyzing ever-changing market sentiments.