# Assessing NLTK sentiment analysis model -vader- on financial news
The purpose of this notebook is to assess the performance of an out of the box sentiment analysis model like nltk's vader on the financial news and see how reliable it can be

## Prepare The Data
The data will have the following columns: `label`, and `headline`.
`label` represents the sentiment analysis of the headline using the following values: `neutral`, `negative`, and `positive`. Later, the values of the label will be changed to numeric values to be able to compare them with the scores from `vader` model scores.

In [None]:
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [None]:
df = pd.read_csv('../input/sentiment-analysis-for-financial-news/all-data.csv', names=['label', 'headline'], encoding='latin-1')
df.head()

The label values will be changed to numeric scores make it more accessible when calculating the error between the actual labels and scores from `vader` model.

In [None]:
df.replace('neutral', 0, inplace=True)
df.replace('negative', -1, inplace=True)
df.replace('positive', 1, inplace=True)

In [None]:
df.head()

## Sentiment Analysis with vader
`vader` is an out of the box sentiment analysis model from nltk.

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

vader = SentimentIntensityAnalyzer()

`vader` gives each headline polarity scores to predicit how negative neutral or positive is the sentiment of each headline. 

We are only concerened with the `compound` score. The `compound` score ranges from `-1` to `1`.

In [None]:
scores = df['headline'].apply(vader.polarity_scores)

vader_scores = pd.DataFrame.from_records(scores)
vader_scores.head()

## Join the scores from `vader` model with actual scores

In [None]:
vader_scores = vader_scores[['compound']]
vader_scores.rename(columns={'compound': 'vader_score'}, inplace=True)
vader_scores.head()

In [None]:
df = df.join(vader_scores)
df.head()

## Calculate the error of `vader` outputs
We will calulate both the mean absolute error and mean squared error for the outputs of the `vader` model to get a sense of how accurate is it.

In [None]:
mae = mean_absolute_error(df['label'], df['vader_score'])
mse = mean_squared_error(df['label'], df['vader_score'])

print("Mean Absolue Error", mae)
print("Mean Squared Error", mse)

## Conclusion
Financial news need more specific models trained on this type of text to be able to predict it accuratly. Looking at the error rate of the `vader` model, general sentiment analysis models are not reliable enough for analyzing financial news.