# Financial News Headline Sentiment Analysis with BERT
This is a short notebook that loads news headlines from three news sources, CNBC, Guardian, and Reuters; then perform sentiment analysis with Bert!

## Dependencies

In [None]:
!pip install transformers

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

## Instantiate Model

In [None]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

## Test out the model

In [None]:
#Encode
tokens = tokenizer.encode('It was good but couldve been better. Great', return_tensors='pt')
result = model(tokens)

In [None]:
#Sentiment
result.logits
int(torch.argmax(result.logits))+1

## Let's get the news data

Imports

In [None]:
import numpy as np
import pandas as pd
import datetime

In [None]:
cnbc_df =  pd.read_csv("cnbc_headlines.csv")
guardian_df =  pd.read_csv("guardian_headlines.csv")
reuters_df =  pd.read_csv("reuters_headlines.csv")

### CNBC Data

In [None]:
cnbc_df.columns

In [None]:
cnbc_df

In [None]:
# we are not using Description column because guardian doesn't have one
cnbc_df = cnbc_df.drop(['Description'] , axis = 1)

In [None]:
cnbc_df.info()

In [None]:
cnbc_df.isna().sum()

In [None]:
cnbc_df.dropna(inplace=True)

### Guardian Data

In [None]:
guardian_df.head

In [None]:
guardian_df.iloc[0]

In [None]:
guardian_df.isnull().sum()

### Reuters Data

In [None]:
reuters_df.head

In [None]:
reuters_df.iloc[0]

In [None]:
reuters_df.isnull().sum()

In [None]:
reuters_df = reuters_df.drop(['Description'] , axis = 1)

## Run Data through BERT

In [None]:
# Function that takes in news and outputs sentiment
def sentiment_score(news):
    tokens = tokenizer.encode(news, return_tensors='pt')
    result = model(tokens)
    return int(torch.argmax(result.logits))+1

In [None]:
# Send each news outlet's headlines through the BART Model and store the Sentiment output. This may take a while ... 
cnbc_df['Sentiment'] = cnbc_df['Headlines'].apply(lambda x: sentiment_score(x[:]))
guardian_df['Sentiment'] = guardian_df['Headlines'].apply(lambda x: sentiment_score(x[:]))
reuters_df['Sentiment'] = reuters_df['Headlines'].apply(lambda x: sentiment_score(x[:]))

In [None]:
cnbc_df

In [None]:
guardian_df

In [None]:
reuters_df

## Visualize Results

In [None]:
import matplotlib.pyplot as plt

In [None]:
def plot_pie_sentiment(df, name):
    sentiment_counts = df['Sentiment'].value_counts()
    
    # Plot the pie chart
    plt.figure(figsize=(8, 8))
    plt.pie(sentiment_counts, labels=sentiment_counts.index, autopct='%1.1f%%', startangle=140)
    
    # Add a legend
    plt.legend(title='Sentiments', labels=sentiment_counts.index, loc='upper left')
    
    plt.title('Distribution of Sentiments in ' + name + ' Dataset')
    plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
    
    plt.show()

In [None]:
plot_pie_sentiment(cnbc_df, 'CNBC')
plot_pie_sentiment(guardian_df, 'Guardian')
plot_pie_sentiment(reuters_df, 'Reuters')

Wow, it seems most of the financial news from these outlets is on the negative side. 

## Let's Export the Dataframes

In [None]:
cnbc_df.to_csv('cnbc_sentiment.csv')
guardian_df.to_csv('guardian_sentiment.csv')
reuters_df.to_csv('reuters_sentiment.csv')