## Sentiment Analysis of Stock Market News using FinBERT

* Importing required libraries

In [None]:
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
from urllib.request import urlopen, Request

*   Webscraping the data from a finance news website which segregates its stocks alphabetically and uses a tick (keyword) in its URL to navigate to a particular stock. 
*   The get_ticks function extracts all the ticks in one go from tickettape/stocks. Using Beautifulsoup and urllib, we extract both the tick and name of stock and store them in a pandas dataframe in alphabetical order.

In [None]:
def get_ticks(url):
    req = Request(url=url,headers={"User-Agent": "Chrome"}) 
    response = urlopen(req)    
    html = BeautifulSoup(response,"html.parser")
    ticks_table = html.find(class_ = 'page')
    
    ticks = list()
    stocks = list()
    for name_box in ticks_table.find_all('a', href = True):
        stocks.append(name_box.text.strip())
        ticks.append(name_box['href'].split('-')[-1].strip())
    d = {'stock':stocks, 'tick':ticks}
    df = pd.DataFrame(data = d)
    return df

* Calling the defined function and first few rows of output

In [None]:
url = "https://www.tickertape.in/stocks"
tick_df = get_ticks(url)
tick_df.head()

Unnamed: 0,stock,tick
0,ABB India Ltd,ABB
1,Adani Enterprises Ltd,ADEL
2,Adani Green Energy Ltd,ADNA
3,Adani Ports and Special Economic Zone Ltd,APSE
4,Adani Power Ltd,ADAN


* Using similar method to extract news information related to each stock

In [None]:
def get_data(url):
    req = Request(url=url,headers={"User-Agent": "Chrome"}) 
    response = urlopen(req).read()    
    html = BeautifulSoup(response,"html.parser")
    news_table = html.find(class_ = 'latest-news-holder')
    
    news = list()
    
    for name_box in news_table.find_all('p', class_='shave-root'):
        news.append(name_box.text.strip())
    
    return news

* Running a for loop where we replace the stock-tick part with the elements in the tick column of our DataFrame we can extract the news of each stock and store it in a list called news.

In [None]:
news = list()
for i, tick in enumerate(tick_df['tick']):
    url = "https://www.tickertape.in/stocks/" + tick + "/news?checklist=basic&ref=stock-overview_overview-sections&type=news"
    headlines = get_data(url)
    news.append(headlines)

* Installing transformers package for sentiment analysis

In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 34.8 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 50.4 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 52.2 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.11.1 tokenizers-0.13.2 transformers-4.25.1


* Initiating pretrained FinBERT model for our analysis

In [None]:
from transformers import BertTokenizer, BertForSequenceClassification

finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')

Downloading:   0%|          | 0.00/533 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/439M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

* Mapping the numeric format of sentiment in FinBERT model to a more Human friendly model using python dictionary called labels.

In [None]:
labels = {0:'neutral', 1:'positive',2:'negative'}

* Preprocessing the text: we use tokenizer and then pass the output from the tokenizer to the finbert object for sentiment analysis.
* We do this for every stock in our list. The sentiment for each headline is stored in a list so that we can later group them to create an aggregate sentiment of each stock. 
* We store the output from the function in a list of lists called tot_val. Some lists appear empty as the website does not contain the news. For those stocks, we just return neutral as the sentiment.

In [None]:
def detect(news):
    tot_val = list()
    for n in news:
        if len(n) == 0:
            tot_val.append(['neutral'])
        else:
            inputs = tokenizer(n, return_tensors="pt", padding=True)
            outputs = finbert(**inputs)[0]
            val = list()
            for idx, sent in enumerate(n):
                val.append(labels[np.argmax(outputs.detach().numpy()[idx])])
                print(sent, '----', val[idx])
            print('#######################################################')    
            tot_val.append(val)
    return tot_val

tot_val = detect(news)

Accumulate ABB India; target of Rs 3291: Prabhudas Lilladher ---- positive
#######################################################
Reliance hit most as top 5 out of 10 firms lose ₹1.67 lakh crore in m-cap ---- negative
Stocks in news: Paytm, HUL, Adani Enterprises, Lupin and more ---- neutral
HUL, Adani Enterprises, One 97 Communications in spotlight ---- neutral
#######################################################
F&O Strategy: Buy Adani Ports 880-put ---- neutral
Equities muted as oil slide offsets concerns over Fed’s rate stance ---- negative
Stocks that will see action on December 7, 2022 ---- neutral
#######################################################
Adani Transmission incorporates wholly-owned subsidiary Adani Cooling Sol ---- neutral
Reliance Is Biggest And Adani Transmission Fastest Wealth Creator Of 2017-2022 ---- positive
Reliance continues to remain big wealth creator: Motilal Oswal study ---- neutral
#######################################################
Geoclean t

* Next, we create an aggregate of the sentiments by combining the sentiments of each headline of every stock. For this purpose, we simply add +1 to the agg variable if the headline was positive and -1 if the headline was negative. Based on the final value of the agg variable we assign positive, negative, or neutral to the stock.

In [None]:
def get_sent(val):
    agg = 0
    for i in val:
        if i == 'positive':
            agg = agg + 1
        elif i == 'negative':
            agg = agg - 1

    if agg > 0:
        return('positive')
    elif agg < 0:
        return('negative')
    else:
        return('neutral')

* We pass our list of sentiments through the get_sent() and obtain the aggregate sentiments. We store these sentiments in a list which we then assign into a column sentiment of our original tick_df DataFrame.
* Looking at the first few rows of the output

In [None]:
sent = list()
for i in tot_val:
    sent.append(get_sent(i))
    
tick_df['sentiment'] = sent 
tick_df.head(20)

Unnamed: 0,stock,tick,sentiment
0,ABB India Ltd,ABB,positive
1,Adani Enterprises Ltd,ADEL,negative
2,Adani Green Energy Ltd,ADNA,neutral
3,Adani Ports and Special Economic Zone Ltd,APSE,negative
4,Adani Power Ltd,ADAN,neutral
5,Adani Total Gas Ltd,ADAG,neutral
6,Adani Transmission Ltd,ADAI,positive
7,Adani Wilmar Ltd,AWL,neutral
8,Ambuja Cements Ltd,ABUJ,neutral
9,Apollo Hospitals Enterprise Ltd,APLH,neutral


* Looking at the last few rows of the output

In [None]:
tick_df.tail(20)

Unnamed: 0,stock,tick,sentiment
80,Sun Pharmaceutical Industries Ltd,SUN,negative
81,Tata Consultancy Services Ltd,TCS,neutral
82,Tata Consumer Products Ltd,TACN,positive
83,Tata Motors Ltd,TAMdv,positive
84,Tata Motors Ltd,TAMO,positive
85,Tata Power Company Ltd,TTPW,neutral
86,Tata Steel Ltd,TISC,positive
87,Tech Mahindra Ltd,TEML,neutral
88,Titan Company Ltd,TITN,positive
89,Torrent Pharmaceuticals Ltd,TORP,neutral
