<div class="alert alert-block alert-info">
<h1> Text Mining Project: Stock Sentiment <br>
Predicting Market Behavior from Tweets</h1><br>
 Text Mining 2025<br>
NOVA IMS MDSAA

<div class="alert alert-block alert-warning"> 
[NOTE] tm_tests consists of 3 notebooks: <br>
- tm_tests_01_12.ipynb: Pipeline 1 for EDA, ML models, LSTM, and DistilBERT  <br> 
- tm_tests_02_12.ipynb: Pipeline 2 for GPT-2<br>
- tm_tests_03_12.ipynb: Pipeline 3 for FinBERT<br>

**This is Pipeline 3: "tm_tests_03_12.ipynb"**

# Group 12

|   | Student Name          |  Student ID |
|---|-----------------------|    ---      |
| 1 | Hassan Bhatti       |  20241023 |
| 2 | Moeko Mitani          |   20240670  |
| 3 | Oumayma Ben Hfaiedh   |   20240699  |
| 4 | Rute D'Alva Teixeira      |  20240667  |
| 5 | Sarah Leuthner    |   20240581  |  

# Table of Contents

* [<font color='#52b69a'>1. Data Integration</font>](#1.) <br>
    - [1.1. Import Libraries ](#1.1.)<br>
    - [1.2. Import Data ](#1.2.)<br>  

* [<font color='#52b69a'>2. Data Preparation</font>](#2.) <br>

* [<font color='#52b69a'>3. FinBERT Model </font>](#3.) <br>
    - [3.1. Initializing the Model ](#3.1.)<br>
    - [3.2. Tokenizer Configuration](#3.2.)<br>  
    - [3.3. Batch Inference Pipeline](#3.3.)<br>
    - [3.4. Predictions](#3.4.)<br>  

* [<font color='#52b69a'>4.  Model Evaluation </font>](#4.) <br>



In this notebook, we implement a **Encoder-Only Transformer Architecture** using **FinBERT**.<br>

FinBERT is a pre-trained language model specifically designed for financial sentiment analysis. It is a variant of the BERT model, fine-tuned on a large corpus of financial data to better understand and analyze text from the financial domain. This makes it particularly useful for tasks like determining the sentiment (positive, negative, or neutral) of financial news articles, reports, and social media posts related to the financial markets. 


<a class="anchor" id="1.">

# 1. Data Integration
<a>

<a class="anchor" id="1.1.">

## 1.1. Import Libraries
<a>

In [1]:
import pandas as pd
import scipy
import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter
from tqdm import tqdm

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_auc_score
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

<a class="anchor" id="1.2.">

## 1.2. Import Data
<a>

In [3]:
data = pd.read_csv("train.csv",
                   encoding='unicode_escape',
                   header=0,
                   names=['Text', 'Sentiment'])
data.head()

Unnamed: 0,Text,Sentiment
0,$BYND - JPMorgan reels in expectations on Beyo...,0
1,$CCL $RCL - Nomura points to bookings weakness...,0
2,"$CX - Cemex cut at Credit Suisse, J.P. Morgan ...",0
3,$ESS: BTIG Research cuts to Neutral https://t....,0
4,$FNKO - Funko slides after Piper Jaffray PT cu...,0


In [4]:
data.shape

(9543, 2)

In [5]:
data['Sentiment'].value_counts()

Unnamed: 0_level_0,count
Sentiment,Unnamed: 1_level_1
2,6178
1,1923
0,1442


<a class="anchor" id="2.">

# 2. Data Preparation
<a>

`Step 1` Ensuring Sentiment is an integer for labelling, as required by the model

In [6]:
# Remove non-numeric Sentiment rows (e.g. header accidentally read as data)
data = data[data['Sentiment'].astype(str).str.isdigit()]

In [7]:
data.dtypes

Unnamed: 0,0
Text,object
Sentiment,int64


`Step 2` Splitting the dataset, isolating the target from the input features

In [8]:
# Split text and labels
X = data['Text'].tolist()
y = data['Sentiment'].tolist()

`Step 3` Confirming everything is in place for moddeling:

In [9]:
len(y)==len(X)

True

In [None]:
print(data.head())
print(data.columns)
print(data.shape)
print(data['Sentiment'].isnull().sum())

                                                Text  Sentiment
0  $BYND - JPMorgan reels in expectations on Beyo...          0
1  $CCL $RCL - Nomura points to bookings weakness...          0
2  $CX - Cemex cut at Credit Suisse, J.P. Morgan ...          0
3  $ESS: BTIG Research cuts to Neutral https://t....          0
4  $FNKO - Funko slides after Piper Jaffray PT cu...          0
Index(['Text', 'Sentiment'], dtype='object')
(9543, 2)
0


In [None]:
# Check the types inside the list
set(type(label) for label in y)

{int}

<a class="anchor" id="3.">

# 3. FinBERT model
<a>

<a class="anchor" id="3.1.">

## 3.1. Initializing the model
<a>

The pre-trained transformer model and its matching tokenizer are loaded using Hugging Face's `AutoModelForSequenceClassification` and `AutoTokenizer`. This loads both the architecture and learned weights optimized for sentiment analysis tasks. The model is set to evaluation mode (model.eval()) to disable dropout layers and ensure consistent inference results.

In [None]:
# Batch prediction with memory management
batch_size = 4  # Conservative for 8GB RAM
preds = []  # Will store Bullish/Bearish/Neutral labels

In [13]:
tokenizer = AutoTokenizer.from_pretrained("ahmedrachid/FinancialBERT-Sentiment-Analysis")
model = AutoModelForSequenceClassification.from_pretrained("ahmedrachid/FinancialBERT-Sentiment-Analysis")
model.eval()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/369 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/464k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/789 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/439M [00:00<?, ?B/s]

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30873, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

**Testing**

Testing the model performance on some sample tweets present in the dataset, to ensure it recognizes the labels.

In [14]:
test_texts = [
    # Bearish examples
    ('$CCL $RCL - Nomura points to bookings weakness at Carnival and Royal Caribbean', 0),
    ('$BYND - JPMorgan reels in expectations on Beyond Meat https://t.co/bd0xbFGjkT', 0),

    # Bullish examples
    ('$BYND - JPMorgan raises price target on Beyond Meat', 1),

    # Neutral examples
    ('$CX - Cemex reports stable quarterly results', 2)
]

print("{:<80} {:<12} {:<12}".format("Text", "Predicted", "Expected"))
print("-" * 100)

for text, true_label in test_texts:
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        logits = model(**inputs).logits
        pred_label = model.config.id2label[logits.argmax().item()]
        confidence = torch.softmax(logits, dim=1).max().item()

    # Convert to your label names
    label_map = {'negative': 'Bearish (0)', 'positive': 'Bullish (1)', 'neutral': 'Neutral (2)'}

    print("{:<80} {:<12} {:<12} (Confidence: {:.2f})".format(
        text[:75] + "..." if len(text) > 75 else text,
        label_map[pred_label],
        ['Bearish (0)', 'Bullish (1)', 'Neutral (2)'][true_label],
        confidence
    ))

Text                                                                             Predicted    Expected    
----------------------------------------------------------------------------------------------------
$CCL $RCL - Nomura points to bookings weakness at Carnival and Royal Caribb...   Bearish (0)  Bearish (0)  (Confidence: 1.00)
$BYND - JPMorgan reels in expectations on Beyond Meat https://t.co/bd0xbFGj...   Neutral (2)  Bearish (0)  (Confidence: 1.00)


model.safetensors:   0%|          | 0.00/439M [00:00<?, ?B/s]

$BYND - JPMorgan raises price target on Beyond Meat                              Bullish (1)  Bullish (1)  (Confidence: 1.00)
$CX - Cemex reports stable quarterly results                                     Bullish (1)  Neutral (2)  (Confidence: 1.00)


<a class="anchor" id="3.2.">

## 3.2 Tokenizer Configuration
<a>


The tokenizer is set up with three key parameters. These settings ensure all input texts are processed into uniformly sized tensors the model can handle.

In [15]:
tokenizer_kwargs = {"return_tensors":"pt", "padding":'longest', "truncation": True, "max_length": 256}

<a class="anchor" id="3.3.">

## 3.3. Batch Inference Pipeline
<a>


In this section, we start by tokenizing input texts, passing them through the model to obtain logits. Then, the pipeline processes texts in batches for efficiency

In [16]:
preds = []
for i in tqdm(range(0, len(X), batch_size)):
    batch = X[i:i+batch_size]
    inputs = tokenizer(batch, **tokenizer_kwargs)

    with torch.no_grad():
        logits = model(**inputs).logits
        batch_preds = [model.config.id2label[p.item()] for p in logits.argmax(dim=1)]
        preds.extend(batch_preds)

    # Verify alignment
    assert len(preds) == i + len(batch), "Prediction count mismatch!"

assert len(preds) == len(X), f"Missing predictions! Expected {len(X)}, got {len(preds)}"


  0%|          | 0/2386 [00:00<?, ?it/s][A
  0%|          | 1/2386 [00:01<53:55,  1.36s/it][A
  0%|          | 2/2386 [00:02<50:22,  1.27s/it][A
  0%|          | 3/2386 [00:03<40:13,  1.01s/it][A
  0%|          | 4/2386 [00:03<29:56,  1.33it/s][A
  0%|          | 5/2386 [00:04<25:16,  1.57it/s][A
  0%|          | 6/2386 [00:04<21:22,  1.86it/s][A
  0%|          | 7/2386 [00:04<19:28,  2.04it/s][A
  0%|          | 8/2386 [00:05<18:38,  2.13it/s][A
  0%|          | 9/2386 [00:05<16:03,  2.47it/s][A
  0%|          | 10/2386 [00:05<15:39,  2.53it/s][A
  0%|          | 11/2386 [00:06<13:35,  2.91it/s][A
  1%|          | 12/2386 [00:06<12:10,  3.25it/s][A
  1%|          | 13/2386 [00:06<11:32,  3.42it/s][A
  1%|          | 14/2386 [00:06<12:25,  3.18it/s][A
  1%|          | 15/2386 [00:07<16:00,  2.47it/s][A
  1%|          | 16/2386 [00:08<19:21,  2.04it/s][A
  1%|          | 17/2386 [00:08<18:29,  2.13it/s][A
  1%|          | 18/2386 [00:08<16:02,  2.46it/s][A
  1%|     

<a class="anchor" id="3.4.">

## 3.4.  Predictions
<a>

The conversion from the previous step creats a score distribution across all possible sentiment classes. The predicted label is selected as the class with the highest probability, representing the model's most confident judgment.

In [None]:
# Verification
print("Sample predictions:", preds[:5])
print("Class distribution:", Counter(preds))

Sample predictions: ['neutral', 'negative', 'negative', 'neutral', 'neutral']
Class distribution: Counter({'neutral': 7163, 'positive': 1593, 'negative': 787})


### Converting labels to match FINBBERT'S format

In [18]:
print(model.config.id2label)

{0: 'negative', 1: 'neutral', 2: 'positive'}


In [19]:
# Ensure y is the same length as X before conversion
assert len(y) == len(X), "Label length mismatch!"

y_true_finbert = [
    'negative' if label == 0 else
    'positive' if label == 1 else
    'neutral'
    for label in y
]

<a class="anchor" id="4.">

# 4. Model Evaluation
<a>

Comparing predictions against true labels, using our established evaluation metrics, to validate the model’s real-world applicability and guide potential improvements.

In [20]:
print(classification_report(
    y_true_finbert,
    preds,
    target_names=['Bearish (negative)', 'Bullish (positive)', 'Neutral'],
    zero_division=0
))

                    precision    recall  f1-score   support

Bearish (negative)       0.69      0.37      0.48      1442
Bullish (positive)       0.77      0.89      0.82      6178
           Neutral       0.64      0.53      0.58      1923

          accuracy                           0.74      9543
         macro avg       0.70      0.60      0.63      9543
      weighted avg       0.73      0.74      0.72      9543



### Further analysis

In [None]:
print(f"First 3 labels: {y[:3]}") 
print(f"First 3 preds: {preds[:3]}") 

First 3 labels: [0, 0, 0]
First 3 preds: ['neutral', 'negative', 'negative']


In [22]:
# Check alignment between true and predicted labels
print("\nSample Alignment:")
for true, pred in zip(y[:5], preds[:5]):
    print(f"True: {true} ({'Bearish' if true==0 else 'Bullish' if true==1 else 'Neutral'}) | Pred: {pred}")

# Verify no data was skipped
assert len(preds) == len(X), f"Missing predictions! Expected {len(X)}, got {len(preds)}"


Sample Alignment:
True: 0 (Bearish) | Pred: neutral
True: 0 (Bearish) | Pred: negative
True: 0 (Bearish) | Pred: negative
True: 0 (Bearish) | Pred: neutral
True: 0 (Bearish) | Pred: neutral
