# FinBERT Notebook

This notebooks shows how the FinBERT pre-trained language model was trained and fine-tuned.

## Modules 

In [1]:
%matplotlib inline

In [2]:
from pathlib import Path
import shutil
import os
import logging
import sys
sys.path.append('..')

from textblob import TextBlob
from pprint import pprint
from sklearn.metrics import classification_report

from transformers import AutoModelForSequenceClassification

from finbert.finbert import *
import finbert.utils as tools

%load_ext autoreload
%autoreload 2

project_dir = Path.cwd().parent
pd.set_option('max_colwidth', -1)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.int):
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  method='lar', copy_X=True, eps=np.finfo(np.float).eps,
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  method='lar', copy_X=True, eps=np.finfo(np.float).eps,
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  eps=np.finfo(np.float).eps, copy_Gram=True, verbose=0,
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  eps=np.finfo(np.float).eps, copy_X=True, fit_path=True,
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  eps=np.finfo(np.floa

In [3]:
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
                    datefmt = '%m/%d/%Y %H:%M:%S',
                    level = logging.ERROR)

## Prepare the model

### Setting path variables:
1. `lm_path`: the path for the pre-trained language model (If vanilla Bert is used then no need to set this one).
2. `cl_path`: the path where the classification model is saved.
3. `cl_data_path`: the path of the directory that contains the data files of `train.csv`, `validation.csv`, `test.csv`.
---

In the initialization of `bertmodel`, we can either use the original pre-trained weights from Google by giving `bm = 'bert-base-uncased`, or our further pre-trained language model by `bm = lm_path`


---
All of the configurations with the model is controlled with the `config` variable. 

In [4]:
lm_path = project_dir/'models'/'language_model'/'finbertTRC2'
cl_path = project_dir/'models'/'classifier_model'/'finbert-sentiment'
cl_data_path = project_dir/'data'/'sentiment_data'

NameError: name 'project_dir' is not defined

###  Configuring training parameters

You can find the explanations of the training parameters in the class docsctrings. 

In [5]:
# Clean the cl_path
try:
    shutil.rmtree(cl_path) 
except:
    pass

bertmodel = AutoModelForSequenceClassification.from_pretrained(lm_path,cache_dir=None, num_labels=3)


config = Config(   data_dir=cl_data_path,
                   bert_model=bertmodel,
                   num_train_epochs=4,
                   model_dir=cl_path,
                   max_seq_length = 48,
                   train_batch_size = 32,
                   learning_rate = 2e-5,
                   output_mode='classification',
                   warm_up_proportion=0.2,
                   local_rank=-1,
                   discriminate=True,
                   gradual_unfreeze=True)

Some weights of the model checkpoint at /Users/macbook/Desktop/Year3/Dissertation/Coding/Exp2/finBERT-master/models/language_model/finbertTRC2 were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weigh

`finbert` is our main class that encapsulates all the functionality. The list of class labels should be given in the prepare_model method call with label_list parameter.

In [6]:
finbert = FinBert(config)
finbert.base_model = 'bert-base-uncased'
finbert.config.discriminate=True
finbert.config.gradual_unfreeze=True

In [7]:
finbert.prepare_model(label_list=['positive','negative','neutral'])

03/08/2024 17:31:18 - INFO - finbert.finbert -   device: cpu n_gpu: 0, distributed training: False, 16-bits training: False


## Fine-tune the model

In [8]:
# Get the training examples
train_data = finbert.get_data('train')

In [9]:
model = finbert.create_the_model()

### [Optional] Fine-tune only a subset of the model
The variable `freeze` determines the last layer (out of 12) to be freezed. You can skip this part if you want to fine-tune the whole model.

<span style="color:red">Important: </span>
Execute this step if you want a shorter training time in the expense of accuracy.

In [None]:
# This is for fine-tuning a subset of the model.

freeze = 6

for param in model.bert.embeddings.parameters():
    param.requires_grad = False
    
for i in range(freeze):
    for param in model.bert.encoder.layer[i].parameters():
        param.requires_grad = False

### Training

In [10]:
trained_model = finbert.train(train_examples = train_data, model = model)

03/08/2024 17:31:19 - INFO - finbert.utils -   *** Example ***
03/08/2024 17:31:19 - INFO - finbert.utils -   guid: train-1
03/08/2024 17:31:19 - INFO - finbert.utils -   tokens: [CLS] per ##tti er ##vi is independent from the company and its major shareholders [SEP]
03/08/2024 17:31:19 - INFO - finbert.utils -   input_ids: 101 2566 6916 9413 5737 2003 2981 2013 1996 2194 1998 2049 2350 15337 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:31:19 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:31:19 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:31:19 - INFO - finbert.utils -   label: neutral (id = 2)
03/08/2024 17:31:19 - INFO - finbert.finbert -   ***** Loading data *****
03/08/2024 17:31:19 - INFO - finbert.finbert -     Num examples = 2485


Iteration:   0%|          | 0/78 [00:00<?, ?it/s]

03/08/2024 17:35:37 - INFO - finbert.utils -   *** Example ***
03/08/2024 17:35:37 - INFO - finbert.utils -   guid: validation-1
03/08/2024 17:35:37 - INFO - finbert.utils -   tokens: [CLS] su ##ome ##n pa ##ika ##llis ##san ##oma ##t o ##y is part of alma media group and it currently publishes 15 local newspapers across finland [SEP]
03/08/2024 17:35:37 - INFO - finbert.utils -   input_ids: 101 10514 8462 2078 6643 7556 21711 8791 9626 2102 1051 2100 2003 2112 1997 11346 2865 2177 1998 2009 2747 12466 2321 2334 6399 2408 6435 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:35:37 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:35:37 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:35:37 - INFO - finbert.utils -   label: neutral (id = 2)
03/08/2024 17:35:37 - INFO - finbe

Validating:   0%|          | 0/9 [00:00<?, ?it/s]

Validation losses: [0.7935961153772142]
No best model found


Epoch:  25%|██▌       | 1/4 [04:43<14:09, 283.07s/it]

Iteration:   0%|          | 0/78 [00:00<?, ?it/s]

03/08/2024 17:42:49 - INFO - finbert.utils -   *** Example ***
03/08/2024 17:42:49 - INFO - finbert.utils -   guid: validation-1
03/08/2024 17:42:49 - INFO - finbert.utils -   tokens: [CLS] su ##ome ##n pa ##ika ##llis ##san ##oma ##t o ##y is part of alma media group and it currently publishes 15 local newspapers across finland [SEP]
03/08/2024 17:42:49 - INFO - finbert.utils -   input_ids: 101 10514 8462 2078 6643 7556 21711 8791 9626 2102 1051 2100 2003 2112 1997 11346 2865 2177 1998 2009 2747 12466 2321 2334 6399 2408 6435 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:42:49 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:42:49 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:42:49 - INFO - finbert.utils -   label: neutral (id = 2)
03/08/2024 17:42:50 - INFO - finbe

Validating:   0%|          | 0/9 [00:00<?, ?it/s]

Validation losses: [0.7935961153772142, 0.20759580118788612]


Epoch:  50%|█████     | 2/4 [12:01<12:29, 374.65s/it]

Iteration:   0%|          | 0/78 [00:00<?, ?it/s]

03/08/2024 17:51:56 - INFO - finbert.utils -   *** Example ***
03/08/2024 17:51:56 - INFO - finbert.utils -   guid: validation-1
03/08/2024 17:51:56 - INFO - finbert.utils -   tokens: [CLS] su ##ome ##n pa ##ika ##llis ##san ##oma ##t o ##y is part of alma media group and it currently publishes 15 local newspapers across finland [SEP]
03/08/2024 17:51:56 - INFO - finbert.utils -   input_ids: 101 10514 8462 2078 6643 7556 21711 8791 9626 2102 1051 2100 2003 2112 1997 11346 2865 2177 1998 2009 2747 12466 2321 2334 6399 2408 6435 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:51:56 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:51:56 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 17:51:56 - INFO - finbert.utils -   label: neutral (id = 2)
03/08/2024 17:51:56 - INFO - finbe

Validating:   0%|          | 0/9 [00:00<?, ?it/s]

Validation losses: [0.7935961153772142, 0.20759580118788612, 0.14650340502460799]


Epoch:  75%|███████▌  | 3/4 [21:14<07:35, 455.98s/it]

Iteration:   0%|          | 0/78 [00:00<?, ?it/s]

03/08/2024 18:04:03 - INFO - finbert.utils -   *** Example ***
03/08/2024 18:04:03 - INFO - finbert.utils -   guid: validation-1
03/08/2024 18:04:03 - INFO - finbert.utils -   tokens: [CLS] su ##ome ##n pa ##ika ##llis ##san ##oma ##t o ##y is part of alma media group and it currently publishes 15 local newspapers across finland [SEP]
03/08/2024 18:04:03 - INFO - finbert.utils -   input_ids: 101 10514 8462 2078 6643 7556 21711 8791 9626 2102 1051 2100 2003 2112 1997 11346 2865 2177 1998 2009 2747 12466 2321 2334 6399 2408 6435 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 18:04:03 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 18:04:03 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 18:04:03 - INFO - finbert.utils -   label: neutral (id = 2)
03/08/2024 18:04:03 - INFO - finbe

Validating:   0%|          | 0/9 [00:00<?, ?it/s]

Epoch: 100%|██████████| 4/4 [33:07<00:00, 496.79s/it]

Validation losses: [0.7935961153772142, 0.20759580118788612, 0.14650340502460799, 0.15782912820577621]





## Test the model

`bert.evaluate` outputs the DataFrame, where true labels and logit values for each example is given

In [11]:
test_data = finbert.get_data('test')

In [12]:
results = finbert.evaluate(examples=test_data, model=trained_model)

03/08/2024 18:04:33 - INFO - finbert.utils -   *** Example ***
03/08/2024 18:04:33 - INFO - finbert.utils -   guid: test-1
03/08/2024 18:04:33 - INFO - finbert.utils -   tokens: [CLS] it is expected to be completed by the end of 2007 [SEP]
03/08/2024 18:04:33 - INFO - finbert.utils -   input_ids: 101 2009 2003 3517 2000 2022 2949 2011 1996 2203 1997 2289 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 18:04:33 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 18:04:33 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/08/2024 18:04:33 - INFO - finbert.utils -   label: neutral (id = 2)
03/08/2024 18:04:33 - INFO - finbert.finbert -   ***** Loading data *****
03/08/2024 18:04:33 - INFO - finbert.finbert -     Num examples = 691
03/08/2024 18:04:33 - INFO - finbert

Testing:   0%|          | 0/22 [00:00<?, ?it/s]

### Prepare the classification report

In [1]:
def report(df, cols=['label','prediction','logits']):
    #print('Validation loss:{0:.2f}'.format(metrics['best_validation_loss']))
    cs = CrossEntropyLoss(weight=finbert.class_weights)
    loss = cs(torch.tensor(list(df[cols[2]])),torch.tensor(list(df[cols[0]])))
    print("Loss:{0:.2f}".format(loss))
    print("Accuracy:{0:.2f}".format((df[cols[0]] == df[cols[1]]).sum() / df.shape[0]) )
    print("\nClassification Report:")
    print(classification_report(df[cols[0]], df[cols[1]], digits=3))

In [14]:
results['prediction'] = results.predictions.apply(lambda x: np.argmax(x,axis=0))

In [15]:
report(results,cols=['labels','prediction','predictions'])

Loss:0.24
Accuracy:0.92

Classification Report:
              precision    recall  f1-score   support

           0      0.904     0.890     0.897       200
           1      0.752     0.916     0.826        83
           2      0.969     0.934     0.951       408

    accuracy                          0.919       691
   macro avg      0.875     0.913     0.891       691
weighted avg      0.924     0.919     0.920       691



  after removing the cwd from sys.path.


In [6]:
import numpy as np
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

In [7]:
cm = confusion_matrix(results['labels'], results['prediction'])
cm_percentage = cm / np.sum(cm, axis=1, keepdims=True)
labels = (np.asarray(["{}\n({:.2f})".format(count, percentage)
                      for count, percentage in zip(cm.flatten(), cm_percentage.flatten())])
          ).reshape(cm.shape)

NameError: name 'results' is not defined

In [4]:
plt.figure(figsize=(4.5,3.6))
sns.heatmap(cm, annot=labels, fmt='', cmap='Blues')
plt.xlabel('Predicted Label', fontsize=8, fontname='Arial')
plt.ylabel('True Label', fontsize=8, fontname='Arial')
plt.show()

NameError: name 'plt' is not defined

### Get predictions

With the `predict` function, given a piece of text, we split it into a list of sentences and then predict sentiment for each sentence. The output is written into a dataframe. Predictions are represented in three different columns: 

1) `logit`: probabilities for each class

2) `prediction`: predicted label

3) `sentiment_score`: sentiment score calculated as: probability of positive - probability of negative

Below we analyze a paragraph taken out of [this](https://www.economist.com/finance-and-economics/2019/01/03/a-profit-warning-from-apple-jolts-markets) article from The Economist. For comparison purposes, we also put the sentiments predicted with TextBlob.
> Later that day Apple said it was revising down its earnings expectations in the fourth quarter of 2018, largely because of lower sales and signs of economic weakness in China. The news rapidly infected financial markets. Apple’s share price fell by around 7% in after-hours trading and the decline was extended to more than 10% when the market opened. The dollar fell by 3.7% against the yen in a matter of minutes after the announcement, before rapidly recovering some ground. Asian stockmarkets closed down on January 3rd and European ones opened lower. Yields on government bonds fell as investors fled to the traditional haven in a market storm.

In [2]:
text = "Later that day Apple said it was revising down its earnings expectations in \
the fourth quarter of 2018, largely because of lower sales and signs of economic weakness in China. \
The news rapidly infected financial markets. Apple’s share price fell by around 7% in after-hours \
trading and the decline was extended to more than 10% when the market opened. The dollar fell \
by 3.7% against the yen in a matter of minutes after the announcement, before rapidly recovering \
some ground. Asian stockmarkets closed down on January 3rd and European ones opened lower. \
Yields on government bonds fell as investors fled to the traditional haven in a market storm."

In [11]:
cl_path = project_dir/'models'/'classifier_model'/'finbert-sentiment'
model = AutoModelForSequenceClassification.from_pretrained(cl_path, cache_dir=None, num_labels=3)

In [7]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /Users/macbook/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [8]:
result = predict(text,model)

In [9]:
blob = TextBlob(text)
result['textblob_prediction'] = [sentence.sentiment.polarity for sentence in blob.sentences]
result

TypeError: 'numpy.float64' object does not support item assignment

In [13]:
print(f'Average sentiment is %.2f.' % (result.sentiment_score.mean()))

Average sentiment is -0.88.


Here is another example

In [None]:
text2 = "Shares in the spin-off of South African e-commerce group Naspers surged more than 25% \
in the first minutes of their market debut in Amsterdam on Wednesday. Bob van Dijk, CEO of \
Naspers and Prosus Group poses at Amsterdam's stock exchange, as Prosus begins trading on the \
Euronext stock exchange in Amsterdam, Netherlands, September 11, 2019. REUTERS/Piroschka van de Wouw \
Prosus comprises Naspers’ global empire of consumer internet assets, with the jewel in the crown a \
31% stake in Chinese tech titan Tencent. There is 'way more demand than is even available, so that’s \
good,' said the CEO of Euronext Amsterdam, Maurice van Tilburg. 'It’s going to be an interesting \
hour of trade after opening this morning.' Euronext had given an indicative price of 58.70 euros \
per share for Prosus, implying a market value of 95.3 billion euros ($105 billion). The shares \
jumped to 76 euros on opening and were trading at 75 euros at 0719 GMT."

In [None]:
result2 = predict(text2,model)
blob = TextBlob(text2)
result2['textblob_prediction'] = [sentence.sentiment.polarity for sentence in blob.sentences]

In [None]:
result2

In [None]:
print(f'Average sentiment is %.2f.' % (result2.sentiment_score.mean()))

### Load your dataset

In [None]:
source_directory = '/Users/macbook/Desktop/Year3/Dissertation/Coding/Exp1/Results/Bloomberg & Reuters'
destination_directory = '/Users/macbook/Desktop/Year3/Dissertation/Coding/Exp2/Dataset'

In [None]:
# List all CSV files in the directory and sort them
csv_files = sorted([file for file in os.listdir(source_directory) if file.endswith('.csv')])

In [None]:
for file in csv_files:
    # Construct the full file path
    file_path = os.path.join(source_directory, file)

    df = pd.read_csv(file_path, usecols=['timestamp', 'headline'])

    # Randomly select 107 headlines
    selected_headlines = df.sample(n=107, random_state=1)
    
    selected_headlines['timestamp'] = pd.to_datetime(selected_headlines['timestamp']).dt.strftime('%Y-%m-%d %H:%M')
    selected_headlines.sort_values(by='timestamp', inplace=True)
    
    selected_headlines.reset_index(drop=True, inplace=True)

    base_name = os.path.splitext(file)[0]
    destination_file_name = f"{base_name}.csv" 

    destination_path = os.path.join(destination_directory, destination_file_name)
    selected_headlines.to_csv(destination_path, index=True)

### Sentiment analysis

In [None]:
import pandas as pd
from datetime import datetime, timedelta
import os

In [None]:
def sentiment_analysis(file_path):
    df = pd.read_csv(file_path)
    df['sentiment_score'] = df['headline'].apply(lambda headline: predict(headline, model).iloc[0]['sentiment_score'])
    
    avg_sentiment = df.sentiment_score.mean()
    return avg_sentiment

In [None]:
source_directory = '/Users/macbook/Desktop/Year3/Dissertation/Coding/Exp2/Dataset'
destination_directory = '/Users/macbook/Desktop/Year3/Dissertation/Coding/Exp2/Results'

In [None]:
final_df = pd.DataFrame(columns=['Date', 'Sentiment_score'])

In [None]:
start_date = datetime(2007, 1, 1)
end_date = datetime(2010, 12, 30)
current_date = start_date

In [14]:
while current_date <= end_date:
    file_name = current_date.strftime("%Y-%m-%d") + '.csv'
    file_path = os.path.join(source_directory, file_name)

    if os.path.exists(file_path):
        avg_sentiment = sentiment_analysis(file_path)
        fina_df = final_df.append({'Date': current_date.strftime("%Y-%m-%d"), 'Sentiment_score': avg_sentiment}, ignore_index=True)
    
    current_date += timedelta(days=1)

02/05/2024 14:21:24 - INFO - root -   tensor([[ 2.2969, -1.1690, -1.4681]])
02/05/2024 14:21:24 - INFO - root -   Using device: cpu 
02/05/2024 14:21:24 - INFO - finbert.utils -   *** Example ***
02/05/2024 14:21:24 - INFO - finbert.utils -   guid: 0
02/05/2024 14:21:24 - INFO - finbert.utils -   tokens: [CLS] avalon ##bay to join s & p ; 500 , replacing symbol [SEP]
02/05/2024 14:21:24 - INFO - finbert.utils -   input_ids: 101 18973 15907 2000 3693 1055 1004 1052 1025 3156 1010 6419 6454 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/05/2024 14:21:24 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/05/2024 14:21:24 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
02/05/2024 14:21:24 