# Baseline Experiment - Sentiment Analysis
## Model: FinBERT PT-BR
- Current Issue: How to compute exchange rate forward returns if news is released on weekends and usd/brl isn't being traded? There's no matching exchange rate unless we use Friday's closing and Monday's opening to measure the change.
- Hypothesis (h1a): Sentiment outputs, mapped to direction, can predict short-term exchange rate movements.
- Objective: A baseline comparison before testing hyped-up LLMs capabilities
- Value Proposition: This is the first known study to be conducted on applying language models to trade in emerging currency markets. Especially in a multilingual context.
- Why sentiment analysis as baseline: 
    - FinBERT is widely cited in financial NLP literature
    - outperforms general BERT and lexicon-based models on tasks like financial sentiment classification
    - Traditional ML methods rely on sparse inputs or static word embeddings (like Word2Vec) which don't capture context
    - sentiment analysis is commonly used in generating trading signals, however I believe that market does not operate
    on whether a piece of text is happy or sad. Thus, I'm expecting the following experiments to outperform this baseline. 
    I just want to rule sentiment analysis out of the picture. "Predicting directional movement" is a better approach.
    - it was used as a benchmark in very similar paper found at https://doi.org/10.1016/j.mlwa.2023.100508

- Independent Variable (Predictor):
    - Text: headline / article content
    - Category: FinBERT sentiment output
    - Binary Label: heuristic mapping (positive -> 1, negative -> -1) (bullish or bearish in commercial terms)
    - (POSSIBLY CONSIDER as a control var/experiment?): Multi-class Label: neutral (0) label defined by threshold label (min exchange rate % change)

- Dependent Variable (Ground Truth):
    - Directional Movement: binary direction of exchange rate following news timestamp (time frame TBD)
    - (POSSIBLY CONSIDER?): percent change in exchange rate over defined window? Measures profitability...

- Dataset Creation Process:
    - News Data: There are only 3,519 headlines before **Timestamp[2024-12-30 17:38:00]**, which is the latest possible timestamp for a t+20 analysis (since exchange rate data ends at 17:58:00). In total, the dataset contains 4630 headlines. These cannot be used until exchange rates are available on a minute-level basis for December 30, 2024, to January 15, 2025. 
        - Dataset Creation Process: Bom Dia Mercado (BDM) → Eli formatted news data into excel file → preprocess.ipynb → export to repo → final dataset
    - FX Rate Data: Minute-level time series of USD/BRL exchange rates, synchronized with news timestamps in pandas ISO datetime object format.
        - Dataset Creation Process: Bloomberg → retrieve USD/BRL exchange rates as excel file → preprocess.ipynb → export to repo → final dataset

- Model:
    - HuggingFace Transformers Model: lucas-leme/FinBERT-PT-BR

- RESULTS:
    - did on colab T4 GPU through batches & did on local CPU. Same exact results. Saved exact model configurations into checkpoints/exp001
    - t+7 horizon: best accuracy - 48.3% OVERALL accuracy through binary classification by fine-tuned financial pt-br sentiment analysis

Notes:
    - Dataset used in this experiment: experimental_dataset.csv with 3519 news headlines 
    - USING HEADLINES ONLY not ARTICLE CONTENT and COMMENTS as past research has shown that these are not useful for prediction purposes, and they are noisy.
    - straightforward t+1 to t+20 prediction horizon by computing directional movement using following exchange rate minus exchange rate at t for each increment

In [None]:
'''
Doc 1 Methodology: Load lucas-leme/FinBERT-PT-BR / tokenizer → tokenize headlines → FinBERT → sentiment output

HuggingFace notes
- BERT is an architecture while lucas-leme/FinBERT-PT-BR is a checkpoint
- import the model specific class from the transformers library
- call from_pretrained() from the above class to download the model's weights (pytorch_model.bin) and configuration settings (config.json)
- tokenizer is a class from the transformers library that finds the tokenizer specified in the checkpoint and fully preprocesses input text
'''


## Load Model from HuggingFace

In [None]:
from transformers import AutoTokenizer, BertForSequenceClassification

# Load from HuggingFace
tokenizer = AutoTokenizer.from_pretrained("lucas-leme/FinBERT-PT-BR")
model = BertForSequenceClassification.from_pretrained("lucas-leme/FinBERT-PT-BR")

# Save locally
model.save_pretrained("../checkpoints/exp001")
tokenizer.save_pretrained("../checkpoints/exp001")

## Load Model from Local Storage

In [2]:
from transformers import AutoTokenizer, BertForSequenceClassification, pipeline
import pandas as pd

local_path = "../../checkpoints/exp001"

df = pd.read_csv("../../data/processed/experimental_dataset.csv")  # Now we can use paths relative to project root

model = BertForSequenceClassification.from_pretrained(local_path)
tokenizer = AutoTokenizer.from_pretrained(local_path)

In [3]:
model = BertForSequenceClassification.from_pretrained(
    local_path,
    trust_remote_code=True,
    local_files_only=True
)
tokenizer = AutoTokenizer.from_pretrained(
    local_path,
    trust_remote_code=True,
    local_files_only=True
)
finbert_pipeline = pipeline(
    task='text-classification',
    model=model,
    tokenizer=tokenizer
)
# mapping predictions
pred_mapper = {
    0: "POSITIVE",
    1: "NEGATIVE", 
    2: "NEUTRAL"
}

Device set to use cpu


In [4]:
# Sentiment Analysis
results = []
for headline in df['Headline']:
    result = finbert_pipeline(headline)[0]

    if result['label'] == pred_mapper[0]:  # POSITIVE
        sentiment = 1
    elif result['label'] == pred_mapper[1]:  # NEGATIVE
        sentiment = -1
    elif result['label'] == pred_mapper[2]:  # NEUTRAL
        sentiment = 0
    results.append(sentiment)

# save predictions to the dataframe
df['Prediction'] = results

In [None]:
df.to_csv("../../results/exp001/exp001.csv", index=False)

In [18]:
import pandas as pd
df = pd.read_csv("../../results/exp001/exp001_colab.csv")
preds = df['Prediction']
print(preds.value_counts(), '\n' 'total vals (it checks out - good): 'f'{preds.value_counts().sum()}')

Prediction
-1    1542
 0    1069
 1     908
Name: count, dtype: int64 
total vals (it checks out - good): 3519


In [27]:
import pandas as pd

def compare_csvs(file1, file2):
    df1 = pd.read_csv(file1)
    df2 = pd.read_csv(file2)

    # sort columns and rows for a strict, order-insensitive comparison
    df1_sorted = df1.sort_index(axis=1).sort_values(by=df1.columns.tolist()).reset_index(drop=True)
    df2_sorted = df2.sort_index(axis=1).sort_values(by=df2.columns.tolist()).reset_index(drop=True)

    return df1_sorted.equals(df2_sorted)

are_equal = compare_csvs("../../results/exp001/exp001_colab.csv", "../../results/exp001/exp001.csv")
print("files equal" if are_equal else "files differ")

files equal


In [None]:
from sklearn.metrics import confusion_matrix

filtered_df = df[df["Prediction"] != 0].copy() # no DA for neutral (0), just binary classification. Rid of all neutral predictions 

forward_return_cols = [col for col in df.columns if col.startswith("Forward Return t+")]

conf_matrices = {}
'''
[[TN, FP],
 [FN, TP]]
'''

for col in forward_return_cols:
    y_true = filtered_df[col]
    y_pred = filtered_df["Prediction"]

    #  -1 and 1 (exclude 0s in ground truth if present)
    mask = y_true != 0
    y_true_filtered = y_true[mask]
    y_pred_filtered = y_pred[mask]

    #  confusion matrix with labels fixed to [-1, 1]
    cm = confusion_matrix(y_true_filtered, y_pred_filtered, labels=[-1, 1])
    conf_matrices[col] = cm

# Display one example
for k, v in conf_matrices.items():
    print(f"Confusion matrix for {k}:\n{v}\n")


Confusion matrix for Forward Return t+1:
[[449 259]
 [543 256]]

Confusion matrix for Forward Return t+2:
[[620 401]
 [703 411]]

Confusion matrix for Forward Return t+3:
[[661 436]
 [797 439]]

Confusion matrix for Forward Return t+4:
[[684 426]
 [826 463]]

Confusion matrix for Forward Return t+5:
[[695 438]
 [832 461]]

Confusion matrix for Forward Return t+6:
[[684 440]
 [841 462]]

Confusion matrix for Forward Return t+7:
[[698 421]
 [831 476]]

Confusion matrix for Forward Return t+8:
[[686 428]
 [840 473]]

Confusion matrix for Forward Return t+9:
[[661 428]
 [865 473]]

Confusion matrix for Forward Return t+10:
[[655 421]
 [878 480]]

Confusion matrix for Forward Return t+11:
[[645 424]
 [886 477]]

Confusion matrix for Forward Return t+12:
[[652 428]
 [880 473]]

Confusion matrix for Forward Return t+13:
[[648 415]
 [883 485]]

Confusion matrix for Forward Return t+14:
[[635 414]
 [892 486]]

Confusion matrix for Forward Return t+15:
[[643 403]
 [889 499]]

Confusion matrix fo

In [31]:
#accuracies

filtered_df = df[df["Prediction"] != 0].copy()

forward_return_cols = [col for col in df.columns if col.startswith("Forward Return t+")]
accuracies = {}

for col in forward_return_cols:
    y_true = filtered_df[col]
    y_pred = filtered_df["Prediction"]
    mask = y_true != 0
    accuracy = (y_true[mask] == y_pred[mask]).mean()
    accuracies[col] = accuracy

accuracy_df = pd.DataFrame.from_dict(accuracies, orient='index', columns=['Accuracy'])
accuracy_df.index.name = 'Horizon'
accuracy_df.reset_index(inplace=True)

display(accuracy_df)


Unnamed: 0,Horizon,Accuracy
0,Forward Return t+1,0.467817
1,Forward Return t+2,0.482904
2,Forward Return t+3,0.471496
3,Forward Return t+4,0.478116
4,Forward Return t+5,0.476505
5,Forward Return t+6,0.472188
6,Forward Return t+7,0.483924
7,Forward Return t+8,0.477544
8,Forward Return t+9,0.467244
9,Forward Return t+10,0.466311
