# exp001 - Baseline Experiment
- https://drive.google.com/file/d/1sV2DSHaIFE-BvPs0POqH3y4qMH883j6N/view?usp=drive_link
- Hypothesis (h1a): Sentiment outputs, mapped to direction, can predict short-term exchange rate movements - won't perform well because the model lacks domain specific context
- Objective: A baseline comparison before testing hyped-up LLMs capabilities
- Value Proposition: This is the first known study to be conducted on applying language models to trade in emerging currency markets. Especially in a multilingual context.
- Why sentiment analysis as baseline: 
    - FinBERT is widely cited in financial NLP literature
    - Outperforms general BERT and lexicon-based models on tasks like financial sentiment classification
    - Traditional ML methods rely on sparse inputs or static word embeddings (like Word2Vec) which don't capture context
    - Sentiment analysis is commonly used in generating trading signals, however I believe that market does not operate
    on whether a piece of text is happy or sad. Thus, I'm expecting the following experiments to outperform this baseline. 
    I just want to rule sentiment analysis out of the picture. "Predicting directional movement" is a better approach.
    - It was used as a benchmark in very similar paper found at https://doi.org/10.1016/j.mlwa.2023.100508

- Model:
    - HuggingFace Transformers Model: lucas-leme/FinBERT-PT-BR

- Independent Variable (Predictor):
    - Text: headline
    - Category: FinBERT sentiment output
    - Binary Label: heuristic mapping (positive -> 1, negative -> -1) (bullish or bearish in commercial terms)

- Dependent Variable (Ground Truth):
    - Directional Movement: binary direction of exchange rate following news timestamp (exchange rate change after t + 5 minutes)

- Dataset Creation Process:
    - Stage 4 test dataset

- Experimentation:
    - Encoder-only (representation model) - BERT - FinBERT-PT-BR is a domain specific version of FinBERT, another domain specific BERT model
    - Methodology: Load lucas-leme/FinBERT-PT-BR / tokenizer → tokenize headlines → FinBERT → sentiment output

- RESULTS:
    - Bad as expected, check below

Notes:
    - Using HEADLINES ONLY not ARTICLE CONTENT and COMMENTS as past research has shown that these are not useful for prediction purposes, and they are noisy. No timestamps either because that isn't sentiment.

HuggingFace notes
- BERT is an architecture while lucas-leme/FinBERT-PT-BR is a checkpoint
- import the model specific class from the transformers library
- local: call from_pretrained() from the above class to download model's weights (pytorch_model.bin) and config settings (config.json)
- tokenizer is a class from the transformers library that finds the tokenizer specified in the checkpoint and fully preprocesses input text

In [1]:
import pandas as pd
df = pd.read_csv("../../results/exp001/results.csv")
preds = df['Prediction']
print(preds.value_counts(), '\n' 'Total values: 'f'{preds.value_counts().sum()}')

Prediction
-1    310
 1    190
Name: count, dtype: int64 
Total values: 500


In [None]:
mapping = {1: 'Decrease', -1: 'Increase', 0: 'Stable'}
df['Prediction'] = df['Prediction'].map(mapping)

In [3]:
from sklearn.metrics import confusion_matrix
import pandas as pd

labels = ["Increase", "Decrease"]

cm = confusion_matrix(df["Direction"], df["Prediction"], labels=labels)
cm_df = pd.DataFrame(
    cm,
    index=[f"Actual {label}" for label in labels],
    columns=[f"Predicted {label}" for label in labels]
)

print("\nConfusion Matrix:\n")
display(cm_df)


Confusion Matrix:



Unnamed: 0,Predicted Increase,Predicted Decrease
Actual Increase,161,104
Actual Decrease,149,86


In [4]:
from sklearn.metrics import classification_report

# Normalize predictions (just in case)
df["Prediction"] = df["Prediction"].str.strip().str.capitalize()
df["Direction"] = df["Direction"].str.strip().str.capitalize()

# Report
report = classification_report(
    df["Direction"], df["Prediction"],
    labels=["Increase", "Decrease"],
    target_names=["Increase", "Decrease"],
    digits=3
)

print("\nClassification Report:\n")
print(report)


Classification Report:

              precision    recall  f1-score   support

    Increase      0.519     0.608     0.560       265
    Decrease      0.453     0.366     0.405       235

    accuracy                          0.494       500
   macro avg      0.486     0.487     0.482       500
weighted avg      0.488     0.494     0.487       500

