### Sentiment Analysis - homework

- https://sites.google.com/view/fiqa/home
- https://dl.acm.org/doi/fullHtml/10.1145/3184558.3192301
- https://huggingface.co/datasets/pauri32/fiqa-2018?row=0

The homework is to complete task 1 from the two-tasks challenge from 2018.
- Task 1: Aspect-based financial sentiment analysis
- Task 2: Opinion-based QA over Financial Data

Participants should find or create/tune a model to do sentiment analysis of a given phrase.

The model can be trained or tuned on dataset "pauri32/fiqa-2018" from huggingface.
Or participants can use ready "off-the-shelf" model.

The quality of the results shoudl be evaluated using:
- Mean Squared Error (MSE)
- R Square (R^2) and Cosine
- classification measures: Accuracy, Precision, Recall and F1-Score

This link provides exampels of input-output tasks: https://sites.google.com/view/fiqa/home

In [None]:
!pip install transformers datasets torch pandas numpy scikit-learn tqdm

In [None]:
!pip install transformers[torch]

In [None]:
#!pip install 'accelerate>={ACCELERATE_MIN_VERSION}'

In [25]:
# Financial Sentiment Analysis using FIQA Dataset
# Task 1: Aspect-based Financial Sentiment Analysis

import pandas as pd
import numpy as np
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, precision_recall_fscore_support
from scipy.spatial.distance import cosine
import torch
from tqdm import tqdm

# Set up device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# ## 1. Data Loading and Preprocessing

print("Loading dataset...")
dataset = load_dataset("pauri32/fiqa-2018")

def preprocess_dataset(dataset):
    df = pd.DataFrame(dataset)
    df['sentence'] = df['sentence'].str.strip()
    df['sentiment_score'] = df['sentiment_score'].astype(float)
    
    df['labels'] = pd.cut(df['sentiment_score'], 
                         bins=[-float('inf'), -0.3, 0.3, float('inf')], 
                         labels=[0, 1, 2])
    df['labels'] = df['labels'].astype(int)
    
    return df[['sentence', 'labels']]

# Process train and test datasets
train_df = preprocess_dataset(dataset['train'])
test_df = preprocess_dataset(dataset['test'])

print("Training set shape:", train_df.shape)
print("Test set shape:", test_df.shape)

# ## 2. Model Setup

MODEL_NAME = "ProsusAI/finbert"
print(f"\nLoading {MODEL_NAME} model and tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=3)
model = model.to(device)  # Move model to appropriate device

# Prepare datasets
train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)

def tokenize_function(examples):
    return tokenizer(
        examples['sentence'],
        padding='max_length',
        truncation=True,
        max_length=128
    )

print("Tokenizing datasets...")
train_tokenized = train_dataset.map(tokenize_function, batched=True, remove_columns=train_dataset.column_names)
test_tokenized = test_dataset.map(tokenize_function, batched=True, remove_columns=test_dataset.column_names)

train_tokenized = train_tokenized.add_column('labels', train_dataset['labels'])
test_tokenized = test_tokenized.add_column('labels', test_dataset['labels'])

# Set format for pytorch
train_tokenized.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])
test_tokenized.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])

# ## 3. Model Training

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=100,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    # Add device specific settings
    no_cuda=device == "cpu",  # Disable CUDA when not available
)

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='weighted')
    acc = accuracy_score(labels, preds)
    
    # Move tensors to CPU for metric calculation
    if isinstance(labels, torch.Tensor):
        labels = labels.cpu().numpy()
    if isinstance(preds, torch.Tensor):
        preds = preds.cpu().numpy()
    
    mse = mean_squared_error(labels, preds)
    r2 = r2_score(labels, preds)
    cosine_sim = 1 - cosine(labels, preds)
    
    return {
        'accuracy': acc,
        'precision': precision,
        'recall': recall,
        'f1': f1,
        'mse': mse,
        'r2': r2,
        'cosine_similarity': cosine_sim
    }

print("\nInitializing trainer...")
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_tokenized,
    eval_dataset=test_tokenized,
    compute_metrics=compute_metrics,
)

print("\nStarting training...")
trainer.train()

# ## 4. Evaluation

print("\nEvaluating model...")
test_results = trainer.evaluate()

print("\nTest Results:")
print("-------------")
print(f"Accuracy: {test_results['eval_accuracy']:.4f}")
print(f"Precision: {test_results['eval_precision']:.4f}")
print(f"Recall: {test_results['eval_recall']:.4f}")
print(f"F1 Score: {test_results['eval_f1']:.4f}")
print(f"MSE: {test_results['eval_mse']:.4f}")
print(f"R² Score: {test_results['eval_r2']:.4f}")
print(f"Cosine Similarity: {test_results['eval_cosine_similarity']:.4f}")

# ## 5. Example Predictions

def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    # Move inputs to the same device as model
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model(**inputs)
        
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
    prediction = torch.argmax(probabilities, dim=-1)
    
    # Move results back to CPU for processing
    prediction = prediction.cpu()
    probabilities = probabilities.cpu()
    
    sentiment_map = {0: "negative", 1: "neutral", 2: "positive"}
    predicted_sentiment = sentiment_map[prediction.item()]
    
    return predicted_sentiment, probabilities[0].tolist()

example_sentences = [
    "The company reported strong earnings growth.",
    "The stock price dropped significantly after the announcement.",
    "Investors are cautiously optimistic about the market outlook."
]

print("\nExample Predictions:")
print("-------------------")
for sentence in example_sentences:
    sentiment, probs = predict_sentiment(sentence)
    print(f"\nText: {sentence}")
    print(f"Predicted sentiment: {sentiment}")
    print(f"Confidence scores (negative/neutral/positive): {[f'{p:.4f}' for p in probs]}")

# ## 6. Save Model

output_dir = "./fiqa_sentiment_model"
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print("\nModel saved to:", output_dir)

Using device: cpu
Loading dataset...
Training set shape: (961, 2)
Test set shape: (150, 2)

Loading ProsusAI/finbert model and tokenizer...
Tokenizing datasets...


Map:   0%|          | 0/961 [00:00<?, ? examples/s]

Map:   0%|          | 0/150 [00:00<?, ? examples/s]


Initializing trainer...

Starting training...




Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1,Mse,R2,Cosine Similarity
1,No log,1.258994,0.366667,0.326311,0.366667,0.302908,0.953333,-0.986847,0.840091
2,1.443200,1.030048,0.5,0.562295,0.5,0.476822,0.86,-0.79233,0.816741
3,1.443200,0.911709,0.566667,0.594359,0.566667,0.566822,0.633333,-0.319933,0.807143


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Evaluating model...



Test Results:
-------------
Accuracy: 0.5667
Precision: 0.5944
Recall: 0.5667
F1 Score: 0.5668
MSE: 0.6333
R² Score: -0.3199
Cosine Similarity: 0.8071

Example Predictions:
-------------------

Text: The company reported strong earnings growth.
Predicted sentiment: positive
Confidence scores (negative/neutral/positive): ['0.1610', '0.0202', '0.8188']

Text: The stock price dropped significantly after the announcement.
Predicted sentiment: neutral
Confidence scores (negative/neutral/positive): ['0.1858', '0.7376', '0.0766']

Text: Investors are cautiously optimistic about the market outlook.
Predicted sentiment: positive
Confidence scores (negative/neutral/positive): ['0.0891', '0.0630', '0.8479']

Model saved to: ./fiqa_sentiment_model


In [23]:
# Let's examine the data first
import pandas as pd
import numpy as np
from datasets import load_dataset, Dataset

# Load dataset
dataset = load_dataset("pauri32/fiqa-2018")

# Convert to DataFrame
train_df = pd.DataFrame(dataset['train'])
test_df = pd.DataFrame(dataset['test'])

# Print some examples of the aspects column
print("Sample aspects from training data:")
print("\nFirst 5 aspects:")
for i, aspect in enumerate(train_df['aspects'].head(), 1):
    print(f"{i}. {aspect}")
    print(f"   Type: {type(aspect)}")

# Print unique aspects
print("\nUnique aspects (first 10):")
unique_aspects = train_df['aspects'].unique()[:10]
for i, aspect in enumerate(unique_aspects, 1):
    print(f"{i}. {aspect}")
    print(f"   Type: {type(aspect)}")

# Print sample complete rows
print("\nSample complete rows:")
for i, row in train_df.head().iterrows():
    print(f"\nRow {i}:")
    for col in row.index:
        print(f"{col}: {row[col]} (Type: {type(row[col])})")

Sample aspects from training data:

First 5 aspects:
1. ['Stock/Price Action/Volatility/Short Selling']
   Type: <class 'str'>
2. ['Stock/Price Action/Bearish']
   Type: <class 'str'>
3. ['Corporate/M&A/M&A']
   Type: <class 'str'>
4. ['Market/Volatility/Volatility']
   Type: <class 'str'>
5. ['Stock/Price Action/Bullish/Bullish Behavior']
   Type: <class 'str'>

Unique aspects (first 10):
1. ['Stock/Price Action/Volatility/Short Selling']
   Type: <class 'str'>
2. ['Stock/Price Action/Bearish']
   Type: <class 'str'>
3. ['Corporate/M&A/M&A']
   Type: <class 'str'>
4. ['Market/Volatility/Volatility']
   Type: <class 'str'>
5. ['Stock/Price Action/Bullish/Bullish Behavior']
   Type: <class 'str'>
6. ['Corporate/Dividend Policy']
   Type: <class 'str'>
7. ['Corporate/Sales/Deal']
   Type: <class 'str'>
8. ['Corporate/Dividend Policy/Dividend']
   Type: <class 'str'>
9. ['Stock/Price Action/Bearish/Bearish Behavior']
   Type: <class 'str'>
10. ['Stock/Technical Analysis/MACD']
   Type: <c

In [24]:
# Aspect-based Financial Sentiment Analysis using FIQA Dataset
# Task 1: Aspect-based Financial Sentiment Analysis

import pandas as pd
import numpy as np
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, precision_recall_fscore_support
from scipy.spatial.distance import cosine
import torch
from tqdm import tqdm

# Set up device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# ## 1. Data Loading and Preprocessing

print("Loading dataset...")
dataset = load_dataset("pauri32/fiqa-2018")

def clean_aspect_string(aspect_str):
    """Clean the aspect string and extract the actual aspect."""
    # Remove [''] and extract the aspect
    aspect = aspect_str.strip("[]'")
    return aspect

def get_aspect_category(aspect):
    """Extract main category from aspect."""
    return aspect.split('/')[0]

def preprocess_dataset(dataset):
    df = pd.DataFrame(dataset)
    
    # Clean text
    df['sentence'] = df['sentence'].str.strip()
    
    # Clean and extract aspects
    df['aspect'] = df['aspects'].apply(clean_aspect_string)
    
    # Extract aspect categories
    df['aspect_category'] = df['aspect'].apply(get_aspect_category)
    
    # Create combined text with aspect
    df['aspect_text'] = '[ASP] ' + df['aspect'] + ' [SEP] ' + df['sentence']
    
    # Ensure sentiment_score is float
    df['sentiment_score'] = df['sentiment_score'].astype(float)
    
    # Keep relevant columns
    return df[['sentence', 'aspect', 'aspect_category', 'aspect_text', 'sentiment_score', 'target', 'label']]

# Process datasets
train_df = preprocess_dataset(dataset['train'])
test_df = preprocess_dataset(dataset['test'])

print("\nDataset shapes:")
print("Training set shape:", train_df.shape)
print("Test set shape:", test_df.shape)

print("\nAspect Categories Distribution:")
print(train_df['aspect_category'].value_counts())

print("\nSample processed data:")
print(train_df[['sentence', 'aspect', 'aspect_category', 'sentiment_score']].head())

# ## 2. Model Setup

MODEL_NAME = "ProsusAI/finbert"
print(f"\nLoading {MODEL_NAME} model and tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=3)
model = model.to(device)

# Prepare datasets
train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)

def tokenize_function(examples):
    return tokenizer(
        examples['aspect_text'],
        padding='max_length',
        truncation=True,
        max_length=128
    )

print("Tokenizing datasets...")
train_tokenized = train_dataset.map(tokenize_function, batched=True, remove_columns=train_dataset.column_names)
test_tokenized = test_dataset.map(tokenize_function, batched=True, remove_columns=test_dataset.column_names)

train_tokenized = train_tokenized.add_column('labels', train_dataset['label'])
test_tokenized = test_tokenized.add_column('labels', test_dataset['label'])

train_tokenized.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])
test_tokenized.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])

# ## 3. Model Training

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=100,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    no_cuda=device == "cpu",
)

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    
    if isinstance(labels, torch.Tensor):
        labels = labels.cpu().numpy()
    if isinstance(preds, torch.Tensor):
        preds = preds.cpu().numpy()
    
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='weighted')
    acc = accuracy_score(labels, preds)
    mse = mean_squared_error(labels, preds)
    r2 = r2_score(labels, preds)
    cosine_sim = 1 - cosine(labels, preds)
    
    return {
        'accuracy': acc,
        'precision': precision,
        'recall': recall,
        'f1': f1,
        'mse': mse,
        'r2': r2,
        'cosine_similarity': cosine_sim
    }

print("\nInitializing trainer...")
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_tokenized,
    eval_dataset=test_tokenized,
    compute_metrics=compute_metrics,
)

print("\nStarting training...")
trainer.train()

# ## 4. Evaluation

print("\nEvaluating model...")
test_results = trainer.evaluate()

print("\nOverall Test Results:")
print("-------------")
for metric, value in test_results.items():
    if isinstance(value, float):
        print(f"{metric}: {value:.4f}")

# ## 5. Aspect-wise Analysis and Examples

# Evaluate performance by aspect category
def evaluate_by_aspect_category(df, trainer):
    results = {}
    for category in df['aspect_category'].unique():
        category_df = df[df['aspect_category'] == category]
        if len(category_df) < 10:  # Skip categories with too few samples
            continue
            
        category_dataset = Dataset.from_pandas(category_df)
        category_tokenized = category_dataset.map(
            tokenize_function, 
            batched=True, 
            remove_columns=category_dataset.column_names
        )
        category_tokenized = category_tokenized.add_column('labels', category_dataset['label'])
        category_tokenized.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])
        
        category_results = trainer.evaluate(eval_dataset=category_tokenized)
        results[category] = {
            'accuracy': category_results['eval_accuracy'],
            'f1': category_results['eval_f1'],
            'samples': len(category_df)
        }
    return pd.DataFrame(results).T

print("\nCalculating performance by aspect category...")
aspect_performance = evaluate_by_aspect_category(test_df, trainer)
print("\nAspect Category Performance:")
print(aspect_performance.sort_values('samples', ascending=False))

# Function for aspect-based prediction
def predict_aspect_sentiment(text, aspect):
    aspect_text = f"[ASP] {aspect} [SEP] {text}"
    inputs = tokenizer(aspect_text, return_tensors="pt", padding=True, truncation=True)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model(**inputs)
    
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
    prediction = torch.argmax(probabilities, dim=-1)
    
    prediction = prediction.cpu()
    probabilities = probabilities.cpu()
    
    sentiment_map = {0: "negative", 1: "neutral", 2: "positive"}
    predicted_sentiment = sentiment_map[prediction.item()]
    
    return predicted_sentiment, probabilities[0].tolist()

# Example predictions
print("\nExample Predictions:")
example_texts = [
    {
        "text": "Company reports strong revenue growth but increasing operational costs impact margins",
        "aspects": [
            "Corporate/Revenue/Growth",
            "Corporate/Costs/Operating Costs",
            "Corporate/Financial/Margins"
        ]
    },
    {
        "text": "Stock shows high volatility amid market uncertainty and low trading volume",
        "aspects": [
            "Stock/Price Action/Volatility",
            "Market/Uncertainty",
            "Stock/Trading Volume"
        ]
    }
]

for example in example_texts:
    print(f"\nText: {example['text']}")
    for aspect in example['aspects']:
        sentiment, probs = predict_aspect_sentiment(example['text'], aspect)
        print(f"\nAspect: {aspect}")
        print(f"Predicted sentiment: {sentiment}")
        print(f"Confidence scores (negative/neutral/positive): {[f'{p:.4f}' for p in probs]}")

# Save model and results
output_dir = "./fiqa_aspect_sentiment_model"
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
aspect_performance.to_csv("aspect_performance.csv")

print(f"\nModel and performance analysis saved to: {output_dir}")

Using device: cpu
Loading dataset...

Dataset shapes:
Training set shape: (961, 7)
Test set shape: (150, 7)

Aspect Categories Distribution:
aspect_category
Stock        562
Corporate    367
Market        28
Economy        4
Name: count, dtype: int64

Sample processed data:
                                            sentence                                       aspect aspect_category  sentiment_score
0  Still short $LNG from $11.70 area...next stop ...  Stock/Price Action/Volatility/Short Selling           Stock           -0.543
1                                    $PLUG bear raid                   Stock/Price Action/Bearish           Stock           -0.480
2  How Kraft-Heinz Merger Came Together in Speedy...                            Corporate/M&A/M&A       Corporate            0.214
3     Slump in Weir leads FTSE down from record high                 Market/Volatility/Volatility          Market           -0.827
4                $AAPL bounces off support, it seems  Stock/Price Acti

Map:   0%|          | 0/961 [00:00<?, ? examples/s]

Map:   0%|          | 0/150 [00:00<?, ? examples/s]


Initializing trainer...

Starting training...




Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1,Mse,R2,Cosine Similarity
1,No log,1.817033,0.42,0.271958,0.42,0.317148,1.32,-0.98,0.546829
2,0.882800,1.417513,0.593333,0.404918,0.593333,0.478844,0.626667,0.06,0.852493
3,0.882800,1.471902,0.593333,0.41426,0.593333,0.480084,0.626667,0.06,0.802897


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Evaluating model...



Overall Test Results:
-------------
eval_loss: 1.4175
eval_accuracy: 0.5933
eval_precision: 0.4049
eval_recall: 0.5933
eval_f1: 0.4788
eval_mse: 0.6267
eval_r2: 0.0600
eval_cosine_similarity: 0.8525
eval_runtime: 3.4021
eval_samples_per_second: 44.0900
eval_steps_per_second: 2.9390
epoch: 3.0000

Calculating performance by aspect category...


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Map:   0%|          | 0/64 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Map:   0%|          | 0/74 [00:00<?, ? examples/s]

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Aspect Category Performance:
           accuracy        f1  samples
Stock      0.716216  0.631336     74.0
Corporate  0.468750  0.334896     64.0

Example Predictions:

Text: Company reports strong revenue growth but increasing operational costs impact margins

Aspect: Corporate/Revenue/Growth
Predicted sentiment: negative
Confidence scores (negative/neutral/positive): ['0.9491', '0.0127', '0.0382']

Aspect: Corporate/Costs/Operating Costs
Predicted sentiment: negative
Confidence scores (negative/neutral/positive): ['0.6635', '0.0366', '0.2999']

Aspect: Corporate/Financial/Margins
Predicted sentiment: negative
Confidence scores (negative/neutral/positive): ['0.9248', '0.0131', '0.0621']

Text: Stock shows high volatility amid market uncertainty and low trading volume

Aspect: Stock/Price Action/Volatility
Predicted sentiment: positive
Confidence scores (negative/neutral/positive): ['0.0838', '0.0325', '0.8837']

Aspect: Market/Uncertainty
Predicted sentiment: positive
Confidence scor