In [3]:
"""
Sentiment Analysis with TF-IDF and BERT: Fine-Tuning for different types of text sources (news / tweets / posts)

Overview:
This script demonstrates two approaches for sentiment classification of cryptocurrency-related text:
1. **TF-IDF with Logistic Regression**: A traditional machine learning approach using TF-IDF features to represent the text and training a logistic regression model for classification.
2. **BERT (DistilBERT) Fine-Tuning**: A deep learning approach that fine-tunes a pre-trained DistilBERT model for sentiment classification.

Both models classify the sentiment of into three categories: 'negative', 'neutral', and 'positive'.
Note that the first model is more geared towards a smaller training dataset. 
However, the second one is more suitable for a larger dataset.
Ideally, we would also be using the second approach but with the (ProsusAI/finbert) model.

Workflow:
1. **TF-IDF Model**:
    - The dataset is loaded from a pandas DataFrame and preprocessed.
    - **TF-IDF Vectorization**: The text is converted into numerical features using the TF-IDF vectorizer. This vectorizer captures both unigrams and bigrams to represent the text.
    - **Logistic Regression**: A logistic regression model is trained using the TF-IDF features to classify the sentiments of the tweets. Sample weights are computed to handle imbalanced classes.

2. **BERT Model (Fine-Tuning)**:
    - **Data Preprocessing**: The dataset is tokenized using the Hugging Face `DistilBertTokenizer`, ensuring that tweet titles are padded and truncated to a fixed length of 128 tokens.
    - **Label Mapping**: Sentiment labels are converted from string representations (e.g., `{'class': 'positive'}`) into integer values (2 for positive, 1 for neutral, 0 for negative).
    - **Model Setup**: A pre-trained **DistilBERT** model is loaded for sequence classification with 3 labels (negative, neutral, positive). The model is then fine-tuned on the sentiment classification task.
    - **Training**: The model is trained using the Hugging Face `Trainer` class, with evaluation metrics including accuracy and weighted F1 score. The training process uses gradient accumulation, mixed precision (FP16), and model saving after each epoch.

3. **Evaluation**:
    - Both models (TF-IDF + Logistic Regression and BERT) are evaluated using the test dataset. Performance metrics such as accuracy and weighted F1 score are computed to assess how well the models predict sentiment.
"""


"\nSentiment Analysis with TF-IDF and BERT: Fine-Tuning for different types of text sources (news / tweets / posts)\n\nOverview:\nThis script demonstrates two approaches for sentiment classification of cryptocurrency-related text:\n1. **TF-IDF with Logistic Regression**: A traditional machine learning approach using TF-IDF features to represent the text and training a logistic regression model for classification.\n2. **BERT (DistilBERT) Fine-Tuning**: A deep learning approach that fine-tunes a pre-trained DistilBERT model for sentiment classification.\n\nBoth models classify the sentiment of into three categories: 'negative', 'neutral', and 'positive'.\nNote that the first model is more geared towards a smaller training dataset. \nHowever, the second one is more suitable for a larger dataset.\nIdeally, we would also be using the second approach but with the (ProsusAI/finbert) model.\n\nWorkflow:\n1. **TF-IDF Model**:\n    - The dataset is loaded from a pandas DataFrame and preprocessed

In [None]:
# Importing all necessary libraries for the execution of the code
from datasets import Dataset
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score, f1_score
from sklearn.utils import class_weight
import ast
import re
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer, pipeline, TrainingArguments, Trainer
import numpy as np
import torch
import joblib
from typing import Dict, List, Any

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Displaying csv (for training and testing) 
# This dataset is from https://www.kaggle.com/datasets/oliviervha/crypto-news/data
# Note that for this dataset, Crypto news is scraped from the web. The sentiment analysis is performed using textblob.
# Hence, sentiment analysis is not entirely correct but it's a good starting point
dataset = pd.read_csv('cryptonews.csv')
dataset

Unnamed: 0,date,sentiment,source,subject,text,title,url
0,2023-04-05 06:52:09,"{'class': 'negative', 'polarity': -0.03, 'subj...",CoinTelegraph,defi,The compensation process is expected to start ...,Allbridge to first begin repaying stuck bridge...,https://cointelegraph.com/news/allbridge-to-fi...
1,2023-04-05 06:19:00,"{'class': 'neutral', 'polarity': 0.0, 'subject...",CryptoPotato,bitcoin,On-chain analytics revealed a sentiment shift ...,Bitcoin Hodl Patterns Indicate Cycle Shift to ...,https://cryptopotato.com/bitcoin-hodl-patterns...
2,2023-04-05 05:09:44,"{'class': 'negative', 'polarity': -0.04, 'subj...",CoinTelegraph,bitcoin,"Ether has broken the $1,900 resistance level f...",ETH hits 7-month high ahead of Shanghai and Ca...,https://cointelegraph.com/news/eth-hits-7-mont...
3,2023-04-05 01:09:52,"{'class': 'positive', 'polarity': 0.07, 'subje...",CoinTelegraph,bitcoin,"With a new quarterly production record, Marath...","Marathon Digital posts quarterly record of 2,1...",https://cointelegraph.com/news/marathon-digita...
4,2023-04-04 23:49:00,"{'class': 'positive', 'polarity': 0.4, 'subjec...",CryptoPotato,altcoin,The stablecoin BTG Dol will supposedly become ...,Brazilian Finance Giant BTG Pactual to Issue a...,https://cryptopotato.com/brazilian-finance-gia...
...,...,...,...,...,...,...,...
18538,2021-10-27 15:17:00,"{'class': 'neutral', 'polarity': 0.0, 'subject...",CryptoNews,defi,Cream Finance (CREAM) suffered another flash l...,Cream Finance Suffers Another Exploit as Attac...,https://cryptonews.com/news/cream-finance-suff...
18539,2021-10-19 13:39:00,"{'class': 'positive', 'polarity': 0.1, 'subjec...",CryptoNews,blockchain,Banque de France disclosed the results of its ...,French Central Bank's Blockchain Bond Trial Br...,https://cryptonews.com/news/french-central-ban...
18540,2021-10-18 13:58:00,"{'class': 'positive', 'polarity': 0.14, 'subje...",CryptoNews,blockchain,Advancing its project to become \x9caÂ\xa0meta...,"Facebook To Add 10,000 Jobs In EU For Metavers...",https://cryptonews.com/news/facebook-to-add-10...
18541,2021-10-15 00:00:00,"{'class': 'neutral', 'polarity': 0.0, 'subject...",CryptoNews,blockchain,Chinese companies are still topping the blockc...,Tech Crackdown Hasn't Halted Chinese Firms' Bl...,https://cryptonews.com/news/tech-crackdown-has...


In [None]:
# Only load 3 relevant columns
# Note that the date is essential for training the model, especially when we are combining the models to boost signal generation
dataset = dataset[['date','title', 'sentiment']]
dataset

Unnamed: 0,date,title,sentiment
0,2023-04-05 06:52:09,Allbridge to first begin repaying stuck bridge...,"{'class': 'negative', 'polarity': -0.03, 'subj..."
1,2023-04-05 06:19:00,Bitcoin Hodl Patterns Indicate Cycle Shift to ...,"{'class': 'neutral', 'polarity': 0.0, 'subject..."
2,2023-04-05 05:09:44,ETH hits 7-month high ahead of Shanghai and Ca...,"{'class': 'negative', 'polarity': -0.04, 'subj..."
3,2023-04-05 01:09:52,"Marathon Digital posts quarterly record of 2,1...","{'class': 'positive', 'polarity': 0.07, 'subje..."
4,2023-04-04 23:49:00,Brazilian Finance Giant BTG Pactual to Issue a...,"{'class': 'positive', 'polarity': 0.4, 'subjec..."
...,...,...,...
18538,2021-10-27 15:17:00,Cream Finance Suffers Another Exploit as Attac...,"{'class': 'neutral', 'polarity': 0.0, 'subject..."
18539,2021-10-19 13:39:00,French Central Bank's Blockchain Bond Trial Br...,"{'class': 'positive', 'polarity': 0.1, 'subjec..."
18540,2021-10-18 13:58:00,"Facebook To Add 10,000 Jobs In EU For Metavers...","{'class': 'positive', 'polarity': 0.14, 'subje..."
18541,2021-10-15 00:00:00,Tech Crackdown Hasn't Halted Chinese Firms' Bl...,"{'class': 'neutral', 'polarity': 0.0, 'subject..."


In [None]:
# Copy the dataset so as to not tamper with the original csv file
df = dataset.copy()

def map_sentiment(sentiment_str: str) -> int:
    """
    Parses and maps a sentiment string to an integer label.

    Args:
        sentiment_str (str): A string representation of a dictionary containing
                             a 'class' key (e.g., "{'class': 'Positive'}").

    Returns:
        int: The mapped sentiment value:
             - 0 for 'negative'
             - 1 for 'neutral'
             - 2 for 'positive'
             - -1 for unknown classes or parsing errors
    """
    try:
        # Convert string to a Python dictionary
        sentiment_dict = ast.literal_eval(sentiment_str)
        # Extract the 'class' value and convert it to lowercase
        sentiment_class = sentiment_dict['class'].lower()
        
        # Map the sentiment class to an integer
        if sentiment_class == 'negative':
            return 0
        elif sentiment_class == 'neutral':
            return 1
        elif sentiment_class == 'positive':
            return 2
        else:
            return -1  # for unknown classes
    except:
        return -1  # in case of parsing errors

# Apply the mapping function to the sentiment column
df['sentiment'] = df['sentiment'].apply(map_sentiment)
df

Unnamed: 0,date,title,sentiment
0,2023-04-05 06:52:09,Allbridge to first begin repaying stuck bridge...,0
1,2023-04-05 06:19:00,Bitcoin Hodl Patterns Indicate Cycle Shift to ...,1
2,2023-04-05 05:09:44,ETH hits 7-month high ahead of Shanghai and Ca...,0
3,2023-04-05 01:09:52,"Marathon Digital posts quarterly record of 2,1...",2
4,2023-04-04 23:49:00,Brazilian Finance Giant BTG Pactual to Issue a...,2
...,...,...,...
18538,2021-10-27 15:17:00,Cream Finance Suffers Another Exploit as Attac...,1
18539,2021-10-19 13:39:00,French Central Bank's Blockchain Bond Trial Br...,2
18540,2021-10-18 13:58:00,"Facebook To Add 10,000 Jobs In EU For Metavers...",2
18541,2021-10-15 00:00:00,Tech Crackdown Hasn't Halted Chinese Firms' Bl...,1


In [None]:
# Count number of sentiments for each sentiment value
df['sentiment'].value_counts()

sentiment
2    8296
1    6417
0    3830
Name: count, dtype: int64

This shows that the sentiment is not exactly balanced in the dataset. This means that when training the model, it might learn to favor positive predictions

In [None]:
def clean_text(text: str) -> str:
    """
    Cleans a given text string by removing special characters and converting it to lowercase.

    Args:
        text (str): The input text to clean.

    Returns:
        str: A cleaned version of the text with:
             - All characters in lowercase
             - Punctuation removed
             - Leading and trailing whitespace stripped
    """
    text = str(text).lower()  # Ensure it's a string and lowercase
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation
    return text.strip()

# Apply the clean text format onto the dataset
df['title'] = df['title'].apply(clean_text)

In [None]:
# Words commonly used in market discussions that don't contribute much to sentiment
# Ideally, we would start with standard stopword lists from libraries like to handle common English words like "the" and "in." 
# Then, we would automatically build a domain-specific stopwords list by analyzing term frequency 
# in the dataset, identifying high-frequency terms that carry little sentiment. 
# However, due to time constraints and a lack of a proper dataset/API, we have opted for simpler approach
general_stopwords = [
    'price', 'market', 'today', 'time', 'day', 'week', 'year', 'bitcoin', 'ethereum', 
    'crypto', 'cryptocurrency', 'token', 'coin', 'btc', 'eth', 'xrp', 'altcoin', 
    'blockchain', 'exchange', 'investor', 'trading', 'trade', 'buy', 'sell', 
    'bullish', 'bearish', 'pump', 'dump', 'volume', 'liquidity', 'whale', 'whales',
    'tokenomics', 'supply', 'demand', 'chart', 'technical', 'analysis', 'ta', 'fa'
]
neutral_crypto_terms = [
    'defi', 'nft', 'dao', 'dapp', 'smart', 'contract', 'layer', 'l1', 'l2', 
    'staking', 'yield', 'farming', 'governance', 'validator', 'node', 'gas', 'fee',
    'wallet', 'address', 'transaction', 'block', 'mining', 'miner', 'pow', 'pos',
    'cex', 'dex', 'kyc', 'aml', 'airdop', 'ido', 'ieo', 'stablecoin', 'usdt', 'usdc'
]
financial_noise = [
    'dollar', 'usd', 'fiat', 'capital', 'investment', 'return', 'profit', 'loss', 
    'portfolio', 'hedge', 'risk', 'volatility', 'leverage', 'margin', 'short', 'long',
    'entry', 'exit', 'ath', 'atl', 'resistance', 'support', 'fibonacci', 'rsi', 'macd'
]
custom_stopwords = general_stopwords + neutral_crypto_terms + financial_noise

TF-IDF Vectorization Method

In [None]:
# Initialize TF-IDF Vectorizer to extract unigrams and bigrams
# - max_features: limits vocabulary size to top 100,000 terms
# - stop_words: use custom-defined stopwords to filter noise
# - ngram_range: include unigrams and bigrams
# - min_df: ignore terms that appear in fewer than 2 documents
# - max_df: ignore terms that appear in more than 95% of documents (too common)
tfidf = TfidfVectorizer(
    max_features=100000,               
    stop_words= custom_stopwords,            
    ngram_range=(1, 2),             
    min_df=2,                       
    max_df=0.95                     
)

# Transform the 'title' column into TF-IDF features
X_tfidf = tfidf.fit_transform(df['title'])

# Set target labels for model training
y = df['sentiment'] 

In [None]:
# Train-test split using TF-IDF features
X_train, X_test, y_train, y_test = train_test_split(
    X_tfidf, y, test_size=0.2, random_state=42
)

# Initialize Logistic Regression with increased max_iter to ensure convergence
model = LogisticRegression(max_iter=10000) 

# Compute sample weights to handle imbalanced classes as we've observed previously that the dataset is highly imbalanced
weights = class_weight.compute_sample_weight('balanced', y_train)

# Train the model using the computed class weights
model.fit(X_train, y_train, sample_weight = weights)
# Predict sentiment labels on the test set
y_pred = model.predict(X_test)

# Evaluate the model using classification metrics
print(classification_report(y_test, y_pred))
print("Accuracy:", accuracy_score(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.28      0.35      0.31       745
           1       0.43      0.43      0.43      1298
           2       0.53      0.47      0.50      1666

    accuracy                           0.43      3709
   macro avg       0.41      0.42      0.41      3709
weighted avg       0.45      0.43      0.44      3709

Accuracy: 0.4327311943920194


Precision: How many of the predicted class labels were actually correct<br>
Recall: How many actual class labels were correctly predicted<br>
F1-score: The balance between precision and recall (good for imbalanced data)<br>
Support: The number of actual examples for each class in the test set<br>

<div>From the summary table above, we can see that the TF-IDF model performs relatively poorly. There are several reasons to this:</div>
<div>- High Dimensionality -> Sparse Vectors. The vocalubary grows really large on a really big dataset. This makes learning difficult for this model because it's trying to find a pattern in a giant mesh of words</div>
<div>- TF-IDF doesn't learn. It only works on word frequency and no analysis of words is done</div>
<div>- Struggles with imbalanced data. As evident in the table analysis above, some classes have more samples than others. Even if I've specificed for the model to pay more attention to underrepresented classes, it still struggles.</div>
    
<br>Hence, we can conclude that this is not that great of a model when it comes to sentiment analysis for a big dataset<br>

In [None]:
# Save both model and vectorizer so we don't have to train the model again next time and we can reuse it to predict
# The reason why we need to save both the model and vectorizer is because TF-IDF vectorizer transforms raw text into numerical vectors
# Meanwhile, the model doesn’t work on raw text — it only understands numbers.
joblib.dump(model, 'tfidf_sentiment_model.pkl')
joblib.dump(tfidf, 'tfidf_vectorizer.pkl')

['tfidf_vectorizer.pkl']

In [11]:
# When using the model, load the saved model and TF-IDF vectorizer
model = joblib.load('tfidf_sentiment_model.pkl')
vectorizer = joblib.load('tfidf_vectorizer.pkl')

texts = df['title']

# Transform the texts using the vectorizer
X_vectorized = vectorizer.transform(texts)

# Predict using the trained model
df['predicted_sentiment'] = model.predict(X_vectorized)

# Save the predictions to a new CSV
df.to_csv("tfidf_predicted_sentiment_output.csv", index=False)

Transformers (BERT model) Approach

In [None]:
df2 = dataset.copy()

# Apply the mapping function to the sentiment column
df2['sentiment'] = df2['sentiment'].apply(map_sentiment)
df2

Unnamed: 0,date,title,sentiment
0,2023-04-05 06:52:09,Allbridge to first begin repaying stuck bridge...,0
1,2023-04-05 06:19:00,Bitcoin Hodl Patterns Indicate Cycle Shift to ...,1
2,2023-04-05 05:09:44,ETH hits 7-month high ahead of Shanghai and Ca...,0
3,2023-04-05 01:09:52,"Marathon Digital posts quarterly record of 2,1...",2
4,2023-04-04 23:49:00,Brazilian Finance Giant BTG Pactual to Issue a...,2
...,...,...,...
18538,2021-10-27 15:17:00,Cream Finance Suffers Another Exploit as Attac...,1
18539,2021-10-19 13:39:00,French Central Bank's Blockchain Bond Trial Br...,2
18540,2021-10-18 13:58:00,"Facebook To Add 10,000 Jobs In EU For Metavers...",2
18541,2021-10-15 00:00:00,Tech Crackdown Hasn't Halted Chinese Firms' Bl...,1


In [None]:
# Change to hugging face dataset to apply to the BERT model
df2 = Dataset.from_pandas(df2)
# Split the dataset into training and testing sets
df_BERT = df2.train_test_split(test_size = 0.2)

# Load the DistilBERT tokenizer (uncased version for lowercase text handling)
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")

# Tokenize the dataset
def tokenize_function(examples: Dict[str, List[Any]]) -> Dict[str, List[Any]]:
    """
    Tokenizes a batch of examples using the DistilBERT tokenizer.

    This function is intended to be used with Hugging Face Datasets' `.map()` method
    with `batched=True`. It processes a batch of text samples and prepares them
    for input into a transformer model.

    Args:
        examples (Dict[str, List[Any]]): A dictionary where each key corresponds
                                         to a field in the dataset. Must contain:
                                         - "title": a list of tweet texts (strings)
                                         - "sentiment": a list of integer labels

    Returns:
        Dict[str, List[Any]]: A dictionary containing tokenized BERT inputs:
            - "input_ids": list of token ID sequences (integers)
            - "attention_mask": list of masks (1 for real token, 0 for padding)
            - "labels": original sentiment labels copied from the input, required
                        for supervised training (classification)
    """
    tokenized_inputs = tokenizer(
        examples["title"], 
        padding="max_length", 
        truncation=True,
        max_length=128,
    )
    # Add labels to the tokenized inputs
    tokenized_inputs["labels"] = examples["sentiment"] 
    return tokenized_inputs

tokenized_dataset = df_BERT.map(tokenize_function, batched = True)

Map: 100%|██████████| 14834/14834 [00:03<00:00, 4027.84 examples/s]
Map: 100%|██████████| 3709/3709 [00:00<00:00, 4311.32 examples/s]


Ideally we would use ProsusAI/finbert model as it is more suitable for crypto market

In [None]:
# Load a pretrained DistilBERT model for sequence classification
model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=3,                                           # 3 classes: negative, neutral, positive
    ignore_mismatched_sizes=True ,                          # Allows loading even if the classifier head doesn't match
    id2label={0: "negative", 1: "neutral", 2: "positive"}
)

# Set device to GPU if available, else fallback to CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Move the model to the appropriate device for training/inference
model = model.to(device)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
# Define training hyperparameters for Hugging Face Trainer
training_args = TrainingArguments(
    output_dir="./distilbert_finetuned",  # Directory to save model checkpoints
    eval_strategy="epoch",                # Evaluate at the end of each epoch
    learning_rate=2e-5,                   # Learning rate for AdamW optimizer (more consistent regularization and better generalization)
    per_device_train_batch_size=16,       # Batch size per device during training
    per_device_eval_batch_size=16,        # Batch size per device during evaluation
    num_train_epochs=3,                   # Number of training epochs
    weight_decay=0.01,                    # L2 regularization strength (reduce overfitting in a machine learning model)
    save_strategy="epoch",                # Save model checkpoint at each epoch
    load_best_model_at_end=True,          # Load best model based on eval loss
    fp16=True,                            # Use mixed precision if NVIDIA GPU is available
    gradient_accumulation_steps=2,        # Accumulate gradients to simulate larger batch size
    dataloader_num_workers=4,             # Number of subprocesses for data loading (speed boost)
)

In [None]:
def compute_metrics(eval_pred: tuple) -> dict:
    """
    Computes evaluation metrics: accuracy and weighted F1 score for model predictions.

    Args:
        eval_pred (tuple): A tuple containing two elements:
            - logits (numpy.ndarray): The raw output from the model (predicted scores).
            - labels (numpy.ndarray): The true labels (ground truth).

    Returns:
        dict: A dictionary with the following keys:
            - 'accuracy': Accuracy score (correct predictions / total predictions)
            - 'f1': Weighted F1 score (harmonic mean of precision and recall, weighted by class support)
    """
    # Unpack the tuple into logits (model outputs) and labels (true values)
    logits, labels = eval_pred

    # Convert the logits (model outputs) to predicted class labels (0, 1, 2 for negative, neutral, positive)
    predictions = np.argmax(logits, axis=-1)
    
    # Return the metrics in a dictionary
    return {
        "accuracy": accuracy_score(labels, predictions),
        "f1": f1_score(labels, predictions, average="weighted"),
    }

In [None]:
# Trainer setup
trainer = Trainer(
    model=model,                                # The model to be trained (DistilBERT in this case)
    args=training_args,                         # Hyperparameters and training configuration (learning rate, batch size, etc.)
    train_dataset=tokenized_dataset["train"],   # The training dataset (tokenized data)
    eval_dataset=tokenized_dataset["test"],     # The evaluation dataset (tokenized data)
    compute_metrics=compute_metrics,            # The function to compute evaluation metrics (accuracy and F1 score)
)

# Start training
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,F1
1,No log,1.018392,0.495282,0.427845
2,1.029900,1.008655,0.497169,0.439211
3,0.976200,1.018938,0.489889,0.474548


TrainOutput(global_step=1392, training_loss=0.9812447997345322, metrics={'train_runtime': 319.9338, 'train_samples_per_second': 139.098, 'train_steps_per_second': 4.351, 'total_flos': 1473792326272512.0, 'train_loss': 0.9812447997345322, 'epoch': 3.0})

The reason why this model is also not providing good results, is likely due to:
<div>- Texts too short/vague (headlines without clear sentiment cues)</div>
<div>- DistilBERT may be too small for this task</div>
<div>- Too much noise in the dataset</div>

<br>  When training NLP models, having too much noise in the data—such as irrelevant content, typos, sarcasm, or inconsistent labeling can negatively affect the model’s ability to learn meaningful patterns. This leads to poor generalization and lower accuracy as shown above, especially in real-world applications. Denoising helps clean the data to ensure the model focuses on relevant features and learns more effectively. However, due to time constraints, we are currently unable to perform a proper denoising process. However, if we were to have a proper dataset and with proper denoising techniques, we are confident to be able to improve our signal generation.</br>
<br> Besides, looking at Training Loss (0.97) and Validation Loss (~1.03), which are almost identical. This means the model isn't improving on the training data, and it's not generalizing to new data (validation set).</br>

<br> However, on the bright side, this means that there are more room for improvement for the current NLP model</br>

In [11]:
# Save the final model
trainer.save_model("bert-sentiment-model")
tokenizer.save_pretrained("bert-tokenizer")

('bert-tokenizer\\tokenizer_config.json',
 'bert-tokenizer\\special_tokens_map.json',
 'bert-tokenizer\\vocab.txt',
 'bert-tokenizer\\added_tokens.json')

In [13]:
# When using the model, load the saved model and tokenizer
model = DistilBertForSequenceClassification.from_pretrained("bert-sentiment-model")
tokenizer = DistilBertTokenizer.from_pretrained("bert-tokenizer")
model.eval() # This prevents the model to keep training (provides different output)

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
