This code imports the necessary Python libraries for fine-tuning and evaluating the RoBERTa model. Its purpose is to provide tools for loading datasets, tokenizing text, defining the model, setting up training, and computing performance metrics.

In [1]:
# ==============================================
# 1. IMPORT LIBRARIES
# ==============================================
import torch
from datasets import load_dataset
from transformers import RobertaForSequenceClassification, RobertaTokenizerFast
from transformers import TrainingArguments, Trainer
import numpy as np
from sklearn.metrics import accuracy_score, f1_score

This code sets up the computing device, choosing a GPU if available for faster model training, otherwise falling back to CPU. It then mounts Google Drive to access the dataset and loads a CSV file containing mental health-related text posts. The dataset is split into training and evaluation sets using a stratified 80/20 split to maintain class balance. Finally, the data is converted into Hugging Face Dataset objects for compatibility with the RoBERTa training pipeline.

In [3]:
# ==============================================
# 2. DEVICE SETUP (GPU / CPU)
# ==============================================
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("GPU not available, using CPU.")

print("\n--- Loading and Preprocessing Data ---")

from google.colab import drive
drive.mount('/content/drive')

from datasets import load_dataset
dataset = load_dataset("csv", data_files="/content/drive/MyDrive/OSM-Final-Project/Mental-Health-Twitter.csv")

# Use a smaller subset to simulate an undergraduate project scale
# The dataset only has a 'train' split, so we will use subsets of 'train' for both
# training and evaluation data.
import pandas as pd
from sklearn.model_selection import train_test_split
from datasets import Dataset

# Convert the dataset to a pandas DataFrame
df = dataset["train"].to_pandas()

# Stratified 80/20 split to keep class balance
train_df, eval_df = train_test_split(
    df,
    test_size=0.2,
    # Stratify by the 'label' column, assuming it exists and contains the class labels
    stratify=df["label"],
    random_state=42
)

# Convert back to Hugging Face Dataset objects
train_data = Dataset.from_pandas(train_df.reset_index(drop=True))
eval_data  = Dataset.from_pandas(eval_df.reset_index(drop=True))

Using GPU: Tesla T4

--- Loading and Preprocessing Data ---
Mounted at /content/drive


Generating train split: 0 examples [00:00, ? examples/s]

In this section, the RoBERTa tokenizer is initialized using the pre-trained model "margotwagner/roberta-psychotherapy-eval" to convert raw text posts into numerical token IDs that the model can process. A custom tokenize_function is defined to handle truncation and padding, ensuring that all input sequences have a uniform length. Both the training and evaluation datasets are tokenized in a batched manner for efficiency. The label column is renamed to "labels" to maintain compatibility with the Hugging Face Trainer API, and the tokenized datasets are formatted as PyTorch tensors containing input IDs, attention masks, and labels. This preprocessing step ensures that the data is fully prepared for RoBERTa fine-tuning, enabling smooth training and accurate evaluation of the model on the depression detection task.

In [4]:
# ==============================================
# 3. TOKENIZER SETUP
# ==============================================
MODEL_NAME = "margotwagner/roberta-psychotherapy-eval"
tokenizer = RobertaTokenizerFast.from_pretrained(MODEL_NAME)

def tokenize_function(examples):
    # Converts text into token IDs
    return tokenizer(examples["post_text"], truncation=True, padding=True)

# Tokenize the data
tokenized_train = train_data.map(tokenize_function, batched=True)
tokenized_eval = eval_data.map(tokenize_function, batched=True)

# Rename label column for Hugging Face Trainer
# **FIX**: Replace 'label' with the actual column name containing the labels if it's not 'label'
tokenized_train = tokenized_train.rename_column("label", "labels")
tokenized_eval = tokenized_eval.rename_column("label", "labels")

# Convert to PyTorch tensors
tokenized_train.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
tokenized_eval.set_format("torch", columns=["input_ids", "attention_mask", "labels"])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Map:   0%|          | 0/16000 [00:00<?, ? examples/s]

Map:   0%|          | 0/4000 [00:00<?, ? examples/s]

This code loads the pre-trained RoBERTa model for sequence classification and adapts it to a binary classification task with two labels. The model is moved to the selected device, either GPU or CPU, for efficient computation. Loading the pre-trained weights allows the model to leverage existing language representations while being ready for fine-tuning on the depression detection dataset.

In [5]:
# ==============================================
# 4. MODEL DEFINITION
# ==============================================
model = RobertaForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2).to(device)
print(f"Model loaded: {MODEL_NAME}")

config.json:   0%|          | 0.00/886 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Model loaded: margotwagner/roberta-psychotherapy-eval


This code defines the evaluation metrics and prepares the training process for the fine-tuned RoBERTa model. The compute_metrics function calculates accuracy and F1 score, providing a clear measure of the model’s performance on the binary depression classification task.

The TrainingArguments object specifies key hyperparameters. Some additional settings include saving and evaluating the model at the end of each epoch, using 16-bit precision if a GPU is available, and disabling external logging for simplicity. The Trainer class combines the model, training arguments, tokenized datasets, metrics function, and tokenizer into a single training pipeline. This setup allows the model to learn efficiently from the training data while being regularly evaluated on the validation set. The hyperparameters were chosen based on prior experimentation to balance training stability, accuracy, and computational efficiency.

In [10]:
# ==============================================
# 5. METRICS AND TRAINING SETUP
# ==============================================
def compute_metrics(p):
    preds = np.argmax(p.predictions, axis=1)
    acc = accuracy_score(p.label_ids, preds)
    f1 = f1_score(p.label_ids, preds, average="binary")
    return {"accuracy": acc, "f1": f1}

training_args = TrainingArguments(
    output_dir="./results",

    learning_rate=3e-5,
    per_device_eval_batch_size=16,
    warmup_steps=3000,

    eval_strategy="epoch",
    logging_dir="./logs",
    save_strategy="epoch",
    load_best_model_at_end=True,
    fp16=torch.cuda.is_available(),  # Use 16-bit precision if GPU is available
    report_to=[]  # Disable W&B logging
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

  trainer = Trainer(


This code initiates the fine-tuning process for the RoBERTa model using the previously defined training setup. The trainer.train() function performs iterative weight updates on the model based on the training dataset. During training, the model’s performance is monitored on the evaluation set at the end of each epoch. The process is expected to take between 1 to 4 hours on a GPU, depending on the dataset size and hardware. This step is essential for adapting the pre-trained RoBERTa model to effectively detect depressive cues in social media text.

In [11]:
# ==============================================
# 6. EXECUTION - TRAINING
# ==============================================
print("\n--- Starting Fine-Tuning (Expected Time: 1–4 hours on GPU) ---")
trainer.train()


--- Starting Fine-Tuning (Expected Time: 1–4 hours on GPU) ---


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.128,0.482203,0.9125,0.910486
2,0.1504,0.481297,0.904,0.900415
3,0.093,0.502302,0.9205,0.92046


TrainOutput(global_step=6000, training_loss=0.1441794007619222, metrics={'train_runtime': 666.4312, 'train_samples_per_second': 72.025, 'train_steps_per_second': 9.003, 'total_flos': 3855839071867680.0, 'train_loss': 0.1441794007619222, 'epoch': 3.0})

After completing the training, the model is evaluated on the held-out validation set to measure its performance. The trainer.evaluate() function calculates metrics such as accuracy and F1 score, providing insight into how well the model generalizes to unseen data. The evaluation results are printed to allow the researcher to compare different hyperparameter configurations. The save_model() function stores the best-performing model checkpoint for future use or deployment. This step ensures that the model achieving the highest validation performance is preserved.

In [12]:
# ==============================================
# 7. FINAL EVALUATION
# ==============================================
print("\n--- Final Evaluation Results ---")
eval_results = trainer.evaluate()
print(eval_results)

# Save best model checkpoint for future inference
trainer.save_model(training_args.output_dir + "/best_model")

print("\nFine-tuning process complete. The resulting model can now be used for inference.")


--- Final Evaluation Results ---


{'eval_loss': 0.48129695653915405, 'eval_accuracy': 0.904, 'eval_f1': 0.9004149377593361, 'eval_runtime': 5.558, 'eval_samples_per_second': 719.686, 'eval_steps_per_second': 44.98, 'epoch': 3.0}

Fine-tuning process complete. The resulting model can now be used for inference.


The inference pipeline applies the fine-tuned RoBERTa model to new, unlabeled text data using the Hugging Face pipeline API. This allows the model to predict whether posts indicate signs of depression based on learned linguistic patterns. The tokenizer and model are loaded into memory, and the pipeline automatically handles tokenization, encoding, and prediction. Example posts are passed through the pipeline to demonstrate how predictions and confidence scores are generated. Each prediction is interpreted as either positive or negative for depressive cues. This process shows how the model can analyze unseen social media posts efficiently. The pipeline can be extended to larger datasets for structured sentiment analysis. This part highlights the model’s generalization ability and readiness for practical mental health monitoring applications.

In [13]:
# ==============================================
# 8. INFERENCE PIPELINE (TESTING ON NEW DATA)
# ==============================================
from transformers import pipeline

# If restarting Colab, reload saved model path
# MODEL_PATH = "./sentiment_distilbert_best"

sentiment_analyzer = pipeline(
    "sentiment-analysis",
    model=model,            # model still in memory
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1
)

# Example text data (mimicking real-world project reviews)
new_data = [
    "This system is incredibly slow and completely useless for disaster management.",
    "The accuracy is amazing and the new dashboard makes resource allocation simple.",
    "The committee was very critical of the project's limited scope."
]

print("\n--- Running Inference on Unlabeled Data ---")
results = sentiment_analyzer(new_data)

# Print results
for text, result in zip(new_data, results):
    sentiment = "Positive" if result["label"] == "LABEL_1" else "Negative"
    print(f"\nText: {text}")
    print(f"Prediction: {sentiment} (Score: {result['score']:.4f})")

print("\n--- Next Steps ---")
print("You may now apply this analyzer to your larger dataset for structured sentiment analysis.")

Device set to use cuda:0



--- Running Inference on Unlabeled Data ---

Text: This system is incredibly slow and completely useless for disaster management.
Prediction: Negative (Score: 0.9974)

Text: The accuracy is amazing and the new dashboard makes resource allocation simple.
Prediction: Negative (Score: 0.9989)

Text: The committee was very critical of the project's limited scope.
Prediction: Negative (Score: 0.9993)

--- Next Steps ---
You may now apply this analyzer to your larger dataset for structured sentiment analysis.
