# Sentiment Analysis Training with Qwen2-1.5B

This notebook trains a sentiment analysis model using:
- **Model**: Qwen2-1.5B with LoRA fine-tuning
- **Datasets**: Amazon Reviews + Twitter Sentiment140
- **Task**: Binary sentiment classification (positive/negative)

**Platforms**: Google Colab, Kaggle
**GPU**: Recommended for faster training


## Setup and Installation


In [None]:
# Install required packages
!pip install transformers==4.44.0
!pip install peft==0.12.0
!pip install datasets==2.20.0
!pip install accelerate==0.33.0
!pip install torch>=2.0.0
!pip install scikit-learn
!pip install pandas
!pip install numpy
!pip install wandb
!pip install regex
!pip install nltk


In [None]:
import torch
import pandas as pd
import numpy as np
import json
import os
from datasets import Dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model, TaskType
import logging

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Check GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")


## Upload and Load Data

**Important**: Upload your `sentiment_analysis_data.zip` file first, then run the cell below.


In [None]:
# Extract uploaded data
!unzip -o sentiment_analysis_data.zip

# Load data function
def load_jsonl_data(file_path):
    """Load data from JSONL file."""
    data = []
    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            data.append(json.loads(line.strip()))
    return data

# Load datasets
train_data = load_jsonl_data("train.jsonl")
val_data = load_jsonl_data("validation.jsonl")

print(f"✅ Training examples: {len(train_data):,}")
print(f"✅ Validation examples: {len(val_data):,}")

# Show sample
print(f"\n📝 Sample training example:")
sample = train_data[0]
print(f"Text: {sample['text'][:150]}...")
print(f"Label: {sample['label']} ({sample.get('sentiment', 'N/A')})")
print(f"Source: {sample.get('dataset_source', 'N/A')}")
