# üé≠ Emotion Detection from Text
### Beyond Positive/Negative ‚Äî Detecting 6 Human Emotions using DistilBERT
**Emotions:** üò† Anger ¬∑ üò¢ Sadness ¬∑ üò® Fear ¬∑ ü§© Joy ¬∑ üò≤ Surprise ¬∑ ü§¢ Disgust

> Make sure you have **GPU enabled**: Runtime > Change runtime type > T4 GPU

## üì¶ Cell 1 ‚Äî Install Libraries

In [1]:
!pip install datasets transformers torch scikit-learn plotly wordcloud streamlit pyngrok -q


[notice] A new release of pip is available: 25.2 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## üì• Cell 2 ‚Äî Load Dataset from HuggingFace

In [3]:
from datasets import load_dataset
import pandas as pd

# Load the emotion dataset (6 emotions, ~20k samples)
dataset = load_dataset('dair-ai/emotion')

# Convert to pandas
train_df = pd.DataFrame(dataset['train'])
test_df  = pd.DataFrame(dataset['test'])

# Emotion label mapping
label_map = {0: 'Sadness üò¢', 1: 'Joy ü§©', 2: 'Love ‚ù§Ô∏è', 3: 'Anger üò†', 4: 'Fear üò®', 5: 'Surprise üò≤'}
train_df['emotion'] = train_df['label'].map(label_map)

print(f'Training samples: {len(train_df)}')
print(f'Test samples: {len(test_df)}')
print('\nEmotion Distribution:')
print(train_df['emotion'].value_counts())

Training samples: 16000
Test samples: 2000

Emotion Distribution:
emotion
Joy ü§©         5362
Sadness üò¢     4666
Anger üò†       2159
Fear üò®        1937
Love ‚ù§Ô∏è       1304
Surprise üò≤     572
Name: count, dtype: int64


## üìä Cell 3 ‚Äî Explore & Visualize the Dataset

In [4]:
import plotly.express as px
import plotly.graph_objects as go

# Emotion distribution bar chart
emotion_counts = train_df['emotion'].value_counts().reset_index()
emotion_counts.columns = ['Emotion', 'Count']

fig = px.bar(
    emotion_counts,
    x='Emotion', y='Count',
    color='Emotion',
    title='üìä Emotion Distribution in Training Data',
    color_discrete_sequence=px.colors.qualitative.Bold
)
fig.update_layout(showlegend=False, plot_bgcolor='white')
fig.show()

# Show sample texts
print('\nüîç Sample texts:')
for emotion in train_df['emotion'].unique():
    sample = train_df[train_df['emotion'] == emotion]['text'].iloc[0]
    print(f'{emotion}: "{sample[:80]}..."')


üîç Sample texts:
Sadness üò¢: "i didnt feel humiliated..."
Anger üò†: "im grabbing a minute to post i feel greedy wrong..."
Love ‚ù§Ô∏è: "i am ever feeling nostalgic about the fireplace i will know that it is still on ..."
Surprise üò≤: "ive been taking or milligrams or times recommended amount and ive fallen asleep ..."
Fear üò®: "i feel as confused about life as a teenager or as jaded as a year old man..."
Joy ü§©: "i have been with petronas for years i feel that petronas has performed well and ..."


## ü§ñ Cell 4 ‚Äî Fine-tune DistilBERT Model

In [5]:
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification, Trainer, TrainingArguments
import torch
from torch.utils.data import Dataset

# Check GPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using device: {device} üöÄ')

# Load tokenizer
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')

# Custom Dataset class
class EmotionDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_len=128):
        self.encodings = tokenizer(texts, truncation=True, padding=True, max_length=max_len)
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

# Prepare data
train_texts = list(dataset['train']['text'])
train_labels = list(dataset['train']['label'])
test_texts  = list(dataset['test']['text'])
test_labels  = list(dataset['test']['label'])

train_dataset = EmotionDataset(train_texts, train_labels, tokenizer)
test_dataset  = EmotionDataset(test_texts, test_labels, tokenizer)

print(f'Train dataset size: {len(train_dataset)}')
print(f'Test dataset size: {len(test_dataset)}')

RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
Failed to import transformers.modeling_tf_utils because of the following error (look up to see its traceback):
Your currently installed version of Keras is Keras 3, but this is not yet supported in Transformers. Please install the backwards-compatible tf-keras package with `pip install tf-keras`.

In [None]:
from sklearn.metrics import accuracy_score, f1_score
import numpy as np

# Load DistilBERT for classification (6 labels)
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=6)

# Metrics function
def compute_metrics(pred):
    labels = pred.label_ids
    preds  = pred.predictions.argmax(-1)
    acc = accuracy_score(labels, preds)
    f1  = f1_score(labels, preds, average='weighted')
    return {'accuracy': acc, 'f1': f1}

# Training arguments
training_args = TrainingArguments(
    output_dir='./emotion_model',
    num_train_epochs=3,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    warmup_steps=200,
    weight_decay=0.01,
    logging_dir='./logs',
    evaluation_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    logging_steps=50,
    report_to='none'
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics
)

# Train! ‚è≥ (~15-20 mins on T4 GPU)
print('üöÄ Starting training...')
trainer.train()
print('‚úÖ Training complete!')

## üìà Cell 5 ‚Äî Evaluate & Visualize Results

In [None]:
from sklearn.metrics import classification_report, confusion_matrix
import plotly.figure_factory as ff

# Get predictions
predictions = trainer.predict(test_dataset)
preds = predictions.predictions.argmax(-1)
true_labels = predictions.label_ids

emotion_names = ['Sadness üò¢', 'Joy ü§©', 'Love ‚ù§Ô∏è', 'Anger üò†', 'Fear üò®', 'Surprise üò≤']

# Classification report
print('üìä Classification Report:')
print(classification_report(true_labels, preds, target_names=emotion_names))

# Confusion Matrix Heatmap
cm = confusion_matrix(true_labels, preds)
fig = ff.create_annotated_heatmap(
    z=cm,
    x=emotion_names,
    y=emotion_names,
    colorscale='Blues'
)
fig.update_layout(title='üéØ Confusion Matrix', xaxis_title='Predicted', yaxis_title='Actual')
fig.show()

## üíæ Cell 6 ‚Äî Save the Model

In [None]:
# Save model and tokenizer
model.save_pretrained('./saved_emotion_model')
tokenizer.save_pretrained('./saved_emotion_model')
print('‚úÖ Model saved to ./saved_emotion_model')

## üåê Cell 7 ‚Äî Build & Launch Streamlit Dashboard

In [None]:
# Write the Streamlit app to a file
app_code = '''
import streamlit as st
import torch
import plotly.graph_objects as go
import plotly.express as px
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch.nn.functional as F
import pandas as pd

# Page config
st.set_page_config(page_title="üé≠ Emotion Detector", layout="wide")

# Emotion config
EMOTIONS = [
    {"name": "Sadness",  "emoji": "üò¢", "color": "#6495ED"},
    {"name": "Joy",      "emoji": "ü§©", "color": "#FFD700"},
    {"name": "Love",     "emoji": "‚ù§Ô∏è",  "color": "#FF69B4"},
    {"name": "Anger",    "emoji": "üò†", "color": "#FF4500"},
    {"name": "Fear",     "emoji": "üò®", "color": "#9370DB"},
    {"name": "Surprise", "emoji": "üò≤", "color": "#32CD32"},
]

@st.cache_resource
def load_model():
    tokenizer = DistilBertTokenizerFast.from_pretrained("./saved_emotion_model")
    model = DistilBertForSequenceClassification.from_pretrained("./saved_emotion_model")
    model.eval()
    return tokenizer, model

def predict_emotion(text, tokenizer, model):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
    probs = F.softmax(outputs.logits, dim=-1).squeeze().tolist()
    pred_idx = probs.index(max(probs))
    return pred_idx, probs

# Load model
tokenizer, model = load_model()

# Header
st.title("üé≠ Emotion Detection from Text")
st.markdown("**Beyond Positive/Negative ‚Äî Detect the real human emotion behind any text!**")
st.divider()

# Input
col1, col2 = st.columns([2, 1])
with col1:
    user_input = st.text_area("‚úçÔ∏è Enter any text here:", height=150,
        placeholder="e.g. I just got my first job offer! I can't believe it!")
    detect_btn = st.button("üîç Detect Emotion", use_container_width=True, type="primary")

# History tracker
if "history" not in st.session_state:
    st.session_state.history = []

if detect_btn and user_input.strip():
    pred_idx, probs = predict_emotion(user_input, tokenizer, model)
    emotion = EMOTIONS[pred_idx]

    # Store history
    st.session_state.history.append({
        "text": user_input[:60] + "...",
        "emotion": f"{emotion[\"emoji\"]} {emotion[\"name\"]}",
        "confidence": f"{max(probs)*100:.1f}%"
    })

    # Result card
    st.markdown(f"""
    <div style="background:{emotion[\"color\"]}22; border-left: 5px solid {emotion[\"color\"]};
    padding:20px; border-radius:10px; margin:10px 0">
        <h2>{emotion[\"emoji\"]} Detected Emotion: <b>{emotion[\"name\"]}</b></h2>
        <h4>Confidence: {max(probs)*100:.1f}%</h4>
    </div>
    """, unsafe_allow_html=True)

    st.divider()

    # Charts
    c1, c2 = st.columns(2)
    with c1:
        # Bar chart - all emotion probabilities
        labels = [f"{e[\"emoji\"]} {e[\"name\"]}" for e in EMOTIONS]
        colors = [e[\"color\"] for e in EMOTIONS]
        fig_bar = go.Figure(go.Bar(
            x=labels, y=[p*100 for p in probs],
            marker_color=colors, text=[f"{p*100:.1f}%" for p in probs],
            textposition="outside"
        ))
        fig_bar.update_layout(title="üìä Emotion Probability Distribution",
            yaxis_title="Probability (%)", plot_bgcolor="white", showlegend=False)
        st.plotly_chart(fig_bar, use_container_width=True)

    with c2:
        # Gauge meter for top emotion
        fig_gauge = go.Figure(go.Indicator(
            mode="gauge+number+delta",
            value=max(probs)*100,
            title={"text": f"{emotion[\"emoji\"]} {emotion[\"name\"]} Confidence"},
            gauge={
                "axis": {"range": [0, 100]},
                "bar": {"color": emotion[\"color\"]},
                "steps": [
                    {"range": [0, 40], "color": "#f0f0f0"},
                    {"range": [40, 70], "color": "#e0e0e0"},
                    {"range": [70, 100], "color": "#d0d0d0"}
                ]
            }
        ))
        fig_gauge.update_layout(title="üéØ Confidence Meter")
        st.plotly_chart(fig_gauge, use_container_width=True)

# History table
if st.session_state.history:
    st.divider()
    st.subheader("üïê Emotion History")
    history_df = pd.DataFrame(st.session_state.history)
    st.dataframe(history_df, use_container_width=True)

    # History emotion frequency
    freq = history_df["emotion"].value_counts().reset_index()
    freq.columns = ["Emotion", "Count"]
    fig_pie = px.pie(freq, names="Emotion", values="Count",
        title="ü•ß Your Emotion History Breakdown",
        color_discrete_sequence=px.colors.qualitative.Bold)
    st.plotly_chart(fig_pie, use_container_width=True)
'''

with open('app.py', 'w') as f:
    f.write(app_code)

print('‚úÖ Streamlit app written to app.py')

In [None]:
# Launch the Streamlit app via pyngrok
from pyngrok import ngrok
import subprocess, time

# Kill any existing streamlit
!pkill -f streamlit 2>/dev/null
time.sleep(2)

# Start streamlit in background
process = subprocess.Popen(['streamlit', 'run', 'app.py', '--server.port=8501', '--server.headless=true'])
time.sleep(5)

# Create public tunnel
public_url = ngrok.connect(8501)
print(f'üåê Your Dashboard is LIVE at: {public_url}')
print('üëÜ Click the link above to open your Emotion Detection Dashboard!')

## üéâ You're Done!
### What you built today:
- ‚úÖ Loaded a real NLP dataset from HuggingFace
- ‚úÖ Fine-tuned DistilBERT for 6-class emotion detection
- ‚úÖ Built a real-time interactive dashboard with Streamlit
- ‚úÖ Visualized results with Plotly (bar chart, gauge meter, pie chart, history tracker)

### üì∏ Next Steps:
1. **Record your dashboard** using Loom or OBS (free)
2. **Screenshot** the emotion charts
3. **Post on LinkedIn** ‚Äî tag it with #NLP #DeepLearning #HuggingFace #AIML üöÄ