# Bahar - Multilingual Emotion Classification Demo

This notebook demonstrates emotion classification using the **GoEmotions dataset** taxonomy.

**Supported Languages:** English, Dutch, Persian (and others)

**28 Emotion Categories:**
- **Positive (12):** admiration, amusement, approval, caring, desire, excitement, gratitude, joy, love, optimism, pride, relief
- **Negative (11):** anger, annoyance, disappointment, disapproval, disgust, embarrassment, fear, grief, nervousness, remorse, sadness
- **Ambiguous (4):** confusion, curiosity, realization, surprise
- **Neutral (1):** neutral

**Reference:** [GoEmotions Research Blog](https://research.google/blog/goemotions-a-dataset-for-fine-grained-emotion-classification/)


## 1. Setup and Installation

First, let's import the required modules and initialize the classifier.


In [1]:
# ============================================================================
# ALL IMPORTS - Consolidated and sorted
# ============================================================================

# Standard library imports
from collections import Counter

# Bahar package imports - Analyzers
from bahar.analyzers import EmotionAnalyzer, EnhancedAnalyzer
from bahar.analyzers.enhanced_analyzer import (
    export_to_academic_format,
    format_enhanced_output,
)

# Bahar package imports - GoEmotions dataset
from bahar.datasets.goemotions import (
    EMOTION_GROUPS,
    GOEMOTIONS_EMOTIONS,
    SAMPLE_TEXTS,
)
from bahar.datasets.goemotions.result import format_emotion_output
from bahar.datasets.goemotions.samples import get_samples_by_language

# Bahar package imports - Rich output utilities
from bahar.utils.rich_output import (
    console,
    print_header,
    print_info,
    print_section,
    print_success,
)

# Rich library imports
from rich.console import Group
from rich.panel import Panel
from rich.table import Table
from rich.text import Text

# ============================================================================
# Display initial information
# ============================================================================

print_success(f"Total emotions in GoEmotions taxonomy: {len(GOEMOTIONS_EMOTIONS)}")
print_info(f"Emotion groups: {', '.join(list(EMOTION_GROUPS.keys()))}")


  from .autonotebook import tqdm as notebook_tqdm


## 2. Initialize the Classifier

Load the pre-trained GoEmotions model. This will download ~400MB on first run.


In [2]:
# Initialize and load the classifier using EmotionAnalyzer
# Using English with GoEmotions model (default)
classifier = EmotionAnalyzer(language="english", model_key="goemotions")

print_info("Loading model... (this may take a minute on first run)")
classifier.load_model()
print_success("Model loaded successfully!")


## 3. View the Complete Emotion Taxonomy


In [3]:
# Display all emotions grouped by sentiment using Rich
table = Table(title="GoEmotions Taxonomy", show_header=True, header_style="bold cyan")
table.add_column("Group", style="yellow", width=15)
table.add_column("Count", style="magenta", justify="right", width=8)
table.add_column("Emotions", style="white")

for group, emotions in EMOTION_GROUPS.items():
    emotion_list = ', '.join(emotions)
    # Color code by sentiment
    if group == "positive":
        style = "green"
    elif group == "negative":
        style = "red"
    elif group == "ambiguous":
        style = "yellow"
    else:
        style = "white"

    table.add_row(
        f"[{style}]{group.upper()}[/{style}]",
        str(len(emotions)),
        emotion_list
    )

console.print(table)


## 4. Single Text Classification

Let's classify a single text and see the results.


In [4]:
# Classify a single text
text = "I'm so excited about this amazing opportunity! This is going to be great!"

result = classifier.analyze(text, top_k=3)

# Use Rich formatting (use_rich=True by default)
format_emotion_output(result, use_rich=True)


''

## 5. Access Raw Prediction Data


In [5]:
# Get top emotions and sentiment
top_emotions = result.get_top_emotions()
sentiment_group = result.get_sentiment_group()

# Display using Rich tables
# Top 3 emotions table
table1 = Table(title="Top 3 Emotions", show_header=True, header_style="bold green")
table1.add_column("Emotion", style="cyan")
table1.add_column("Score", style="magenta", justify="right")
table1.add_column("Percentage", style="yellow", justify="right")
table1.add_column("Confidence Bar", style="green")

for emotion, score in top_emotions:
    bar_length = int(score * 30)
    bar = "█" * bar_length + "░" * (30 - bar_length)
    table1.add_row(emotion, f"{score:.4f}", f"{score*100:.2f}%", bar)

console.print(table1)

# Sentiment group
sentiment_colors = {"positive": "green", "negative": "red", "ambiguous": "yellow", "neutral": "white"}
color = sentiment_colors.get(sentiment_group, "white")
console.print(f"\n[bold]Sentiment Group:[/bold] [{color}]{sentiment_group.upper()}[/{color}]\n")

# All emotion scores (top 10)
table2 = Table(title="All Emotion Scores (Top 10)", show_header=True, header_style="bold blue")
table2.add_column("Rank", style="dim", width=6)
table2.add_column("Emotion", style="cyan")
table2.add_column("Score", style="magenta", justify="right")

sorted_emotions = sorted(result.emotions.items(), key=lambda x: x[1], reverse=True)
for rank, (emotion, score) in enumerate(sorted_emotions[:10], 1):
    table2.add_row(str(rank), emotion, f"{score:.4f}")

console.print(table2)


## 6. English Text Examples


In [6]:
# Process English samples
english_samples = get_samples_by_language("english")

print_header("ENGLISH SAMPLES", f"Analyzing {len(english_samples)} English texts")

for idx, sample in enumerate(english_samples, 1):
    console.print(f"\n[bold cyan][Sample {idx}][/bold cyan]")
    result = classifier.analyze(sample["text"], top_k=3)
    format_emotion_output(result, use_rich=True)
    console.print(f"[dim]Expected emotion: {sample['expected_emotion']}[/dim]")


## 7. Dutch Text Examples

Test the classifier with Dutch language samples.


In [7]:
# Process Dutch samples
dutch_samples = get_samples_by_language("dutch")

print_header("DUTCH SAMPLES", f"Analyzing {len(dutch_samples)} Dutch texts")

for idx, sample in enumerate(dutch_samples, 1):
    console.print(f"\n[bold cyan][Sample {idx}][/bold cyan]")
    result = classifier.analyze(sample["text"], top_k=3)
    format_emotion_output(result, use_rich=True)
    console.print(f"[dim]Translation: {sample['translation']}[/dim]")
    console.print(f"[dim]Expected emotion: {sample['expected_emotion']}[/dim]")


## 8. Persian Text Examples

Test the classifier with Persian (Farsi) language samples.


In [8]:
# Process Persian samples
persian_samples = get_samples_by_language("persian")

print_header("PERSIAN SAMPLES", f"Analyzing {len(persian_samples)} Persian texts")

for idx, sample in enumerate(persian_samples, 1):
    console.print(f"\n[bold cyan][Sample {idx}][/bold cyan]")
    result = classifier.analyze(sample["text"], top_k=3)
    format_emotion_output(result, use_rich=True)
    console.print(f"[dim]Translation: {sample['translation']}[/dim]")
    console.print(f"[dim]Expected emotion: {sample['expected_emotion']}[/dim]")


## 9. Interactive: Classify Your Own Text

Try classifying your own text in any language!


In [9]:
# Enter your own text here
custom_text = "I can't believe this happened! What a surprise!"

# Classify with top 5 emotions
result = classifier.analyze(custom_text, top_k=5)
format_emotion_output(result, use_rich=True)


''

## 10. Batch Classification

Classify multiple texts at once.


In [10]:
# Batch classification example
texts = [
    "Thank you so much for your help!",
    "This is absolutely disgusting.",
    "I'm confused about what happened.",
    "Wow, I didn't expect that!",
]

results = classifier.analyze_batch(texts, top_k=3)

print_header("Batch Classification Results", f"Analyzing {len(texts)} texts")

for idx, result in enumerate(results, 1):
    console.print(f"\n[bold cyan][Text {idx}][/bold cyan]")
    format_emotion_output(result, use_rich=True)


## 11. Visualize Emotion Distribution

Create a simple visualization of emotion scores.


In [11]:
# Visualize emotion distribution for a text
text = "I'm so proud of what we accomplished together!"
result = classifier.analyze(text, top_k=10)

# Display with Rich
console.print(Panel(text, title="Text", border_style="cyan"))

# Create Rich table for top 10 emotions
table = Table(title="Top 10 Emotion Scores", show_header=True, header_style="bold green")
table.add_column("Emotion", style="cyan", width=15)
table.add_column("Confidence Bar", style="green", width=60)
table.add_column("Score", style="magenta", justify="right")

for emotion, score in result.get_top_emotions()[:10]:
    # Create a visual bar chart
    bar_length = int(score * 60)
    bar = "█" * bar_length + "░" * (60 - bar_length)
    percentage = score * 100
    table.add_row(emotion, bar, f"{percentage:5.2f}%")

console.print(table)


## 12. Compare Emotions Across Languages

Compare how similar texts in different languages are classified.


In [12]:
# Similar texts in different languages
multilingual_texts = [
    {"lang": "English", "text": "I'm so happy about this great news!"},
    {"lang": "Dutch", "text": "Ik ben zo blij met dit geweldige nieuws!"},
    {"lang": "Persian", "text": "من از این خبر عالی خیلی خوشحالم!"},
]

print_header("Comparing Similar Texts Across Languages", "Same meaning, different languages")

# Analyze all texts first
results = []
for item in multilingual_texts:
    result = classifier.analyze(item["text"], top_k=3)
    results.append((item, result))

# Create comparison table
comparison_table = Table(
    title="Multilingual Emotion Analysis Comparison",
    show_header=True,
    header_style="bold cyan",
    border_style="blue"
)
comparison_table.add_column("Language", style="yellow", width=12)
comparison_table.add_column("Text", style="white", width=40)
comparison_table.add_column("Sentiment", style="magenta", width=12, justify="center")
comparison_table.add_column("Top Emotion", style="green", width=15)
comparison_table.add_column("Score", style="cyan", justify="right", width=8)

for item, result in results:
    sentiment = result.get_sentiment_group()
    top_emotion, top_score = result.get_top_emotions()[0]

    # Color-coded sentiment
    sentiment_colors = {"positive": "green", "negative": "red", "ambiguous": "yellow", "neutral": "white"}
    color = sentiment_colors.get(sentiment, "white")
    sentiment_display = f"[{color}]{sentiment.upper()}[/{color}]"

    comparison_table.add_row(
        item["lang"],
        item["text"],
        sentiment_display,
        top_emotion,
        f"{top_score:.3f}"
    )

console.print(comparison_table)

# Detailed breakdown for each language
console.print()
for item, result in results:
    # Create panel for each language
    sentiment = result.get_sentiment_group()
    sentiment_colors = {"positive": "green", "negative": "red", "ambiguous": "yellow", "neutral": "white"}
    color = sentiment_colors.get(sentiment, "white")

    # Create emotion breakdown table
    emotion_table = Table(show_header=False, box=None, padding=(0, 1))
    emotion_table.add_column("Emotion", style="cyan", width=15)
    emotion_table.add_column("Bar", style="green", width=25)
    emotion_table.add_column("Score", style="magenta", justify="right", width=8)

    for emotion, score in result.get_top_emotions():
        bar_length = int(score * 25)
        bar = "█" * bar_length + "░" * (25 - bar_length)
        emotion_table.add_row(emotion, bar, f"{score:.3f}")

    # Create panel with all info
    panel_content = Text()
    panel_content.append(f"Text: ", style="bold")
    panel_content.append(f"{item['text']}\n\n", style="italic")
    panel_content.append(f"Sentiment: ", style="bold")
    panel_content.append(f"{sentiment.upper()}\n\n", style=color)

    panel_group = Group(panel_content, emotion_table)

    panel = Panel(
        panel_group,
        title=f"[bold cyan]{item['lang']}[/bold cyan]",
        border_style=color,
        expand=False
    )
    console.print(panel)


## 13. Emotion Statistics

Analyze emotion distribution across multiple texts.


In [13]:
# Collect all sample texts
all_texts = []
for lang in ["english", "dutch", "persian"]:
    samples = get_samples_by_language(lang)
    all_texts.extend([s["text"] for s in samples])

# Classify all texts and count top emotions
top_emotion_counts = Counter()
sentiment_counts = Counter()

for text in all_texts:
    result = classifier.analyze(text, top_k=1)
    top_emotion = result.get_top_emotions()[0][0]
    sentiment = result.get_sentiment_group()

    top_emotion_counts[top_emotion] += 1
    sentiment_counts[sentiment] += 1

print_header("Emotion Statistics Across All Sample Texts", f"Analyzed {len(all_texts)} texts")

print_info(f"Total texts analyzed: {len(all_texts)}")

# Top Emotions table
table1 = Table(title="Top Emotions Detected", show_header=True, header_style="bold cyan")
table1.add_column("Emotion", style="yellow")
table1.add_column("Count", style="magenta", justify="right")
table1.add_column("Percentage", style="green", justify="right")
table1.add_column("Bar", style="cyan")

for emotion, count in top_emotion_counts.most_common():
    percentage = (count / len(all_texts)) * 100
    bar_length = int(percentage / 100 * 30)
    bar = "█" * bar_length + "░" * (30 - bar_length)
    table1.add_row(emotion, str(count), f"{percentage:5.1f}%", bar)

console.print(table1)

# Sentiment Distribution table
table2 = Table(title="Sentiment Distribution", show_header=True, header_style="bold cyan")
table2.add_column("Sentiment", style="yellow")
table2.add_column("Count", style="magenta", justify="right")
table2.add_column("Percentage", style="green", justify="right")
table2.add_column("Bar", style="cyan")

for sentiment, count in sentiment_counts.most_common():
    percentage = (count / len(all_texts)) * 100
    bar_length = int(percentage / 100 * 30)
    bar = "█" * bar_length + "░" * (30 - bar_length)
    # Color code by sentiment
    if sentiment == "positive":
        sentiment_display = f"[green]{sentiment.upper()}[/green]"
    elif sentiment == "negative":
        sentiment_display = f"[red]{sentiment.upper()}[/red]"
    elif sentiment == "ambiguous":
        sentiment_display = f"[yellow]{sentiment.upper()}[/yellow]"
    else:
        sentiment_display = sentiment.upper()
    table2.add_row(sentiment_display, str(count), f"{percentage:5.1f}%", bar)

console.print(table2)


## 14. Custom Examples - Try Your Own!

Use this cell to experiment with your own texts.


In [14]:
# Add your own texts here
my_texts = [
    "Your text here",
    # Add more texts...
]

for text in my_texts:
    if text != "Your text here":  # Skip placeholder
        result = classifier.analyze(text, top_k=3)
        format_emotion_output(result, use_rich=True)
        console.print()


## 15. Enhanced Analysis: Emotion + Linguistics

Now let's explore the enhanced classifier that combines emotion detection with linguistic analysis for academic research.


In [15]:
# Initialize enhanced analyzer
# Using English with GoEmotions model (default)
enhanced_classifier = EnhancedAnalyzer(language="english", model_key="goemotions")
enhanced_classifier.load_model()

print_success("Enhanced analyzer loaded!")


## 16. Linguistic Dimensions

The enhanced classifier analyzes four key linguistic dimensions:

1. **Formality**: formal, colloquial, neutral
2. **Tone**: friendly, rough, serious, kind, neutral
3. **Intensity**: high, medium, low emotional intensity
4. **Communication Style**: direct, indirect, assertive, passive


In [16]:
# Example 1: Formal, serious text
text1 = "I hereby formally request your assistance with this matter. Your prompt attention would be greatly appreciated."

result1 = enhanced_classifier.analyze(text1, top_k=3)
format_enhanced_output(result1, use_rich=True)


''

In [17]:
# Example 2: Colloquial, friendly text
text2 = "Hey! Thanks so much for helping me out, you're awesome! Really appreciate it!"

result2 = enhanced_classifier.analyze(text2, top_k=3)
format_enhanced_output(result2, use_rich=True)


''

In [18]:
# Example 3: High intensity, rough tone
text3 = "This is absolutely unacceptable! I demand an explanation immediately!"

result3 = enhanced_classifier.analyze(text3, top_k=3)
format_enhanced_output(result3, use_rich=True)


''

In [19]:
# Example 4: Passive, kind tone
text4 = "I'm terribly sorry to bother you, but if possible, could you perhaps help me?"

result4 = enhanced_classifier.analyze(text4, top_k=3)
format_enhanced_output(result4, use_rich=True)


''

## 17. Academic Export Format

Export analysis results in a structured format suitable for research and data analysis.


In [20]:
# Export analysis to academic format
academic_data = export_to_academic_format(result1)

print_header("Academic Export Format", "Structured data for research")

# Display as Rich table
table = Table(show_header=True, header_style="bold cyan")
table.add_column("Field", style="yellow", width=30)
table.add_column("Value", style="white")

for key, value in academic_data.items():
    if isinstance(value, float):
        table.add_row(key, f"{value:.4f}")
    else:
        table.add_row(key, str(value))

console.print(table)


## 18. Comparative Analysis: Formal vs. Colloquial

Compare how formality affects emotion detection.


In [21]:
# Same sentiment, different formality
formal_text = "I am extremely grateful for your assistance in this matter."
colloquial_text = "Thanks so much! You're awesome, really appreciate it!"

formal_result = enhanced_classifier.analyze(formal_text, top_k=3)
colloquial_result = enhanced_classifier.analyze(colloquial_text, top_k=3)

print_section("FORMAL VERSION")
format_enhanced_output(formal_result, use_rich=True)

console.print()
print_section("COLLOQUIAL VERSION")
format_enhanced_output(colloquial_result, use_rich=True)


''

## 19. Summary and Next Steps

### What We've Learned:

1. **GoEmotions Taxonomy**: 28 fine-grained emotion categories
2. **Multilingual Support**: Works with English, Dutch, Persian, and more
3. **Linguistic Dimensions**: Formality, tone, intensity, communication style
4. **Academic Applications**: Export-ready format for research
5. **Comprehensive Analysis**: Emotion + linguistics in one tool

### Academic Research Applications:

- Sentiment analysis with linguistic context
- Formality detection in multilingual corpora
- Tone and style analysis for discourse studies
- Emotion intensity measurement
- Communication pattern identification

### Next Steps:

- Fine-tune on multilingual models for better non-English performance
- Integrate with your own applications
- Analyze emotion patterns in your text data
- Build emotion-aware chatbots or content moderation systems
- Conduct linguistic research with structured data export

### Resources:

- [GoEmotions Research Blog](https://research.google/blog/goemotions-a-dataset-for-fine-grained-emotion-classification/)
- [GoEmotions GitHub](https://github.com/google-research/google-research/tree/master/goemotions)
- [HuggingFace Model](https://huggingface.co/monologg/bert-base-cased-goemotions-original)

### Project Files:

- `emotion_classifier.py` - Core emotion implementation
- `linguistic_analyzer.py` - Linguistic dimension analyzer
- `enhanced_classifier.py` - Combined emotion + linguistics
- `sample_texts.py` - Sample texts in 3 languages
- `main.py` - Basic demo
- `demo_enhanced.py` - Enhanced demo
- `classify_text.py` - Basic CLI utility
- `classify_enhanced.py` - Enhanced CLI utility
- `README.md` - Full documentation
