# Teacher Annotation Results - Analysis

Explore and analyze reasoning traces generated by **Qwen2.5-72B-Instruct** teacher model.

**Dataset**: Conversational COVID-19 tweets (news headlines already filtered out)

**Five-Task Reasoning Chain**:
- **R1**: Syntactic Parsing (dependency analysis)
- **R2**: Aspect Extraction (COVID-19 related aspects)
- **R3**: Opinion Extraction (opinion expressions)
- **R4**: Sentiment Classification (positive/negative/neutral)
- **R5**: Emotion Classification (fear, anger, joy, etc.)

## Setup

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, Markdown
from pathlib import Path
import glob

sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
pd.set_option('display.max_colwidth', 150)

## Load Annotated Data

In [None]:
# Find the latest conversational annotation file
annotation_files = glob.glob("data/COVIDSenti/COVIDSenti_conversational_annotated_*.csv")

if not annotation_files:
    print("[ERROR] No annotation files found!")
    print("\nRun annotation first:")
    print("  sbatch scripts/run_annotation.sh")
else:
    # Sort by file size (larger = more samples) to get latest
    annotation_files.sort(key=lambda x: Path(x).stat().st_size, reverse=True)
    file_path = annotation_files[0]
    
    df = pd.read_csv(file_path)
    print(f"Loaded: {file_path}")
    print(f"\nConversational tweets annotated: {len(df):,}")
    print(f"Columns: {list(df.columns)}")

## Dataset Overview

In [None]:
print("=" * 60)
print("DATASET SUMMARY")
print("=" * 60)
print(f"Total samples: {len(df):,}")
print(f"\nSentiment distribution:")
print(df['label'].value_counts())
print(f"\nSample tweet:")
print(f"  {df.iloc[0]['tweet'][:100]}...")

## Sentiment Distribution

In [None]:
# Prepare visualization
label_counts = df['label'].value_counts()
label_names = {'neu': 'Neutral', 'neg': 'Negative', 'pos': 'Positive'}
label_colors = {'neu': '#95a5a6', 'neg': '#e74c3c', 'pos': '#2ecc71'}

display_labels = [label_names.get(lbl, lbl) for lbl in label_counts.index]
colors = [label_colors.get(lbl, '#3498db') for lbl in label_counts.index]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Bar chart
label_counts.plot(kind='bar', ax=ax1, color=colors)
ax1.set_title('Sentiment Distribution (Conversational Tweets)', fontsize=14, fontweight='bold')
ax1.set_xlabel('Sentiment', fontsize=12)
ax1.set_ylabel('Count', fontsize=12)
ax1.set_xticklabels(display_labels, rotation=0)
ax1.grid(axis='y', alpha=0.3)

# Pie chart
label_counts.plot(kind='pie', ax=ax2, autopct='%1.1f%%', colors=colors, 
                   labels=display_labels, startangle=90)
ax2.set_ylabel('')
ax2.set_title('Proportion', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

## Browse Annotations

Explore individual reasoning traces for each tweet.

In [None]:
def display_annotation(idx):
    """Display a formatted annotation for a given index."""
    if idx >= len(df):
        print(f"Index {idx} out of range (max: {len(df)-1})")
        return
    
    row = df.iloc[idx]
    
    output = f"""
# Example {idx + 1} / {len(df)}

## Tweet
**"{row['tweet']}"**

**Ground Truth Label:** `{row['label'].upper()}`

---

## R1: Syntactic Parsing
{row['r1_syntactic']}

---

## R2: Aspect Extraction
{row['r2_aspects']}

---

## R3: Opinion Extraction
{row['r3_opinion']}

---

## R4: Sentiment Classification
{row['r4_sentiment']}

---

## R5: Emotion Classification
{row['r5_emotion']}
"""
    display(Markdown(output))

# Display first example
display_annotation(0)

## Random Example

In [None]:
import random
random_idx = random.randint(0, len(df)-1)
print(f"Random example (index {random_idx}):")
display_annotation(random_idx)

## Filter by Sentiment

Browse examples by specific sentiment.

In [None]:
# Select sentiment to explore
sentiment = 'neg'  # Change to 'pos', 'neg', or 'neu'

sentiment_df = df[df['label'] == sentiment]
print(f"Found {len(sentiment_df):,} tweets with '{sentiment.upper()}' sentiment")
print(f"Showing first example:\n")

if len(sentiment_df) > 0:
    # Get the actual index in the original dataframe
    original_idx = sentiment_df.index[0]
    display_annotation(original_idx)
else:
    print(f"No examples found with sentiment '{sentiment}'")

## Reasoning Trace Statistics

Analyze the length and characteristics of reasoning traces.

In [None]:
reasoning_cols = ['r1_syntactic', 'r2_aspects', 'r3_opinion', 'r4_sentiment', 'r5_emotion']
task_names = ['R1: Syntactic', 'R2: Aspects', 'R3: Opinion', 'R4: Sentiment', 'R5: Emotion']

# Calculate trace lengths
lengths = []
for col in reasoning_cols:
    df[f'{col}_len'] = df[col].str.len()
    lengths.append(df[f'{col}_len'].mean())

print("Average reasoning trace lengths:")
print("=" * 50)
for name, col, length in zip(task_names, reasoning_cols, lengths):
    print(f"  {name:20s}: {length:6.0f} characters")

# Visualize
fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(task_names, lengths, color=['#3498db', '#e74c3c', '#2ecc71', '#f39c12', '#9b59b6'])
ax.set_title('Average Reasoning Trace Length by Task', fontsize=14, fontweight='bold')
ax.set_xlabel('Task', fontsize=12)
ax.set_ylabel('Characters', fontsize=12)
ax.grid(axis='y', alpha=0.3)
plt.xticks(rotation=15, ha='right')
plt.tight_layout()
plt.show()

## Export Samples

Export a subset for manual inspection or presentation.

In [None]:
# Export 10 random examples for inspection
sample_df = df.sample(n=min(10, len(df)), random_state=42)
output_path = "data/COVIDSenti/sample_annotations_for_review.csv"
sample_df.to_csv(output_path, index=False)
print(f"Exported {len(sample_df)} sample annotations to:")
print(f"  {output_path}")

## Next Steps

### Generate More Annotations

To annotate more conversational tweets:

1. **Edit configuration** in `scripts/annotate.py`:
   ```python
   N_SAMPLES = 5000  # Your desired batch size
   ```

2. **Run pre-flight check**:
   ```bash
   python3 scripts/preflight_check.py
   ```

3. **Submit job**:
   ```bash
   sbatch scripts/run_annotation.sh
   ```

4. **Monitor**:
   ```bash
   squeue -u $USER
   tail -f annotation_conversational_*.log
   ```

### Next Phase: Student Model Training

Once you have sufficient annotations (recommended: 5,000+), proceed to:
- Train LLaMA-3-8B student model using these reasoning traces
- Use LoRA for parameter-efficient fine-tuning
- Evaluate on held-out test set

See `modeling/` directory for training scripts.