# QWEN-3 Multi-Task Embeddings Tutorial

This notebook is a **tutorial** demonstrating how task-specific instructions affect embedding models.

## What This Tutorial Shows

We explore how the same 800 documents from the 20 Newsgroups dataset are embedded using QWEN-3-Embedding-0.6B with **four different task instructions**:

1. **Default** - No instruction (general-purpose embeddings)
2. **Sentiment** - Optimized for sentiment classification
3. **Topic** - Optimized for topic identification  
4. **Toxicity** - Optimized for toxicity detection

The data has been pre-generated using `scripts/generate_data.py` and is ready to visualize.

## Live Demo

ðŸ‘‰ **[View the interactive web demo](https://yourusername.github.io/newsgroups-qwen-embed/)**

The web app lets you switch between tasks and see how the embedding space changes.

In [None]:
import json
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## 1. Understanding the Data

The data was generated using the following process:

1. **Load Data**: 800 documents from 10 newsgroup categories (80 each)
2. **Embed with Tasks**: Each document embedded 4 times with different instructions
3. **Reduce Dimensions**: UMAP applied to each embedding set (1024D â†’ 2D)
4. **Save as JSON**: All coordinates saved to `docs/data.json`

### The Four Task Instructions

| Task | Instruction |
|------|------------|
| **Default** | *(none)* - General-purpose embeddings |
| **Sentiment** | "Classify the sentiment of the given text as positive, negative, or neutral" |
| **Topic** | "Identify the topic or theme of the given text" |
| **Toxicity** | "Classify the given text as either toxic or not toxic" |

### The Categories

The 10 newsgroup categories cover diverse topics:

- **Religious/Political**: alt.atheism, talk.religion.misc, talk.politics.guns, talk.politics.mideast
- **Scientific/Technical**: sci.med, sci.space, sci.crypt, comp.graphics  
- **Recreation**: rec.sport.baseball, rec.autos

In [None]:
# Load the pre-generated data
with open('docs/data.json', 'r') as f:
    data = json.load(f)

# Convert to DataFrame for easier manipulation
df = pd.DataFrame(data)

print(f"Loaded {len(df)} documents")
print(f"\nColumns: {df.columns.tolist()}")
print(f"\nCategories: {df['category'].unique()}")
print(f"\nCategory distribution:")
print(df['category'].value_counts())

In [None]:
# Preview a sample document
sample = df.iloc[0]

print("Sample Document:")
print(f"Category: {sample['category']}")
print(f"\nText preview:\n{sample['text_preview']}")
print(f"\nCoordinates:")
print(f"  Default: ({sample['default_x']:.3f}, {sample['default_y']:.3f})")
print(f"  Sentiment: ({sample['sentiment_x']:.3f}, {sample['sentiment_y']:.3f})")
print(f"  Topic: ({sample['topic_x']:.3f}, {sample['topic_y']:.3f})")
print(f"  Toxicity: ({sample['toxicity_x']:.3f}, {sample['toxicity_y']:.3f})")

## 2. Visualizing Task-Specific Embeddings

Now let's visualize how the different task instructions affect the embedding space.

Each point represents a document, colored by its newsgroup category. The positions show how QWEN-3 organizes documents based on the task instruction.

In [None]:
# Create 4-panel comparison visualization

# Task titles for display
task_titles = {
    'default': 'Default (No Instruction)',
    'sentiment': 'Sentiment Task',
    'topic': 'Topic Classification Task',
    'toxicity': 'Toxicity Detection Task'
}

fig = make_subplots(
    rows=1, cols=4,
    subplot_titles=[task_titles[task] for task in ['default', 'sentiment', 'topic', 'toxicity']],
    horizontal_spacing=0.05
)

# Create a plot for each task
for col_idx, task_name in enumerate(['default', 'sentiment', 'topic', 'toxicity'], start=1):
    for category in df['category'].unique():
        mask = df['category'] == category
        fig.add_trace(
            go.Scatter(
                x=df[mask][f'{task_name}_x'],
                y=df[mask][f'{task_name}_y'],
                mode='markers',
                name=category,
                marker=dict(
                    size=6,
                    opacity=0.7,
                    line=dict(width=0.5, color='white')
                ),
                text=df[mask]['text_preview'],
                hovertemplate='<b>%{fullData.name}</b><br><br>%{text}<br><extra></extra>',
                legendgroup=category,
                showlegend=(col_idx == 1)  # Only show legend for first plot
            ),
            row=1, col=col_idx
        )

# Update layout
fig.update_layout(
    title_text='QWEN-3 Multi-Task Embedding Comparison (UMAP Projection)',
    title_x=0.5,
    height=600,
    width=2400,
    hovermode='closest',
    legend=dict(
        yanchor="top",
        y=0.99,
        xanchor="left",
        x=1.01,
        title="Newsgroup Category"
    )
)

# Update axes labels
for col_idx in range(1, 5):
    fig.update_xaxes(title_text="UMAP Dimension 1", row=1, col=col_idx)
    fig.update_yaxes(title_text="UMAP Dimension 2", row=1, col=col_idx)

fig.show()

In [None]:
## 3. Understanding the Results

### What to Look For

When comparing the four plots above, observe:

1. **Cluster Quality**: Which task creates the clearest separation between newsgroup categories?

2. **Task Alignment**: The **Topic task** should produce the best category separation since identifying topics aligns naturally with classifying newsgroups.

3. **Semantic Reorganization**: Notice how the same documents occupy different positions across tasks. The embedding space is literally "reshaped" by the instruction.

4. **Cross-Category Patterns**: 
   - **Sentiment**: May group texts by emotional tone rather than subject matter
   - **Toxicity**: Political/religious groups might show different patterns than technical/hobby groups
   - **Default**: Provides balanced, general-purpose organization

### Key Insight

**Task instructions are powerful!** The same model, same documents, but different instructions create fundamentally different embedding spaces. Choose your instruction based on your downstream task.

### Why This Matters

When building applications with embeddings:
- **Search/Retrieval**: Use task-specific instructions matching your search intent
- **Classification**: Align the instruction with your classification goal
- **Clustering**: Consider what dimension you want documents grouped by
- **Similarity**: Define "similar" via the task instruction

### Technical Details

- **Model**: QWEN-3-Embedding-0.6B (1024 dimensions)
- **Reduction**: UMAP (n_neighbors=15, min_dist=0.1, random_state=42)
- **Documents**: 800 (80 per category, stratified sampling)
- **API**: SiliconFlow
- **Format**: `"Instruct: {instruction}\nQuery: {text}"` (for non-default tasks)

## 4. Try It Yourself

### Option A: Explore the Web Demo

The interactive web app (`docs/index.html`) lets you:
- Click buttons to switch between tasks
- Hover over points to read document previews
- See task instructions and explanations

### Option B: Generate New Data

To regenerate the embeddings or try different tasks:

```bash
# Edit scripts/generate_data.py to modify tasks or parameters
poetry run python scripts/generate_data.py
```

You can experiment with:
- Different task instructions (see `task_prompts.json` for examples)
- More/fewer documents
- Different UMAP parameters
- Additional newsgroup categories

### Option C: Analyze Specific Tasks

Create individual plots to dive deeper into one task:

In [None]:
# Example: Focus on the Topic task
fig_topic = px.scatter(
    df,
    x='topic_x',
    y='topic_y',
    color='category',
    hover_data={'text_preview': True, 'topic_x': False, 'topic_y': False, 'category': False},
    title='Topic Classification Task - UMAP Projection',
    width=1000,
    height=700
)

fig_topic.update_traces(
    marker=dict(size=8, opacity=0.7, line=dict(width=0.5, color='white')),
    hovertemplate='<b>%{fullData.name}</b><br><br>%{customdata[0]}<br><extra></extra>'
)

fig_topic.update_layout(
    hovermode='closest',
    xaxis_title="UMAP Dimension 1",
    yaxis_title="UMAP Dimension 2"
)

fig_topic.show()

In [None]:
## 5. Further Exploration

### Questions to Investigate

1. **Which categories are most similar?**
   - Look for overlapping clusters in the topic task
   - Are religious and political groups close together?
   - Do scientific categories form a distinct region?

2. **How does sentiment vary by category?**
   - Switch to the sentiment plot
   - Do political newsgroups show more extreme sentiment than technical ones?
   - Are hobby groups (baseball, autos) more positive?

3. **Toxicity patterns across topics**
   - Political discussions (guns, mideast) vs technical discussions (graphics, crypto)
   - Does toxicity correlate with sentiment, or are they independent?

4. **Default vs Task-Specific**
   - How much does the topic instruction help compared to default?
   - When would you prefer default embeddings over task-specific ones?

### Next Steps

- **Try other QWEN tasks**: emotion classification, semantic similarity, query-document matching
- **Compare models**: How does QWEN-3-0.6B compare to the 1.5B model?
- **Build applications**: Use task instructions in your own search, recommendation, or classification systems
- **Explore the code**: Check `qwen_embedder.py` for the async batching implementation

### Resources

- [QWEN-3 Embedding Documentation](./QWEN-EMBED.md)
- [Task Prompt Examples](./task_prompts.json)
- [Official QWEN Repo](https://github.com/QwenLM/Qwen3-Embedding)
- [SiliconFlow API](https://siliconflow.com)
- [20 Newsgroups Dataset](https://scikit-learn.org/stable/datasets/real_world.html#newsgroups-dataset)

## 4. Multi-Task Visualization Comparison

Create 4-panel visualization comparing how different task instructions affect the embedding space.

## 5. Individual Task Plots for Detailed Exploration

In [None]:
# Sentiment task embeddings plot
fig_sentiment = px.scatter(
    df,
    x='sentiment_x',
    y='sentiment_y',
    color='category',
    hover_data={'text_preview': True, 'sentiment_x': False, 'sentiment_y': False, 'category': False},
    title='Sentiment Task Embeddings - UMAP Projection',
    width=900,
    height=700,
    color_discrete_map=color_map
)

fig_sentiment.update_traces(
    marker=dict(size=8, opacity=0.7, line=dict(width=0.5, color='white')),
    hovertemplate='<b>%{fullData.name}</b><br><br>%{customdata[0]}<br><extra></extra>'
)

fig_sentiment.update_layout(hovermode='closest')
fig_sentiment.show()

In [None]:
# Toxicity task embeddings plot
fig_toxicity = px.scatter(
    df,
    x='toxicity_x',
    y='toxicity_y',
    color='category',
    hover_data={'text_preview': True, 'toxicity_x': False, 'toxicity_y': False, 'category': False},
    title='Toxicity Detection Task Embeddings - UMAP Projection',
    width=900,
    height=700,
    color_discrete_map=color_map
)

fig_toxicity.update_traces(
    marker=dict(size=8, opacity=0.7, line=dict(width=0.5, color='white')),
    hovertemplate='<b>%{fullData.name}</b><br><br>%{customdata[0]}<br><extra></extra>'
)

fig_toxicity.update_layout(hovermode='closest')
fig_toxicity.show()

In [None]:
# Save interactive HTML files
viz_dir = Path('visualizations')
viz_dir.mkdir(exist_ok=True)

# Save comparison and individual plots
fig.write_html(viz_dir / 'multitask_comparison.html')
fig_default.write_html(viz_dir / 'default_embeddings.html')
fig_sentiment.write_html(viz_dir / 'sentiment_embeddings.html')
fig_topic.write_html(viz_dir / 'topic_embeddings.html')
fig_toxicity.write_html(viz_dir / 'toxicity_embeddings.html')

print("Saved visualizations to:", viz_dir.absolute())
print("\nFiles created:")
print("  - multitask_comparison.html (4-panel comparison)")
print("  - default_embeddings.html")
print("  - sentiment_embeddings.html")
print("  - topic_embeddings.html")
print("  - toxicity_embeddings.html")