# Interactive Token Contribution Explainer

This notebook provides a simple, interactive way to analyze why a language model makes a certain classification decision. It uses the `token_explainer.py` tool to visualize the contribution of each word, phrase, or subword in a given text.

### How to Use:
1.  **Install Dependencies:** Run the first cell to ensure all required packages are installed.
2.  **Configure Inputs:** Modify the variables in the "Settings" cell below. You can change the text, the expected label, the prompt, and the masking strategy.
3.  **Run All Cells:** Click `Run All` to execute the analysis from top to bottom.
4.  **Analyze Results:**
    *   The **DataFrame** will show the detailed metrics for each masked token, sorted by the most influential (`net_effect`).
    *   The **Visual Report** at the bottom will display the text with highlights. Green tokens helped the model get closer to the correct answer, while red tokens pushed it toward an incorrect one.

In [2]:
import pandas as pd
from IPython.display import display, HTML
import sys
from pathlib import Path

# Try to import the explainer module with better error handling
try:
    # Try direct import first
    from token_explainer import (
        get_logprobs_cached,
        token_masking_analysis,
        generate_html_report,
        Tokenizer
    )
    print("✓ Successfully imported token_explainer module")
except ImportError:
    try:
        # Try importing from part2 subdirectory
        from part2.token_explainer import (
            get_logprobs_cached,
            token_masking_analysis,
            generate_html_report,
            Tokenizer
        )
        print("✓ Successfully imported token_explainer module from part2/")
    except ImportError as e:
        print("ERROR: Could not import token_explainer module.")
        print(f"Error details: {e}")
        print("\nPlease ensure that 'token_explainer.py' is in one of these locations:")
        print("  1. Same directory as this notebook")
        print("  2. In a 'part2' subdirectory")
        print("\nCurrent working directory:", Path.cwd())
        raise

# Widen column display for the DataFrame
pd.set_option('display.max_colwidth', 200)
pd.set_option('display.width', None)

✓ Successfully imported token_explainer module from part2/


## 1. Settings
Modify the variables in this cell to configure your experiment.

In [3]:
# --- User Inputs ---
text_to_analyze = "os chefes de defesa da estónia, letónia, lituânia, alemanha, itália, espanha e eslováquia assinarão"
true_label = "pt"
prompt_template = """Given the text below, determine the single most appropriate ISO 639-1 language code.
Answer with the code only.

Text: {text}

Language code:"""

# --- Strategy Settings ---
# Options: 'word', 'phrase', or 'subword'
strategy = 'word'
phrase_size = 2  # Only used for 'phrase' strategy

# --- Model Parameters ---
logprob_kwargs = {
    "provider": 'ollama',
    "model_id": 'llama3:8b',
    "top_logprobs": 5,
    "temperature": 0.0,
    "invert_log": False
}

print("✓ Settings configured")
print(f"  - Text length: {len(text_to_analyze)} characters")
print(f"  - True label: '{true_label}'")
print(f"  - Strategy: '{strategy}'")
print(f"  - Model: {logprob_kwargs['model_id']}")

✓ Settings configured
  - Text length: 99 characters
  - True label: 'pt'
  - Strategy: 'word'
  - Model: llama3:8b


## 2. Run Analysis
This cell executes the main analysis logic based on the settings above.

In [4]:
print("="*60)
print("Starting Analysis")
print("="*60)

try:
    # a. Get initial prediction
    print("\n[Step 1/4] Getting initial prediction...")
    initial_prompt = prompt_template.format(text=text_to_analyze)
    initial_res = get_logprobs_cached(prompt=initial_prompt, **logprob_kwargs)
    unmasked_pred = initial_res.response_text.strip()
    
    print(f"  ✓ Initial Prediction: '{unmasked_pred}'")
    print(f"  ✓ True Label: '{true_label}'")
    
    if unmasked_pred == true_label:
        print("  ✓ Model prediction is CORRECT")
    else:
        print("  ✗ Model prediction is INCORRECT")
    
    # b. Prepare a DataFrame for the analysis function
    print("\n[Step 2/4] Preparing data...")
    input_df = pd.DataFrame([{
        "text": text_to_analyze,
        "label": true_label,
        "preds": unmasked_pred
    }])
    print("  ✓ Data prepared")
    
    # c. Load tokenizer if needed for 'subword' strategy
    tokenizer = None
    if strategy == 'subword':
        print("\n[Step 3/4] Loading 'subword' tokenizer...")
        try:
            tokenizer = Tokenizer.from_pretrained("bert-base-uncased")
            print("  ✓ Tokenizer loaded successfully")
        except Exception as e:
            print(f"  ✗ ERROR: Could not load tokenizer: {e}")
            print("  Falling back to 'word' strategy")
            strategy = 'word'
            tokenizer = None
    else:
        print("\n[Step 3/4] Tokenizer not needed for this strategy")
    
    # d. Run the main masking analysis
    print(f"\n[Step 4/4] Running '{strategy}' masking analysis...")
    if strategy == 'phrase':
        print(f"  - Phrase size: {phrase_size}")
    
    analysis_df = token_masking_analysis(
        df=input_df,
        text_col='text',
        label_col='label',
        pred_col='preds',
        prompt_template=prompt_template,
        logprob_kwargs=logprob_kwargs,
        strategy=strategy,
        tokenizer=tokenizer,
        phrase_size=phrase_size
    )
    
    if analysis_df.empty:
        print("  ✗ WARNING: Analysis returned no results")
    else:
        print(f"  ✓ Analysis complete! Analyzed {len(analysis_df)} tokens/phrases")
    
    print("\n" + "="*60)
    print("Analysis Complete")
    print("="*60)
    
except Exception as e:
    print(f"\n✗ ERROR during analysis: {e}")
    import traceback
    print("\nFull traceback:")
    traceback.print_exc()
    analysis_df = pd.DataFrame()  # Create empty DataFrame for later cells

Starting Analysis

[Step 1/4] Getting initial prediction...
  ✓ Initial Prediction: 'pt'
  ✓ True Label: 'pt'
  ✓ Model prediction is CORRECT

[Step 2/4] Preparing data...
  ✓ Data prepared

[Step 3/4] Tokenizer not needed for this strategy

[Step 4/4] Running 'word' masking analysis...


Analyzing with 'word' strategy: 100%|██████████| 1/1 [00:00<00:00, 65.48it/s]

  ✓ Analysis complete! Analyzed 14 tokens/phrases

Analysis Complete





## 3. Results DataFrame

The table below shows the detailed metrics for each token/phrase, sorted by `net_effect`. A high positive `net_effect` means that masking that token strongly helped the model get closer to the correct `true_label`.

In [5]:
if not analysis_df.empty:
    # Define key columns to display for clarity
    display_cols = [
        'masked_word',
        'net_effect',
        'support_label',
        'suppress_competitor',
        'entropy_change',
        'top_choice'
    ]
    
    # Filter to only columns that exist
    available_cols = [col for col in display_cols if col in analysis_df.columns]
    
    if not available_cols:
        print("WARNING: Expected columns not found in results.")
        print("Available columns:", list(analysis_df.columns))
        display(analysis_df)
    else:
        # Sort by net_effect to see the most influential tokens first
        if 'net_effect' in analysis_df.columns:
            sorted_df = analysis_df.sort_values('net_effect', ascending=False)
        else:
            sorted_df = analysis_df
        
        print(f"\nShowing {len(sorted_df)} tokens/phrases:")
        print("\nInterpretation:")
        print("  • Positive net_effect = token supports correct label")
        print("  • Negative net_effect = token misleads the model")
        print()
        display(sorted_df[available_cols])
else:
    print("⚠ Analysis DataFrame is empty. Cannot display results.")
    print("Please check the analysis step above for errors.")


Showing 14 tokens/phrases:

Interpretation:
  • Positive net_effect = token supports correct label
  • Negative net_effect = token misleads the model



Unnamed: 0,masked_word,net_effect,support_label,suppress_competitor,entropy_change,top_choice
4,da,4.342431,0.021843,4.320588,-0.085192,pt
2,de,0.033732,0.033732,,-0.146303,pt
1,chefes,0.031507,0.031507,,-0.133245,pt
3,defesa,0.025892,0.025892,,-0.105188,pt
5,"estónia,",0.000875,0.000875,,0.006581,pt
0,os,-1.105919,-0.061222,-1.044697,0.183672,pt
6,"letónia,",-1.65424,-0.112124,-1.542116,0.284353,pt
12,eslováquia,-1.86335,-0.134743,-1.728607,0.311726,pt
11,e,-2.195141,-0.208837,-1.986304,0.434047,pt
13,assinarão,-2.208427,-0.198319,-2.010108,0.406141,pt


## 4. Visual Report

This visualization applies the results from the table above directly to the text.
- **<span style='background-color: #c8ffc8;'>Green</span>** tokens contributed positively towards the correct label.
- **<span style='background-color: #ffc8c8;'>Red</span>** tokens pushed the model towards an incorrect label.

In [6]:
if not analysis_df.empty:
    try:
        # Generate the HTML report using the results
        html_report = generate_html_report(
            analysis_df, 
            strategy=strategy, 
            phrase_size=phrase_size
        )
        
        # Display the report directly in the notebook output
        print("\nVisual Token Contribution Report:")
        print("=" * 60)
        display(HTML(html_report))
    except Exception as e:
        print(f"✗ ERROR generating visual report: {e}")
        import traceback
        traceback.print_exc()
else:
    print("⚠ Analysis DataFrame is empty. Cannot display report.")
    print("Please check the analysis step above for errors.")


Visual Token Contribution Report:


## 5. Summary Statistics (Optional)
Additional insights from the analysis.

In [7]:
if not analysis_df.empty and 'net_effect' in analysis_df.columns:
    print("\nSummary Statistics:")
    print("=" * 60)
    
    # Count positive vs negative contributors
    positive = (analysis_df['net_effect'] > 0).sum()
    negative = (analysis_df['net_effect'] < 0).sum()
    neutral = (analysis_df['net_effect'] == 0).sum()
    
    print(f"Total tokens analyzed: {len(analysis_df)}")
    print(f"  • Positive contributors (green): {positive}")
    print(f"  • Negative contributors (red): {negative}")
    print(f"  • Neutral: {neutral}")
    
    if 'net_effect' in analysis_df.columns:
        print(f"\nNet effect range: [{analysis_df['net_effect'].min():.4f}, {analysis_df['net_effect'].max():.4f}]")
        print(f"Mean net effect: {analysis_df['net_effect'].mean():.4f}")
        
        # Show top contributors
        print("\nTop 3 most helpful tokens:")
        top_3 = analysis_df.nlargest(3, 'net_effect')[['masked_word', 'net_effect']]
        for idx, row in top_3.iterrows():
            print(f"  • '{row['masked_word']}': {row['net_effect']:.4f}")
        
        print("\nTop 3 most misleading tokens:")
        bottom_3 = analysis_df.nsmallest(3, 'net_effect')[['masked_word', 'net_effect']]
        for idx, row in bottom_3.iterrows():
            print(f"  • '{row['masked_word']}': {row['net_effect']:.4f}")
else:
    print("⚠ Cannot generate summary statistics - analysis data unavailable.")


Summary Statistics:
Total tokens analyzed: 14
  • Positive contributors (green): 5
  • Negative contributors (red): 9
  • Neutral: 0

Net effect range: [-4.9457, 4.3424]
Mean net effect: -1.4795

Top 3 most helpful tokens:
  • 'da': 4.3424
  • 'de': 0.0337
  • 'chefes': 0.0315

Top 3 most misleading tokens:
  • 'alemanha,': -4.9457
  • 'espanha': -4.2423
  • 'itália,': -3.6173
