# Cross-Lingual Meme Understanding with CLIP

**Research Question:** Does CLIP's multimodal understanding transfer cross-lingually to Japanese memes?

**Author:** Maliha Binte Mamun  

---

## Overview

This notebook investigates whether vision-language models (specifically CLIP) can understand memes across languages. We compare CLIP's performance on:
- English memes (in-distribution for CLIP)
- Japanese memes (cross-lingual transfer)

**Key Contributions:**
1. First systematic evaluation of CLIP on Japanese memes
2. Error analysis revealing cross-lingual failure patterns
3. Insights for building multilingual multimodal systems

---

## Day 1: Environment Setup & Data Loading

In [None]:
# ============================================================
# SECTION 1.1: Install Dependencies
# ============================================================
!pip install -q transformers torch torchvision pillow pandas matplotlib seaborn
!pip install -q datasets  # For Hugging Face datasets
!pip install -q japanize-matplotlib  # For Japanese text in plots
!pip install -q fugashi unidic-lite  # Japanese tokenizer (optional, for analysis)

print("✅ All dependencies installed!")

In [None]:
# ============================================================
# SECTION 1.2: Import Libraries
# ============================================================
import torch
import torch.nn.functional as F
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import requests
from io import BytesIO
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datasets import load_dataset
import json
import os
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# For Japanese text in matplotlib
import japanize_matplotlib

# Check GPU availability
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
if device == "cuda":
    print(f"GPU: {torch.cuda.get_device_name(0)}")

In [None]:
# ============================================================
# SECTION 1.3: Load CLIP Model
# ============================================================
# We use the multilingual CLIP variant for fair comparison
# Options:
#   - "openai/clip-vit-base-patch32" (English-only, faster)
#   - "openai/clip-vit-large-patch14" (English-only, better)
#   - "laion/CLIP-ViT-H-14-laion2B-s32B-b79K" (Multilingual training data)

MODEL_NAME = "openai/clip-vit-base-patch32"  # Start with base model

print(f" Loading CLIP model: {MODEL_NAME}")
model = CLIPModel.from_pretrained(MODEL_NAME).to(device)
processor = CLIPProcessor.from_pretrained(MODEL_NAME)
model.eval()
print("CLIP model loaded successfully!")

In [None]:
# ============================================================
# SECTION 1.4: Helper Functions
# ============================================================

def load_image_from_url(url):
    """Load image from URL."""
    try:
        response = requests.get(url, timeout=10)
        image = Image.open(BytesIO(response.content)).convert('RGB')
        return image
    except Exception as e:
        print(f"Error loading image: {e}")
        return None

def load_image_from_path(path):
    """Load image from local path."""
    try:
        image = Image.open(path).convert('RGB')
        return image
    except Exception as e:
        print(f"Error loading image: {e}")
        return None

def get_clip_embeddings(image, texts):
    """Get CLIP embeddings for image and texts."""
    inputs = processor(
        text=texts,
        images=image,
        return_tensors="pt",
        padding=True,
        truncation=True
    ).to(device)

    with torch.no_grad():
        outputs = model(**inputs)
        image_embeds = outputs.image_embeds
        text_embeds = outputs.text_embeds

    return image_embeds, text_embeds

def compute_similarity(image, texts):
    """Compute cosine similarity between image and texts."""
    image_embeds, text_embeds = get_clip_embeddings(image, texts)

    # Normalize embeddings
    image_embeds = F.normalize(image_embeds, dim=-1)
    text_embeds = F.normalize(text_embeds, dim=-1)

    # Compute cosine similarity
    similarity = (image_embeds @ text_embeds.T).squeeze(0)

    return similarity.cpu().numpy()

def classify_meme(image, candidate_labels):
    """Zero-shot classification of meme."""
    similarities = compute_similarity(image, candidate_labels)
    probs = F.softmax(torch.tensor(similarities) * 100, dim=0).numpy()  # Temperature scaling

    results = list(zip(candidate_labels, similarities, probs))
    results.sort(key=lambda x: x[1], reverse=True)

    return results

print("Helper functions defined!")

## Section 1.5: Dataset Preparation

We'll use two datasets:
1. **English Memes:** Hateful Memes dataset (Facebook AI) or MemeCap
2. **Japanese Memes:** We'll curate a small evaluation set

### Dataset Sources:
- Hateful Memes: https://hatefulmemeschallenge.com/
- For Japanese: We'll use publicly available meme images with Japanese text

In [None]:
# ============================================================
# SECTION 1.5a: Create Project Directory Structure
# ============================================================
import os

# Create directories
os.makedirs('data/english_memes', exist_ok=True)
os.makedirs('data/japanese_memes', exist_ok=True)
os.makedirs('results', exist_ok=True)
os.makedirs('figures', exist_ok=True)

print("!Directory structure created:")
print("   data/english_memes/")
print("   data/japanese_memes/")
print("   results/")
print("   figures/")

In [None]:
# ============================================================
# SECTION 1.5b: Load English Meme Dataset
# ============================================================
# Option 1: Use Hugging Face's meme datasets
# Option 2: Use sample memes for quick testing

# For this project, we'll create a curated evaluation set
# This ensures we have controlled examples for analysis

# English meme samples with labels
english_memes = [
    {
        "id": "en_001",
        "url": "https://i.imgflip.com/1bij.jpg",  # "One Does Not Simply" template
        "text_in_image": "ONE DOES NOT SIMPLY WALK INTO MORDOR",
        "sentiment": "humorous",
        "category": "movie_reference",
        "description": "Lord of the Rings Boromir meme"
    },
    {
        "id": "en_002",
        "url": "https://i.imgflip.com/9ehk.jpg",  # "Y U No" template
        "text_in_image": "Y U NO EXAMPLE TEXT",
        "sentiment": "frustrated",
        "category": "reaction",
        "description": "Y U No guy meme"
    },
    {
        "id": "en_003",
        "url": "https://i.imgflip.com/1otk96.jpg",  # "Distracted Boyfriend"
        "text_in_image": "",
        "sentiment": "humorous",
        "category": "relationship",
        "description": "Distracted boyfriend meme template"
    },
]

print(f"  English meme samples prepared: {len(english_memes)} memes")
print("\n NOTE: For full experiment, you should:")
print("   1. Download Hateful Memes dataset from Facebook AI")
print("   2. Or use: datasets.load_dataset('limjiayi/hateful_memes_expanded')")
print("   3. Minimum recommended: 100-500 memes per language")

In [None]:
# ============================================================
# SECTION 1.5c: Japanese Meme Dataset
# ============================================================
# Japanese memes are harder to find in public datasets
# We'll create a small curated set with common Japanese meme formats

# Japanese meme samples (you'll need to add actual images)
japanese_memes = [
    {
        "id": "jp_001",
        "url": "",  # Add URL or local path
        "text_in_image": "なぜベストを尽くさないのか",  # "Why don't you do your best?"
        "text_english": "Why don't you do your best?",
        "sentiment": "motivational_sarcastic",
        "category": "tv_reference",
        "description": "Famous Japanese TV phrase often used sarcastically"
    },
    {
        "id": "jp_002",
        "url": "",
        "text_in_image": "それな",  # "That's it" / "Exactly"
        "text_english": "Exactly / That's right",
        "sentiment": "agreement",
        "category": "reaction",
        "description": "Japanese internet slang for agreement"
    },
    {
        "id": "jp_003",
        "url": "",
        "text_in_image": "草",  # "Grass" = LOL in Japanese internet
        "text_english": "LOL (literally 'grass')",
        "sentiment": "humorous",
        "category": "reaction",
        "description": "Japanese internet slang for laughing"
    },
]

print(f" Japanese meme templates prepared: {len(japanese_memes)} memes")
print("\n ACTION REQUIRED:")
print("   To collect Japanese meme images. Options:")
print("   1. Search Twitter/X with #日本ミーム or #ネタ画像")
print("   2. Use Japanese image boards (careful with copyright)")
print("   3. Create synthetic examples with Japanese text on meme templates")
print("   \n Collect 50-100 Japanese memes manually")

In [None]:
# ============================================================
# SECTION 1.5d: Alternative - Use Existing Multilingual Dataset
# ============================================================
# If you want to use an existing dataset with Japanese content

print(" Searching for available multilingual meme datasets...")

# Try loading MultiOFF dataset (multilingual offensive memes)
try:
    # This dataset has memes in multiple languages
    from datasets import load_dataset

    # Option: Hateful Memes (English)
    print("\n Available datasets you can use:")
    print("   1. limjiayi/hateful_memes_expanded (English, ~10K memes)")
    print("   2. neuroailab/memotion (English, sentiment analysis)")
    print("   3. For Japanese: Manual collection will be done")

except Exception as e:
    print(f"Note: {e}")

In [None]:
# ============================================================
# SECTION 1.6: Load Real Dataset - Hateful Memes Sample
# ============================================================

print("Loading English meme dataset from Hugging Face...")

try:
    # Load a subset for quick experimentation
    dataset = load_dataset("limjiayi/hateful_memes_expanded", split="train[:200]")
    print(f"   Loaded {len(dataset)} English memes")
    print(f"   Columns: {dataset.column_names}")

    # Preview
    print("\n Sample entry:")
    print(dataset[0])

except Exception as e:
    print(f"  Could not load dataset: {e}")
    print("   Using manual sample data instead.")
    dataset = None

In [None]:
# 1.7 Mount Google Drive (for Japanese memes)
from google.colab import drive
drive.mount('/content/drive')
print("Google Drive mounted!")

In [None]:
# 1.8 Load Japanese Meme Annotations from Google Sheet
import pandas as pd

SHEET_ID = "YOUR_GOOGLE_SHEET_ID"
sheet_url = f"https://docs.google.com/spreadsheets/d/{SHEET_ID}/export?format=csv"

jp_annotations = pd.read_csv(sheet_url)
print(f"Loaded {len(jp_annotations)} Japanese meme annotations!")
print("\nColumns:", jp_annotations.columns.tolist())
print("\nFirst 5 rows:")
jp_annotations.head()

In [None]:
# 1.9 Copy Japanese Meme Images from Google Drive
import shutil
import os

# Create directory
os.makedirs("data/japanese_memes", exist_ok=True)

# CORRECT PATH based on your Drive structure
drive_folder = "YOUR_DRIVE_PATH/japanese_memes"

# Check if folder exists
if not os.path.exists(drive_folder):
    print(f"!Folder not found: {drive_folder}")
else:
    # List all images in Drive folder
    drive_files = os.listdir(drive_folder)
    print(f"Found {len(drive_files)} files in Google Drive folder")

    # Copy all images (not just those in annotations)
    count = 0
    for filename in drive_files:
        if filename.lower().endswith(('.jpeg', '.jpg', '.png')):
            src = os.path.join(drive_folder, filename)
            dst = os.path.join("data/japanese_memes", filename)
            shutil.copy(src, dst)
            count += 1

    print(f"✅ Copied {count} Japanese meme images!")

# 1.10 Verify Japanese Dataset is Ready
image_files = os.listdir("data/japanese_memes")
print(f"   Japanese Meme Dataset Summary:")
print(f"   Annotations: {len(jp_annotations)}")
print(f"   Images: {len(image_files)}")
print(f"   Ready for Section 3!")

---
## English Meme Baseline Evaluation

In [None]:
# ============================================================
# SECTION 2.1: Define Classification Labels
# ============================================================

# Zero-shot classification labels (English)
SENTIMENT_LABELS_EN = [
    "a funny meme",
    "an offensive meme",
    "a sarcastic meme",
    "a wholesome meme",
    "a political meme",
    "a neutral meme"
]

CATEGORY_LABELS_EN = [
    "a meme about relationships",
    "a meme about work or school",
    "a meme about current events",
    "a meme referencing movies or TV",
    "a meme about internet culture",
    "a meme about daily life"
]

# Japanese equivalents for cross-lingual testing
SENTIMENT_LABELS_JP = [
    "面白いミーム",      # funny meme
    "攻撃的なミーム",    # offensive meme
    "皮肉なミーム",      # sarcastic meme
    "心温まるミーム",    # wholesome meme
    "政治的なミーム",    # political meme
    "普通のミーム"       # neutral meme
]

print("  Classification labels defined:")
print(f"   Sentiment labels (EN): {len(SENTIMENT_LABELS_EN)}")
print(f"   Sentiment labels (JP): {len(SENTIMENT_LABELS_JP)}")

In [None]:
# ============================================================
# SECTION 2.2: Test CLIP on Single Image
# ============================================================

# Test with a sample meme
test_url = "https://i.imgflip.com/1bij.jpg"  # "One Does Not Simply" meme

print("Testing CLIP on sample meme...")
test_image = load_image_from_url(test_url)

if test_image:
    # Display image
    plt.figure(figsize=(6, 6))
    plt.imshow(test_image)
    plt.axis('off')
    plt.title("Test Meme: One Does Not Simply")
    plt.show()

    # Classify with English labels
    print("\n Zero-shot Classification Results (English labels):")
    results = classify_meme(test_image, SENTIMENT_LABELS_EN)
    for label, sim, prob in results:
        print(f"   {label}: {prob:.2%} (sim: {sim:.3f})")
else:
    print("Could not load test image")

In [None]:
# ============================================================
# SECTION 2.3: Batch Evaluation Function
# ============================================================

def evaluate_memes(meme_data, labels, label_language="en"):
    """
    Evaluate CLIP on a batch of memes.

    Args:
        meme_data: List of dicts with 'url' or 'image' key
        labels: List of classification labels
        label_language: 'en' or 'jp'

    Returns:
        DataFrame with results
    """
    results = []

    for item in tqdm(meme_data, desc=f"Evaluating ({label_language})"):
        # Load image
        if 'image' in item and item['image'] is not None:
            image = item['image']
            if isinstance(image, str):
                image = load_image_from_path(image)
        elif 'url' in item and item['url']:
            image = load_image_from_url(item['url'])
        else:
            continue

        if image is None:
            continue

        # Classify
        try:
            classification = classify_meme(image, labels)
            top_label, top_sim, top_prob = classification[0]

            results.append({
                'id': item.get('id', 'unknown'),
                'predicted_label': top_label,
                'confidence': top_prob,
                'similarity': top_sim,
                'all_scores': {l: p for l, s, p in classification},
                'ground_truth': item.get('sentiment', item.get('label', 'unknown')),
                'label_language': label_language
            })
        except Exception as e:
            print(f"Error processing {item.get('id', 'unknown')}: {e}")

    return pd.DataFrame(results)

print("Batch evaluation function defined!")

In [None]:
# =====================================
# SECTION 2.4: Evaluate English Memes
# =====================================

from datasets import load_dataset
from PIL import Image
import pandas as pd
import os
from tqdm import tqdm
import requests
from io import BytesIO
import torch

# Create results directory
os.makedirs('results', exist_ok=True)

# Base URL for the Hateful Memes images
BASE_URL = "https://huggingface.co/datasets/limjiayi/hateful_memes_expanded/resolve/main/"

print("Loading Hateful Memes dataset...")
hateful_memes = load_dataset("limjiayi/hateful_memes_expanded", split="train")

# Take subset
subset = hateful_memes.select(range(200))#(50))

print(f"Running CLIP evaluation on 200 memes...")

results_list = []
errors = 0

for i, item in enumerate(tqdm(subset, desc="Evaluating")):
    try:
        # Get image path and construct full URL
        img_path = item['img']
        img_url = BASE_URL + img_path

        # Download image
        response = requests.get(img_url, timeout=15)
        if response.status_code != 200:
            errors += 1
            continue

        image = Image.open(BytesIO(response.content)).convert("RGB")

        # Process with CLIP - FIXED VERSION
        text_prompts = [f"a {label} meme" for label in SENTIMENT_LABELS_EN]

        inputs = processor(
            text=text_prompts,
            images=image,
            return_tensors="pt",
            padding=True
        ).to(device)

        with torch.no_grad():
            outputs = model(**inputs)

            # Get logits directly (image-text similarity)
            logits_per_image = outputs.logits_per_image  # Shape: [1, num_labels]
            probs = logits_per_image.softmax(dim=-1).squeeze()

            pred_idx = probs.argmax().item()
            confidence = probs[pred_idx].item()

        results_list.append({
            'id': f'en_{i:04d}',
            'predicted': SENTIMENT_LABELS_EN[pred_idx],
            'confidence': confidence,
            'text': item.get('text', '')[:50]
        })

    except Exception as e:
        errors += 1
        if errors <= 3:
            print(f"Error on item {i}: {type(e).__name__}: {e}")
        continue

# Convert to DataFrame
english_results = pd.DataFrame(results_list)

print(f"\n Evaluation complete!")
print(f"   Processed: {len(english_results)} memes")
print(f"   Errors: {errors}")

if len(english_results) > 0:
    print(f"   Average confidence: {english_results['confidence'].mean():.2%}")
    english_results.to_csv('results/english_meme_results.csv', index=False)
    print("   Saved to: results/english_meme_results.csv")
    print(f"\n Sample predictions:")
    print(english_results.head(10))

In [None]:
# # Check english_results columns
print("Columns in english_results:")
print(english_results.columns.tolist())
print("\nFirst row:")
print(english_results.head(1))

In [None]:
# ================================================
# SECTION 2.5: Visualize English Baseline Results
# ================================================

import matplotlib.pyplot as plt
import seaborn as sns
import os

print(" Visualizing English Baseline Results...")

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# 1. Distribution of predictions
ax1 = axes[0]
pred_counts = english_results['predicted'].value_counts()
pred_counts.plot(kind='barh', ax=ax1, color='steelblue')
ax1.set_xlabel('Count')
ax1.set_ylabel('Predicted Sentiment')
ax1.set_title('English Meme Predictions Distribution')

# 2. Confidence distribution
ax2 = axes[1]
english_results['confidence'].hist(bins=20, ax=ax2, color='coral', edgecolor='black')
ax2.set_xlabel('Confidence Score')
ax2.set_ylabel('Frequency')
ax2.set_title(f'Confidence Distribution (Mean: {english_results["confidence"].mean():.2%})')
ax2.axvline(english_results['confidence'].mean(), color='red', linestyle='--', label='Mean')
ax2.legend()

plt.tight_layout()

# Save figure
os.makedirs('figures', exist_ok=True)
plt.savefig('figures/english_baseline_results.png', dpi=150, bbox_inches='tight')
print(" Figure saved to: figures/english_baseline_results.png")

plt.show()

# Print summary statistics
print("\n English Baseline Summary:")
print(f"   Total memes evaluated: {len(english_results)}")
print(f"   Average confidence: {english_results['confidence'].mean():.2%}")
print(f"   Std deviation: {english_results['confidence'].std():.2%}")
print(f"\n   Prediction distribution:")
print(english_results['predicted'].value_counts())

---
## Japanese Meme Evaluation & Cross-Lingual Comparison

In [None]:
# ===========================================
# SECTION 3: JAPANESE MEME EVALUATION
# ===========================================

from PIL import Image
import pandas as pd
import os
from tqdm import tqdm
import torch

print("🇯🇵 JAPANESE MEME EVALUATION")
print("="*50)

# Check data is loaded
jp_image_dir = "data/japanese_memes"
print(f"   Annotations loaded: {len(jp_annotations)}")
print(f"   Images available: {len(os.listdir(jp_image_dir))}")

# Japanese and English labels
SENTIMENT_LABELS_JP = ["面白いミーム", "皮肉なミーム", "攻撃的なミーム", "ほのぼのミーム", "中立的なミーム", "政治的なミーム"]
SENTIMENT_LABELS_EN_FOR_JP = ["funny", "sarcastic", "offensive", "wholesome", "neutral", "political"]

# ============================================================
# TEST 1: Japanese memes with ENGLISH labels
# ============================================================
print(f"\n Test 1: Japanese memes → ENGLISH labels...")

jp_results_en = []
errors_en = 0

for i, row in tqdm(jp_annotations.iterrows(), total=len(jp_annotations), desc="JP memes (EN labels)"):
    try:
        img_path = os.path.join(jp_image_dir, row['filename'])
        if not os.path.exists(img_path):
            errors_en += 1
            continue

        image = Image.open(img_path).convert("RGB")
        text_prompts = [f"a {label} meme" for label in SENTIMENT_LABELS_EN_FOR_JP]

        inputs = processor(text=text_prompts, images=image, return_tensors="pt", padding=True).to(device)

        with torch.no_grad():
            outputs = model(**inputs)
            probs = outputs.logits_per_image.softmax(dim=-1).squeeze()
            pred_idx = probs.argmax().item()
            confidence = probs[pred_idx].item()

        jp_results_en.append({
            'id': row['id'],
            'filename': row['filename'],
            'predicted_en': SENTIMENT_LABELS_EN_FOR_JP[pred_idx],
            'confidence_en': confidence,
            'actual_sentiment': row.get('sentiment', ''),
            'category': row.get('category', '')
        })
    except Exception as e:
        errors_en += 1
        continue

jp_results_en_df = pd.DataFrame(jp_results_en)
print(f"   Processed: {len(jp_results_en_df)}, Errors: {errors_en}")
print(f"   Avg confidence (EN labels): {jp_results_en_df['confidence_en'].mean():.2%}")

# ============================================================
# TEST 2: Japanese memes with JAPANESE labels
# ============================================================
print(f"\n Test 2: Japanese memes → JAPANESE labels...")

jp_results_jp = []
errors_jp = 0

for i, row in tqdm(jp_annotations.iterrows(), total=len(jp_annotations), desc="JP memes (JP labels)"):
    try:
        img_path = os.path.join(jp_image_dir, row['filename'])
        if not os.path.exists(img_path):
            errors_jp += 1
            continue

        image = Image.open(img_path).convert("RGB")

        inputs = processor(text=SENTIMENT_LABELS_JP, images=image, return_tensors="pt", padding=True).to(device)

        with torch.no_grad():
            outputs = model(**inputs)
            probs = outputs.logits_per_image.softmax(dim=-1).squeeze()
            pred_idx = probs.argmax().item()
            confidence = probs[pred_idx].item()

        jp_results_jp.append({
            'id': row['id'],
            'predicted_jp': SENTIMENT_LABELS_JP[pred_idx],
            'confidence_jp': confidence
        })
    except Exception as e:
        errors_jp += 1
        continue

jp_results_jp_df = pd.DataFrame(jp_results_jp)
print(f"   Processed: {len(jp_results_jp_df)}, Errors: {errors_jp}")
print(f"   Avg confidence (JP labels): {jp_results_jp_df['confidence_jp'].mean():.2%}")

# ============================================================
# COMBINE RESULTS
# ============================================================
japanese_results = jp_results_en_df.merge(jp_results_jp_df, on='id', how='inner')

# Save results
os.makedirs('results', exist_ok=True)
japanese_results.to_csv('results/japanese_meme_results.csv', index=False)
print(f"\n Saved to: results/japanese_meme_results.csv")

# ============================================================
# COMPARISON SUMMARY
# ============================================================
print("\n" + "="*50)
print("CROSS-LINGUAL COMPARISON SUMMARY")
print("="*50)

en_baseline_conf = english_results['confidence'].mean()
jp_with_en_labels = japanese_results['confidence_en'].mean()
jp_with_jp_labels = japanese_results['confidence_jp'].mean()

print(f"\n{'Dataset':<30} {'Avg Confidence':>15}")
print("-"*50)
print(f"{'English memes + EN labels':<30} {en_baseline_conf:>14.2%}")
print(f"{'Japanese memes + EN labels':<30} {jp_with_en_labels:>14.2%}")
print(f"{'Japanese memes + JP labels':<30} {jp_with_jp_labels:>14.2%}")
print("-"*50)
print(f"{'Gap (EN baseline vs JP+EN)':<30} {en_baseline_conf - jp_with_en_labels:>14.2%}")
print(f"{'Gap (JP+EN vs JP+JP)':<30} {jp_with_en_labels - jp_with_jp_labels:>14.2%}")

# Show sample Japanese results
print(f"\n Sample Japanese meme predictions:")
print(japanese_results[['filename', 'predicted_en', 'confidence_en', 'actual_sentiment']].head(10))

---
## Error Analysis & Visualizations

In [None]:
# ============================================================
# SECTION 4: VISUALIZATIONS & ERROR ANALYSIS
# ============================================================

import matplotlib.pyplot as plt
import seaborn as sns
import os

os.makedirs('figures', exist_ok=True)

# ============================================================
# 4.1: Cross-Lingual Comparison Bar Chart
# ============================================================

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Chart 1: Confidence Comparison
ax1 = axes[0]
conditions = ['English memes\n+ EN labels', 'Japanese memes\n+ EN labels', 'Japanese memes\n+ JP labels']
confidences = [
    english_results['confidence'].mean(),
    japanese_results['confidence_en'].mean(),
    japanese_results['confidence_jp'].mean()
]
colors = ['#2ecc71', '#3498db', '#e74c3c']

bars = ax1.bar(conditions, confidences, color=colors, edgecolor='black')
ax1.set_ylabel('Average Confidence', fontsize=12)
ax1.set_title('CLIP Confidence Across Languages', fontsize=14, fontweight='bold')
ax1.set_ylim(0, 0.6)

# Add value labels on bars
for bar, conf in zip(bars, confidences):
    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
             f'{conf:.1%}', ha='center', fontsize=11, fontweight='bold')

# Chart 2: Prediction Distribution (Japanese memes)
ax2 = axes[1]
jp_pred_counts = japanese_results['predicted_en'].value_counts()
jp_pred_counts.plot(kind='bar', ax=ax2, color='steelblue', edgecolor='black')
ax2.set_xlabel('Predicted Sentiment')
ax2.set_ylabel('Count')
ax2.set_title('Japanese Meme Predictions\n(using English labels)', fontsize=12)
ax2.tick_params(axis='x', rotation=45)

# Chart 3: Confidence Distribution Comparison
ax3 = axes[2]
ax3.hist(english_results['confidence'], bins=15, alpha=0.6, label='English memes', color='green', edgecolor='black')
ax3.hist(japanese_results['confidence_en'], bins=15, alpha=0.6, label='Japanese memes (EN labels)', color='blue', edgecolor='black')
ax3.hist(japanese_results['confidence_jp'], bins=15, alpha=0.6, label='Japanese memes (JP labels)', color='red', edgecolor='black')
ax3.set_xlabel('Confidence Score')
ax3.set_ylabel('Frequency')
ax3.set_title('Confidence Distribution Comparison', fontsize=12)
ax3.legend()

plt.tight_layout()
plt.savefig('figures/cross_lingual_comparison.png', dpi=150, bbox_inches='tight')
print(" Saved: figures/cross_lingual_comparison.png")
plt.show()

In [None]:
# ============================================================
# 4.2: Accuracy Analysis
# ============================================================

print("\n ACCURACY ANALYSIS")
print("="*50)

# Calculate accuracy for Japanese memes - FIXED version
def check_match(row):
    predicted = str(row['predicted_en']).lower()
    actual = str(row['actual_sentiment']).lower()
    return actual in predicted or predicted in actual

japanese_results['correct'] = japanese_results.apply(check_match, axis=1)
accuracy = japanese_results['correct'].mean()

print(f"Japanese meme accuracy (predicted vs annotated): {accuracy:.1%}")

# Confusion-style breakdown
print("\n📋 Prediction vs Actual Sentiment:")
confusion = pd.crosstab(
    japanese_results['actual_sentiment'],
    japanese_results['predicted_en'],
    margins=True
)
print(confusion)

In [None]:
# ============================================================
# 4.3: Category Analysis
# ============================================================

print("\nPERFORMANCE BY CATEGORY")
print("="*50)

category_stats = japanese_results.groupby('category').agg({
    'confidence_en': ['mean', 'count'],
    'confidence_jp': 'mean'
}).round(3)

category_stats.columns = ['EN_label_conf', 'count', 'JP_label_conf']
category_stats = category_stats.sort_values('count', ascending=False)
print(category_stats.head(15))

In [None]:
# ============================================================
# 4.4: Summary Statistics for Paper
# ============================================================

print("\n" + "="*60)
print(" SUMMARY FOR PAPER")
print("="*60)

summary_stats = {
    'english_memes': {
        'n': len(english_results),
        'avg_confidence': english_results['confidence'].mean(),
        'std_confidence': english_results['confidence'].std()
    },
    'japanese_memes_en_labels': {
        'n': len(japanese_results),
        'avg_confidence': japanese_results['confidence_en'].mean(),
        'std_confidence': japanese_results['confidence_en'].std()
    },
    'japanese_memes_jp_labels': {
        'n': len(japanese_results),
        'avg_confidence': japanese_results['confidence_jp'].mean(),
        'std_confidence': japanese_results['confidence_jp'].std()
    },
    'cross_lingual_gap': english_results['confidence'].mean() - japanese_results['confidence_en'].mean(),
    'label_language_gap': japanese_results['confidence_en'].mean() - japanese_results['confidence_jp'].mean()
}

import json
print(json.dumps(summary_stats, indent=2, default=lambda x: f"{x:.4f}" if isinstance(x, float) else x))

# Save summary
with open('results/summary_statistics.json', 'w') as f:
    json.dump(summary_stats, f, indent=2, default=lambda x: round(x, 4) if isinstance(x, float) else x)
print("\n Summary saved to: results/summary_statistics.json")

---
## Results Compilation & Paper Preparation

In [None]:
# ============================================================
# SECTION 5: EXPORT ALL RESULTS
# ============================================================

import shutil
import os

# Create paper materials folder
os.makedirs('paper_materials', exist_ok=True)

# Copy all results
files_to_copy = [
    ('figures/cross_lingual_comparison.png', 'paper_materials/'),
    ('figures/english_baseline_results.png', 'paper_materials/'),
    ('results/summary_statistics.json', 'paper_materials/'),
    ('results/english_meme_results.csv', 'paper_materials/'),
    ('results/japanese_meme_results.csv', 'paper_materials/')
]

print("Exporting paper materials...")
for src, dst in files_to_copy:
    if os.path.exists(src):
        shutil.copy(src, dst)
        print(f"   ✅ {src}")
    else:
        print(f"   ⚠️ Not found: {src}")

# Create zip file
!zip -r cross_lingual_clip_results.zip results/ figures/ paper_materials/ 2>/dev/null

print("\n" + "="*60)
print(" PROJECT COMPLETE!")
print("="*60)

In [None]:
from google.colab import files
files.download('cross_lingual_clip_results.zip')

In [None]:
import json
from google.colab import files

# Load notebook
notebook_path = "YOUR_DRIVE_PATH/CrossLingual_CLIP_Meme_Understanding.ipynb"

with open(notebook_path, 'r', encoding='utf-8') as f:
    notebook = json.load(f)

# Clean each cell
for cell in notebook.get('cells', []):
    if cell.get('cell_type') == 'code':
        # Clear outputs
        cell['outputs'] = []
        cell['execution_count'] = None

        # Clean source - replace personal paths with generic
        source = ''.join(cell.get('source', []))

        # Replace personal paths
        source = source.replace('YOUR_DRIVE_PATH/', 'YOUR_DRIVE_PATH/')
        source = source.replace('YOUR_GOOGLE_SHEET_ID', 'YOUR_GOOGLE_SHEET_ID')

        cell['source'] = source.split('\n')
        cell['source'] = [line + '\n' if i < len(cell['source'])-1 else line
                          for i, line in enumerate(cell['source'])]

# Save clean version
clean_path = "/content/CrossLingual_CLIP_Clean.ipynb"
with open(clean_path, 'w', encoding='utf-8') as f:
    json.dump(notebook, f, indent=1, ensure_ascii=False)

print("✅ Cleaned notebook created!")
files.download(clean_path)
# import json
# import os
# from google.colab import files

# # Load the notebook
# notebook_path = "YOUR_DRIVE_PATH/CrossLingual_CLIP_Meme_Understanding.ipynb"

# with open(notebook_path, 'r', encoding='utf-8') as f:
#     notebook = json.load(f)

# # Clean the notebook - remove large outputs
# for cell in notebook.get('cells', []):
#     if cell.get('cell_type') == 'code':
#         # Clear outputs to reduce size and potential issues
#         cell['outputs'] = []
#         cell['execution_count'] = None

# # Save clean version
# clean_path = "/content/CrossLingual_CLIP_Meme_Understanding_clean.ipynb"
# with open(clean_path, 'w', encoding='utf-8') as f:
#     json.dump(notebook, f, indent=1, ensure_ascii=False)

# # Verify it's valid
# try:
#     with open(clean_path, 'r') as f:
#         json.load(f)
#     print("✅ Clean notebook created and validated!")
#     print(f"   Cells: {len(notebook.get('cells', []))}")

#     # Download
#     files.download(clean_path)
#     print(" Downloading...")
# except Exception as e:
#     print(f" Error: {e}")

---
## 🎯 Next Steps

### To Complete This Study:

1. **Collect Japanese Memes** (50-100 minimum)
   - Use Twitter/X with Japanese hashtags
   - Annotate with sentiment/category labels
   - Save to `data/japanese_memes/`

2. **Run Full Evaluation**
   - Increase English meme sample size to 200+
   - Evaluate all Japanese memes
   - Run cross-lingual comparisons

3. **Statistical Analysis**
   - Compute significance tests (t-test, chi-square)
   - Calculate effect sizes
   - Generate confidence intervals

4. **Write Paper**
   - Use the results and figures generated
   - Follow workshop paper format (4-6 pages)
   - Target: ACL workshops, EMNLP workshops

### Potential Extensions:
- Try multilingual CLIP variants (XLM-R based)
- Fine-tune on small Japanese meme dataset
- Add OCR to explicitly extract Japanese text
- Compare with Japanese-specific models
---