# Sequential Two-Stage Contextual Bridge Distillation — Sports-in-the-Wild

This notebook now implements a **sequential 2-stage knowledge bridge** instead of training all three models jointly. The idea:

```
Teacher (Base)  ──►  Stage 1: Train Small  ──►  Stage 2: Small (frozen) teaches Tiny
        (global rich knowledge)              (domain-adapted bridge)         (final efficient student)
```

Why a contextual bridge?
- The direct gap Teacher(Base) → Student(Tiny) can be large (capacity + representation mismatch).
- First adapting an intermediate (Small) yields a *domain-specialized assistant*.
- Then the Tiny model learns from both: retained high-level signal (Teacher) + distilled, compressed domain signal (Assistant).

Outcome: Better stability + improved tiny accuracy vs single-hop distillation.


## Training Flow Overview

We run **two independent training jobs** inside one notebook:

### Stage 1: Teacher → Assistant (Train ViT-Small)
Goal: Produce a strong, domain-adapted SMALL model that will act as a frozen assistant in Stage 2.

Configuration principles:
- Student model = `videomae-small` (becomes the assistant later)
- Assistant influence weights = 0 (no assistant yet)
- Only Teacher → Student logits (and optionally features) KD

### Stage 2: Assistant → Student (Train ViT-Tiny)
Goal: Train the TINY model using BOTH the frozen Small (assistant) and the original Teacher.

Configuration principles:
- Assistant model path = checkpoint directory produced in Stage 1
- Teacher still provides a small stabilizing signal
- Assistant has higher logits weight (primary mentor)

### Advantages of This Design
- Reduces representational jump distance
- Lets the Assistant internalize domain specifics before mentoring Tiny
- Often yields +accuracy versus a direct Teacher→Tiny pipeline

Proceed through sections in order. Skip Stage 1 only if you already have a trained Small checkpoint you want to reuse.


In [1]:
# Environment / Common Imports
import os, json, torch
import torchvision, pytorchvideo, transformers
from datetime import datetime
from transformers import TrainingArguments

from tri_model_distillation.config import TriModelConfig
from tri_model_distillation.models import TriModelDistillationFramework
from tri_model_distillation.trainer import TriModelDistillationTrainer, compute_video_classification_metrics
from tri_model_distillation.utils import (
    setup_logging, load_label_mappings, create_data_loaders,
)

print(torch.__version__)
print('CUDA available:', torch.cuda.is_available())
print("torch:", torch.__version__, "cuda:", torch.version.cuda)
print("torchvision:", torchvision.__version__)
print("pytorchvideo:", pytorchvideo.__version__)
print("has functional_tensor:", hasattr(__import__('torchvision.transforms', fromlist=['']), 'functional_tensor'))



2.1.0+cu118
CUDA available: True
torch: 2.1.0+cu118 cuda: 11.8
torchvision: 0.16.0+cu118
pytorchvideo: 0.1.5
has functional_tensor: True


In [2]:
# Dataset + Label Mapping
DATASET_ROOT = 'processed_dataset'
BASE_RUN_DIR = f"./contextual_bridge_runs_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
STAGE1_OUTPUT_DIR = f"{BASE_RUN_DIR}/stage1_teacher_to_small"
STAGE2_OUTPUT_DIR = f"{BASE_RUN_DIR}/stage2_small_to_tiny"
os.makedirs(STAGE1_OUTPUT_DIR, exist_ok=True)
os.makedirs(STAGE2_OUTPUT_DIR, exist_ok=True)

label2id, id2label = load_label_mappings(dataset_root=DATASET_ROOT, train_csv='train.csv', classification_type='multiclass')
num_labels = len(label2id)
print(f"Detected {num_labels} classes")
print(list(label2id.keys())[:30], '...')

Detected 30 classes
['archery', 'baseball', 'basketball', 'bmx', 'bowling', 'boxing', 'cheerleading', 'discusthrow', 'diving', 'football', 'golf', 'gymnastics', 'hammerthrow', 'highjump', 'hockey', 'hurdling', 'javelin', 'longjump', 'polevault', 'rowing', 'running', 'shotput', 'skating', 'skiing', 'soccer', 'swimming', 'tennis', 'volleyball', 'weight', 'wrestling'] ...


In [3]:
# Stage 1 Configuration (Teacher + Pretrained Assistant → Train Small Student)
# Dual supervision with ONLY logits KD (features/attentions disabled to save memory)
pretrained_small_ckpt = 'mitegvg/videomae-small-finetuned-ssv2-finetuned-sports-videos-in-the-wild'

stage1_config = TriModelConfig(
    classification_type='multiclass',
    num_labels=num_labels,
    teacher_model_name='mitegvg/videomae-base-finetuned-kinetics-finetuned-sports-videos-in-the-wild',
    assistant_model_name=pretrained_small_ckpt,
    student_model_name=pretrained_small_ckpt,
    temperature=4.0,
    logits_temperature=4.0,
    teacher_logits_weight=1.0,
    assistant_logits_weight=0.5,
    classification_loss_weight=1.0,
    logits_distillation_weight=0.35,
    hidden_layers_to_align=[],
    feature_distillation_weight=0.0,
    attention_distillation_weight=0.0,
    use_pretrained_student=True,
    num_frames=16,
    apply_defaults=False,  # NEW: prevent auto override adding hidden/attn needs
)

# Memory safety knobs
TOTAL_TRAIN_SAMPLES = 3364
per_device_train_batch_size = 2 if torch.cuda.is_available() else 2  # reduced
gradient_accumulation_steps = 16 if torch.cuda.is_available() else 8  # keep effective batch similar
effective_batch = per_device_train_batch_size * gradient_accumulation_steps
stage1_epochs = 12
steps_per_epoch = TOTAL_TRAIN_SAMPLES // effective_batch
stage1_total_steps = steps_per_epoch * stage1_epochs
stage1_warmup = min(500, int(0.1 * stage1_total_steps))

print('Stage 1 config ready (logits-only KD; features/attn disabled)')

Stage 1 config ready (logits-only KD; features/attn disabled)


In [4]:
# Stage 1 Skip Logic — Detect existing checkpoint to optionally skip Stage 1 training
import os, glob
stage1_checkpoint_exists = any(
    os.path.exists(os.path.join(STAGE1_OUTPUT_DIR, fname))
    for fname in ['pytorch_model.bin', 'model.safetensors', 'config.json']
)
if not stage1_checkpoint_exists:
    for sub in glob.glob(os.path.join(STAGE1_OUTPUT_DIR, 'checkpoint-*')):
        if os.path.exists(os.path.join(sub, 'pytorch_model.bin')) or os.path.exists(os.path.join(sub, 'model.safetensors')):
            stage1_checkpoint_exists = True
            break

SKIP_STAGE1 = stage1_checkpoint_exists
if SKIP_STAGE1:
    print(f"Existing Stage 1 checkpoint detected in {STAGE1_OUTPUT_DIR}. Stage 1 training will be skipped.")
    print("(Delete or rename the directory to force retraining.)")
else:
    print("No existing Stage 1 checkpoint found. Stage 1 training will run.")

Existing Stage 1 checkpoint detected in ./contextual_bridge_runs_20250824_214420/stage1_teacher_to_small. Stage 1 training will be skipped.
(Delete or rename the directory to force retraining.)


In [5]:
# Stage 1: Initialize Framework + Dataloaders
if SKIP_STAGE1:
    print('Stage 1 skipped: loading assistant from existing checkpoint for Stage 2.')
else:
    setup_logging()

    stage1_framework = TriModelDistillationFramework(
        config=stage1_config,
        num_labels=num_labels,
        label2id=label2id,
        id2label=id2label,
    )

    def _count_params(m):
        return sum(p.numel() for p in m.parameters() if p.requires_grad), sum(p.numel() for p in m.parameters() if not p.requires_grad)

    tr_s, fr_s = _count_params(stage1_framework.student_model)
    tr_t, fr_t = _count_params(stage1_framework.teacher_model)
    tr_a, fr_a = _count_params(stage1_framework.assistant_model)
    print(f"Teacher trainable {tr_t:,} frozen {fr_t:,}")
    print(f"Assistant trainable {tr_a:,} frozen {fr_a:,}")
    print(f"Student trainable {tr_s:,} frozen {fr_s:,}")
    print('Need hidden states:', getattr(stage1_framework,'_need_hidden',True), 'Need attentions:', getattr(stage1_framework,'_need_attn',True))

    train_loader, val_loader, test_loader = create_data_loaders(
        dataset_root=DATASET_ROOT,
        image_processor=stage1_framework.image_processor,
        label2id=label2id,
        batch_size=per_device_train_batch_size,
        num_frames=stage1_config.num_frames,
        num_workers=2,
    )
    print('Data loaders ready')

    # Dry-run memory probe
    if torch.cuda.is_available():
        import gc
        batch = next(iter(train_loader))
        batch = {k: v.to('cuda') if hasattr(v,'to') else v for k,v in batch.items()}
        torch.cuda.empty_cache(); gc.collect(); torch.cuda.reset_peak_memory_stats()
        out = stage1_framework(pixel_values=batch['pixel_values'], labels=batch['labels'], output_hidden_states=False, output_attentions=False)
        loss_probe = out['student'].logits.mean(); loss_probe.backward();
        peak = torch.cuda.max_memory_allocated()/1024/1024
        print(f"Dry-run peak MB: {peak:.1f}")
        del out, batch, loss_probe; torch.cuda.empty_cache(); gc.collect()

Stage 1 skipped: loading assistant from existing checkpoint for Stage 2.


In [6]:
# Stage 1: Train Small (Assistant-to-be)
if SKIP_STAGE1:
    print('Skipping Stage 1 training; will use existing checkpoint as assistant in Stage 2.')
    stage1_trainer = None
    stage1_framework = None
else:
    # Enable gradient checkpointing for student to reduce activation memory
    if hasattr(stage1_framework.student_model, 'gradient_checkpointing_enable'):
        stage1_framework.student_model.gradient_checkpointing_enable()

    stage1_args = stage1_config.to_training_args(
        per_device_train_batch_size=per_device_train_batch_size,
        per_device_eval_batch_size=per_device_train_batch_size,
        num_train_epochs=stage1_epochs,
        warmup_steps=stage1_warmup,
        evaluation_strategy='epoch',
        logging_strategy='epoch',
        save_strategy='epoch',
        save_total_limit=20,
        output_dir=STAGE1_OUTPUT_DIR,
        overwrite_output_dir=True,
        gradient_accumulation_steps=gradient_accumulation_steps,
        fp16=torch.cuda.is_available(),
        remove_unused_columns=False,
        dataloader_pin_memory=True,
        dataloader_num_workers=0,
        metric_for_best_model='eval_accuracy',
        greater_is_better=True,
        load_best_model_at_end=True,
        report_to=['tensorboard'],
        logging_dir=f'{STAGE1_OUTPUT_DIR}/logs',
    )

    stage1_trainer = TriModelDistillationTrainer(
        framework=stage1_framework,
        distillation_config=stage1_config,
        args=stage1_args,
        train_dataset=train_loader.dataset,
        eval_dataset=val_loader.dataset,
        compute_metrics=lambda eval_pred, **kw: compute_video_classification_metrics(eval_pred, classification_type='multiclass'),
    )

    print('Starting Stage 1 training (logits-only KD, memory optimized)...')
    stage1_train_result = stage1_trainer.train()
    print('Stage 1 training complete')

    stage1_trainer.save_model(STAGE1_OUTPUT_DIR)
    stage1_val_metrics = stage1_trainer.evaluate(eval_dataset=val_loader.dataset)
    print('Stage 1 validation:', stage1_val_metrics)

Skipping Stage 1 training; will use existing checkpoint as assistant in Stage 2.


In [7]:
# Stage 2 Configuration (Assistant → Tiny)
# If Stage 1 skipped, we still expect STAGE1_OUTPUT_DIR to already contain a trained small model
if SKIP_STAGE1:
    assert os.path.exists(os.path.join(STAGE1_OUTPUT_DIR, 'config.json')), 'Expected existing Stage 1 checkpoint missing.'

# Stage 2 Configuration (Assistant → Tiny) — full logits + feature + attention KD
stage2_config = TriModelConfig(
    classification_type='multiclass',
    num_labels=num_labels,
    teacher_model_name='mitegvg/videomae-base-finetuned-kinetics-finetuned-sports-videos-in-the-wild',
    assistant_model_name=STAGE1_OUTPUT_DIR,  # trained small (assistant)
    student_model_name='mitegvg/videomae-tiny-finetuned-kinetics-finetuned-sports-videos-in-the-wild',
    # Distillation temperatures
    temperature=4.0,
    logits_temperature=4.0,
    # Logits KD weighting
    teacher_logits_weight=0.0,
    assistant_logits_weight=1.0,
    logits_distillation_weight=0.15,
    # Supervised classification loss
    classification_loss_weight=0.70,
    # Representation alignment
    align_hidden_states=True,
    align_attention_maps=True,
    hidden_layers_to_align=[],  # spaced to reduce redundancy & memory
    feature_distillation_weight=0.0,
    attention_distillation_weight=0.0,
    # (If your loss uses these per-source feature weights)
    teacher_feature_weight=0.0,
    assistant_feature_weight=0.0,
    # Student init
    use_pretrained_student=True,
    num_frames=16,
)

stage2_epochs = 20
stage2_total_steps = steps_per_epoch * stage2_epochs
stage2_warmup = min(500, int(0.1 * stage2_total_steps))
print('Stage 2 config ready (full KD: logits + features + attention)')

Stage 2 config ready (full KD: logits + features + attention)


In [8]:
# Stage 2: Initialize + Train Tiny with Frozen Assistant
# Clean up Stage 1 objects to free GPU memory before constructing Stage 2 framework
if torch.cuda.is_available():
    import gc
    if not SKIP_STAGE1:
        del stage1_trainer, stage1_framework
    torch.cuda.empty_cache(); gc.collect()
    print('CUDA memory cache cleared before Stage 2 initialization.')

stage2_framework = TriModelDistillationFramework(
    config=stage2_config,
    num_labels=num_labels,
    label2id=label2id,
    id2label=id2label,
)

train_loader2, val_loader2, test_loader2 = create_data_loaders(
    dataset_root=DATASET_ROOT,
    image_processor=stage2_framework.image_processor,
    label2id=label2id,
    batch_size=per_device_train_batch_size,
    num_frames=stage2_config.num_frames,
    num_workers=2,
)

stage2_args = stage2_config.to_training_args(
    per_device_train_batch_size=per_device_train_batch_size,
    per_device_eval_batch_size=per_device_train_batch_size,
    num_train_epochs=stage2_epochs,
    warmup_steps=stage2_warmup,
    eval_strategy='epoch',
    logging_strategy='epoch',
    save_strategy='epoch',
    save_total_limit=20,
    output_dir=STAGE2_OUTPUT_DIR,
    overwrite_output_dir=True,
    gradient_accumulation_steps=gradient_accumulation_steps,
    fp16=torch.cuda.is_available(),
    remove_unused_columns=False,
    dataloader_pin_memory=True,
    dataloader_num_workers=0,
    metric_for_best_model='eval_accuracy',
    greater_is_better=True,
    load_best_model_at_end=True,
    report_to=['tensorboard', 'wandb'],
    logging_dir=f'{STAGE2_OUTPUT_DIR}/logs',
)

stage2_trainer = TriModelDistillationTrainer(
    framework=stage2_framework,
    distillation_config=stage2_config,
    args=stage2_args,
    train_dataset=train_loader2.dataset,
    eval_dataset=val_loader2.dataset,
    compute_metrics=lambda eval_pred, **kw: compute_video_classification_metrics(eval_pred, classification_type='multiclass'),
)

print('Starting Stage 2 training...')
stage2_train_result = stage2_trainer.train()
print('Stage 2 training complete')

stage2_trainer.save_model(STAGE2_OUTPUT_DIR)
stage2_val_metrics = stage2_trainer.evaluate(eval_dataset=val_loader2.dataset)
print('Stage 2 validation:', stage2_val_metrics)

# For downstream evaluation cells
OUTPUT_DIR = STAGE2_OUTPUT_DIR
framework = stage2_framework
val_loader = val_loader2
test_loader = test_loader2
print('OUTPUT_DIR set to final student:', OUTPUT_DIR)

CUDA memory cache cleared before Stage 2 initialization.
[TriModelConfig] Active components => logits:True features:False attn:False
[TriModelConfig] Using legacy eval_strategy fallback.
Starting Stage 2 training...


wandb: Currently logged in as: mite_gvg (mitegvg) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin


Epoch,Training Loss,Validation Loss


Stage 2 training complete
Stage 2 validation: {'eval_loss': 3.6225275925227574, 'eval_classification_loss': 1.8903797278535508, 'eval_feature_distillation_loss': 0.0, 'eval_attention_distillation_loss': 0.0, 'eval_logits_distillation_loss': 11.547651935759045, 'eval_accuracy': 0.4880952380952381, 'eval_precision_macro': 0.47579232615223327, 'eval_precision_micro': 0.4880952380952381, 'eval_precision_weighted': 0.4893833281175718, 'eval_recall_macro': 0.4707976831771982, 'eval_recall_micro': 0.4880952380952381, 'eval_recall_weighted': 0.4880952380952381, 'eval_f1_macro': 0.4611007316882032, 'eval_f1_micro': 0.4880952380952381, 'eval_f1_weighted': 0.476842110187112, 'eval_roc_auc_ovr': 0.9208079792034629, 'eval_roc_auc_ovo': 0.9200895195307596, 'eval_cohen_kappa': 0.4683638204800622, 'eval_balanced_accuracy': 0.4707976831771982, 'eval_top1_accuracy': 0.4880952380952381, 'eval_precision': 0.47579232615223327, 'eval_recall': 0.4707976831771982, 'eval_f1_score': 0.4611007316882032, 'eval_cl

### Stage 2 Rationale
The trained Small model now serves as a **domain-adapted bridge**. In Stage 2 we:
- Freeze the Small checkpoint (loaded via its output directory)
- Keep a light stabilizing signal from the original Base teacher (lower weight)
- Emphasize logits distillation from the Assistant (higher weight)

Tuning tips:
- If Tiny underfits early: increase `assistant_logits_weight` or `logits_temperature`
- If overfitting: reduce `classification_loss_weight` slightly or add light feature distillation
- If training unstable: raise `teacher_logits_weight` to 0.4–0.5 for extra regularization


In [9]:
# Replace the problematic last cell with this corrected version:

import os
import torch
import time
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from transformers import VideoMAEForVideoClassification, VideoMAEImageProcessor
import cv2
from tqdm import tqdm

print("Starting evaluation on the full test set...")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load model directly (not using pipeline)
local_model_directory = OUTPUT_DIR
print(OUTPUT_DIR)
student_model = VideoMAEForVideoClassification.from_pretrained(local_model_directory)
processor = VideoMAEImageProcessor.from_pretrained(local_model_directory)
student_model.to(device)
student_model.eval()

def process_video_for_inference(video_path, processor, num_frames=16):
    """Process video exactly like the training pipeline"""
    try:
        cap = cv2.VideoCapture(video_path)
        if not cap.isOpened():
            return None
        
        frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        if frame_count < num_frames:
            cap.release()
            return None
            
        # Sample frames uniformly (same as training)
        frame_indices = np.linspace(0, frame_count - 1, num_frames, dtype=int)
        frames = []
        
        for idx in frame_indices:
            cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
            ret, frame = cap.read()
            if not ret:
                break
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            frames.append(frame)
        
        cap.release()
        
        if len(frames) != num_frames:
            return None
            
        # Process frames using the same processor
        inputs = processor(frames, return_tensors="pt")
        return inputs
        
    except Exception as e:
        print(f"Error processing video {video_path}: {e}")
        return None

# Load test data from CSV
def load_test_data_from_csv(csv_file_path, data_root_path):
    test_samples = []
    if not os.path.exists(csv_file_path):
        print(f"ERROR: Test CSV file not found at {csv_file_path}")
        return test_samples

    with open(csv_file_path, "r") as f:
        for line_num, line in enumerate(f.readlines(), 1):
            parts = line.strip().split()
            if len(parts) >= 2:
                relative_video_path = parts[0]
                true_label_str = parts[1]
                full_video_path = os.path.normpath(os.path.join(data_root_path, relative_video_path))
                test_samples.append((full_video_path, true_label_str))
            elif line.strip():
                print(f"Warning: Malformed line {line_num} in {csv_file_path}: '{line.strip()}'")
                
    print(f"Loaded {len(test_samples)} samples from {csv_file_path}")
    return test_samples

# Load test data
dataset_root_path = "processed_dataset"
test_csv_path = os.path.join(dataset_root_path, "test.csv")
test_data = load_test_data_from_csv(test_csv_path, dataset_root_path)

if test_data:
    total_videos_processed = 0
    videos_skipped = 0
    true_labels = []
    predicted_labels = []
    top1_correct_predictions = 0
    top5_correct_predictions = 0
    inference_times = []

    print(f"\nStarting inference on {len(test_data)} test videos...")
    
    with torch.no_grad():
        for i, (video_path, true_label) in enumerate(tqdm(test_data, desc="Processing videos")):
            if not os.path.exists(video_path):
                videos_skipped += 1
                continue

            # Process video using the same pipeline as training
            inputs = process_video_for_inference(video_path, processor)
            if inputs is None:
                videos_skipped += 1
                continue

            try:
                start_time = time.time()
                
                # Move inputs to device
                inputs = {k: v.to(device) for k, v in inputs.items()}
                
                # Get model predictions
                outputs = student_model(**inputs)
                logits = outputs.logits[0]  # Remove batch dimension
                probs = torch.softmax(logits, dim=0)
                
                # Get top-5 predictions
                top5_probs, top5_indices = torch.topk(probs, 5)
                
                end_time = time.time()
                inference_times.append(end_time - start_time)
                total_videos_processed += 1

                # Convert indices to labels using id2label
                predicted_labels_top5 = [id2label[idx.item()] for idx in top5_indices]
                predicted_label_top1 = predicted_labels_top5[0]

                # Store for metrics calculation
                predicted_labels.append(predicted_label_top1)
                true_labels.append(true_label)

                # Calculate top-k accuracy
                if predicted_label_top1 == true_label:
                    top1_correct_predictions += 1
                if true_label in predicted_labels_top5:
                    top5_correct_predictions += 1

            except Exception as e:
                print(f"Error during inference for {video_path}: {e}")
                videos_skipped += 1

    # Print results
    if total_videos_processed > 0:
        top1_accuracy = (top1_correct_predictions / total_videos_processed) * 100
        top5_accuracy = (top5_correct_predictions / total_videos_processed) * 100
        avg_inference_time = sum(inference_times) / len(inference_times)
        fps = 1.0 / avg_inference_time if avg_inference_time > 0 else float('inf')

        print("\n--- Evaluation Complete ---")
        print(f"Total videos in test set: {len(test_data)}")
        print(f"Videos successfully processed: {total_videos_processed}")
        print(f"Videos skipped (missing/corrupt): {videos_skipped}")
        print(f"Top-1 Correct Predictions: {top1_correct_predictions}")
        print(f"Top-5 Correct Predictions: {top5_correct_predictions}")
        print(f"Top-1 Accuracy: {top1_accuracy:.2f}%")
        print(f"Top-5 Accuracy: {top5_accuracy:.2f}%")
        print(f"Average inference time per video: {avg_inference_time:.3f} seconds ({fps:.2f} videos/sec)")

        # Classification report
        if len(predicted_labels) == len(true_labels) and len(true_labels) > 0:
            print("\nDetailed Classification Report:")
            print(classification_report(true_labels, predicted_labels, labels=list(label2id.keys()), zero_division=0))
    else:
        print("No videos were processed successfully.")
else:
    print("No test data loaded.")

Starting evaluation on the full test set...
./contextual_bridge_runs_20250824_214420/stage2_small_to_tiny
Loaded 422 samples from processed_dataset\test.csv

Starting inference on 422 test videos...


Processing videos: 100%|████████████████████████████| 422/422 [01:28<00:00,  4.78it/s]



--- Evaluation Complete ---
Total videos in test set: 422
Videos successfully processed: 422
Videos skipped (missing/corrupt): 0
Top-1 Correct Predictions: 205
Top-5 Correct Predictions: 344
Top-1 Accuracy: 48.58%
Top-5 Accuracy: 81.52%
Average inference time per video: 0.012 seconds (86.89 videos/sec)

Detailed Classification Report:
              precision    recall  f1-score   support

     archery       0.53      0.69      0.60        13
    baseball       0.56      0.50      0.53        18
  basketball       0.75      0.25      0.38        12
         bmx       0.60      0.27      0.38        11
     bowling       0.47      0.80      0.59        10
      boxing       0.00      0.00      0.00        11
cheerleading       0.60      0.32      0.41        19
 discusthrow       0.20      0.50      0.29         4
      diving       0.70      0.64      0.67        11
    football       0.48      0.55      0.51        20
        golf       0.50      0.64      0.56        11
  gymnastics 