# ‚öΩ Soccer Event Detection - Temporal Alignment

**Approach:** Ground-truth labels from SoccerNet + Reaction Lag

**Model:** XLM-RoBERTa Base

**Expected Accuracy:** ~83% (realistic, not 100%)

---

## üìã Setup Checklist

1. ‚úÖ Enable GPU: Runtime ‚Üí Change runtime type ‚Üí **GPU (T4)**
2. ‚úÖ Mount Google Drive to access datasets
3. ‚úÖ Upload SoccerNet-Echoes transcripts to Drive
4. ‚úÖ Download SoccerNet labels (will do in notebook)
5. ‚úÖ Run all cells

---

## 1Ô∏è‚É£ GPU Check & Install Dependencies

In [1]:
# Check GPU
import torch
print("üîç Checking GPU...")
if torch.cuda.is_available():
    print(f"‚úÖ GPU: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ö†Ô∏è No GPU! Training will be VERY slow. Please enable GPU in Runtime settings.")

üîç Checking GPU...
‚úÖ GPU: Tesla T4
   Memory: 15.83 GB


In [2]:
# Install packages
print("üì¶ Installing dependencies...\n")
!pip install -q transformers datasets scikit-learn accelerate SoccerNet
print("\n‚úÖ Installation complete!")

üì¶ Installing dependencies...

[0m
‚úÖ Installation complete!


## 2Ô∏è‚É£ Mount Google Drive

In [3]:
from google.colab import drive
import os

# Mount Drive
drive.mount('/content/drive')

# Set working directory
WORK_DIR = '/content/drive/MyDrive/NLP_Soccer_Temporal'
os.makedirs(WORK_DIR, exist_ok=True)
os.chdir(WORK_DIR)

print(f"‚úÖ Working directory: {os.getcwd()}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
‚úÖ Working directory: /content/drive/MyDrive/NLP_Soccer_Temporal


## 3Ô∏è‚É£ Download SoccerNet Labels

**IMPORTANT:** This downloads ~2GB of label files. It takes ~10-15 minutes.

Skip this cell if you already have labels.

In [4]:
# Download SoccerNet labels
from SoccerNet.Downloader import SoccerNetDownloader

LABELS_DIR = './dataset/soccernet'

if os.path.exists(LABELS_DIR):
    print("‚úÖ SoccerNet labels already exist. Skipping download.")
else:
    print("üì• Downloading SoccerNet labels...")
    print("This will take 10-15 minutes. Please be patient.\n")

    downloader = SoccerNetDownloader(LocalDirectory=LABELS_DIR)
    downloader.downloadGames(
        files=['Labels-v2.json'],
        split=['train', 'valid', 'test']
    )

    print("\n‚úÖ Download complete!")

# Count labels
label_count = sum(1 for root, dirs, files in os.walk(LABELS_DIR) if 'Labels-v2.json' in files)
print(f"\nüìä Found {label_count} matches with labels")

‚úÖ SoccerNet labels already exist. Skipping download.

üìä Found 500 matches with labels


## 4Ô∏è‚É£ Check Dataset Structure

In [5]:
# Check if transcripts exist
TRANSCRIPT_PATHS = [
    './dataset/sn-echoes/Dataset/whisper_v1_en',
    './dataset/sn-echoes/Dataset/whisper_v2_en'
]

print("üîç Checking datasets...\n")

for path in TRANSCRIPT_PATHS:
    if os.path.exists(path):
        num_files = sum(len(files) for _, _, files in os.walk(path))
        print(f"‚úÖ {path}")
        print(f"   Files: {num_files}")
    else:
        print(f"‚ùå MISSING: {path}")
        print("   Please upload SoccerNet-Echoes to Google Drive!")

print(f"\n‚úÖ SoccerNet labels: {LABELS_DIR}")
print(f"   Matches: {label_count}")

üîç Checking datasets...

‚úÖ ./dataset/sn-echoes/Dataset/whisper_v1_en
   Files: 734
‚úÖ ./dataset/sn-echoes/Dataset/whisper_v2_en
   Files: 718

‚úÖ SoccerNet labels: ./dataset/soccernet
   Matches: 500


## 5Ô∏è‚É£ Imports & Configuration

In [6]:
# Imports
import json
import glob
import re
import random
from pathlib import Path
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass
from collections import defaultdict

import numpy as np
import pandas as pd
from tqdm.notebook import tqdm
import torch
from torch.utils.data import Dataset
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer,
    EarlyStoppingCallback
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, f1_score, accuracy_score, precision_score, recall_score
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Imports successful!")

‚úÖ Imports successful!


In [7]:
@dataclass
class Config:
    # ... (c√°c ƒë∆∞·ªùng d·∫´n gi·ªØ nguy√™n)
    dataset_root: str = "./dataset/sn-echoes/Dataset"
    whisper_folders: List[str] = None
    soccernet_labels_dir: str = "./dataset/soccernet"
    output_dir: str = "./models/soccer_event_temporal"

    # Temporal alignment
    reaction_lag_start: int = 1
    reaction_lag_end: int = 6
    context_window_size: int = 3

    # --- N√ÇNG C·∫§P ·ªû ƒê√ÇY ---
    # TƒÉng t·ª´ 3% l√™n 15%. Model s·∫Ω c√≥ nhi·ªÅu d·ªØ li·ªáu n·ªÅn ƒë·ªÉ so s√°nh h∆°n.
    no_event_keep_ratio: float = 0.15

    model_name: str = "xlm-roberta-base"

    # TƒÉng nh·∫π ƒë·ªô d√†i c√¢u l√™n 160 ƒë·ªÉ l·∫•y th√™m ng·ªØ c·∫£nh (v·∫´n an to√†n cho GPU T4)
    max_length: int = 160

    # Gi·ªØ nguy√™n Batch size an to√†n
    batch_size: int = 16
    learning_rate: float = 2e-5

    # TƒÉng Epoch l√™n 5 ƒë·ªÉ model h·ªçc k·ªπ h∆°n (H·ªçc ƒëi h·ªçc l·∫°i)
    num_epochs: int = 5

    warmup_steps: int = 1000 # TƒÉng warmup m·ªôt ch√∫t
    weight_decay: float = 0.01
    train_val_split: float = 0.8

    event_classes: List[str] = None

    def __post_init__(self):
        # ... (Gi·ªØ nguy√™n ƒëo·∫°n d∆∞·ªõi)
        if self.whisper_folders is None:
            self.whisper_folders = ["whisper_v1_en", "whisper_v2_en"]
        if self.event_classes is None:
            self.event_classes = [
                "No-Event", "Goal", "Yellow card", "Red card", "Substitution", "Penalty"
            ]
        self.label2id = {label: idx for idx, label in enumerate(self.event_classes)}
        self.id2label = {idx: label for idx, label in enumerate(self.event_classes)}

config = Config()
print("‚úÖ Config loaded (Pro Version)")

‚úÖ Config loaded (Pro Version)


## 6Ô∏è‚É£ Helper Classes

In [8]:
import difflib # Th∆∞ vi·ªán quan tr·ªçng ƒë·ªÉ so s√°nh chu·ªói

# SoccerNet Label Loader (Fixed with Fuzzy Matching)
class SoccerNetLabelLoader:
    def __init__(self, config: Config):
        self.config = config

    def parse_game_time(self, game_time: str) -> Tuple[int, int]:
        try:
            parts = game_time.split(' - ')
            half = int(parts[0])
            time_parts = parts[1].split(':')
            minutes = int(time_parts[0])
            seconds = int(time_parts[1])
            total_seconds = minutes * 60 + seconds
            return half, total_seconds
        except:
            return None, None

    def load_labels(self, labels_path: str) -> List[Dict]:
        if not os.path.exists(labels_path):
            return []

        try:
            with open(labels_path, 'r', encoding='utf-8') as f:
                data = json.load(f)

            events = []
            for annotation in data.get('annotations', []):
                game_time = annotation.get('gameTime', '')
                label = annotation.get('label', '')
                half, seconds = self.parse_game_time(game_time)

                if half is not None and label in self.config.event_classes:
                    events.append({'half': half, 'time': seconds, 'label': label})

            return events
        except Exception as e:
            return []

    def find_labels_file(self, match_folder: str) -> Optional[str]:
        # T√°ch ƒë∆∞·ªùng d·∫´n th∆∞ m·ª•c Echoes
        parts = Path(match_folder).parts
        if len(parts) < 3: return None

        # C·∫•u tr√∫c: .../League/Season/MatchName
        target_league = parts[-3]
        target_season = parts[-2]
        target_match = parts[-1]

        # ƒê∆∞·ªùng d·∫´n g·ªëc t·ªõi th∆∞ m·ª•c ch·ª©a Label c·ªßa m√πa gi·∫£i ƒë√≥
        base_path = os.path.join(self.config.soccernet_labels_dir, target_league, target_season)

        # N·∫øu th∆∞ m·ª•c League/Season kh√¥ng t·ªìn t·∫°i trong folder Labels, tr·∫£ v·ªÅ None
        if not os.path.exists(base_path):
            return None

        # L·∫•y danh s√°ch t·∫•t c·∫£ c√°c tr·∫≠n trong folder Label th·ª±c t·∫ø
        available_matches = [d for d in os.listdir(base_path) if os.path.isdir(os.path.join(base_path, d))]

        # --- QUAN TR·ªåNG: Fuzzy Match ---
        # T√¨m t√™n folder gi·ªëng nh·∫•t (v√≠ d·ª•: "Chelsea_Burnley" kh·ªõp v·ªõi "2015-02-21 - 18-00 Chelsea 1 - 1 Burnley")
        matches = difflib.get_close_matches(target_match, available_matches, n=1, cutoff=0.3)

        if matches:
            best_match_folder = matches[0]
            labels_path = os.path.join(base_path, best_match_folder, 'Labels-v2.json')
            if os.path.exists(labels_path):
                return labels_path

        return None

print("‚úÖ SoccerNetLabelLoader (Fixed with Fuzzy Matching) defined")

‚úÖ SoccerNetLabelLoader (Fixed with Fuzzy Matching) defined


In [9]:
# Transcript Loader
class TranscriptLoader:
    def __init__(self, config: Config):
        self.config = config

    def load_transcript_file(self, file_path: str) -> List[Dict]:
        with open(file_path, 'r', encoding='utf-8') as f:
            data = json.load(f)

        segments = []
        for seg_id, seg_data in data['segments'].items():
            try:
                segment_id = int(seg_id)
            except ValueError:
                digits = re.findall(r'\d+', seg_id)
                segment_id = int(digits[0]) if digits else 0

            segments.append({
                'segment_id': segment_id,
                'start_time': float(seg_data[0]),
                'end_time': float(seg_data[1]),
                'text': seg_data[2].strip()
            })

        return sorted(segments, key=lambda x: x['start_time'])

    def create_context_windows(self, segments: List[Dict]) -> List[Dict]:
        if len(segments) < 3:
            return []

        windows = []
        for i in range(1, len(segments) - 1):
            prev_seg = segments[i - 1]
            curr_seg = segments[i]
            next_seg = segments[i + 1]

            merged_text = ' '.join([
                prev_seg['text'],
                curr_seg['text'],
                next_seg['text']
            ])

            windows.append({
                'start_time': prev_seg['start_time'],
                'end_time': next_seg['end_time'],
                'text': merged_text.strip(),
                'center_time': curr_seg['start_time']
            })

        return windows

    def get_all_match_folders(self) -> List[str]:
        match_folders = []
        for whisper_folder in self.config.whisper_folders:
            folder_path = os.path.join(self.config.dataset_root, whisper_folder)
            if not os.path.exists(folder_path):
                continue

            pattern = os.path.join(folder_path, "*", "*", "*")
            matches = glob.glob(pattern)
            for match_folder in matches:
                if os.path.isdir(match_folder):
                    json_files = glob.glob(os.path.join(match_folder, "*.json"))
                    if json_files:
                        match_folders.append(match_folder)
        return match_folders

    def load_all_transcripts(self) -> List[Dict]:
        all_matches = []
        match_folders = self.get_all_match_folders()
        print(f"Found {len(match_folders)} match folders")

        for match_folder in tqdm(match_folders, desc="Loading transcripts"):
            parts = Path(match_folder).parts
            match_info = {
                'league': parts[-3] if len(parts) >= 3 else 'unknown',
                'season': parts[-2] if len(parts) >= 3 else 'unknown',
                'match_name': parts[-1] if len(parts) >= 1 else 'unknown',
                'folder': match_folder
            }

            json_files = sorted(glob.glob(os.path.join(match_folder, "*.json")))
            for json_file in json_files:
                half = os.path.basename(json_file).split('_')[0]
                segments = self.load_transcript_file(json_file)
                all_matches.append({
                    'match_info': match_info,
                    'half': int(half),
                    'segments': segments,
                    'file_path': json_file
                })
        return all_matches

print("‚úÖ TranscriptLoader defined")

‚úÖ TranscriptLoader defined


In [10]:
# Temporal Aligner
class TemporalAligner:
    def __init__(self, config: Config):
        self.config = config

    def align_windows_with_events(self, windows: List[Dict], events: List[Dict]) -> List[Dict]:
        labeled_windows = []

        for window in windows:
            window_label = 'No-Event'
            window_center = window.get('center_time', (window['start_time'] + window['end_time']) / 2)

            for event in events:
                event_time = event['time']
                reaction_start = event_time + self.config.reaction_lag_start
                reaction_end = event_time + self.config.reaction_lag_end

                if reaction_start <= window_center <= reaction_end:
                    window_label = event['label']
                    break

            labeled_windows.append({
                'text': window['text'],
                'label': window_label,
                'start_time': window['start_time'],
                'end_time': window['end_time'],
                'center_time': window_center
            })

        return labeled_windows

print("‚úÖ TemporalAligner defined")

‚úÖ TemporalAligner defined


In [11]:
# Class Balancer
class ClassBalancer:
    def __init__(self, config: Config):
        self.config = config

    def balance_dataset(self, windows: List[Dict]) -> List[Dict]:
        event_windows = []
        no_event_windows = []

        for window in windows:
            if window['label'] == 'No-Event':
                no_event_windows.append(window)
            else:
                event_windows.append(window)

        num_to_keep = int(len(no_event_windows) * self.config.no_event_keep_ratio)
        kept_no_event = random.sample(no_event_windows, num_to_keep)

        balanced = event_windows + kept_no_event
        random.shuffle(balanced)

        print(f"\nüéØ Class Balancing:")
        print(f"  Original No-Event: {len(no_event_windows)}")
        print(f"  Kept No-Event: {len(kept_no_event)} ({self.config.no_event_keep_ratio*100:.0f}%)")
        print(f"  Event samples: {len(event_windows)}")
        print(f"  Total: {len(balanced)}")

        return balanced

print("‚úÖ ClassBalancer defined")

‚úÖ ClassBalancer defined


In [12]:
# Dataset
class SoccerEventDataset(Dataset):
    def __init__(self, data: List[Dict], tokenizer, config: Config):
        self.data = data
        self.tokenizer = tokenizer
        self.config = config

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        item = self.data[idx]
        encoding = self.tokenizer(
            item['text'],
            truncation=True,
            max_length=self.config.max_length,
            padding='max_length',
            return_tensors='pt'
        )
        label_id = self.config.label2id[item['label']]
        return {
            'input_ids': encoding['input_ids'].squeeze(),
            'attention_mask': encoding['attention_mask'].squeeze(),
            'labels': torch.tensor(label_id, dtype=torch.long)
        }

# Metrics
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {
        'accuracy': accuracy_score(labels, predictions),
        'f1_macro': f1_score(labels, predictions, average='macro'),
        'f1_weighted': f1_score(labels, predictions, average='weighted'),
        'precision': precision_score(labels, predictions, average='weighted', zero_division=0),
        'recall': recall_score(labels, predictions, average='weighted', zero_division=0)
    }

print("‚úÖ Dataset & Metrics defined")

‚úÖ Dataset & Metrics defined


## 7Ô∏è‚É£ Load & Process Data

In [13]:
# Load transcripts
print("[1/5] Loading transcripts...")
transcript_loader = TranscriptLoader(config)
all_matches = transcript_loader.load_all_transcripts()
print(f"‚úÖ Loaded {len(all_matches)} match halves")

[1/5] Loading transcripts...
Found 734 match folders


Loading transcripts:   0%|          | 0/734 [00:00<?, ?it/s]

‚úÖ Loaded 1452 match halves


In [14]:
# Temporal alignment
print("\n[2/5] Creating context windows & temporal alignment...")
label_loader = SoccerNetLabelLoader(config)
temporal_aligner = TemporalAligner(config)

all_labeled_windows = []
matches_with_labels = 0
matches_without_labels = 0

for match in tqdm(all_matches, desc="Processing"):
    labels_path = label_loader.find_labels_file(match['match_info']['folder'])

    if labels_path:
        all_events = label_loader.load_labels(labels_path)
        half_events = [e for e in all_events if e['half'] == match['half']]
        windows = transcript_loader.create_context_windows(match['segments'])
        labeled_windows = temporal_aligner.align_windows_with_events(windows, half_events)
        all_labeled_windows.extend(labeled_windows)
        matches_with_labels += 1
    else:
        matches_without_labels += 1

print(f"\nüìä Matches with labels: {matches_with_labels}")
print(f"üìä Matches without labels: {matches_without_labels}")
print(f"üìä Total windows (before balancing): {len(all_labeled_windows)}")

# Label distribution
label_counts = defaultdict(int)
for w in all_labeled_windows:
    label_counts[w['label']] += 1

print(f"\nüìä Label Distribution (Before Balancing):")
for label, count in sorted(label_counts.items(), key=lambda x: -x[1]):
    pct = count / len(all_labeled_windows) * 100
    print(f"  {label:20s}: {count:7d} ({pct:5.1f}%)")


[2/5] Creating context windows & temporal alignment...


Processing:   0%|          | 0/1452 [00:00<?, ?it/s]


üìä Matches with labels: 1420
üìä Matches without labels: 32
üìä Total windows (before balancing): 1073017

üìä Label Distribution (Before Balancing):
  No-Event            : 1058893 ( 98.7%)
  Substitution        :    5735 (  0.5%)
  Yellow card         :    4302 (  0.4%)
  Goal                :    3544 (  0.3%)
  Penalty             :     426 (  0.0%)
  Red card            :     117 (  0.0%)


In [15]:
# Balance dataset
print("\n[3/5] Balancing dataset...")
balancer = ClassBalancer(config)
all_labeled_windows = balancer.balance_dataset(all_labeled_windows)

# Distribution after balancing
label_counts = defaultdict(int)
for w in all_labeled_windows:
    label_counts[w['label']] += 1

print(f"\nüìä Label Distribution (After Balancing):")
for label, count in sorted(label_counts.items(), key=lambda x: -x[1]):
    pct = count / len(all_labeled_windows) * 100
    print(f"  {label:20s}: {count:7d} ({pct:5.1f}%)")


[3/5] Balancing dataset...

üéØ Class Balancing:
  Original No-Event: 1058893
  Kept No-Event: 158833 (15%)
  Event samples: 14124
  Total: 172957

üìä Label Distribution (After Balancing):
  No-Event            :  158833 ( 91.8%)
  Substitution        :    5735 (  3.3%)
  Yellow card         :    4302 (  2.5%)
  Goal                :    3544 (  2.0%)
  Penalty             :     426 (  0.2%)
  Red card            :     117 (  0.1%)


In [16]:
# Split data
print("\n[4/5] Splitting data...")
train_data, val_data = train_test_split(
    all_labeled_windows,
    train_size=config.train_val_split,
    random_state=42,
    stratify=[w['label'] for w in all_labeled_windows]
)
print(f"‚úÖ Train: {len(train_data)}, Val: {len(val_data)}")


[4/5] Splitting data...
‚úÖ Train: 138365, Val: 34592


In [17]:
# Load model
print("\n[5/5] Loading model...")
tokenizer = AutoTokenizer.from_pretrained(config.model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    config.model_name,
    num_labels=len(config.event_classes),
    id2label=config.id2label,
    label2id=config.label2id
)

train_dataset = SoccerEventDataset(train_data, tokenizer, config)
val_dataset = SoccerEventDataset(val_data, tokenizer, config)
print(f"‚úÖ Model & datasets ready")


[5/5] Loading model...


Loading weights:   0%|          | 0/197 [00:00<?, ?it/s]

XLMRobertaForSequenceClassification LOAD REPORT from: xlm-roberta-base
Key                         | Status     | 
----------------------------+------------+-
lm_head.dense.weight        | UNEXPECTED | 
roberta.pooler.dense.weight | UNEXPECTED | 
roberta.pooler.dense.bias   | UNEXPECTED | 
lm_head.layer_norm.weight   | UNEXPECTED | 
lm_head.bias                | UNEXPECTED | 
lm_head.dense.bias          | UNEXPECTED | 
lm_head.layer_norm.bias     | UNEXPECTED | 
classifier.out_proj.bias    | MISSING    | 
classifier.out_proj.weight  | MISSING    | 
classifier.dense.bias       | MISSING    | 
classifier.dense.weight     | MISSING    | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING	:those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.


‚úÖ Model & datasets ready


## 8Ô∏è‚É£ Train Model

In [21]:
import torch
from torch import nn
from transformers import Trainer, TrainingArguments, EarlyStoppingCallback
import numpy as np
from sklearn.utils.class_weight import compute_class_weight

print("‚öñÔ∏è ƒêang t√≠nh to√°n tr·ªçng s·ªë l·ªõp (Class Weights)...")

# 1. T√≠nh to√°n tr·ªçng s·ªë
# √âp ki·ªÉu int ƒë·ªÉ tr√°nh l·ªói tensor
train_labels = [int(sample['labels']) for sample in train_dataset]
all_classes = np.arange(len(config.event_classes))

class_weights = compute_class_weight(
    class_weight="balanced",
    classes=all_classes,
    y=train_labels
)

# L∆∞u tr·ªçng s·ªë v√†o bi·∫øn (t·∫°m th·ªùi ƒë·ªÉ ·ªü CPU c≈©ng ƒë∆∞·ª£c, t√≠ n·ªØa d√πng s·∫Ω chuy·ªÉn sau)
weights_tensor = torch.tensor(class_weights, dtype=torch.float)

print(f"‚úÖ ƒê√£ t√≠nh xong!")
print(f"   Tr·ªçng s·ªë: {weights_tensor.numpy()}")

# 2. ƒê·ªãnh nghƒ©a Custom Trainer (ƒê√É S·ª¨A L·ªñI ·ªû ƒê√ÇY)
class WeightedTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get("logits")

        # --- KH·∫ÆC PH·ª§C L·ªñI T·∫†I D√íNG N√ÄY ---
        # √âp weights_tensor ph·∫£i nh·∫£y l√™n c√πng device v·ªõi logits (GPU)
        # B·∫•t k·ªÉ logits ƒëang ·ªü ƒë√¢u, weight s·∫Ω ƒëi theo ƒë√≥.
        curr_weight = weights_tensor.to(logits.device)

        loss_fct = nn.CrossEntropyLoss(weight=curr_weight)
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))

        return (loss, outputs) if return_outputs else loss

# 3. Setup Arguments (Gi·ªØ nguy√™n)
training_args = TrainingArguments(
    output_dir=config.output_dir,
    num_train_epochs=config.num_epochs,
    per_device_train_batch_size=config.batch_size,
    per_device_eval_batch_size=config.batch_size,
    gradient_accumulation_steps=2,
    learning_rate=config.learning_rate,
    warmup_steps=config.warmup_steps,
    weight_decay=config.weight_decay,
    logging_dir=f"{config.output_dir}/logs",
    logging_steps=100,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1_weighted",
    greater_is_better=True,
    save_total_limit=2,
    report_to="none",
    fp16=torch.cuda.is_available(),
)

# 4. Kh·ªüi t·∫°o Trainer
trainer = WeightedTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)]
)

print("‚úÖ WeightedTrainer ƒë√£ s·ª≠a l·ªói Device! S·∫µn s√†ng train.")

‚öñÔ∏è ƒêang t√≠nh to√°n tr·ªçng s·ªë l·ªõp (Class Weights)...


`logging_dir` is deprecated and will be removed in v5.2. Please set `TENSORBOARD_LOGGING_DIR` instead.


‚úÖ ƒê√£ t√≠nh xong!
   Tr·ªçng s·ªë: [1.8148705e-01 8.1343327e+00 6.7017822e+00 2.4532802e+02 5.0263367e+00
 6.7627075e+01]
‚úÖ WeightedTrainer ƒë√£ s·ª≠a l·ªói Device! S·∫µn s√†ng train.


In [22]:
# Train!
print("üöÄ Starting training...\n")
print("="*70)
trainer.train()
print("\n‚úÖ Training complete!")

üöÄ Starting training...



Epoch,Training Loss,Validation Loss,Accuracy,F1 Macro,F1 Weighted,Precision,Recall
1,2.195335,1.052463,0.766796,0.34336,0.815823,0.893341,0.766796
2,1.798059,0.991453,0.846525,0.393702,0.867648,0.898046,0.846525
3,1.602073,0.980683,0.852336,0.407762,0.872369,0.902139,0.852336
4,1.372228,1.076497,0.864853,0.435187,0.881597,0.905891,0.864853
5,1.389905,1.097683,0.869478,0.458846,0.88475,0.906479,0.869478


Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

There were missing keys in the checkpoint model loaded: ['roberta.embeddings.LayerNorm.weight', 'roberta.embeddings.LayerNorm.bias', 'roberta.encoder.layer.0.attention.output.LayerNorm.weight', 'roberta.encoder.layer.0.attention.output.LayerNorm.bias', 'roberta.encoder.layer.0.output.LayerNorm.weight', 'roberta.encoder.layer.0.output.LayerNorm.bias', 'roberta.encoder.layer.1.attention.output.LayerNorm.weight', 'roberta.encoder.layer.1.attention.output.LayerNorm.bias', 'roberta.encoder.layer.1.output.LayerNorm.weight', 'roberta.encoder.layer.1.output.LayerNorm.bias', 'roberta.encoder.layer.2.attention.output.LayerNorm.weight', 'roberta.encoder.layer.2.attention.output.LayerNorm.bias', 'roberta.encoder.layer.2.output.LayerNorm.weight', 'roberta.encoder.layer.2.output.LayerNorm.bias', 'roberta.encoder.layer.3.attention.output.LayerNorm.weight', 'roberta.encoder.layer.3.attention.output.LayerNorm.bias', 'roberta.encoder.layer.3.output.LayerNorm.weight', 'roberta.encoder.layer.3.output.Laye


‚úÖ Training complete!


## 9Ô∏è‚É£ Evaluate & Save

In [23]:
# Evaluation
print("üìä Final Evaluation")
print("="*70)

eval_results = trainer.evaluate()
print("\nüìä Metrics:")
for key, value in eval_results.items():
    print(f"  {key}: {value:.4f}")

# Classification report
predictions = trainer.predict(val_dataset)
pred_labels = np.argmax(predictions.predictions, axis=1)
true_labels = predictions.label_ids

print("\nüìã Classification Report:")
print(classification_report(
    true_labels,
    pred_labels,
    target_names=config.event_classes,
    zero_division=0
))

üìä Final Evaluation



üìä Metrics:
  eval_loss: 1.0977
  eval_accuracy: 0.8695
  eval_f1_macro: 0.4588
  eval_f1_weighted: 0.8848
  eval_precision: 0.9065
  eval_recall: 0.8695
  eval_runtime: 87.5649
  eval_samples_per_second: 395.0440
  eval_steps_per_second: 24.6900
  epoch: 5.0000

üìã Classification Report:
              precision    recall  f1-score   support

    No-Event       0.96      0.90      0.93     31767
        Goal       0.34      0.61      0.43       709
 Yellow card       0.35      0.54      0.43       861
    Red card       0.32      0.52      0.40        23
Substitution       0.28      0.50      0.36      1147
     Penalty       0.19      0.24      0.21        85

    accuracy                           0.87     34592
   macro avg       0.41      0.55      0.46     34592
weighted avg       0.91      0.87      0.88     34592



In [24]:
# Save model
print("üíæ Saving model...")
trainer.save_model(config.output_dir)
tokenizer.save_pretrained(config.output_dir)

import pickle
with open(os.path.join(config.output_dir, 'config.pkl'), 'wb') as f:
    pickle.dump(config, f)

print(f"‚úÖ Model saved to: {config.output_dir}")
print("\nüéâ All done! Model is ready to use.")

üíæ Saving model...


Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

‚úÖ Model saved to: ./models/soccer_event_temporal

üéâ All done! Model is ready to use.


## üîü Download Model (Optional)

In [25]:
# Zip model for download
!zip -r soccer_event_temporal.zip {config.output_dir}
print("‚úÖ Model zipped!")
print("Download 'soccer_event_temporal.zip' from Files panel (left sidebar)")

updating: models/soccer_event_temporal/ (stored 0%)
updating: models/soccer_event_temporal/config.json (deflated 54%)
updating: models/soccer_event_temporal/model.safetensors (deflated 23%)
updating: models/soccer_event_temporal/training_args.bin (deflated 53%)
updating: models/soccer_event_temporal/tokenizer_config.json (deflated 47%)
updating: models/soccer_event_temporal/tokenizer.json (deflated 77%)
updating: models/soccer_event_temporal/config.pkl (deflated 27%)
  adding: models/soccer_event_temporal/checkpoint-17296/ (stored 0%)
  adding: models/soccer_event_temporal/checkpoint-17296/config.json (deflated 54%)
  adding: models/soccer_event_temporal/checkpoint-17296/model.safetensors (deflated 23%)
  adding: models/soccer_event_temporal/checkpoint-17296/training_args.bin (deflated 53%)
  adding: models/soccer_event_temporal/checkpoint-17296/optimizer.pt (deflated 66%)
  adding: models/soccer_event_temporal/checkpoint-17296/scheduler.pt (deflated 61%)
  adding: models/soccer_event_

---

## ‚úÖ Summary

**What we did:**
1. ‚úÖ Load SoccerNet-Echoes transcripts
2. ‚úÖ Download SoccerNet Action Spotting labels
3. ‚úÖ Create context windows (3 segments)
4. ‚úÖ Temporal alignment with reaction lag (T+1 to T+6)
5. ‚úÖ Balance dataset (downsample No-Event to 25%)
6. ‚úÖ Train XLM-RoBERTa
7. ‚úÖ Evaluate with realistic metrics (~83% accuracy)

**Expected Results:**
- Accuracy: ~80-85% (not 100%!)
- F1-Weighted: ~0.80-0.85
- Event classes F1: ~0.50-0.75

**Next Steps:**
- Test with new transcripts
- Fine-tune hyperparameters
- Deploy for inference

üéâ Training complete!
