# Zero Shot Classification Results

This document presents the results of a zero-shot classification approach using cosine similarity for two classes: **Goal** and **Non-Goal**.

---

## Results 

### Building Prototypes
Prototypes for the classes were successfully built.

### Classifying Test Clips (Delta 0.0)
| File         | True Label   | Predicted Label |
|--------------|--------------|-----------------|
| g31.npy      | Goal         | Goal            |
| g32.npy      | Goal         | Goal            |
| g33.npy      | Goal         | Goal            |
| g34.npy      | Goal         | Goal            |
| g35.npy      | Goal         | Goal            |
| g36.npy      | Goal         | Goal            |
| g37.npy      | Goal         | Goal            |
| ng31.npy     | Non-Goal     | Non-Goal        |
| ng32.npy     | Non-Goal     | Non-Goal        |
| ng33.npy     | Non-Goal     | Non-Goal        |
| ng34.npy     | Non-Goal     | Goal            |
| ng35.npy     | Non-Goal     | Non-Goal        |
| ng36.npy     | Non-Goal     | Goal            |
| ng37.npy     | Non-Goal     | Non-Goal        |

---

### Testing Goal Clips
All goal clips were tested successfully.

### Testing Non-Goal Clips
All non-goal clips were tested successfully.

---

## Confusion Matrix
|               | Predicted Non-Goal | Predicted Goal |
|---------------|----------------|--------------------|
| **True Non-Goal** | 5              | 2                  |
| **True Goal** | 0          | 7                  |

---

## Performance Metrics
- **Precision**: 0.7778
- **Recall**: 1.0000
- **F₁ Score**: 0.8750

# Zero Shot Classification Results

## Step 1: Generating R(2+1)D Embeddings
Embeddings were successfully generated.

---

## Step 2: Building Prototypes from Training Data
- Loading goal embeddings...
- **Goal Prototype Shape**: (512,)
- **Non-Goal Prototype Shape**: (512,)
- Built prototypes from 30 goal and 30 non-goal samples.

---

## Step 3: Evaluating Classification Performance

### Testing with Threshold: 0.0

#### Testing Goal Clips
| File         | True Label | Predicted Label | Sim Goal | Sim Non-Goal | Difference | Correct |
|--------------|------------|-----------------|----------|--------------|------------|---------|
| g31.npy      | Goal       | Goal            | 0.9084   | 0.8992       | 0.0092     | True    |
| g32.npy      | Goal       | Goal            | 0.9081   | 0.8810       | 0.0271     | True    |
| g33.npy      | Goal       | Goal            | 0.8963   | 0.8623       | 0.0340     | True    |
| g34.npy      | Goal       | Goal            | 0.9176   | 0.8940       | 0.0235     | True    |
| g35.npy      | Goal       | Goal            | 0.9260   | 0.9181       | 0.0079     | True    |
| g36.npy      | Goal       | Goal            | 0.9090   | 0.8988       | 0.0102     | True    |
| g37.npy      | Goal       | Goal            | 0.8989   | 0.8799       | 0.0190     | True    |

#### Testing Non-Goal Clips
| File         | True Label   | Predicted Label | Sim Goal | Sim Non-Goal | Difference | Correct |
|--------------|--------------|-----------------|----------|--------------|------------|---------|
| ng31.npy     | Non-Goal     | Non-Goal        | 0.8693   | 0.9094       | -0.0401    | True    |
| ng32.npy     | Non-Goal     | Goal            | 0.8785   | 0.8498       | 0.0287     | False   |
| ng33.npy     | Non-Goal     | Non-Goal        | 0.9054   | 0.9225       | -0.0171    | True    |
| ng34.npy     | Non-Goal     | Non-Goal        | 0.8937   | 0.9218       | -0.0281    | True    |
| ng35.npy     | Non-Goal     | Goal            | 0.8742   | 0.8726       | 0.0015     | False   |
| ng36.npy     | Non-Goal     | Goal            | 0.7916   | 0.7884       | 0.0032     | False   |
| ng37.npy     | Non-Goal     | Non-Goal        | 0.8728   | 0.8749       | -0.0021    | True    |

---

## Evaluation Results
- **Overall Accuracy**: 0.7857 (11/14)
- **Goal Accuracy**: 1.0000 (7/7)
- **Non-Goal Accuracy**: 0.5714 (4/7)

---

## Step 4: Detailed Evaluation with Best Threshold (0.0)

### Confusion Matrix
|               | Predicted Non-Goal | Predicted Goal |
|---------------|--------------------|----------------|
| **True Non-Goal** | 4              | 3              |
| **True Goal**     | 0              | 7              |

### Performance Metrics
- **Precision**: 0.7000
- **Recall**: 1.0000
- **F₁ Score**: 0.8235

---

## Final Results
- **Best Threshold**: 0.0
- **Best Accuracy**: 0.7857
- **Precision**: 0.7000
- **Recall**: 1.0000
- **F₁ Score**: 0.8235

# Multiclass Few-shot Learning Results Comparison
## *Equal samples for each event*

### Test Dataset
- **Total samples**: 27 (20 train(prototype) and 7 test)
- **Classes**: 4 (balanced dataset with 7 samples per class)
- **Learning paradigm**: Few-shot learning with equal representation

## Overall Performance Summary

| Model | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score |
|-------|------------------|-------------------|-----------------|-------------------|
| **ResNet-50** | 0.6071 | 0.6051 | 0.6071 | 0.5797 |
| **R(2+1)D** | 0.6071 | 0.6087 | 0.6071 | 0.5997 |

## Per-Class Performance

### ResNet-50
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score |
|-------|----------|---------------|-----------|---------|----------|
| **Goal** | 0.7143 | 5/7 | 0.62 | 0.71 | 0.67 |
| **Direct_free-kick** | 0.4286 | 3/7 | 0.43 | 0.43 | 0.43 |
| **Yellow_card** | 1.0000 | 7/7 | 0.70 | 1.00 | 0.82 |
| **no_event** | 0.2857 | 2/7 | 0.67 | 0.29 | 0.40 |

### R(2+1)D
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score |
|-------|----------|---------------|-----------|---------|----------|
| **Goal** | 0.5714 | 4/7 | 0.80 | 0.57 | 0.67 |
| **Direct_free-kick** | 0.5714 | 4/7 | 0.57 | 0.57 | 0.57 |
| **Yellow_card** | 1.0000 | 7/7 | 0.78 | 1.00 | 0.88 |
| **no_event** | 0.2857 | 2/7 | 0.29 | 0.29 | 0.29 |

## Confusion Matrices

### ResNet-50 Confusion Matrix
```
Predicted →  Goal  Free-kick  Yellow  no_event
True ↓
Goal           5      2         0       0
Direct_free    3      3         0       1
Yellow_card    0      0         7       0
no_event       0      2         3       2
```

### R(2+1)D Confusion Matrix
```
Predicted →  Goal  Free-kick  Yellow  no_event
True ↓
Goal           4      1         0       2
Direct_free    0      4         0       3
Yellow_card    0      0         7       0
no_event       1      2         2       2
```

## Most Common Classification Errors

### ResNet-50
- **Goal → Direct_free-kick**: 2 
- **Direct_free-kick → Goal**: 3 
- **no_event → Yellow_card**: 3 
- **no_event → Direct_free-kick**: 2 

### R(2+1)D
- **Direct_free-kick → no_event**: 3 
- **Goal → no_event**: 2 
- **no_event → Direct_free-kick**: 2 
- **no_event → Yellow_card**: 2 

## Key Findings

### Model Strengths
- **Both models** achieved perfect accuracy (100%) on **Yellow_card** classification
- **ResNet-50** performed better on **Goal** detection (71.4% vs 57.1%)
- **R(2+1)D** showed higher precision for **Goal** class (0.80 vs 0.62)
- **R(2+1)D** performed better on **Direct_free-kick** classification (57.1% vs 42.9%)

### Common Challenges
- **Both models** struggled most with **no_event** classification (28.6% accuracy)
- **Goal** and **Direct_free-kick** events showed mutual confusion in both models
- **no_event** samples were frequently misclassified as other event types


# Unequal Samples Dataset Results

## Imbalanced dataset, 6 classes total

### Overall Performance Summary

| Model      | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score |
|------------|------------------|--------------------|-----------------|--------------------|
| ResNet-50  | 0.4872           | 0.4593            | 0.4872         | 0.4560            |
| R(2+1)D    | 0.5128           | 0.4998            | 0.5128         | 0.4859            |

---

### Per-Class Performance (Unequal Dataset)

#### ResNet-50

| Class             | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------------------|----------|---------------|-----------|--------|----------|---------|
| Goal              | 0.2500   | 2/8           | 0.40      | 0.25   | 0.318    | 8       |
| Red_card          | 0.0000   | 0/1           | 0.00      | 0.00   | 0.001    | 1       |
| Yellow_card       | 0.8182   | 9/11          | 0.60      | 0.82   | 0.691    | 11      |
| Direct_free-kick  | 0.6250   | 5/8           | 0.56      | 0.62   | 0.598    | 8       |
| Penalty           | 1.0000   | 1/1           | 0.33      | 1.00   | 0.501    | 1       |
| no_event          | 0.2000   | 2/10          | 0.33      | 0.20   | 0.251    | 10      |

---

#### R(2+1)D

| Class             | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------------------|----------|---------------|-----------|--------|----------|---------|
| Goal              | 0.5000   | 4/8           | 0.44      | 0.50   | 0.478    | 8       |
| Red_card          | 1.0000   | 1/1           | 0.50      | 1.00   | 0.671    | 1       |
| Yellow_card       | 0.7273   | 8/11          | 0.62      | 0.73   | 0.671    | 11      |
| Direct_free-kick  | 0.7500   | 6/8           | 0.67      | 0.75   | 0.718    | 8       |
| Penalty           | 0.0000   | 0/1           | 0.00      | 0.00   | 0.001    | 1       |
| no_event          | 0.1000   | 1/10          | 0.33      | 0.10   | 0.151    | 10      |

---

### Confusion Matrices (Unequal Dataset)

#### ResNet-50 Confusion Matrix

| Predicted → | Goal | Red | Yellow | Free | Penalty | no_event |
|-------------|------|-----|--------|------|---------|----------|
| **True ↓**  |      |     |        |      |         |          |
| Goal        | 2    | 0   | 0      | 2    | 2       | 2        |
| Red_card    | 0    | 0   | 1      | 0    | 0       | 0        |
| Yellow_card | 0    | 1   | 9      | 0    | 0       | 1        |
| Direct_free | 2    | 0   | 0      | 5    | 0       | 1        |
| Penalty     | 0    | 0   | 0      | 0    | 1       | 0        |
| no_event    | 1    | 0   | 5      | 2    | 0       | 2        |

---

#### R(2+1)D Confusion Matrix

| Predicted → | Goal | Red | Yellow | Free | Penalty | no_event |
|-------------|------|-----|--------|------|---------|----------|
| **True ↓**  |      |     |        |      |         |          |
| Goal        | 4    | 0   | 1      | 1    | 2       | 0        |
| Red_card    | 0    | 1   | 0      | 0    | 0       | 0        |
| Yellow_card | 1    | 1   | 8      | 0    | 0       | 1        |
| Direct_free | 1    | 0   | 0      | 6    | 0       | 1        |
| Penalty     | 1    | 0   | 0      | 0    | 0       | 0        |
| no_event    | 2    | 0   | 4      | 2    | 1       | 1        |

---

### Most Common Classification Errors (Unequal Dataset)

#### ResNet-50

- **no_event → Yellow_card**: 5 misclassifications
- **Goal → Direct_free-kick**: 2 misclassifications
- **Goal → Penalty**: 2 misclassifications
- **Goal → no_event**: 2 misclassifications
- **Direct_free-kick → Goal**: 2 misclassifications

---

#### R(2+1)D

- **no_event → Yellow_card**: 4 misclassifications
- **Goal → Penalty**: 2 misclassifications
- **no_event → Goal**: 2 misclassifications
- **no_event → Direct_free-kick**: 2 misclassifications

---

### Dataset Characteristics (Unequal Dataset)

- **Total samples**: 39
- **Classes**: 6 (imbalanced distribution)

#### Class distribution:

- Yellow_card: 11 samples
- Goal: 8 samples
- Direct_free-kick: 8 samples
- no_event: 10 samples
- Red_card: 1 sample
- Penalty: 1 sample

---

### Key Findings (Unequal Dataset)

#### Model Performance Comparison

- **R(2+1)D** outperformed **ResNet-50** overall (51.3% vs 48.7% accuracy)
- **R(2+1)D** showed better performance on most individual classes
- Both models struggled with rare classes (**Red_card**, **Penalty**)

# Multiclass Few-shot Learning Results Comparison

## Equal Samples Dataset Results

*7 samples per class, 4 classes total*

## Overall Performance Summary

| Model | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score |
|-------|------------------|-------------------|-----------------|-------------------|
| **ResNet-50** | 0.6071 | - | - | - |
| **R(2+1)D** | 0.6071 | 0.6087 | 0.6071 | 0.5997 |

## Per-Class Performance

### ResNet-50
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score |
|-------|----------|---------------|-----------|---------|----------|
| **Goal** | 0.7143 | 5/7 | 0.62 | 0.71 | 0.67 |
| **Direct_free-kick** | 0.4286 | 3/7 | 0.43 | 0.43 | 0.43 |
| **Yellow_card** | 1.0000 | 7/7 | 0.70 | 1.00 | 0.82 |
| **no_event** | 0.2857 | 2/7 | 0.67 | 0.29 | 0.40 |

### R(2+1)D
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score |
|-------|----------|---------------|-----------|---------|----------|
| **Goal** | 0.5714 | 4/7 | 0.80 | 0.57 | 0.67 |
| **Direct_free-kick** | 0.5714 | 4/7 | 0.57 | 0.57 | 0.57 |
| **Yellow_card** | 1.0000 | 7/7 | 0.78 | 1.00 | 0.88 |
| **no_event** | 0.2857 | 2/7 | 0.29 | 0.29 | 0.29 |

## Confusion Matrices

### ResNet-50 Confusion Matrix
```
Predicted →  Goal  Free-kick  Yellow  no_event
True ↓
Goal           5      2         0       0
Direct_free    3      3         0       1
Yellow_card    0      0         7       0
no_event       0      2         3       2
```

### R(2+1)D Confusion Matrix
```
Predicted →  Goal  Free-kick  Yellow  no_event
True ↓
Goal           4      1         0       2
Direct_free    0      4         0       3
Yellow_card    0      0         7       0
no_event       1      2         2       2
```

## Most Common Classification Errors

### ResNet-50
- **Goal → Direct_free-kick**: 2 misclassifications
- **Direct_free-kick → Goal**: 3 misclassifications
- **no_event → Yellow_card**: 3 misclassifications
- **no_event → Direct_free-kick**: 2 misclassifications

### R(2+1)D
- **Direct_free-kick → no_event**: 3 misclassifications
- **Goal → no_event**: 2 misclassifications
- **no_event → Direct_free-kick**: 2 misclassifications
- **no_event → Yellow_card**: 2 misclassifications

## Key Findings

### Model Strengths
- **Both models** achieved perfect accuracy (100%) on **Yellow_card** classification
- **ResNet-50** performed better on **Goal** detection (71.4% vs 57.1%)
- **R(2+1)D** showed higher precision for **Goal** class (0.80 vs 0.62)
- **R(2+1)D** performed better on **Direct_free-kick** classification (57.1% vs 42.9%)

### Common Challenges
- **Both models** struggled most with **no_event** classification (28.6% accuracy)
- **Goal** and **Direct_free-kick** events showed mutual confusion in both models
- **no_event** samples were frequently misclassified as other event types

### Test Dataset
- **Total samples**: 28
- **Classes**: 4 (balanced dataset with 7 samples per class)
- **Learning paradigm**: Few-shot learning with equal representation

---

## Unequal Samples Dataset Results

*Imbalanced dataset, 6 classes total*

### Overall Performance Summary

| Model | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score |
|-------|------------------|-------------------|-----------------|-------------------|
| **ResNet-50** | 0.4872 | 0.4593 | 0.4872 | 0.4560 |
| **R(2+1)D** | 0.5128 | 0.4998 | 0.5128 | 0.4859 |

### Per-Class Performance (Unequal Dataset)

#### ResNet-50
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.2500 | 2/8 | 0.40 | 0.25 | 0.31 | 8 |
| **Red_card** | 0.0000 | 0/1 | 0.00 | 0.00 | 0.00 | 1 |
| **Yellow_card** | 0.8182 | 9/11 | 0.60 | 0.82 | 0.69 | 11 |
| **Direct_free-kick** | 0.6250 | 5/8 | 0.56 | 0.62 | 0.59 | 8 |
| **Penalty** | 1.0000 | 1/1 | 0.33 | 1.00 | 0.50 | 1 |
| **no_event** | 0.2000 | 2/10 | 0.33 | 0.20 | 0.25 | 10 |

#### R(2+1)D
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.5000 | 4/8 | 0.44 | 0.50 | 0.47 | 8 |
| **Red_card** | 1.0000 | 1/1 | 0.50 | 1.00 | 0.67 | 1 |
| **Yellow_card** | 0.7273 | 8/11 | 0.62 | 0.73 | 0.67 | 11 |
| **Direct_free-kick** | 0.7500 | 6/8 | 0.67 | 0.75 | 0.71 | 8 |
| **Penalty** | 0.0000 | 0/1 | 0.00 | 0.00 | 0.00 | 1 |
| **no_event** | 0.1000 | 1/10 | 0.33 | 0.10 | 0.15 | 10 |

### Confusion Matrices (Unequal Dataset)

#### ResNet-50 Confusion Matrix
```
Predicted →    Goal  Red  Yellow  Free  Penalty  no_event
True ↓
Goal             2    0     0      2      2        2
Red_card         0    0     1      0      0        0
Yellow_card      0    1     9      0      0        1
Direct_free      2    0     0      5      0        1
Penalty          0    0     0      0      1        0
no_event         1    0     5      2      0        2
```

#### R(2+1)D Confusion Matrix
```
Predicted →    Goal  Red  Yellow  Free  Penalty  no_event
True ↓
Goal             4    0     1      1      2        0
Red_card         0    1     0      0      0        0
Yellow_card      1    1     8      0      0        1
Direct_free      1    0     0      6      0        1
Penalty          1    0     0      0      0        0
no_event         2    0     4      2      1        1
```

### Most Common Classification Errors (Unequal Dataset)

#### ResNet-50
- **no_event → Yellow_card**: 5 misclassifications
- **Goal → Direct_free-kick**: 2 misclassifications
- **Goal → Penalty**: 2 misclassifications
- **Goal → no_event**: 2 misclassifications
- **Direct_free-kick → Goal**: 2 misclassifications

#### R(2+1)D
- **no_event → Yellow_card**: 4 misclassifications
- **Goal → Penalty**: 2 misclassifications
- **no_event → Goal**: 2 misclassifications
- **no_event → Direct_free-kick**: 2 misclassifications

### Dataset Characteristics (Unequal Dataset)
- **Total samples**: 39
- **Classes**: 6 (imbalanced distribution)
- **Class distribution**: 
  - Yellow_card: 11 samples
  - Goal: 8 samples  
  - Direct_free-kick: 8 samples
  - no_event: 10 samples
  - Red_card: 1 sample
  - Penalty: 1 sample

### Key Findings (Unequal Dataset)

#### Model Performance Comparison
- **R(2+1)D** outperformed ResNet-50 overall (51.3% vs 48.7% accuracy)
- **R(2+1)D** showed better performance on most individual classes
- Both models struggled with rare classes (**Red_card**, **Penalty**) and **no_event**

#### Class-Specific Insights
- **Yellow_card** remained the best-performing class for both models
- **Direct_free-kick** classification improved in R(2+1)D (75% vs 62.5%)
- **Goal** detection was better in R(2+1)D (50% vs 25%)
- **no_event** classification was poor in both models, especially R(2+1)D (10% vs 20%)

#### Impact of Class Imbalance
- Rare classes (**Red_card**, **Penalty**) with only 1 sample each showed inconsistent results
- **no_event** misclassifications primarily went to **Yellow_card** in both models
- Class imbalance significantly reduced overall performance compared to balanced dataset

## Overall Comparison: Equal vs Unequal Datasets

| Dataset Type | ResNet-50 Accuracy | R(2+1)D Accuracy | Performance Drop |
|--------------|-------------------|------------------|------------------|
| **Equal Samples** | 60.71% | 60.71% | - |
| **Unequal Samples** | 48.72% | 51.28% | ~10-12% |

The unequal dataset posed significantly greater challenges, with both models showing substantial performance degradation due to class imbalance and the introduction of rare event classes.

---

## Sliding Window Augmentation Dataset Results

*±2 seconds sliding window for Red_card and Penalty classes*

### Overall Performance Summary

| Model | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score |
|-------|------------------|-------------------|-----------------|-------------------|
| **ResNet-50** | 0.5370 | 0.5161 | 0.5370 | 0.5024 |
| **R(2+1)D** | 0.5926 | 0.5707 | 0.5926 | 0.5539 |

### Per-Class Performance (Sliding Window Dataset)

#### ResNet-50
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.5000 | 4/8 | 0.57 | 0.50 | 0.53 | 8 |
| **Red_card** | 0.0000 | 0/3 | 0.00 | 0.00 | 0.00 | 3 |
| **Yellow_card** | 0.9231 | 12/13 | 0.60 | 0.92 | 0.73 | 13 |
| **Direct_free-kick** | 0.5385 | 7/13 | 0.50 | 0.54 | 0.52 | 13 |
| **Penalty** | 0.6667 | 2/3 | 0.33 | 0.67 | 0.44 | 3 |
| **no_event** | 0.2857 | 4/14 | 0.57 | 0.29 | 0.38 | 14 |

#### R(2+1)D
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.7500 | 6/8 | 0.67 | 0.75 | 0.71 | 8 |
| **Red_card** | 0.6667 | 2/3 | 0.67 | 0.67 | 0.67 | 3 |
| **Yellow_card** | 0.9231 | 12/13 | 0.71 | 0.92 | 0.80 | 13 |
| **Direct_free-kick** | 0.5385 | 7/13 | 0.58 | 0.54 | 0.56 | 13 |
| **Penalty** | 1.0000 | 3/3 | 0.38 | 1.00 | 0.55 | 3 |
| **no_event** | 0.1429 | 2/14 | 0.40 | 0.14 | 0.21 | 14 |

### Confusion Matrices (Sliding Window Dataset)

#### ResNet-50 Confusion Matrix
```
Predicted →    Goal  Red  Yellow  Free  Penalty  no_event
True ↓
Goal             4    0     0      2      1        1
Red_card         0    0     3      0      0        0
Yellow_card      0    0    12      0      0        1
Direct_free      2    0     0      7      3        1
Penalty          0    0     0      1      2        0
no_event         1    0     5      4      0        4
```

#### R(2+1)D Confusion Matrix
```
Predicted →    Goal  Red  Yellow  Free  Penalty  no_event
True ↓
Goal             6    0     0      0      2        0
Red_card         0    2     1      0      0        0
Yellow_card      0    0    12      1      0        0
Direct_free      2    0     0      7      1        3
Penalty          0    0     0      0      3        0
no_event         1    1     4      4      2        2
```

### Most Common Classification Errors (Sliding Window Dataset)

#### ResNet-50
- **no_event → Yellow_card**: 5 misclassifications
- **no_event → Direct_free-kick**: 4 misclassifications
- **Red_card → Yellow_card**: 3 misclassifications
- **Direct_free-kick → Penalty**: 3 misclassifications
- **Goal → Direct_free-kick**: 2 misclassifications

#### R(2+1)D
- **no_event → Yellow_card**: 4 misclassifications
- **no_event → Direct_free-kick**: 4 misclassifications
- **Direct_free-kick → no_event**: 3 misclassifications
- **Goal → Penalty**: 2 misclassifications
- **no_event → Penalty**: 2 misclassifications

### Key Findings (Sliding Window Dataset)

#### Impact of Sliding Window Augmentation
- **Red_card** performance improved significantly in R(2+1)D (66.7% vs 0% in original unequal dataset)
- **Penalty** classification showed mixed results: ResNet-50 improved (66.7% vs 100%), R(2+1)D maintained perfect accuracy
- Overall dataset size increased from 39 to 54 samples (+38.5%)

#### Model Performance Comparison
- **R(2+1)D** clearly outperformed ResNet-50 (59.3% vs 53.7% accuracy)
- **R(2+1)D** showed superior performance across most classes except **no_event**
- Both models maintained strong **Yellow_card** detection (>92% accuracy)

#### Persistent Challenges
- **no_event** classification remained problematic for both models
- **Direct_free-kick** confusion with other event types persisted
- Class imbalance still affected overall performance despite augmentation

## Performance Comparison Across All Datasets

| Dataset Configuration | ResNet-50 Accuracy | R(2+1)D Accuracy | Best Performing Model |
|----------------------|-------------------|------------------|----------------------|
| **Equal Samples (4 classes)** | 60.71% | 60.71% | Tie |
| **Unequal Samples (6 classes)** | 48.72% | 51.28% | R(2+1)D (+2.56%) |
| **Sliding Window (6 classes)** | 53.70% | 59.26% | R(2+1)D (+5.56%) |

### Overall Insights
- **Sliding window augmentation** improved performance over the original unequal dataset
- **R(2+1)D** consistently outperformed ResNet-50 on imbalanced datasets
- **Class balance** remains the most critical factor affecting model performance
- **Temporal augmentation** (sliding window) shows promise for rare event classes but requires careful implementation