# Few-Shot Classification Pipeline Overview

## 1. Clipping
### Process:
- Parse `Labels-v2.json` to extract timestamps for second-half events (e.g., goals, penalties, cards).
- Cut 6-second clips around goal events (±3 seconds) and save them to `Clips/goal/`.
- Generate an equal number of random 6-second non-goal clips, avoiding intervals around all key events (e.g., goals, penalties, cards).
  
### Output:
- Folders containing raw 6-second video clips:
  - `Clips/goal/`: Clips containing goals.
  - `Clips/No_event/`: Negative clips (non-goal).

---

## 2. Frame-Level Preprocessing
### Process:
- Read each clip frame-by-frame and convert frames to grayscale.
- Detect and mask grass using HSV thresholding (green range).
- Detect and mask audience using another HSV thresholding range.
- Detect the ball by combining:
  - HSV threshold for bright (white) regions.
  - High-intensity pixels in the grayscale frame.
- Combine masks so that only the ball remains unmasked, while grass and crowd are blacked out.

---

## 3. Frame Sampling & Embedding
### Input:
- 6-second clips (for each event).
- Prototype: 20 clips per class.
- Testing: 7 clips per class.

### Process:
- Uniformly sample `N` frames from each clip:
  - ResNet-50: 16 frames.
  - R(2+1)D: 32 frames.
- Pass sampled frames through a pretrained backbone:
  - ResNet-50 → 2048-dimensional feature embeddings.
  - R(2+1)D-18 → 512-dimensional feature embeddings.
- Mean-pool per-frame feature tensors into a single embedding per clip.
- Save embeddings as `.npy` files.

### Output:
- Per-clip embeddings:
  - ResNet-50: 2048-dimensional embeddings.
  - R(2+1)D: 512-dimensional embeddings.

---

# Sampling and Clipping Strategies Across Different Files

## **1. Multiclass Equal Sampling**
- **Classes**: Goal, No_event, Yellow card, Direct free-kick.
- **Clipping Strategy**:
  - Goal was the least frequent event, occurring 27 times across 23 matches.
  - To ensure balance, 27 samples were taken for each class.
  - Equal representation was maintained across all classes.
- **Splitting**:
  - 80% of the samples were used for prototype building.
  - 20% of the samples were used for evaluation.
  - The process was repeated with the split reversed (20% for prototype building and 80% for evaluation).

---

## **2. Multiclass Unequal Sampling**
- **Classes**: Goal, No_event, Yellow card, Direct free-kick, Penalty, Red card.
- **Clipping Strategy**:
  - Added Penalty and Red card classes to the dataset.
  - Samples were imbalanced due to the varying frequency of events:
    - Penalty and Red card were rare events (3 samples for Red card, 4 samples for Penalty across 23 matches).
    - Other classes had more samples, leading to an unequal distribution.
- **Splitting**:
  - 80% of the samples were used for prototype building.
  - 20% of the samples were used for evaluation.
  - The split was reversed (20% for prototype building and 80% for evaluation) for additional testing.

---

## **3. Multiclass Sliding Window Sampling**
- **Classes**: Red card, Penalty.
- **Clipping Strategy**:
  - Focused on rare events (Red card and Penalty) due to their limited occurrence in 23 matches (3 samples for Red card, 4 samples for Penalty).
  - Applied a sliding window approach to augment the dataset:
    - Generated additional clips by sliding ±2 seconds around the event timestamps.
    - This increased the number of samples for these rare classes.
- **Splitting**:
  - 80% of the augmented samples were used for prototype building.
  - 20% of the augmented samples were used for evaluation.
  - The split was reversed (20% for prototype building and 80% for evaluation) for further testing.

---

## **Summary**
- **Equal Sampling** ensured balanced representation across classes but was limited to four classes.
- **Unequal Sampling** introduced more classes but faced challenges due to class imbalance.
- **Sliding Window Sampling** augmented rare classes (Red card and Penalty) to improve representation and performance.

## 4. Prototype Construction
### Input:
- All training clip embeddings for each class.

### Process:
- Compute class prototypes:
  - Average all embeddings for each class.
  - Apply L2 normalization to the averaged embeddings.

### Output:
- Class prototypes representing each class as a mean feature vector.

---

## 5. Few-Shot Classification
### Input:
- Test clip embedding.
- Class prototypes.

### Process:
- L2-normalize the test embedding.
- Compute cosine similarity between the test embedding and each class prototype.
- Compare similarity differences to a threshold (Δ).

### Output:
- Predicted label for each test clip (e.g., "Goal" vs "Non-Goal").

---

## 6. Evaluation
### Few-Shot Approach:
- Prototype-based few-shot classification:
  - Each class is represented by a mean feature vector (prototype).
  - New clips are classified based on cosine similarity to these prototypes.

### Metrics:
- Evaluate classification performance using metrics such as:
  - Accuracy.
  - Precision.
  - Recall.

# Multiclass Few-shot Learning Results Comparison
## *Equal samples for each event*

### Test Dataset
- **Total samples**: 27 (20 train(prototype) and 7 test)
- **Classes**: 4 (balanced dataset with 7 samples per class)
- **Learning paradigm**: Few-shot learning with equal representation

## Overall Performance Summary

| Model | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score |
|-------|------------------|-------------------|-----------------|-------------------|
| **ResNet-50** | 0.6071 | 0.6051 | 0.6071 | 0.5797 |
| **R(2+1)D** | 0.6071 | 0.6087 | 0.6071 | 0.5997 |

## Per-Class Performance

### ResNet-50
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score |
|-------|----------|---------------|-----------|---------|----------|
| **Goal** | 0.7143 | 5/7 | 0.62 | 0.71 | 0.67 |
| **Direct_free-kick** | 0.4286 | 3/7 | 0.43 | 0.43 | 0.43 |
| **Yellow_card** | 1.0000 | 7/7 | 0.70 | 1.00 | 0.82 |
| **no_event** | 0.2857 | 2/7 | 0.67 | 0.29 | 0.40 |

### R(2+1)D
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score |
|-------|----------|---------------|-----------|---------|----------|
| **Goal** | 0.5714 | 4/7 | 0.80 | 0.57 | 0.67 |
| **Direct_free-kick** | 0.5714 | 4/7 | 0.57 | 0.57 | 0.57 |
| **Yellow_card** | 1.0000 | 7/7 | 0.78 | 1.00 | 0.88 |
| **no_event** | 0.2857 | 2/7 | 0.29 | 0.29 | 0.29 |

## Confusion Matrices

### ResNet-50 Confusion Matrix
```
Predicted →  Goal  Free-kick  Yellow  no_event
True ↓
Goal           5      2         0       0
Direct_free    3      3         0       1
Yellow_card    0      0         7       0
no_event       0      2         3       2
```

### R(2+1)D Confusion Matrix
```
Predicted →  Goal  Free-kick  Yellow  no_event
True ↓
Goal           4      1         0       2
Direct_free    0      4         0       3
Yellow_card    0      0         7       0
no_event       1      2         2       2
```

## Most Common Classification Errors

### ResNet-50
- **Goal → Direct_free-kick**: 2 
- **Direct_free-kick → Goal**: 3 
- **no_event → Yellow_card**: 3 
- **no_event → Direct_free-kick**: 2 

### R(2+1)D
- **Direct_free-kick → no_event**: 3 
- **Goal → no_event**: 2 
- **no_event → Direct_free-kick**: 2 
- **no_event → Yellow_card**: 2 

## Key Findings

### Model Strengths
- **Both models** achieved perfect accuracy (100%) on **Yellow_card** classification
- **ResNet-50** performed better on **Goal** detection (71.4% vs 57.1%)
- **R(2+1)D** showed higher precision for **Goal** class (0.80 vs 0.62)
- **R(2+1)D** performed better on **Direct_free-kick** classification (57.1% vs 42.9%)

### Common Challenges
- **Both models** struggled most with **no_event** classification (28.6% accuracy)
- **Goal** and **Direct_free-kick** events showed mutual confusion in both models
- **no_event** samples were frequently misclassified as other event types


# Unequal Samples Dataset Results

## Imbalanced dataset, 6 classes total

### Overall Performance Summary

| Model      | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score |
|------------|------------------|--------------------|-----------------|--------------------|
| ResNet-50  | 0.4872           | 0.4593            | 0.4872         | 0.4560            |
| R(2+1)D    | 0.5128           | 0.4998            | 0.5128         | 0.4859            |

---

### Per-Class Performance (Unequal Dataset)

#### ResNet-50

| Class             | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------------------|----------|---------------|-----------|--------|----------|---------|
| Goal              | 0.2500   | 2/8           | 0.40      | 0.25   | 0.318    | 8       |
| Red_card          | 0.0000   | 0/1           | 0.00      | 0.00   | 0.001    | 1       |
| Yellow_card       | 0.8182   | 9/11          | 0.60      | 0.82   | 0.691    | 11      |
| Direct_free-kick  | 0.6250   | 5/8           | 0.56      | 0.62   | 0.598    | 8       |
| Penalty           | 1.0000   | 1/1           | 0.33      | 1.00   | 0.501    | 1       |
| no_event          | 0.2000   | 2/10          | 0.33      | 0.20   | 0.251    | 10      |

---

#### R(2+1)D

| Class             | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------------------|----------|---------------|-----------|--------|----------|---------|
| Goal              | 0.5000   | 4/8           | 0.44      | 0.50   | 0.478    | 8       |
| Red_card          | 1.0000   | 1/1           | 0.50      | 1.00   | 0.671    | 1       |
| Yellow_card       | 0.7273   | 8/11          | 0.62      | 0.73   | 0.671    | 11      |
| Direct_free-kick  | 0.7500   | 6/8           | 0.67      | 0.75   | 0.718    | 8       |
| Penalty           | 0.0000   | 0/1           | 0.00      | 0.00   | 0.001    | 1       |
| no_event          | 0.1000   | 1/10          | 0.33      | 0.10   | 0.151    | 10      |

---

### Confusion Matrices (Unequal Dataset)

#### ResNet-50 Confusion Matrix

| Predicted → | Goal | Red | Yellow | Free | Penalty | no_event |
|-------------|------|-----|--------|------|---------|----------|
| **True ↓**  |      |     |        |      |         |          |
| Goal        | 2    | 0   | 0      | 2    | 2       | 2        |
| Red_card    | 0    | 0   | 1      | 0    | 0       | 0        |
| Yellow_card | 0    | 1   | 9      | 0    | 0       | 1        |
| Direct_free | 2    | 0   | 0      | 5    | 0       | 1        |
| Penalty     | 0    | 0   | 0      | 0    | 1       | 0        |
| no_event    | 1    | 0   | 5      | 2    | 0       | 2        |

---

#### R(2+1)D Confusion Matrix

| Predicted → | Goal | Red | Yellow | Free | Penalty | no_event |
|-------------|------|-----|--------|------|---------|----------|
| **True ↓**  |      |     |        |      |         |          |
| Goal        | 4    | 0   | 1      | 1    | 2       | 0        |
| Red_card    | 0    | 1   | 0      | 0    | 0       | 0        |
| Yellow_card | 1    | 1   | 8      | 0    | 0       | 1        |
| Direct_free | 1    | 0   | 0      | 6    | 0       | 1        |
| Penalty     | 1    | 0   | 0      | 0    | 0       | 0        |
| no_event    | 2    | 0   | 4      | 2    | 1       | 1        |

---

### Most Common Classification Errors (Unequal Dataset)

#### ResNet-50

- **no_event → Yellow_card**: 5 misclassifications
- **Goal → Direct_free-kick**: 2 misclassifications
- **Goal → Penalty**: 2 misclassifications
- **Goal → no_event**: 2 misclassifications
- **Direct_free-kick → Goal**: 2 misclassifications

---

#### R(2+1)D

- **no_event → Yellow_card**: 4 misclassifications
- **Goal → Penalty**: 2 misclassifications
- **no_event → Goal**: 2 misclassifications
- **no_event → Direct_free-kick**: 2 misclassifications

---

### Dataset Characteristics (Unequal Dataset)

- **Total samples**: 39
- **Classes**: 6 (imbalanced distribution)

#### Class distribution:

- Yellow_card: 11 samples
- Goal: 8 samples
- Direct_free-kick: 8 samples
- no_event: 10 samples
- Red_card: 1 sample
- Penalty: 1 sample

---

### Key Findings (Unequal Dataset)

#### Model Performance Comparison

- **R(2+1)D** outperformed **ResNet-50** overall (51.3% vs 48.7% accuracy)
- **R(2+1)D** showed better performance on most individual classes
- Both models struggled with rare classes (**Red_card**, **Penalty**)

# Multiclass Few-shot Learning Results Comparison

## Equal Samples Dataset Results

*7 samples per class, 4 classes total*

## Overall Performance Summary

| Model | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score |
|-------|------------------|-------------------|-----------------|-------------------|
| **ResNet-50** | 0.6071 | 0.6051 | 0.6071 | 0.5797 |
| **R(2+1)D** | 0.6071 | 0.6087 | 0.6071 | 0.5997 |

## Per-Class Performance

### ResNet-50
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score |
|-------|----------|---------------|-----------|---------|----------|
| **Goal** | 0.7143 | 5/7 | 0.62 | 0.71 | 0.67 |
| **Direct_free-kick** | 0.4286 | 3/7 | 0.43 | 0.43 | 0.43 |
| **Yellow_card** | 1.0000 | 7/7 | 0.70 | 1.00 | 0.82 |
| **no_event** | 0.2857 | 2/7 | 0.67 | 0.29 | 0.40 |

### R(2+1)D
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score |
|-------|----------|---------------|-----------|---------|----------|
| **Goal** | 0.5714 | 4/7 | 0.80 | 0.57 | 0.67 |
| **Direct_free-kick** | 0.5714 | 4/7 | 0.57 | 0.57 | 0.57 |
| **Yellow_card** | 1.0000 | 7/7 | 0.78 | 1.00 | 0.88 |
| **no_event** | 0.2857 | 2/7 | 0.29 | 0.29 | 0.29 |

## Confusion Matrices

### ResNet-50 Confusion Matrix
```
Predicted →  Goal  Free-kick  Yellow  no_event
True ↓
Goal           5      2         0       0
Direct_free    3      3         0       1
Yellow_card    0      0         7       0
no_event       0      2         3       2
```

### R(2+1)D Confusion Matrix
```
Predicted →  Goal  Free-kick  Yellow  no_event
True ↓
Goal           4      1         0       2
Direct_free    0      4         0       3
Yellow_card    0      0         7       0
no_event       1      2         2       2
```

## Most Common Classification Errors

### ResNet-50
- **Goal → Direct_free-kick**: 2 misclassifications
- **Direct_free-kick → Goal**: 3 misclassifications
- **no_event → Yellow_card**: 3 misclassifications
- **no_event → Direct_free-kick**: 2 misclassifications

### R(2+1)D
- **Direct_free-kick → no_event**: 3 misclassifications
- **Goal → no_event**: 2 misclassifications
- **no_event → Direct_free-kick**: 2 misclassifications
- **no_event → Yellow_card**: 2 misclassifications

## Key Findings

### Model Strengths
- **Both models** achieved perfect accuracy (100%) on **Yellow_card** classification
- **ResNet-50** performed better on **Goal** detection (71.4% vs 57.1%)
- **R(2+1)D** showed higher precision for **Goal** class (0.80 vs 0.62)
- **R(2+1)D** performed better on **Direct_free-kick** classification (57.1% vs 42.9%)

### Common Challenges
- **Both models** struggled most with **no_event** classification (28.6% accuracy)
- **Goal** and **Direct_free-kick** events showed mutual confusion in both models
- **no_event** samples were frequently misclassified as other event types

### Test Dataset
- **Total samples**: 27 (20 for prototype and 7 for evaluation)
- **Classes**: 4 (balanced dataset with 7 samples per class)
- **Learning paradigm**: Few-shot learning with equal representation

---

## Unequal Samples Dataset Results

*Imbalanced dataset, 6 classes total*

### Overall Performance Summary

| Model | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score |
|-------|------------------|-------------------|-----------------|-------------------|
| **ResNet-50** | 0.4872 | 0.4593 | 0.4872 | 0.4560 |
| **R(2+1)D** | 0.5128 | 0.4998 | 0.5128 | 0.4859 |

### Per-Class Performance (Unequal Dataset)

#### ResNet-50
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.2500 | 2/8 | 0.40 | 0.25 | 0.31 | 8 |
| **Red_card** | 0.0000 | 0/1 | 0.00 | 0.00 | 0.00 | 1 |
| **Yellow_card** | 0.8182 | 9/11 | 0.60 | 0.82 | 0.69 | 11 |
| **Direct_free-kick** | 0.6250 | 5/8 | 0.56 | 0.62 | 0.59 | 8 |
| **Penalty** | 1.0000 | 1/1 | 0.33 | 1.00 | 0.50 | 1 |
| **no_event** | 0.2000 | 2/10 | 0.33 | 0.20 | 0.25 | 10 |

#### R(2+1)D
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.5000 | 4/8 | 0.44 | 0.50 | 0.47 | 8 |
| **Red_card** | 1.0000 | 1/1 | 0.50 | 1.00 | 0.67 | 1 |
| **Yellow_card** | 0.7273 | 8/11 | 0.62 | 0.73 | 0.67 | 11 |
| **Direct_free-kick** | 0.7500 | 6/8 | 0.67 | 0.75 | 0.71 | 8 |
| **Penalty** | 0.0000 | 0/1 | 0.00 | 0.00 | 0.00 | 1 |
| **no_event** | 0.1000 | 1/10 | 0.33 | 0.10 | 0.15 | 10 |

### Confusion Matrices (Unequal Dataset)

#### ResNet-50 Confusion Matrix
```
Predicted →    Goal  Red  Yellow  Free  Penalty  no_event
True ↓
Goal             2    0     0      2      2        2
Red_card         0    0     1      0      0        0
Yellow_card      0    1     9      0      0        1
Direct_free      2    0     0      5      0        1
Penalty          0    0     0      0      1        0
no_event         1    0     5      2      0        2
```

#### R(2+1)D Confusion Matrix
```
Predicted →    Goal  Red  Yellow  Free  Penalty  no_event
True ↓
Goal             4    0     1      1      2        0
Red_card         0    1     0      0      0        0
Yellow_card      1    1     8      0      0        1
Direct_free      1    0     0      6      0        1
Penalty          1    0     0      0      0        0
no_event         2    0     4      2      1        1
```

### Most Common Classification Errors (Unequal Dataset)

#### ResNet-50
- **no_event → Yellow_card**: 5 misclassifications
- **Goal → Direct_free-kick**: 2 misclassifications
- **Goal → Penalty**: 2 misclassifications
- **Goal → no_event**: 2 misclassifications
- **Direct_free-kick → Goal**: 2 misclassifications

#### R(2+1)D
- **no_event → Yellow_card**: 4 misclassifications
- **Goal → Penalty**: 2 misclassifications
- **no_event → Goal**: 2 misclassifications
- **no_event → Direct_free-kick**: 2 misclassifications

### Dataset Characteristics (Unequal Dataset)
- **Total samples**: 39
- **Classes**: 6 (imbalanced distribution)
- **Class distribution**: 
  - Yellow_card: 11 samples
  - Goal: 8 samples  
  - Direct_free-kick: 8 samples
  - no_event: 10 samples
  - Red_card: 1 sample
  - Penalty: 1 sample

### Key Findings (Unequal Dataset)

#### Model Performance Comparison
- **R(2+1)D** outperformed ResNet-50 overall (51.3% vs 48.7% accuracy)
- **R(2+1)D** showed better performance on most individual classes
- Both models struggled with rare classes (**Red_card**, **Penalty**) and **no_event**

#### Class-Specific Insights
- **Yellow_card** remained the best-performing class for both models
- **Direct_free-kick** classification improved in R(2+1)D (75% vs 62.5%)
- **Goal** detection was better in R(2+1)D (50% vs 25%)
- **no_event** classification was poor in both models, especially R(2+1)D (10% vs 20%)

#### Impact of Class Imbalance
- Rare classes (**Red_card**, **Penalty**) with only 1 sample each showed inconsistent results
- **no_event** misclassifications primarily went to **Yellow_card** in both models
- Class imbalance significantly reduced overall performance compared to balanced dataset

## Overall Comparison: Equal vs Unequal Datasets

| Dataset Type | ResNet-50 Accuracy | R(2+1)D Accuracy | Performance Drop |
|--------------|-------------------|------------------|------------------|
| **Equal Samples** | 60.71% | 60.71% | - |
| **Unequal Samples** | 48.72% | 51.28% | ~10-12% |

The unequal dataset posed significantly greater challenges, with both models showing substantial performance degradation due to class imbalance and the introduction of rare event classes.

---

## Sliding Window Augmentation Dataset Results

*±2 seconds sliding window for Red_card and Penalty classes*

### Overall Performance Summary

| Model | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score |
|-------|------------------|-------------------|-----------------|-------------------|
| **ResNet-50** | 0.5370 | 0.5161 | 0.5370 | 0.5024 |
| **R(2+1)D** | 0.5926 | 0.5707 | 0.5926 | 0.5539 |

### Per-Class Performance (Sliding Window Dataset)

#### ResNet-50
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.5000 | 4/8 | 0.57 | 0.50 | 0.53 | 8 |
| **Red_card** | 0.0000 | 0/3 | 0.00 | 0.00 | 0.00 | 3 |
| **Yellow_card** | 0.9231 | 12/13 | 0.60 | 0.92 | 0.73 | 13 |
| **Direct_free-kick** | 0.5385 | 7/13 | 0.50 | 0.54 | 0.52 | 13 |
| **Penalty** | 0.6667 | 2/3 | 0.33 | 0.67 | 0.44 | 3 |
| **no_event** | 0.2857 | 4/14 | 0.57 | 0.29 | 0.38 | 14 |

#### R(2+1)D
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.7500 | 6/8 | 0.67 | 0.75 | 0.71 | 8 |
| **Red_card** | 0.6667 | 2/3 | 0.67 | 0.67 | 0.67 | 3 |
| **Yellow_card** | 0.9231 | 12/13 | 0.71 | 0.92 | 0.80 | 13 |
| **Direct_free-kick** | 0.5385 | 7/13 | 0.58 | 0.54 | 0.56 | 13 |
| **Penalty** | 1.0000 | 3/3 | 0.38 | 1.00 | 0.55 | 3 |
| **no_event** | 0.1429 | 2/14 | 0.40 | 0.14 | 0.21 | 14 |

### Confusion Matrices (Sliding Window Dataset)

#### ResNet-50 Confusion Matrix
```
Predicted →    Goal  Red  Yellow  Free  Penalty  no_event
True ↓
Goal             4    0     0      2      1        1
Red_card         0    0     3      0      0        0
Yellow_card      0    0    12      0      0        1
Direct_free      2    0     0      7      3        1
Penalty          0    0     0      1      2        0
no_event         1    0     5      4      0        4
```

#### R(2+1)D Confusion Matrix
```
Predicted →    Goal  Red  Yellow  Free  Penalty  no_event
True ↓
Goal             6    0     0      0      2        0
Red_card         0    2     1      0      0        0
Yellow_card      0    0    12      1      0        0
Direct_free      2    0     0      7      1        3
Penalty          0    0     0      0      3        0
no_event         1    1     4      4      2        2
```

### Most Common Classification Errors (Sliding Window Dataset)

#### ResNet-50
- **no_event → Yellow_card**: 5 misclassifications
- **no_event → Direct_free-kick**: 4 misclassifications
- **Red_card → Yellow_card**: 3 misclassifications
- **Direct_free-kick → Penalty**: 3 misclassifications
- **Goal → Direct_free-kick**: 2 misclassifications

#### R(2+1)D
- **no_event → Yellow_card**: 4 misclassifications
- **no_event → Direct_free-kick**: 4 misclassifications
- **Direct_free-kick → no_event**: 3 misclassifications
- **Goal → Penalty**: 2 misclassifications
- **no_event → Penalty**: 2 misclassifications

### Key Findings (Sliding Window Dataset)

#### Impact of Sliding Window Augmentation
- **Red_card** performance improved significantly in R(2+1)D (66.7% vs 0% in original unequal dataset)
- **Penalty** classification showed mixed results: ResNet-50 improved (66.7% vs 100%), R(2+1)D maintained perfect accuracy
- Overall dataset size increased from 39 to 54 samples (+38.5%)

#### Model Performance Comparison
- **R(2+1)D** clearly outperformed ResNet-50 (59.3% vs 53.7% accuracy)
- **R(2+1)D** showed superior performance across most classes except **no_event**
- Both models maintained strong **Yellow_card** detection (>92% accuracy)

#### Persistent Challenges
- **no_event** classification remained problematic for both models
- **Direct_free-kick** confusion with other event types persisted
- Class imbalance still affected overall performance despite augmentation

## Performance Comparison Across All Datasets

| Dataset Configuration | ResNet-50 Accuracy | R(2+1)D Accuracy | Best Performing Model |
|----------------------|-------------------|------------------|----------------------|
| **Equal Samples (4 classes)** | 60.71% | 60.71% | Tie |
| **Unequal Samples (6 classes)** | 48.72% | 51.28% | R(2+1)D (+2.56%) |
| **Sliding Window (6 classes)** | 53.70% | 59.26% | R(2+1)D (+5.56%) |

### Overall Insights
- **Sliding window augmentation** improved performance over the original unequal dataset
- **R(2+1)D** consistently outperformed ResNet-50 on imbalanced datasets
- **Class balance** remains the most critical factor affecting model performance
- **Temporal augmentation** (sliding window) shows promise for rare event classes but requires careful implementation

# Multiclass Few-shot Learning Results Comparison
*Prototypical Networks with Reversed Sample Strategy*

## Methodology Overview

### Training Strategy
- **Prototype Building**: 5 random samples per class selected from available training data
- **Evaluation**: Remaining samples used for testing (varies by dataset configuration)
- **Sample Selection**: Reversed sampling approach - building prototypes from limited samples

### Dataset Configurations

#### Equal Samples Dataset
- **Prototype samples**: 5 per class (4 classes)
- **Evaluation samples**: 12 per class (48 total)
- **Classes**: Goal, Direct_free-kick, Yellow_card, no_event

#### Unequal Samples Dataset
- **Prototype samples**: Maximum 7 per class (varies by availability)
- **Evaluation samples**: Maximum 12 per class (varies by availability)
- **Classes**: Goal, Direct_free-kick, Yellow_card, no_event, Red_card, Penalty

#### Sliding Window Dataset
- **Prototype samples**: Maximum 7 per class (with ±2sec augmentation for rare events)
- **Evaluation samples**: Maximum 12 per class
- **Classes**: Goal, Direct_free-kick, Yellow_card, no_event, Red_card, Penalty
- **Augmentation**: ±2 seconds temporal sliding window for Red_card and Penalty

---

## Equal Samples Dataset Results
*4 classes, balanced distribution*

### Overall Performance Summary

| Model | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score |
|-------|------------------|-------------------|-----------------|-------------------|
| **ResNet-50** | 0.5625 | 0.5694 | 0.5625 | 0.5500 |
| **R(2+1)D** | 0.5000 | 0.5219 | 0.5000 | 0.4733 |

### Per-Class Performance

#### ResNet-50
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score |
|-------|----------|---------------|-----------|---------|----------|
| **Goal** | 0.5833 | 7/12 | 0.58 | 0.58 | 0.58 |
| **Direct_free-kick** | 0.6667 | 8/12 | 0.44 | 0.67 | 0.53 |
| **Yellow_card** | 0.7500 | 9/12 | 0.75 | 0.75 | 0.75 |
| **no_event** | 0.2500 | 3/12 | 0.50 | 0.25 | 0.33 |

#### R(2+1)D
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score |
|-------|----------|---------------|-----------|---------|----------|
| **Goal** | 0.8333 | 10/12 | 0.43 | 0.83 | 0.57 |
| **Direct_free-kick** | 0.5833 | 7/12 | 0.54 | 0.58 | 0.56 |
| **Yellow_card** | 0.4167 | 5/12 | 0.71 | 0.42 | 0.53 |
| **no_event** | 0.1667 | 2/12 | 0.40 | 0.17 | 0.24 |

### Confusion Matrices (Equal Samples)

#### ResNet-50
```
Predicted →     Goal  Free  Yellow  no_event
True ↓
Goal              7     5      0        0
Direct_free       4     8      0        0
Yellow_card       0     0      9        3
no_event          1     5      3        3
```

#### R(2+1)D
```
Predicted →     Goal  Free  Yellow  no_event
True ↓
Goal             10     2      0        0
Direct_free       4     7      0        1
Yellow_card       4     1      5        2
no_event          5     3      2        2
```

---

## Unequal Samples Dataset Results
*6 classes, imbalanced distribution*

### Overall Performance Summary

| Model | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score | Test Samples |
|-------|------------------|-------------------|-----------------|-------------------|--------------|
| **ResNet-50** | 0.4412 | 0.3863 | 0.4412 | 0.3975 | 34 |
| **R(2+1)D** | 0.2778 | 0.2616 | 0.2778 | 0.2666 | 54 |

### Per-Class Performance

#### ResNet-50 (34 samples)
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.7143 | 5/7 | 0.33 | 0.71 | 0.45 | 7 |
| **Direct_free-kick** | 0.4286 | 3/7 | 0.43 | 0.43 | 0.43 | 7 |
| **Yellow_card** | 0.7143 | 5/7 | 0.71 | 0.71 | 0.71 | 7 |
| **no_event** | 0.2857 | 2/7 | 0.40 | 0.29 | 0.33 | 7 |
| **Red_card** | 0.0000 | 0/2 | 0.00 | 0.00 | 0.00 | 2 |
| **Penalty** | 0.0000 | 0/4 | 0.00 | 0.00 | 0.00 | 4 |

#### R(2+1)D (54 samples)
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.2500 | 3/12 | 0.30 | 0.25 | 0.27 | 12 |
| **Direct_free-kick** | 0.3333 | 4/12 | 0.22 | 0.33 | 0.27 | 12 |
| **Yellow_card** | 0.5833 | 7/12 | 0.58 | 0.58 | 0.58 | 12 |
| **no_event** | 0.0833 | 1/12 | 0.07 | 0.08 | 0.08 | 12 |
| **Red_card** | 0.0000 | 0/2 | 0.00 | 0.00 | 0.00 | 2 |
| **Penalty** | 0.0000 | 0/4 | 0.00 | 0.00 | 0.00 | 4 |

### Confusion Matrices (Unequal Samples)

#### ResNet-50
```
Predicted →   Goal  Free  Yellow  no_event  Red  Penalty
True ↓
Goal            5     2      0        0      0      0
Direct_free     3     3      0        1      0      0
Yellow_card     0     0      5        2      0      0
no_event        3     2      0        2      0      0
Red_card        0     0      2        0      0      0
Penalty         4     0      0        0      0      0
```

#### R(2+1)D
```
Predicted →   Goal  Free  Yellow  no_event  Red  Penalty
True ↓
Goal            3     5      0        4      0      0
Direct_free     1     4      0        7      0      0
Yellow_card     3     1      7        1      0      0
no_event        1     8      2        1      0      0
Red_card        0     0      2        0      0      0
Penalty         2     0      1        1      0      0
```

---

## Sliding Window Dataset Results
*6 classes with temporal augmentation*

### Overall Performance Summary

| Model | Overall Accuracy | Weighted Precision | Weighted Recall | Weighted F₁ Score | Test Samples |
|-------|------------------|-------------------|-----------------|-------------------|--------------|
| **ResNet-50** | 0.2879 | 0.1788 | 0.2879 | 0.2128 | 66 |
| **R(2+1)D** | 0.3939 | 0.2752 | 0.3939 | 0.3160 | 66 |

### Per-Class Performance

#### ResNet-50
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.8333 | 10/12 | 0.33 | 0.83 | 0.48 | 12 |
| **Direct_free-kick** | 0.2500 | 3/12 | 0.25 | 0.25 | 0.25 | 12 |
| **Yellow_card** | 0.5000 | 6/12 | 0.40 | 0.50 | 0.44 | 12 |
| **no_event** | 0.0000 | 0/12 | 0.00 | 0.00 | 0.00 | 12 |
| **Red_card** | 0.0000 | 0/6 | 0.00 | 0.00 | 0.00 | 6 |
| **Penalty** | 0.0000 | 0/12 | 0.00 | 0.00 | 0.00 | 12 |

#### R(2+1)D
| Class | Accuracy | Correct/Total | Precision | Recall | F₁-Score | Support |
|-------|----------|---------------|-----------|---------|----------|---------|
| **Goal** | 0.6667 | 8/12 | 0.53 | 0.67 | 0.59 | 12 |
| **Direct_free-kick** | 0.9167 | 11/12 | 0.42 | 0.92 | 0.58 | 12 |
| **Yellow_card** | 0.4167 | 5/12 | 0.36 | 0.42 | 0.38 | 12 |
| **no_event** | 0.1667 | 2/12 | 0.20 | 0.17 | 0.18 | 12 |
| **Red_card** | 0.0000 | 0/6 | 0.00 | 0.00 | 0.00 | 6 |
| **Penalty** | 0.0000 | 0/12 | 0.00 | 0.00 | 0.00 | 12 |

### Confusion Matrices (Sliding Window)

#### ResNet-50
```
Predicted →   Goal  Free  Yellow  no_event  Red  Penalty
True ↓
Goal           10     1      0        1      0      0
Direct_free     9     3      0        0      0      0
Yellow_card     1     1      6        3      1      0
no_event        2     7      3        0      0      0
Red_card        0     0      6        0      0      0
Penalty         8     0      0        4      0      0
```

#### R(2+1)D
```
Predicted →   Goal  Free  Yellow  no_event  Red  Penalty
True ↓
Goal            8     4      0        0      0      0
Direct_free     0    11      0        1      0      0
Yellow_card     0     2      5        4      1      0
no_event        2     7      1        2      0      0
Red_card        0     0      6        0      0      0
Penalty         5     2      2        3      0      0
```

---

## Critical Analysis and Key Findings

### Performance Degradation Pattern

| Dataset Configuration | ResNet-50 Accuracy | R(2+1)D Accuracy | Performance Gap |
|----------------------|-------------------|------------------|-----------------|
| **Equal Samples** | 56.25% | 50.00% | ResNet-50 +6.25% |
| **Unequal Samples** | 44.12% | 27.78% | ResNet-50 +16.34% |
| **Sliding Window** | 28.79% | 39.39% | R(2+1)D +10.60% |

### Major Observations

#### 1. Model Behavior Reversal
- **Equal samples**: ResNet-50 outperformed R(2+1)D significantly
- **Unequal samples**: ResNet-50 maintained superiority with larger margin
- **Sliding window**: R(2+1)D showed recovery and outperformed ResNet-50

#### 2. Class-Specific Patterns
- **Yellow_card**: Consistently best-performing class across most configurations
- **Goal detection**: Varied dramatically between models and datasets
- **Rare events** (Red_card, Penalty): Complete failure in most cases (0% accuracy)
- **no_event**: Persistent challenge across all configurations

#### 3. Impact of Sample Strategy
- **Reversed sampling** (5 prototypes → 12 test) showed different behavior than traditional few-shot learning
- **Larger test sets** revealed model limitations more clearly
- **Class imbalance** severely affected both models' generalization capability

#### 4. Temporal Augmentation Effects
- **Sliding window augmentation** helped R(2+1)D more than ResNet-50
- **Direct_free-kick** detection improved dramatically in R(2+1)D (91.67% accuracy)
- **Penalty and Red_card** augmentation didn't translate to classification success

### Most Problematic Class Confusions

#### Across All Configurations:
1. **no_event → other classes**: Consistent misclassification pattern
2. **Goal ↔ Direct_free-kick**: Mutual confusion in most scenarios
3. **Rare events → common classes**: Red_card and Penalty often misclassified as Goal or Yellow_card

### Methodological Insights

#### Strengths of Reversed Sample Strategy:
- **Larger evaluation sets** provide more reliable performance estimates
- **Better understanding** of model generalization capabilities
- **Real-world applicability** where test data exceeds training data

#### Limitations Revealed:
- **Prototype quality** becomes critical with limited training samples
- **Class imbalance** effects are amplified in few-shot scenarios
- **Temporal augmentation** shows mixed results and requires careful implementation

### Recommendations

1. **For balanced datasets**: ResNet-50 shows superior few-shot learning capability
2. **For imbalanced datasets**: Consider ensemble approaches or specialized loss functions
3. **For temporal data**: R(2+1)D benefits more from sliding window augmentation
4. **For rare event detection**: Alternative strategies beyond simple augmentation needed

The reversed sampling strategy reveals that traditional few-shot learning assumptions may not hold when dealing with imbalanced sports event detection, particularly for rare events like red cards and penalties.