# Sports Image Classification
Dataset: https://www.kaggle.com/code/littlebughenrylee/100-sports-classification-resnet-93-yolo-98

<p>Collection of sports images covering 100 different sports.. Images are 224,224,3 jpg format. Data is separated into train, test and valid directories. Additionallly a csv file is included for those that wish to use it to create there own train, test and validation datasets.</p>

Images were gathered from internet searches. The images were scanned with a duplicate image detector program I wrote. Any duplicate images were removed to prevent bleed through of images between the train, test and valid data sets. All images were then resized to 224 X224 X 3 and converted to jpg format. A csv file is included that for each image file contains the relative path to the image file, the image file class label and the dataset (train, test or valid) that the image file resides in. This is a clean dataset.

<img src="https://t3.ftcdn.net/jpg/02/78/42/76/360_F_278427683_zeS9ihPAO61QhHqdU1fOaPk2UClfgPcW.jpg" class="center">
<style>.center {
  display: block;
  margin-left: auto;
  margin-right: auto;
  width: 50%;
}
</style>

## Importing Necessary Libraries

In [1]:
import os 
import numpy as np
import pandas as pd
from PIL import Image
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import torch
from torch import nn as nn
from torch.utils.data import Dataset, DataLoader
import time
import torchvision
from torchvision import transforms

from torchinfo import summary
import kaggle
from plotly.subplots import make_subplots
import os
import glob
import plotly.graph_objects as go
import plotly.express as px


## Download Dataset from Kaggle
link: https://www.kaggle.com/datasets/gpiosenka/sports-classification

In [2]:
if "sports-classification" in os.listdir():
    print('Data Already Exist')
else:
  print("Downloading Data From Kaggle")
  !kaggle datasets download -d gpiosenka/sports-classification

Downloading Data From Kaggle
Downloading sports-classification.zip to d:\




  0%|          | 0.00/424M [00:00<?, ?B/s]
  0%|          | 2.00M/424M [00:00<00:29, 15.1MB/s]
  1%|          | 5.00M/424M [00:00<00:21, 20.6MB/s]
  2%|▏         | 8.00M/424M [00:00<00:19, 22.3MB/s]
  3%|▎         | 11.0M/424M [00:00<00:19, 22.7MB/s]
  3%|▎         | 14.0M/424M [00:00<00:20, 21.2MB/s]
  4%|▍         | 17.0M/424M [00:00<00:19, 22.4MB/s]
  5%|▍         | 20.0M/424M [00:00<00:18, 23.3MB/s]
  5%|▌         | 23.0M/424M [00:01<00:17, 24.4MB/s]
  6%|▌         | 26.0M/424M [00:01<00:16, 24.7MB/s]
  7%|▋         | 29.0M/424M [00:01<00:16, 25.2MB/s]
  8%|▊         | 32.0M/424M [00:01<00:16, 25.0MB/s]
  8%|▊         | 35.0M/424M [00:01<00:16, 25.1MB/s]
  9%|▉         | 38.0M/424M [00:01<00:16, 24.4MB/s]
 10%|▉         | 41.0M/424M [00:01<00:15, 25.3MB/s]
 10%|█         | 44.0M/424M [00:01<00:15, 26.1MB/s]
 11%|█         | 47.0M/424M [00:02<00:15, 26.2MB/s]
 12%|█▏        | 50.0M/424M [00:02<00:14, 27.1MB/s]
 13%|█▎        | 53.0M/424M [00:02<00:14, 27.2MB/s]
 13%|█▎        | 56.

In [3]:
# Example usage:
directory_path = os.getcwd()

## Extract Downloaded ZIP File

In [4]:
# Extract the filenames without extension
data_folder_name = glob.glob(os.path.join(directory_path, '*.zip'))
filenames_without_extension = [os.path.splitext(os.path.basename(file))[0] for file in data_folder_name]
data_folder_name

['d:\\sports-classification.zip']

In [5]:
data_folder_name = filenames_without_extension

In [6]:
data_folder_name = data_folder_name[0]

## Determine If Dataset is unzipped or not

In [8]:
if not os.path.exists(data_folder_name):
    os.makedirs(data_folder_name)
    print(f"Directory '{data_folder_name}' created.")
    !unzip sports-classification.zip -d 'sports-classification/'
else:
    print(f"Directory '{data_folder_name}' already exists.")

Directory 'sports-classification' created.
Archive:  sports-classification.zip
  inflating: sports-classification/EfficientNetB0-100-(224 X 224)- 98.40.h5  
  inflating: sports-classification/sports.csv  
  inflating: sports-classification/test/air hockey/1.jpg  
  inflating: sports-classification/test/air hockey/2.jpg  
  inflating: sports-classification/test/air hockey/3.jpg  
  inflating: sports-classification/test/air hockey/4.jpg  
  inflating: sports-classification/test/air hockey/5.jpg  
  inflating: sports-classification/test/ampute football/1.jpg  
  inflating: sports-classification/test/ampute football/2.jpg  
  inflating: sports-classification/test/ampute football/3.jpg  
  inflating: sports-classification/test/ampute football/4.jpg  
  inflating: sports-classification/test/ampute football/5.jpg  
  inflating: sports-classification/test/archery/1.jpg  
  inflating: sports-classification/test/archery/2.jpg  
  inflating: sports-classification/test/archery/3.jpg  
  inflating:

## Define Model Results Directory

In [9]:
MODEL_RESULTS_DIR = 'models_results'

In [10]:
if not os.path.exists(MODEL_RESULTS_DIR):
    os.makedirs(MODEL_RESULTS_DIR)
    print(f"Directory '{MODEL_RESULTS_DIR}' created.")
else:
    print(f"Directory '{MODEL_RESULTS_DIR}' already exists.")

Directory 'models_results' created.


## Import Dataset and get total number of classes

In [11]:
data = pd.read_csv('sports-classification/sports.csv')
NUM_CLASSES = data['class id'].nunique()
print(NUM_CLASSES)

100


## Define Device either GPU or CPU

In [12]:
# Device
device = "cuda" if torch.cuda.is_available() else "cpu"
# hardcode :
# loss_fn -> CrossEntropyLoss
# optimizer -> Adam(lr = 0.0005)
device

'cuda'