Skip to content
/ VGGR Public

VGGR (Video Game Genre Recognition) is a Deep-Learning Image Classification project, answering questions nobody is asking.

License

Notifications You must be signed in to change notification settings

m4cit/VGGR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VGGR (Video Game Genre Recognition)

Have you ever seen gameplay footage and wondered what kind of video game it is from? No? Well, do not not wonder anymore.

VGGR is a Deep-Learning Image Classification project, answering questions nobody is asking.

Requirements

  1. Install Python 3.10 or newer.

  2. Clone the repository with

    git clone https://github.com/m4cit/VGGR.git
    

    or download the latest source code.

  3. Download the latest train, test, and validation img zip-files in releases.

  4. Unzip the train, test, and validation img files inside their respective folders located in ./data/.

  5. Install PyTorch.

    5.1 Either with CUDA

    • Windows:
      pip3 install torch==2.2.2 torchvision==0.17.2 --index-url https://download.pytorch.org/whl/cu121
      
    • Linux:
      pip3 install torch==2.2.2 torchvision==0.17.2
      

    5.2 Or without CUDA

    • Windows:
      pip3 install torch==2.2.2 torchvision==0.17.2
      
    • Linux:
      pip3 install torch==2.2.2 torchvision==0.17.2 --index-url https://download.pytorch.org/whl/cpu
      
  6. Navigate to the VGGR main directory.

    cd VGGR
    
  7. Install dependencies.

    pip install -r requirements.txt
    

Note: The provided train dataset does not contain augmentations.

Genres

Available

  • Football / Soccer
  • First Person Shooter (FPS)
  • 2D Platformer
  • Racing

Games

Train Set

  • FIFA 06
  • Call of Duty Black Ops
  • Call of Duty Modern Warfare 3
  • DuckTales Remastered
  • Project CARS

Test Set

  • PES 2012
  • FIFA 10
  • Counter Strike 1.6
  • Counter Strike 2
  • Ori and the Blind Forest
  • Dirt 3

Validation Set

  • Left 4 Dead 2
  • Oddworld Abe's Oddysee
  • FlatOut 2

Usage

Commands

--demo | Demo predictions with the test set

--augment | Data Augmentation

--train | Train mode

--predict | Predict / inference mode

--input (-i) | File input for predict mode (URL or local image path)

--model (-m) | Model selection

  • cnn_v1 (default)
  • cnn_v2
  • cnn_v3

--device (-d) | Device selection

  • cpu (default)
  • cuda
  • ipu
  • xpu
  • mkldnn
  • opengl
  • opencl
  • ideep
  • hip
  • ve
  • fpga
  • ort
  • xla
  • lazy
  • vulkan
  • mps
  • meta
  • hpu
  • mtia

Examples

Demo with Test Set

python VGGR.py --demo

or

python VGGR.py --demo -m cnn_v1 -d cpu

or

python VGGR.py --demo --model cnn_v1 --device cpu

Predict with Custom Input

python VGGR.py --predict -i path/to/img.png

or

python VGGR.py --predict -i https://website/img.png

or

python VGGR.py --predict -i path/to/img.png -m cnn_v1 -d cpu

Training

python VGGR.py --train -m cnn_v1 -d cpu

Delete the existing model to train from scratch.

Results

The --demo mode creates html files with the predictions and corresponding images inside the results folder.

Performance

There are three Convolutional Neural Network (CNN) models available:

  1. cnn_v1 | F-score of 75 %
  2. cnn_v2 | F-score of 58.33 %
  3. cnn_v3 | F-score of 64.58 %

cnn_v1 --demo result examples

Data

Most of the images are from my own gameplay footage. The PES 2012 and FIFA 10 images are from videos by No Commentary Gameplays, and the FIFA 95 images are from a video by 10min Gameplay (YouTube).

The train dataset also contained augmentations (not in the provided zip-file).

Augmentation

To augment the train data with jittering, inversion, and 5 part cropping, copy-paste the metadata of the images into the augment.csv file located in ./data/train/metadata/.

Then run

python VGGR.py --augment

The metadata of the resulting images are subsequently added to the metadata.csv file.

Preprocessing

All images are originally 2560x1440p, and get resized to 1280x720p before training, validation, and inference. 4:3 images are stretched to 16:9 to avoid black bars.

Libraries