Eric Phann  
DSBA 6165

# Introduction

We will be looking at the _Super Resolution Gaming Dataset (SRGD)_, one of three potential datasets for our DSBA 6165 group project.  
It can be accessed via [Github](https://github.com/epishchik/SRGD) or [Hugging Face](https://huggingface.co/datasets/epishchik/SRGD/tree/main/data).

## Background

This dataset was developed by Evgenii Pishchik ([@epishchik](https://github.com/epishchik)) with the intent to provide an easily useabla and accessible video game image dataset for the specific task of super resolution. This dataset helps to address the lack of current video game domain-specific images related to this specific task, and is intended for indie developers to use to get started with experimenting and researching in this area.

# Dataset

Let's dive into the details or the dataset below.

## Overview

SRGD consists of 2 independent datasets. Each has images from various games/projects in all 4 resolutions: 270p, 360p, 540p, and 1080p.
*   __GameEngineData__: 14431 train and 3600 test images across 17 games/projects
*   __DownscaleData__: 29726 train and 7421 test images acros 20 games/projects

The creator does not specify the difference between the two, so we will need to take a look ourselves. Additionally, there are overlaps in games/projects between the two with some games like Defense of the Ancients 2 being only represented in one dataset.

## Directory

Each dataset has images grouped by __game name__ or __project name__ and then split into training and validation sets for _each_ resolution.

Example: If I wanted to look at images for the game Defense of the Ancients 2 (aka Dota 2), I would use the following directory:

` ~/SRGD/data/DownscaleData/Dota2 `

It is worth noting that this game does not have images in the GameEngineData folder.

You would then need to drill deeper into the folder based on __resolution__ and whether I want the __train or test set__.  
Example2: If I wanted to look at high-resolution images (1080p) for Dota 2, I would use the following directory:  
```
~/SRGD/data/DownscaleData/Dota2/train-1080p.tar.gz # train set
~/SRGD/data/DownscaleData/Dota2/val-1080p.tar.gz # test/val set
```

## File Size & Storage

It is important to note that due to the nature of the data, images rather than tabular rows and columns, the file sizes and folders get really large, especially as image resolution increases. Take a look at the folder sizes for varying resolutions of the same 7013 Dota 2 images:


*   train-270p = 395 MB
*   train-360p = 602 MB
*   train-540p = 1075 MB = 1.75 GB
*   train-1080p = 8930 MB = 8.93 GB  

Because of this, we need to be careful and intentional before loading the entirety of SRGD, which is __over 50 GBs (50,000 MBs)__!


# Examples

Let's take a look at a few examples of low resolution (240p) vs. high resolution (1080p) images from Dota 2.

In [1]:
!pip install datasets

Collecting datasets
  Downloading datasets-3.3.2-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.3.2-py3-none-any.whl (485 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m485.4/485.4 kB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading multiprocess-0.70.16-py311-none-any.whl (143 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.5/143.5 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading

In [3]:
from datasets import load_dataset

In [9]:
# dota2_270p = load_dataset("epishchik/SRGD", data_files="data/DownscaleData/Dota2/train-270p.tar.gz")

AttributeError: 'BuilderConfig' object has no attribute 'features'

Uh-oh! Looks like the dataset's config file is buggy. Let's just take a look at an example of low-resolution vs. high-resolution instead. We will revisit the error later, potentially redownloading and uploading to Hugging Face the datasets we would like to use ourselves.  

Let's look at the example given on the SRGD GitHub repo.

![lr](https://github.com/epishchik/SRGD/blob/main/images/readme/lr.png?raw=true)  
_Low-resolution image (270p)_  

![hr](https://github.com/epishchik/SRGD/blob/main/images/readme/hr.png?raw=true)  
_High-resolution image (1080p)_

# Conclusion

This dataset is very well cleaned and sourced. We don't have to do much (or any) preprocessing techniques and simply glancing through the images, they are all approriate resolutions. It has a variety of scenes varying from real games to prototype projects (good for unseen test data). Additionally, it is useful in that each image has its equivalent across varying resolutions (270p, 360p, 540p, 1080p) allowing for us to compare low resolutions to high resolutions. One possible limitation is that all of these games/projects are from the Unity game engine. A model fine-tuned on this dataset may suffer from a game with a unique engine or style e.g., Candy Crush.