# 0. Data gathering

Description of a version 0 dataset:
* 100 images total, 75 training and 25 testing
* Images of the catan circles that represent numbers at various angles and lighting
* There are 10 numbers in total, so there needs to be 10 images of each number
* This dataset will be without data augmentation
* Various background, including a catan board and close up of the number
* The goal is to train a model on this dataset that will eventually take in cropped out images of the numbers of a Catan board and identify which number it is

# 1. Create a zip file of the images for importing

After images are collected, I'll create a zip file and put it on the Github Repository: https://github.com/mattmanb/catanomics/tree/main
* This zip file will have the correct directory paths set up (the data will already be divided into train and test sets, as well as what number they will fall under)
* i.e. "data/CATANIST/train/four" will contain all training images of the number four
* Each training class should have a minimum of 7 images of that number (if the data is divided completely randomly then there is a chance a number's images are too skewed to the test directory and the model won't be able to sufficiently learn the number's patterns)

## 1.1 This script was used to create the condensed zip file that holds the dataset
* The zip file before was 233MB since each image was 3024x3024 (thanks Apple!)
* I resized all the images to 224x224, saved it to a 'condensed' directory, then made a zip file from there. No need to keep this code included, so I'll comment it out.

In [5]:
# # Resize all the images in the dataset so a zip file < 25MB can be made
# import cv2
# import os

# def resize_images(root_dir, target_dir, new_width, new_height):
#     for subdir, dirs, files in os.walk(root_dir):
#         for file in files:
#             # Construct the full file path
#             file_path = os.path.join(subdir, file)
#             # Read the image
#             image = cv2.imread(file_path)
#             if image is not None:
#                 # Resize the image
#                 resized_image = cv2.resize(image, (new_width, new_height))

#                 # Construct relative path to maintain the directory structure
#                 relative_path = os.path.relpath(subdir, root_dir)
#                 target_subdir = os.path.join(target_dir, relative_path)

#                 # Ensure the target subdirectory exists
#                 if not os.path.exists(target_subdir):
#                     os.makedirs(target_subdir)

#                 # Construct the target file path to save the image
#                 target_file_path = os.path.join(target_subdir, file)
#                 cv2.imwrite(target_file_path, resized_image)
#                 print(f"Resized image saved to {target_file_path}")
#             else:
#                 print(f"Failed to read {file_path}")

In [6]:
# # Create new folder
# root_dir = './data/CATANIST'
# target_dir = './condensed/CATANIST'
# new_width = 224
# new_height = 224

# resize_images(root_dir, target_dir, new_width, new_height)

Resized image saved to ./condensed/CATANIST\test\eight\Photo Feb 28 2024, 09 54 02.jpg
Resized image saved to ./condensed/CATANIST\test\eight\Photo Feb 28 2024, 09 54 09.jpg
Resized image saved to ./condensed/CATANIST\test\eight\Photo Feb 28 2024, 09 54 16.jpg
Resized image saved to ./condensed/CATANIST\test\eleven\Photo Feb 28 2024, 09 58 03.jpg
Resized image saved to ./condensed/CATANIST\test\eleven\Photo Feb 28 2024, 09 58 22.jpg
Resized image saved to ./condensed/CATANIST\test\eleven\Photo Feb 28 2024, 09 58 29.jpg
Resized image saved to ./condensed/CATANIST\test\five\Photo Feb 28 2024, 09 50 22.jpg
Resized image saved to ./condensed/CATANIST\test\five\Photo Feb 28 2024, 09 50 29.jpg
Resized image saved to ./condensed/CATANIST\test\five\Photo Feb 28 2024, 09 50 36.jpg
Resized image saved to ./condensed/CATANIST\test\four\Photo Feb 28 2024, 09 48 56.jpg
Resized image saved to ./condensed/CATANIST\test\four\Photo Feb 28 2024, 09 49 02.jpg
Resized image saved to ./condensed/CATANIST\t

# 2. Set up train/test directories and get a list of all the file paths (Data preparation)

* Create training/testing directories
* Visualize some of the images at random from the training directory

# 3. Tranforming the data for the model

In order to use these images with a PyTorch model, they need to be transformed...
* The resolution can be much lower than the quality iPhones default to
* No data augmentation in the first iteration of the dataset
* Then the images must be put into Tensor format that will work with PyTorch
* Visualize images using matplotlib (`.permute()` needs to be used so that color channels come last!)

# 4. Load image data using `ImageFolder` from `torchvision.datasets`, and create DataLoaders

This performs the transforms previously created and loads the images into ImageFolder data variables

Then create DataLoaders from the train/test ImageFolder variables

# 5. Perform any other transforms

This is the step where any other transformations, such as data augmentation, would take place. 

Possible data augmentations for this dataset:
* TrivialAugment - https://pytorch.org/vision/main/generated/torchvision.transforms.TrivialAugmentWide.html
* Vertical Flip - https://pytorch.org/vision/stable/generated/torchvision.transforms.RandomVerticalFlip.html

# 6. Create a model

Find a model architecture that works well with MNIST and try that, then experiment from there.
* This step will require many different attempts to try to become the most accurate, prevent underfitting/overfitting, and determine other issues with the dataset
* Use `torchinfo` to visualize the learning process
* Functionize training, testing, and evaluation for future models

# 7. Plot loss curves

This step is crucial in improving the results during step 6
* Plot different model performance against one another
* Visualize model performance
* Functionize plotting and plotting two models against one another

# 8. Test model with custom images

Take new images and send them through the model
* Functionize model predictions and plot the prediction along with the image
* Take new images to see how the model does with new images

In [1]:
import cv2
import numpy as np
import torch

In [2]:
# Show image function for later use
def showImage(img, name=None):
    if not name:
        cv2.imshow("Image display", img)
    else:
        cv2.imshow(name, img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    
# Save image function
def saveImage(filename, img, dir):
    # Get full path
    full_path = f"{dir}/{filename}"
    cv2.imwrite(full_path, img)
    print(f"Image saved to {full_path}")

In [3]:
import os
from PIL import Image

images = []
for filename in os.listdir("./data/CATANIST dataset"):
    file_path = os.path.join("./data/CATANIST dataset", filename)
    try:
        img = Image.open(file_path)
        images.append(img)
    except IOError:
        print(f"Error opening or reading image {filename}")
        
for img in images:
    print(img)
    showImage(img)

FileNotFoundError: [Errno 2] No such file or directory: './images/CATANIST dataset'