# Pre-process the dataset for FelineFlow
The original Cats vs. Dogs dataset consists of more than 12,000 images, each of different sizes of cats alone! We need to modify the dataset to suit our needs accordingly.

In this notebook, we will apply the following transformations:
- Select the first 4096 images for training and testing.
- Crop the images to a 1:1 aspect ratio.
- Downscale the image to 128x128.

Let's get started by importing the necessary modules

In [70]:
import os
import cv2

Let's create a function to crop an image to 1:1

In [71]:
def crop_square(img_path):
    img = cv2.imread(img_path)
    height, width = img.shape[:2]
    if(height==width):
        return img
    if(height>width):
        return img[(height-width)//2: (height+width)//2, :]
    else:
        return img[:, (width-height)//2: (width+height)//2]

Function to resize the image to a specified resolution

In [72]:
def resize(img, res):
    return cv2.resize(img, res, interpolation=cv2.INTER_AREA)

And finally, a function to generate the dataset

In [73]:
def generate_dataset(input_dir, output_dir, res, num_images):
    os.makedirs(output_dir, exist_ok=True)
    
    img_paths = os.listdir(input_dir)
    if(len(img_paths)<num_images):
        raise Exception("Not enough images in source directory")
    
    selected_paths = img_paths[:num_images]
    for i in range(len(selected_paths)):
        try:
            destination_path = os.path.join(output_dir, str(i))+'.jpg'
            cropped_image = crop_square(os.path.join(input_dir,selected_paths[i]))
                                        
            resized_image = resize(cropped_image, res)
            print(destination_path, 'saved')
            cv2.imwrite(destination_path, resized_image)
        except AttributeError:
            i-=1
            continue


Let's specify the parameters globally and watch the script work its magic!

In [74]:
generate_dataset('./cats_source', './cats_processed', (256,256), 4096)

./cats_processed\0.jpg saved
./cats_processed\1.jpg saved
./cats_processed\2.jpg saved
./cats_processed\3.jpg saved
./cats_processed\4.jpg saved
./cats_processed\5.jpg saved
./cats_processed\6.jpg saved
./cats_processed\7.jpg saved
./cats_processed\8.jpg saved
./cats_processed\9.jpg saved
./cats_processed\10.jpg saved
./cats_processed\11.jpg saved
./cats_processed\12.jpg saved
./cats_processed\13.jpg saved
./cats_processed\14.jpg saved
./cats_processed\15.jpg saved
./cats_processed\16.jpg saved
./cats_processed\17.jpg saved
./cats_processed\18.jpg saved
./cats_processed\19.jpg saved
./cats_processed\20.jpg saved
./cats_processed\21.jpg saved
./cats_processed\22.jpg saved
./cats_processed\23.jpg saved
./cats_processed\24.jpg saved
./cats_processed\25.jpg saved


./cats_processed\26.jpg saved
./cats_processed\27.jpg saved
./cats_processed\28.jpg saved
./cats_processed\29.jpg saved
./cats_processed\30.jpg saved
./cats_processed\31.jpg saved
./cats_processed\32.jpg saved
./cats_processed\33.jpg saved
./cats_processed\34.jpg saved
./cats_processed\35.jpg saved
./cats_processed\36.jpg saved
./cats_processed\37.jpg saved
./cats_processed\38.jpg saved
./cats_processed\39.jpg saved
./cats_processed\40.jpg saved
./cats_processed\41.jpg saved
./cats_processed\42.jpg saved
./cats_processed\43.jpg saved
./cats_processed\44.jpg saved
./cats_processed\45.jpg saved
./cats_processed\46.jpg saved
./cats_processed\47.jpg saved
./cats_processed\48.jpg saved
./cats_processed\49.jpg saved
./cats_processed\50.jpg saved
./cats_processed\51.jpg saved
./cats_processed\52.jpg saved
./cats_processed\53.jpg saved
./cats_processed\54.jpg saved
./cats_processed\55.jpg saved
./cats_processed\56.jpg saved
./cats_processed\57.jpg saved
./cats_processed\58.jpg saved
./cats_pro