# Training Model with Your own Data

**This lesson is adapted from [sentdex](https://www.youtube.com/watch?v=j-3vuBynnOE) on YouTube. I have collated this for my own learning experience, as well as for the benefit of others who would like to learn as well. :)**

First, you will have to find your own dataset to work with. There are many public datasets out there for you to choose from. 

For this notebook, we will be using photos of cats and dogs from [kaggle](https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765)

You might see a few new modules to import below, namely os and cv2

The module os allows us to operate on underlying Operating System tasks, and the module cv2 is mainly to augment data.

To download cv2, try

*pip install opencv-python*

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import os
import cv2

Firstly, we will initialise the directory of the data, as well as the categories they are in.
*Note that this ill depend on how your directory of your pet_images look like, but if you did not do any modifictions to your downloaded file, it should be okay*

Then, we will be using cv2 to convert your images to grayscale images. This is because it will make it easier for the model to train. Moreover for this case, colour is not essential for the model to be trained. However, do not that other comparisons might need colour as a feature for training.

In [None]:
DATADIR = "Cat_Dog_data"
CATEGORIES = ["Dog", "Cat"]

for category in CATEGORIES:
    path = os.path.join(DATADIR, category)
    for img in os.listdir(path):
        #input images into an array and convert images to gray-scale
        img_array = cv2.imread(os.path.join(path, img), cv2.IIMREAD_GRAYSCALE)
        plt.imshow(img_array, cmap = "gray")
        break
    break

In [None]:
#take note of how the array changes with/without the grayscale
print(img_array)

print(img_array.shape)

If you look at the images in your pet_images folder, you would realise that the dimensions of the photos are all different. This will make it extremely difficult for your model to be trained, hence we will decide on an image size for all your photos, and resize them to it.

In [None]:
IMG_SIZE = 50

new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
plt.imshow(new_array, cmap = 'gray')
plt.show()