# Reading and saving data for a classification model
In this notebook, we first will read images and their corresponding labels and then will wrap and save them in a single file on disk (or google drive). It helps us a lot later to avoid reading so much data each time when we are working with our models.
We will use the python pickle module to wrap images in a single file. Any object in Python can be pickled so that it can be saved on disk. 
Pickling is a way to convert a python object (list, dict, etc.) into a character stream. The idea is that this character stream contains all the information necessary to reconstruct the object in another python script.

# **Make a connection between colab and your google drive Where your data are saved.**

In [0]:
from google.colab import drive
drive.mount('/content/gdrive/')

# **Import necessary Libraries**

In [0]:
import numpy as np   # Package for scientific computing
import matplotlib.pyplot as plt # 2D plotting library
import os     # Using operating system
# NOTE: in the tutorials we used cv2 for reading images and resizing.
# we replaced this with imageio.imread and skimage.transform.resize
# because opencv can cause some dependency issues
#from imageio import imread
#from skimage.transform import resize
import cv2    # Computer vision and machine learning software library
from tqdm import tqdm   # Progress bar library
import random  # Generating Random Numbers
import pickle # Serializing and de-serializing a Python object structure

# **Reading and saving data** 
What we need is a training data directory (and/or validation data directory)  containing one subdirectory per image class, filled with images. For example: 

```
Animals/
    train/
        dogs/
            dog001.jpg
            dog002.jpg
            ...
        cats/
            cat001.jpg
            cat002.jpg
            ...
```



In [0]:
DATADIR ="/content/gdrive/My Drive/PathToYourDirectory"  # You have to replace the directory of you images instead of "PathToYourDirectory".
                                                        # For example "/content/gdrive/My Drive/Animals/train"

CATEGORIES = os.listdir(DATADIR)
print(CATEGORIES)

In [0]:
training_data = []
IMG_SIZE_H='Desire_Hight_for_your_images' # in a case of resize to normalize data size
IMG_SIZE_W='Desire_Weight_for_your_images' # in a case of resize to normalize data size
def create_training_data():
    for category in CATEGORIES:  # do plants and weeds

        path = os.path.join(DATADIR,category)  # create path to plants and weeds
        class_num = CATEGORIES.index(category)  # get the classification  (0 or a 1). 0=plants 1=weeds

        for img in tqdm(os.listdir(path)):  # iterate over each image per plants and weeds
        
            img_array = cv2.imread(os.path.join(path,img))  # convert to array 
            new_array = cv2.resize(img_array, (IMG_SIZE_H, IMG_SIZE_W))  # resize to normalize data size
            training_data.append([new_array, class_num])  # add this to our training_data

In [0]:
create_training_data()  # Calling the function for reading images and labels
print(len(training_data)) # Printing the size of the database

# **Preparation of data for deeplearning**

In [0]:
random.shuffle(training_data)   # Shuffling data
X = []  # An Array for images
y = []  # An Array for labels

for features,label in training_data:   # Seperation of iamegs and labels
    X.append(features)
    y.append(label)

In [0]:
X = np.array(X).reshape(-1, IMG_SIZE_H, IMG_SIZE_W, 3)  # Reshape data in a form that is suitable for keras
print(X.shape) # Print the size of the database

# **Visualisation and Saving**

In [0]:
# plot 3 images as gray scale
plt.subplot(131)
plt.imshow(X[0,:,:,0], cmap=plt.get_cmap('gray'))
plt.subplot(132)
plt.imshow(X[1,:,:,0], cmap=plt.get_cmap('gray'))
plt.subplot(133)
plt.imshow(X[3,:,:,0], cmap=plt.get_cmap('gray'))
# show the plot
plt.show()

In [0]:
pickle_out = open("PathToSaveTheFile/X.pickle","wb") # wrapping up images # You have to replace the directory of you images instead of "PathToSaveTheFile".
                                                        # For example "/content/gdrive/My Drive/MyFolder/X.pickle"
pickle.dump(X, pickle_out)
pickle_out.close()

pickle_out = open("PathToSaveTheFile/y.pickle","wb") # wrapping up labels # You have to replace the directory of you images instead of "PathToSaveTheFile".
                                                        # For example "/content/gdrive/My Drive/MyFolder/y.pickle"
pickle.dump(y, pickle_out)
pickle_out.close()