## Real dataset

So far we have used only dataset which were downloaded. 
But it is useful to know how to get a real dataset and load it with keras.


In this notebook you will have to train a keras model on a small dataset you collected yourself ! 

Option 1 : you collect 10 examples of each class for the problem of your choice 

Option 2 : you teamup and each member of the team upload to a google drive 10 examples per classes. 


Create one folder per class. 

Exemple 

if you want to classify cats and dogs you must have a folder with dogs pictures and a folder with cat pictures

like 

dog/

cat/





Using the **image_dataset_from_directory** function of keras, load your dataset into a variable named **train**

use the following parameters
- labels='inferred'
- label_mode='categorical'
- image_size=(64,64)




In [None]:
import tensorflow

In [None]:

from tensorflow.keras.preprocessing import image_dataset_from_directory

# label_mode='categorical' one-hot-encodes the labels (no sparse_categorical_entropy loss in the model)
train = image_dataset_from_directory('classification', label_mode='categorical', labels='inferred', image_size=(64,64))
train

Found 20 files belonging to 2 classes.


<BatchDataset shapes: ((None, 64, 64, 3), (None, 2)), types: (tf.float32, tf.float32)>

In [None]:
X_train, y_train = next(iter(train))
X_train.shape, y_train.shape

(TensorShape([20, 64, 64, 3]), TensorShape([20, 2]))

loop over the dataset and display the shape of the iteration value

In [None]:
import numpy as np

class_values = np.unique(y_train)
class_count = len(class_values)
class_count, class_values

(2, array([0., 1.], dtype=float32))

In [None]:
image_height = X_train.shape[1]
image_width = X_train.shape[2]
color_count = X_train.shape[3]
image_height, image_width, color_count

(64, 64, 3)

create a small convolutional model and train it on the dataset 

In [None]:
X_train = X_train / 255

In [None]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D

In [None]:
model = Sequential()
model.add(Conv2D(32, activation='relu', kernel_size=[3,3], input_shape=[image_height, image_width, color_count]))
model.add(Flatten())
model.add(Dense(units=300, activation='relu'))
model.add(Dense(units=class_count, activation='softmax')) # as many neurones as classes
# softmax normalizes the model's outputs so that it looks like a proba distribution with Sum(output_i) = 1

In [None]:
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy']) 
# categorical_crossentropy : used when class values are already one-hot-encoded
# sparse_categorical_crossentropy : used when class values are not already one-hot-encoded

In [None]:
X_train.shape, y_train.shape

(TensorShape([20, 64, 64, 3]), TensorShape([20, 2]))

In [None]:
model_history = model.fit(X_train, y_train, validation_split=0.3, epochs=10)
model_history

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f42641bee50>

## Data augmentation 

We now want to do Data augmentation. 

Data augmentation is a technique to artificially increase the dataset. 

Image data augmentation idea is pretty simple : you will apply transformation on each image to generate additional synthetic example. You can for example do  rotation, cropping, luminosity changes, zooming, etc. 

Keras offer several data-augmentation techniques. 

For the image they are here : https://keras.io/api/layers/preprocessing_layers/image_augmentation/








Build a model by adding the layer RandomRotation and RandomTranslation 


In [None]:
from tensorflow.keras.layers.experimental.preprocessing import RandomRotation, RandomTranslation

In [None]:
model = Sequential()
model.add(RandomRotation(factor=(-0.2, 0.3), input_shape=[image_height, image_width, color_count]))
model.add(RandomTranslation(height_factor=(-0.2, 0.3), width_factor=(0.2, 0.3)))
model.add(Conv2D(32, activation='relu', kernel_size=[3,3], input_shape=[image_height, image_width, color_count]))
model.add(Flatten())
model.add(Dense(units=300, activation='relu'))
model.add(Dense(units=class_count, activation='softmax')) # as many neurones as classes
# softmax normalizes the model's outputs so that it looks like a proba distribution with Sum(output_i) = 1

In [None]:
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy']) 
# categorical_crossentropy : used when class values are already one-hot-encoded
# sparse_categorical_crossentropy : used when class values are not already one-hot-encoded

train the model again 

In [None]:
model_history = model.fit(X_train, y_train, validation_split=0.3, epochs=10)
model_history

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f42535725d0>

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=678e1752-4f32-4619-b3d3-557ab0f005a3' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>