# 1. Install Dependencies and Setup

**1.1 Let's setup Kaggle**

1. Visit https://www.kaggle.com/
2. Create an account
3. Go to 'Settings'
4. Under the 'Account' tab, select 'Create New Token'
5. Kaggle will download a kaggle.json object.
6. Relocate this object from 'Downloads' to ~/.kaggle (Kaggle will look for the object at ~/.kaggle/kaggle.json)

Now, your terminal will have access to the 'kaggle' keyword

**1.2 Now, setup the python environment**

1. Download python3 if you don't have it installed
2. Run: `python3 -m venv env`
3. Run: `source env/bin/activate`

You should now be inside running a python virtual environment

**1.3 once in the virtual environment, we will install dependencies and fetch our database!**

1. Run: `pip install -r requirements.txt`
2. Run: `kaggle datasets download -d moltean/fruits`
3. Run: `unzip fruits.zip`

**•You should now have both the resized 100x100 dataset and the original size dataset as directories in your working directory.**

**Note: it makes sense to just use the 100x100 dataset to reduce training time.**

**It also has way more classes than the full-size set**

If you have all the requirements installed, we should be able to do the following:

Note: The first time you do this in Visual Studio Code, it may ask you which Python environment to use. Select the one you previously initialized.

In [1]:
import tensorflow as tf
import os

In [2]:
# Avoid OOM errors by setting GPU Memory Consumption Growth
# This was in a tutorial. Not sure if it's really necessary but run just in case..

## IMPORTANT: If you are on Windows, uncomment this line: ##
# os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus: 
    tf.config.experimental.set_memory_growth(gpu, True)

In [3]:
# Shows us which GPUs our system has access to. It's okay if you don't have any.
tf.config.list_physical_devices('GPU')

[]

# 2. Check out our training set

In [4]:
import cv2
from PIL import Image

In [9]:
#TODO: Change 'data' to the name of your training set directory
# You should see a list of classes
data_dir = 'fruits-360_dataset_100x100/fruits-360/Training'
os.listdir(data_dir)

['Tomato 4',
 'Corn Husk 1',
 'Huckleberry 1',
 'Tomato 3',
 'Strawberry Wedge 1',
 'Physalis 1',
 'Pineapple 1',
 'Avocado 1',
 'Pear Kaiser 1',
 'Grape Blue 1',
 'Cactus fruit 1',
 'Apple Granny Smith 1',
 'Cherry 1',
 'Tomato 2',
 'Grapefruit Pink 1',
 'Melon Piel de Sapo 1',
 'Eggplant long 1',
 'Grape White 1',
 'Carrot 1',
 'Redcurrant 1',
 'Pear Stone 1',
 'Maracuja 1',
 'Nut Pecan 1',
 'Quince 1',
 'Nut Forest 1',
 'Cocos 1',
 'Grapefruit White 1',
 'Raspberry 1',
 'Apple Braeburn 1',
 'Tamarillo 1',
 'Banana Lady Finger 1',
 'Hazelnut 1',
 'Cabbage white 1',
 'Mandarine 1',
 'Kumquats 1',
 'Apricot 1',
 'Banana Red 1',
 'Papaya 1',
 'Mangostan 1',
 'Apple Golden 2',
 'Carambula 1',
 'Peach Flat 1',
 'Apple Red 1',
 'Peach 1',
 'Cherry Wax Yellow 1',
 'Granadilla 1',
 'Avocado ripe 1',
 'Pomegranate 1',
 'Mulberry 1',
 'Mango 1',
 'Apple Golden 3',
 'Cucumber 3',
 'Zucchini 1',
 'Walnut 1',
 'Tomato Heart 1',
 'Cherry Rainier 1',
 'Apple Red Yellow 1',
 'Plum 1',
 'Nectarine Fl

# 3. Load Data

In [10]:
import numpy as np
from matplotlib import pyplot as plt

In [None]:
# This formats our data...
# TODO: Ensure the image size is kept at (100, 100)
data = tf.keras.utils.image_dataset_from_directory(
    data_dir,
    image_size=(100, 100))

Found 70491 files belonging to 141 classes.
Found 70491 files belonging to 141 classes.


In [12]:
# Each time we call this, it gives us a new set of data
data_iterator = data.as_numpy_iterator()

In [13]:
# 32 images per batch, 100x100, 3 channels (R, G, B)
batch = data_iterator.next()
batch[0].shape

(32, 256, 256, 3)

# 4. Scale Data

1. Our tensorflow model works with values between 0 and 1.
2. Our images give us pixel R, G, B values from 0-255.

Thus, we need to scale our input data down.

In [None]:
# Hint: Pixel values range from 0-255. We want to scale x to range between 0-1.
# x represents our data, and y represents our class. Therefore, we shouldn't worry about y

# TODO: Uncomment + complete the following statement:
# data = data.map(lambda x,y: (x, y))

In [None]:
# This will now give us an iterator with our SCALED data!
scaled_iterator = data.as_numpy_iterator()
batch = scaled_iterator.next()


In [None]:
# Once previous TODOs are complete, you should see 4 100x100 images here (of fruits, hopefully)
fig, ax = plt.subplots(ncols=4, figsize=(20,20))
for idx, img in enumerate(batch[0][:4]):
    ax[idx].imshow(img)
    ax[idx].title.set_text(batch[1][idx])

# 5. Include Test Data

1. Now, it's your turn. Do the same steps, except this time with the testing directory...
**Note: for functionality, you'll only need to pattern-match some of the lines**

# 6 Create Validation Set
•Now, we're gonna do something funky!

•Usually, we would have a training, a validation, and a testing set.

•With the 100x100 fruits dataset, we are missing a validation set...

To fix this, we are going to create our validation set by stealing some images from our training set!

However... this can result in issues!
It may result in training data leaving out some of our classes (fruits)...

As an exercise, you can think about what you can do to solve this...

In [None]:
train_size = int(len(data) * 0.7)
val_size = int(len(data) * 0.3)
# Leave our test data alone

# TODO: put in the name of your test_data here
test_size = int(len(test_data))

# TODO: Make sure train_size + val_size + test_size lines up with the total size of your data...

In [None]:
# Notice how we separate the training + validation data...

train = data.take(train_size)
val = data.skip(train_size).take(val_size)
test = test_data.take(test_size)

# 6. Build Deep Learning Model
(We will get into this more the second week)

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Dropout, GlobalAveragePooling2D, BatchNormalization
from tensorflow.keras.regularizers import l2

In [None]:
model = Sequential()

In [None]:
# TODO: Add in all your layers here...

In [None]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])


In [None]:
model.summary()

# 7. Train

In [None]:
logdir='logs'

In [None]:
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)

In [None]:
# TODO: Complete the arguments for model.fit()
hist = model.fit()