### TFData Loader Installation

Hello and welcome. Below is a simple guide to installing and using my module for loading image data for Image Classification problem.

Run below cells to install the module:

In [0]:
!git clone https://github.com/sebastian-sz/tfdata-image-loader.git

Cloning into 'tfdata-image-loader'...
remote: Enumerating objects: 32, done.[K
remote: Counting objects: 100% (32/32), done.[K
remote: Compressing objects: 100% (27/27), done.[K
remote: Total 32 (delta 0), reused 27 (delta 0), pack-reused 0[K
Unpacking objects: 100% (32/32), done.


In [0]:
!cd tfdata-image-loader; pip install -e .

Obtaining file:///content/tfdata-image-loader
Installing collected packages: tfdata-image-loader
  Running setup.py develop for tfdata-image-loader
Successfully installed tfdata-image-loader


After installing the external module, please restart your runtime.   
Alternatively you can run:

In [0]:
import os

def restart_runtime():
  os.kill(os.getpid(), 9)

restart_runtime()

Proceed with standard python imports:

In [0]:
%tensorflow_version 2.x

import os

import matplotlib.pyplot as plt
import tensorflow as tf

from tfdata_image_loader import TFDataImageLoader

print(tf.__version__)

2.2.0-rc2


#### (Optionally) Run the tests.
I used pytest for testing the loader. The tests can be run by executing `pytest` in the terminal in the `tests` directory.

We also need to install few test dependencies.

In [0]:
!cd tfdata-image-loader/tests; pip install -r requirements.txt

Collecting pytest==5.4.1
[?25l  Downloading https://files.pythonhosted.org/packages/c7/e2/c19c667f42f72716a7d03e8dd4d6f63f47d39feadd44cc1ee7ca3089862c/pytest-5.4.1-py3-none-any.whl (246kB)
[K     |████████████████████████████████| 256kB 2.8MB/s 
Collecting pluggy<1.0,>=0.12
  Downloading https://files.pythonhosted.org/packages/a0/28/85c7aa31b80d150b772fbe4a229487bc6644da9ccb7e427dd8cc60cb8a62/pluggy-0.13.1-py2.py3-none-any.whl
[31mERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.[0m
Installing collected packages: pluggy, pytest
  Found existing installation: pluggy 0.7.1
    Uninstalling pluggy-0.7.1:
      Successfully uninstalled pluggy-0.7.1
  Found existing installation: pytest 3.6.4
    Uninstalling pytest-3.6.4:
      Successfully uninstalled pytest-3.6.4
Successfully installed pluggy-0.13.1 pytest-5.4.1


In [0]:
!cd tfdata-image-loader/tests; pytest

platform linux -- Python 3.6.9, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /content/tfdata-image-loader
plugins: typeguard-2.7.1
[1mcollecting ... [0m[1mcollected 15 items                                                             [0m

test_image_data_loader.py [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[33m                                [100%][0m

test_image_data_loader.py:124
    assert (

test_image_data_loader.py:90
    assert (



### Download example dataset

In this section we are going to download example dataset.

In [0]:
!curl https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz | tar xz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  218M  100  218M    0     0  78.9M      0  0:00:02  0:00:02 --:--:-- 78.9M


Remove the License file so it doesn't mess up directory structure:

In [0]:
!rm flower_photos/LICENSE.txt

Preview Class names:

In [0]:
!ls flower_photos

daisy  dandelion  roses  sunflowers  tulips


### Load the data using our loader

In [0]:
DATA_PATH = "./flower_photos"
BATCH_SIZE = 32
TARGET_SIZE = (224, 224)


def preprocess_data(image, label):
    return (image / 127.5) - 1, label


def augment_data(image, label):
    return tf.image.random_flip_left_right(image), label

In [0]:
data_loader = TFDataImageLoader(
    data_path=DATA_PATH,
    target_size=TARGET_SIZE,
    batch_size=BATCH_SIZE,
    pre_process_function=preprocess_data,
    augmentation_function=augment_data,
)

Found 3670 images, belonging to 5 classes

Class names mapping: 
{'daisy': array([1, 0, 0, 0, 0], dtype=int32), 'dandelion': array([0, 1, 0, 0, 0], dtype=int32), 'roses': array([0, 0, 1, 0, 0], dtype=int32), 'sunflowers': array([0, 0, 0, 1, 0], dtype=int32), 'tulips': array([0, 0, 0, 0, 1], dtype=int32)}



In [0]:
dataset = data_loader.load_dataset()

In [0]:
for image_batch, label_batch in dataset.take(1):
    print(image_batch.shape)
    print(label_batch.shape)

(32, 224, 224, 3)
(32, 5)


### Train custom model
We can use the loaded data to train a model:

In [0]:
def make_model(num_classes):
    base_model = tf.keras.applications.MobileNetV2(
        input_shape=(TARGET_SIZE[0], TARGET_SIZE[1], 3),
        include_top=False,
        pooling="avg",
    )

    base_model.trainable=False

    return tf.keras.Sequential([
        base_model,
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(num_classes, activation="softmax")
    ])

In [0]:
num_classes = len(os.listdir(DATA_PATH))

model = make_model(num_classes=num_classes)
model.compile(
    optimizer=tf.keras.optimizers.RMSprop(),
    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=False),
    metrics=['accuracy']
)

model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
mobilenetv2_1.00_224 (Model) (None, 1280)              2257984   
_________________________________________________________________
dropout (Dropout)            (None, 1280)              0         
_________________________________________________________________
dense (Dense)                (None, 5)                 6405      
Total params: 2,264,389
Trainable params: 6,405
Non-trainable params: 2,257,984
_________________________________________________________________


In [0]:
model.fit(
    dataset,
    epochs=1,
)



<tensorflow.python.keras.callbacks.History at 0x7fea82d4f4e0>

### Using your own data.

In order to use your own data you can either:
1. Install `tfdata-image-loader` locally
2. Connect your Google Drive with Colab Notebook and pass the `data path` to Google Drive. For example:
```
from google.colab import drive
from tfdata_image_loader import TFDataImageLoader 
drive.mount('.') 
data_path = "drive/My Drive/data/train/..."
train_loader =  TFDataImageLoader(
    data_path
    (...)
)
```
You can also temporarily copy the data from drive to colab.