<h1>Table of Contents<span class="tocSkip"></span></h1>


# Introduction
<hr style = "border:2px solid black" ></hr>


**What?** The ETL Process for Managing Data in TensorFlow



# What is an ETL pipeline?
<hr style = "border:2px solid black" ></hr>


- The **Extract phase** is when the raw data is loaded from wherever it is stored and prepared in a way that can be transformed. 
- The **Transform phase** is when the data is manipulated in a way that makes it suitable or improved for training. For example, batching, image augmentation, mapping to feature columns, and other such logic applied to the data can be considered part of this phase. 
- The **Load phase** is when the data is loaded into the neural network for training.


- Tasks like downloading data, unzipping it, and going through it record by record and processing them is generally done on CPU. 
- When it comes to training, however, you can get great benefits from a GPU or TPU, so it makes sense to use one for this phase if possible. 
    


# Imports
<hr style = "border:2px solid black" ></hr>

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_addons as tfa

# Model definition
<hr style = "border:2px solid black" ></hr>

In [None]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu',
                           input_shape=(300, 300, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='Adam', loss='binary_crossentropy',
              metrics=['accuracy'])

# Extract phase
<hr style = "border:2px solid black" ></hr>

In [None]:
data = tfds.load('horses_or_humans', split='train', as_supervised=True)
val_data = tfds.load('horses_or_humans', split='test', as_supervised=True)

# Transform phase
<hr style = "border:2px solid black" ></hr>

In [None]:
def augmentimages(image, label):
    image = tf.cast(image, tf.float32)
    image = (image/255)
    image = tf.image.random_flip_left_right(image)
    image = tfa.image.rotate(image, 40, interpolation='NEAREST')
    return image, label

train = data.map(augmentimages)
train_batches = train.shuffle(100).batch(32)
validation_batches = val_data.batch(32)

# Load phase
<hr style = "border:2px solid black" ></hr>

In [None]:
history = model.fit(train_batches, epochs=10,
                    validation_data=validation_batches, validation_steps=1)

# References
<hr style = "border:2px solid black" ></hr>


- [AI and Machine Learning for Coders By Laurence Moroney](https://books.google.co.uk/books?hl=en&lr=&id=gw4CEAAAQBAJ&oi=fnd&pg=PR4&dq=AI+and+machine+learning+for+coders&ots=4NCFFOjg7s&sig=OTqLXnnKjFd4ZY_emAFZ3sTX2LE&redir_esc=y#v=onepage&q=AI%20and%20machine%20learning%20for%20coders&f=false) 

