# Importing the necessary libraries

**Tensorflow** is an end-to-end open source platform for machine learning<br>
To install tensorflow, open the anaconda prompt, activate the correct environment, then run `conda install tensorflow`

**Keras** is a deep learning API written in Python, running on top of the machine learning platform TensorFlow.

**NumPy** is an open source project aiming to enable numerical computing with Python.

**Matplotlib** is a comprehensive library for creating static, animated, and interactive visualizations in Python.

**pathlib** is a python module used for file system paths

**tensorflow_datasets** is a collection of datasets ready to use

**pandas** is a library that provides easy-to-use data structures and data analysis tools for the Python.

In [None]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import pathlib
import tensorflow_datasets as tfds
import pandas as pd

# The Data

### Images

In [None]:
data_dir = pathlib.Path("flower_photos")

if not data_dir.is_dir():
    dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
    data_dir = tf.keras.utils.get_file(origin=dataset_url,
                                   fname='flower_photos',
                                   untar=True)
    data_dir = pathlib.Path(data_dir)

print(data_dir.resolve())

batch_size = 32
img_height = 180
img_width = 180
validation_split = 0.2
seed = 404 # Where did it go?

In [None]:
# returns a tf.data.Dataset
train_dataset = tf.keras.utils.image_dataset_from_directory(
    directory=data_dir, # no default value - the directory containing the data
    labels='inferred', # default value - uses names of folders containing the images
    label_mode='int', # default value - other options are 'categorical' and 'binary'
    class_names=None, # default value - changes the order of the classes when specified
    color_mode='rgb', # default value - changes the amount of channels used from images ('grayscale', 'rgb', or 'rgba')
    batch_size=batch_size, # default is 32 - size of the batches of data
    image_size=(img_height, img_width), # default is 256, 256 - resize all images to same dimensions
    shuffle=True, # default value - shuffles data, otherwise keeps data in alphanumeric order
    seed=seed, # default is None - the shuffle seed
    validation_split=validation_split, # default is None - what percentage of the data is used for checking data
    subset="training", # default is None - when splitting data into training and validating, specify which type
    interpolation='bilinear', # default value - The method used to resize images. Also supports "nearest", "bicubic", "area", "lanczos3", "lanczos5", "gaussian", "mitchellcubic".
    follow_links=False, # default value - Whether it visits subdirectories pointed to by symlinks (aka Symbolic Links).
    crop_to_aspect_ratio=False # default value - When true, resizes the images without aspect ratio distortion
)
# see https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory
# for more details on the following function call

In [None]:
validation_dataset = tf.keras.utils.image_dataset_from_directory(
    directory=data_dir,
    batch_size=batch_size,
    image_size=(img_height, img_width),
    seed=seed,
    validation_split=validation_split,
    subset="validation"
)

In [None]:
class_names = train_dataset.class_names
print(class_names)

In [None]:
plt.figure(figsize=(10, 10))
for images, labels in train_dataset.take(1):
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title(class_names[labels[i]])
        plt.axis("off")

### The following cell of code is for performance

Straight from the [tutorial](https://www.tensorflow.org/tutorials/load_data/images) because I have no idea how I would better explain it myself:

>Let's make sure to use buffered prefetching so you can yield data from disk >without having I/O become blocking. These are two important methods you should use when loading data:<br>
>* Dataset.cache keeps the images in memory after they're loaded off disk during the first epoch. This will ensure the dataset does not become a bottleneck while training your model. If your dataset is too large to fit into memory, you can also use this method to create a performant on-disk cache.<br>
>* Dataset.prefetch overlaps data preprocessing and model execution while training.
>
>Interested readers can learn more about both methods, as well as how to cache data to disk in the Prefetching section of the [Better performance with the tf.data API](https://github.com/tensorflow/docs/blob/1054aa3c4bc8379799fe84b32f5b8ef31ddfa61f/site/en/guide/data_performance.ipynb) guide.

In [None]:
AUTOTUNE = tf.data.AUTOTUNE

train_dataset = train_dataset.cache().prefetch(buffer_size=AUTOTUNE)
validation_dataset = validation_dataset.cache().prefetch(buffer_size=AUTOTUNE)

# The Model

### Create the model

Want a list of types of layers?<br>
Or maybe a list of different activation functions?<br>
See https://keras.io/api/layers/ or https://www.tensorflow.org/api_docs/python/tf/keras/layers for the official documentation on these cool concepts

#### What is data augmentation?
Data augmentation is
>a technique to increase the diversity of your training set by applying random (but realistic) transformations, such as image rotation. 

In [None]:
data_augmentation = keras.Sequential(
    [
        tf.keras.layers.RandomFlip(
                            "horizontal",
                            input_shape=(img_height, img_width, 3),
                            seed=42
                         ),
        tf.keras.layers.RandomRotation(0.1, seed=314159265),
        tf.keras.layers.RandomZoom(0.1, seed=271828)
    ]    
)

In [None]:
plt.figure(figsize=(10, 10))
for images, labels in train_dataset:
    plt.imshow(images[0].numpy().astype("uint8"))
    plt.axis("off")
    plt.figure(figsize=(10, 10))
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(data_augmentation(images[0].numpy().astype("uint8") / 255.0))
    break


In [None]:
num_classes = len(class_names)

model = tf.keras.Sequential([
    tf.keras.layers.Rescaling(1.0/255.0, offset=0), # standardize the values to 0-1
    data_augmentation,
    tf.keras.layers.Conv2D(32, 3, activation='relu'), # use the string name for the activation
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32, 3, activation=tf.nn.relu), # or use the function for the activation
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32, 3, activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(num_classes)
])

### Compile the Model
The `compile` function conigures the model for training<br>
Click [here](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) for different optimizer algorithms supported by tensorflow<br>
Click [here](https://www.tensorflow.org/api_docs/python/tf/keras/losses) for different loss functions supported by tensorflow<br>



In [None]:
model.compile(
    optimizer='adam',
    loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)


### Train the Model
Click [here](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch) for how to setup your own training loop from scratch<br>


In [None]:
epochs = 15
print(type(train_dataset))
history = model.fit(
    x=train_dataset, # the input data
    y=None, # the target data. Is not specified here since the input is a dataset
    batch_size=None, # The batch size. Is not specified here since the input is a dataset
    epochs=epochs, # the number of iterations over the entire data provided (https://youtu.be/eJ9-d9jheMI)
    verbose='auto', # default value - what stuff to print
    validation_data=validation_dataset,
    shuffle=True, # default value - whether to shuffle the dataset before each value
    class_weight=None, # default value - used for weighting the loss function during training
    sample_weight=None, # default value - also used for weighting the loss function
    initial_epoch=0, # default value - the epoch to start training (used for resuming from previous training)
    # steps_per_epoch=None, # default value - The number of steps before declaring one epoch finished and starting the next epoch. However, steps_per_epoch=None is not supported
    validation_steps=None, # default value - the total number of steps to draw before stopping when performing validation at the end of every epoch
    validation_batch_size=None, # Number of samples per validation batch
    validation_freq=1, # default value - specifies how many training epochs to run  before a new validation run is performed
    max_queue_size=10, # default value - maximum size for the generator queue
    workers=1, # default value - maximum number of processses to spin up when using process-based threading
    use_multiprocessing=False # default value - If true, use process-based threading
)
# see https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit for more details on this function

In [None]:
model.summary()

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

In [None]:
sunflower_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/592px-Red_sunflower.jpg"
sunflower_path = tf.keras.utils.get_file('Red_sunflower', origin=sunflower_url)

img = tf.keras.utils.load_img(
    sunflower_path, target_size=(img_height, img_width)
)
plt.imshow(img)
img_array = tf.keras.utils.img_to_array(img)
img_array = tf.expand_dims(img_array, 0)

predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])

print(
    "This image most likely belongs to {} with a {:.2f} percent confidence."
    .format(class_names[np.argmax(score)], 100 * np.max(score))
)

### Saving the Model
Click [here](https://www.tensorflow.org/tutorials/keras/save_and_load) for more information on saving models, including saving models after a training epoch

In [None]:
saved_models = pathlib.Path('saved_models')
if not saved_models.is_dir():
    saved_models.mkdir()

model.save('saved_models\my_model')

In [None]:
new_model = tf.keras.models.load_model('saved_models\my_model')

# Check its architecture
new_model.summary()

In [None]:
predictions = new_model.predict(img_array)
score = tf.nn.softmax(predictions[0])
print(
    "This image most likely belongs to {} with a {:.2f} percent confidence."
    .format(class_names[np.argmax(score)], 100 * np.max(score))
)

### CSV Files

Using `pandas` to read csv files

In [None]:
abalone_file_path = pathlib.Path("abalone_train.csv")
if abalone_file_path.is_file():    
    abalone_train = pd.read_csv(
        "abalone_train.csv",
        names=["Length", "Diameter", "Height", "Whole weight", "Shucked weight",
               "Viscera weight", "Shell weight", "Age"]
    )
else:
    abalone_train = pd.read_csv(
        "https://storage.googleapis.com/download.tensorflow.org/data/abalone_train.csv",
        names=["Length", "Diameter", "Height", "Whole weight", "Shucked weight",
               "Viscera weight", "Shell weight", "Age"]
    )    
abalone_train.head()

We will try to predict the age based on the rest of the data.<br>
To do so, we will create a features remove the `'Age'` category from the pandas Dataframe

In [None]:
abalone_features = abalone_train.copy()
abalone_labels = abalone_features.pop('Age')
abalone_features = np.array(abalone_features)
abalone_features

In [None]:
abalone_model = tf.keras.Sequential([
  tf.keras.layers.Dense(64),
  tf.keras.layers.Dense(1)
])

abalone_model.compile(loss = tf.keras.losses.MeanSquaredError(),
                      optimizer = tf.optimizers.Adam())

abalone_model.fit(abalone_features, abalone_labels, epochs=10)

For more information on loading CSV files (including files that contain different data types and not just numbers) click [here](https://www.tensorflow.org/tutorials/load_data/csv)

### Loading Text
Click [here](https://www.tensorflow.org/tutorials/load_data/text) to see the basics of text loading and processing

# Further Studying
#### Want to learn more about tensorflow? No better way to learn than the [official tutorials](https://www.tensorflow.org/tutorials).
*Be sure to look on the left-hand side or the hambuger menu dropdown for all of this great information and more* 