<a href="https://colab.research.google.com/github/lakshitgosain/Tensorflow-ZTM/blob/main/Food_Vision_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Food Vision Project


What does mixed precision training do?
Mixed precision training uses a combination of single precision (float32) and half-preicison (float16) data types to speed up model training (up 3x on modern GPUs).

In [1]:
import os 

if not os.path.exists("helper_functions.py"):
    !wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py
else:
    print("[INFO] 'helper_functions.py' already exists, skipping download.")

--2023-06-12 10:08:34--  https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10246 (10K) [text/plain]
Saving to: ‘helper_functions.py’


2023-06-12 10:08:34 (67.4 MB/s) - ‘helper_functions.py’ saved [10246/10246]



In [2]:
# Import series of helper functions for the notebook (we've created/used these in previous notebooks)
from helper_functions import create_tensorboard_callback, plot_loss_curves, compare_historys

In [3]:
#Get Tensorflow Datasets
import tensorflow_datasets as tfds

In [4]:
#List all the available datasets
datasets_list=tfds.list_builders()
print('food101' in datasets_list)#is out target dataset in the list of tfds

True


#load in the data 

In [None]:
(train_data, test_data), ds_info= tfds.load('food101',
                                            split=["train","validation"],
                                            shuffle_files=True,
                                            as_supervised=True,
                                            with_info=True)


Downloading and preparing dataset 4.65 GiB (download: 4.65 GiB, generated: Unknown size, total: 4.65 GiB) to /root/tensorflow_datasets/food101/2.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]

## Exploring the Food 101 Dataset. We want to find :
 
* Class Names
* The shape of our input tensors
* The datatype of our input data
* What do the labels look like. (e.g. Are they one hot encoded or are they label encoded)
* Do the labels match up with the class names


In [None]:
ds_info.features

In [None]:
#Get the class names
class_names=ds_info.features['label'].names
class_names[:10]

In [None]:
#Take one sample of the train data

train_one_sample= train_data.take(1) #Samples are in format image tensot and label

In [None]:
#what does one sameple of our training data look like
train_one_sample

In [None]:
for image, label in train_one_sample:
  print(f"""
  Image Shape: {image.shape},
  Image Datatype: {image.dtype},
  Target class from food101 (tensor form): {label},
  Class Name (str form): {class_names[label.numpy()]}
        """)

In [None]:
#What does out image tensor look like 
image

In [None]:
import tensorflow as tf
tf.reduce_min(image), tf.reduce_max(image)

In [None]:
### Plot an image from tensorflow datasets
import matplotlib.pyplot as plt
plt.imshow(image)
plt.title(class_names[label.numpy()])#Add title to image to verify that the label associated to the image is correct
plt.axis(False);

## Create preproceiing functions for our data

What we know about our data

* In 'uint8' datatype
* comprised of all different size tensors
* It's not scaled
* not in batches

What we know models want
* Data in float32 dtype (or mixedprecision' float26 or float32
* For batches, tensorflow likes all of the tensors within a batch on the same size
* Scaled (values between 0 and 1) also called normalized tensors, generally performs better

With these points in mind, we've got a few things we can tackle with a preprocessing function

since we're going to be usin an EfficientNetB0, we dont need to rescale our data.. these have rescaling built in. Tis means that our function needs to 
1. Reshape our images to all the same size
2. Convert the dtype of our image tensors from uint8 to float.


In [None]:
#Make a function for preprocessing images
def preprocess_img(image, label, img_shape=224):
  """
  Converts image datatype from uint8 -> float32
  Reshapes image to ([image_shape, image_shape, color_channels])
  """
  image=tf.image.resize(image, [img_shape, img_shape])

  return tf.cast(image, tf.float32), label

In [None]:
preprocessed_img=preprocess_img(image, label)[0]

In [None]:
# Preprocess a single sample image and check the outputs
preprocessed_img = preprocess_img(image, label)[0]
print(f"Image before preprocessing:\n {image[:2]}...,\nShape: {image.shape},\nDatatype: {image.dtype}\n")
print(f"Image after preprocessing:\n {preprocessed_img[:2]}...,\nShape: {preprocessed_img.shape},\nDatatype: {preprocessed_img.dtype}")

## batch and preprare datasets

We're going to make our data input pipeline run really fast

Specifically, we're going to be using:

* map() - maps a predefined function to a target dataset (e.g. preprocess_img() to our image tensors)
* shuffle() - randomly shuffles the elements of a target dataset up buffer_size (ideally, the buffer_size is equal to the size of the dataset, however, this may have implications on memory)
* batch() - turns elements of a target dataset into batches (size defined by parameter batch_size)
* prefetch() - prepares subsequent batches of data whilst other batches of data are being computed on (improves data loading speed but costs memory)
* Extra: cache() - caches (saves them for later) elements in a target dataset, saving loading time (will only work if your dataset is small enough to fit in memory, standard Colab instances only have 12GB of memory)



In [None]:
#Map preprocessing function to training (and parallelize)
train_data=train_data.map(map_func=preprocess_img, num_parallel_calls=tf.data.AUTOTUNE)

#Shuffle train data and turn it into batches
train_data=train_data.shuffle(buffer_size=1000).batch(batch_size=32).prefetch(buffer_size=tf.data.AUTOTUNE)

#Map preprocessing function to our test data

test_data=test_data.map(preprocess_img, num_parallel_calls=tf.data.AUTOTUNE).batch(32).prefetch(buffer_size=tf.data.AUTOTUNE)


In [None]:
train_data, test_data

Hey tensorflow, map this preprocessing function (preprocess_image) across our training dataset and then shuffle the number of elements and batch them together and make sure you prepare new batches (prefetch) whilst the model is looking throigh the current batch