<a href="https://colab.research.google.com/github/shapi88/tensorflow_book/blob/main/ml_resources.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📖 Resources

### Documentation & APIs
* 📖 [Matrix Multiplication](https://www.mathsisfun.com/algebra/matrix-multiplying.html)
* 📖 [NumPy](https://numpy.org/doc/stable/reference/index.html)
* 📖 [Tensorflow](https://www.tensorflow.org/api_docs/python/tf)
* 📖 [Keras](https://www.keras.io)
* 📖 [Keras Transfer Learning](https://keras.io/guides/transfer_learning/#build-a-model)
* 📖 [Tensorflow Decision Forest](https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/all_symbols)

* 📖 [Pandas](https://pandas.pydata.org/docs/reference/index.html#api)
* 📖 [Seaborn](https://seaborn.pydata.org/)
* 📖 [Matplotlib](https://matplotlib.org/stable/plot_types/index.html)
* 📖 [Scipy](https://docs.scipy.org/doc/scipy/reference/index.html#scipy-api)
* 📖 [Scikit Learn](https://scikit-learn.org/stable/)
* 📖 [Missingno](https://github.com/ResidentMario/missingno)
* 📖 [Daniel Bourke](https://www.mrdbourke.com)
* 📖 [Cheat Sheet Activation Function](https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html)
* 📖 [Papers w/ code SOTA](https://paperswithcode.com/sota)
* 📖 [Huggingface](https://huggingface.co/)
* 📖 [resnet v2](https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/5)
* 📖 [efficientnet](https://tfhub.dev/tensorflow/efficientnet/b0/feature-vector/1)
* 📖 [EfficientNet Google Blog AI](https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html)

### Videos
* 🎥 [10 crazy announcements from Google I/O - Fireship](https://www.youtube.com/watch?v=nmfRDRNjCnM)
* 🎥 [Daniel Bourke YT Channel](https://www.youtube.com/channel/UCr8O8l5cCX85Oem1d18EezQ)
* 🎥 [MIT 6.S191 (2022): Convolutional Neural Networks](https://www.youtube.com/watch?v=uapdILWYTzE&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=4)
* 🎥 [mini-batch gradient descent](https://www.youtube.com/watch?v=-_4Zi8fCZO4)

### Learning
* 📖 [kaggle](https://www.kaggle.com/)
* 📖 [ztm](https://zerotomastery.io/)
* 📖 [CS231n: Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/convolutional-networks/)

### Tools
* 🛠 [Matrix Multiplication calculator](http://matrixmultiplication.xyz/)
* 🛠 [Google Collab](https://colab.research.google.com/)
* 🛠 [ChatGPT](https://chat.openai.com/)
* 🛠 [TF Playground](https://playground.tensorflow.org/)
* 🛠 [NN Case Study](https://cs231n.github.io/neural-networks-case-study/)
* 🛠 [Multi Layer Perceptron](https://github.com/GokuMohandas/Made-With-ML/blob/main/notebooks/08_Neural_Networks.ipynb)
* 🛠 [TensoBoard](https://tensorboard.dev)
* 🛠 [Weights & Biases](https://wandb.ai/site)

### Dojo

* 🥋 [mrdbourke/tensorflow-deep-learning Exercises](https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/README.md#-02-neural-network-classification-with-tensorflow-exercises)
* 🥋 [mrdbourke/tensorflow-deep-learning Discussion Page](https://github.com/mrdbourke/tensorflow-deep-learning/discussions)
* 🥋 [Data Augmentation Tutorial](https://www.tensorflow.org/tutorials/images/data_augmentation)
* 🥋 [CNN Explainer](https://poloclub.github.io/cnn-explainer/)


### Twitters

* 🪺 [@ylecun, Researcher in AI, Machine Learning, Robotics](https://twitter.com/ylecun)

### How to
* [Hot load images using tensorflow](https://www.tensorflow.org/tutorials/load_data/images)

### Papers
* 📖 [A guide to convolution arithmetic for deep learning](https://arxiv.org/pdf/1603.07285.pdf)

* 📖 [ResNet - Identity Mappings in Deep Residual Networks](https://arxiv.org/pdf/1603.05027v3.pdf)
* 📖 [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/abs/1801.06146)


#🥵 TO-DO

Check those:
* 🛠 NN Case Study
* 🛠 Multi Layer Perceptron
* 🛠 Activation Functions Formulas

# 💡 Hyperparameters
* Hyperparameters are parameters that are set before training a model and determine how the model is trained. These are not learned during training, but are specified by the user before training begins.

* Hyperparameters are used to control aspects of the model's behavior, such as the learning rate, regularization strength, number of hidden layers, and number of nodes in each layer of a neural network. The choice of hyperparameters can have a significant impact on the model's performance, and finding the optimal values for them can be a challenging task that requires experimentation and tuning.

* Hyperparameter tuning is the process of finding the best values for these parameters by adjusting them and training the model repeatedly, evaluating its performance each time. The goal is to find the hyperparameter values that result in the best performing model on a given task, such as classification or regression

# 🔑 Models
There are many different types of machine learning models that exist, each with its own strengths and weaknesses. Here are some common types of machine learning models:

* **Linear Regression**: A type of model used for predicting a continuous output value based on one or more input features.

* **Logistic Regression**: A type of model used for binary classification problems, where the output variable takes one of two possible values.

* **Decision Trees**: A type of model that makes decisions by recursively splitting the data based on the values of the input features.

* **Random Forests**: An ensemble model that combines multiple decision trees to make predictions.

* **Support Vector Machines (SVM)**: A model that finds the best hyperplane to separate the data into different classes.

* **Naive Bayes**: A model that calculates the probability of each class based on the input features using Bayes' theorem.

* **Neural Networks**: A type of model that consists of interconnected nodes that process information and can learn complex patterns in the data.

* **Clustering Models**: A type of model used to group similar data points together based on their similarity.

* **Dimensionality Reduction Models**: A type of model used to reduce the number of input features to a more manageable number.

* **Reinforcement Learning Models**: A type of model used to learn from experience through a process of trial and error to maximize a reward signal.



## Linear Regression Model

This model uses TensorFlow's Keras API to create a sequential model with a single dense layer. The `units` argument specifies the number of neurons in the layer, and the `input_shape` argument specifies the shape of the input data.

The model is compiled with stochastic gradient descent (SGD) as the optimizer and mean squared error (MSE) as the loss function. The model is then trained on the training data (`x_train` and `y_train`) for 100 epochs.

Finally, the model is used to make a prediction on new data (`x_test`), and the predicted output is printed to the console.

In [None]:
import tensorflow as tf

# Define the input and output variables
x_train = [1.0, 2.0, 3.0, 4.0, 5.0]
y_train = [3.0, 5.0, 7.0, 9.0, 11.0]

# Define the model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=1, input_shape=[1])
])

# Compile the model
model.compile(optimizer=tf.optimizers.SGD(learning_rate=0.01), loss='mean_squared_error')

# Train the model
model.fit(x_train, y_train, epochs=100, verbose=0)

# Make a prediction
x_test = [6.0]
y_pred = model.predict(x_test)

y_pred




array([[13.381614]], dtype=float32)

## Logistic Regression Model

We have a dataset with two features and binary classification labels. We define a logistic regression model using the `tf.keras.Sequential()` API, with a single `Dense` layer with one output unit and a sigmoid activation function.

The model is compiled using stochastic gradient descent (`SGD`) as the optimizer and binary cross-entropy as the loss function. We then train the model on the training data using the `fit()` method.

Finally, we use the trained model to make predictions on new data (`X_test`), and the predicted output is printed to the console.

In [None]:
import tensorflow as tf
import numpy as np

# Load the dataset
X_train = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]])
y_train = np.array([0, 0, 1, 1, 1])

# Create the model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, input_dim=2, activation='sigmoid'))

# Compile the model
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, epochs=1000, verbose=0)

# Use the model to make predictions
X_test = np.array([[6, 7], [7, 8]])
y_pred = model.predict(X_test)
y_pred


[[0.9735141]
 [0.9887447]]


## Decision Trees Model
In this example, we're using the Iris dataset and creating a decision tree model using a neural network. We define the model using the `Sequential` API and add two fully connected layers with `Dense`. The first layer has 16 units and `ReLU` activation, while the second layer has 3 units and `softmax` activation, which is suitable for multiclass classification.

The model is then compiled with the `adam` optimizer and `sparse_categorical_crossentropy` as the loss function. We then train the model on the entire dataset for 100 epochs.

Finally, we evaluate the model on the same dataset and print out the test loss and accuracy. Note that in practice, it is usually better to split the data into separate training and testing sets to avoid overfitting.

In [None]:
import tensorflow as tf
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Define the model
model = tf.keras.Sequential([
  tf.keras.layers.Dense(16, activation='relu', input_shape=(4,)),
  tf.keras.layers.Dense(3, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=100, verbose=0)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f"Test loss: {loss}, Test accuracy: {accuracy}")


Test loss: 0.42342260479927063, Test accuracy: 0.9200000166893005


## Random Forests Model

In this example, we're using the Iris dataset and creating a Random Forest model using scikit-learn's `RandomForestClassifier`. We set the number of trees to be 100 by setting `n_estimators=100`.

We then train the model on the entire dataset using the `fit` method.

Finally, we evaluate the model on the same dataset using the `score` method, which returns the mean accuracy on the given data and labels. Note that in practice, it is usually better to split the data into separate training and testing sets to avoid overfitting.

In [None]:
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Create a Random Forest model
model = RandomForestClassifier(n_estimators=100)

# Train the model
model.fit(X, y)

# Evaluate the model
score = model.score(X, y)
print(f"Test accuracy: {score}")


Test accuracy: 1.0


## Support Vector Machines (SVM) Model

In this example, we're using the Iris dataset and creating a SVM model using scikit-learn's `SVC`. We set the kernel to be linear by setting `kernel='linear'` and the regularization parameter `C` to be 1 by setting `C=1`.

We then train the model on the entire dataset using the `fit` method.

Finally, we evaluate the model on the same dataset using the `score` method, which returns the mean accuracy on the given data and labels. Note that in practice, it is usually better to split the data into separate training and testing sets to avoid overfitting.

In [None]:
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.svm import SVC

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Create a SVM model
model = SVC(kernel='linear', C=1)

# Train the model
model.fit(X, y)

# Evaluate the model
score = model.score(X, y)
print(f"Test accuracy: {score}")

Test accuracy: 0.9933333333333333


## Neural Networks Model

In this example, we're using the Iris dataset and creating a Neural Network model using TensorFlow's Keras API. We're using a sequential model with two dense layers - the first with 64 units and a ReLU activation function, and the second with 3 units and a softmax activation function to output the predicted class probabilities.

We compile the model with the 'adam' optimizer, the 'sparse_categorical_crossentropy' loss function (since our labels are integers), and the 'accuracy' metric to monitor during training.

We then train the model on the training data for 100 epochs, using the validation set to monitor the model's performance during training.

Finally, we evaluate the model on the test set using the `evaluate` method, which returns the loss and accuracy of the model on the given test data.

## CNN

## Layers Hyperparameters

* **Padding** is often necessary when the kernel extends beyond the activation map. Padding conserves data at the borders of activation maps, which leads to better performance, and it can help preserve the input's spatial size, which allows an architecture designer to build depper, higher performing networks. There exist many padding techniques, but the most commonly used approach is zero-padding because of its performance, simplicity, and computational efficiency. The technique involves adding zeros symmetrically around the edges of an input. This approach is adopted by many high-performing CNNs such as AlexNet.

* **Kernel** size, often also referred to as filter size, refers to the dimensions of the sliding window over the input. Choosing this hyperparameter has a massive impact on the image classification task. For example, small kernel sizes are able to extract a much larger amount of information containing highly local features from the input. As you can see on the visualization above, a smaller kernel size also leads to a smaller reduction in layer dimensions, which allows for a deeper architecture. Conversely, a large kernel size extracts less information, which leads to a faster reduction in layer dimensions, often leading to worse performance. Large kernels are better suited to extract features that are larger. At the end of the day, choosing an appropriate kernel size will be dependent on your task and dataset, but generally, smaller kernel sizes lead to better performance for the image classification task because an architecture designer is able to stack more and more layers together to learn more and more complex features!

* **Stride** indicates how many pixels the kernel should be shifted over at a time. For example, as described in the convolutional layer example above, Tiny VGG uses a stride of 1 for its convolutional layers, which means that the dot product is performed on a 3x3 window of the input to yield an output value, then is shifted to the right by one pixel for every subsequent operation. The impact stride has on a CNN is similar to kernel size. As stride is decreased, more features are learned because more data is extracted, which also leads to larger output layers. On the contrary, as stride is increased, this leads to more limited feature extraction and smaller output layer dimensions. One responsibility of the architecture designer is to ensure that the kernel slides across the input symmetrically when implementing a CNN. Use the hyperparameter visualization above to alter stride on various input/kernel dimensions to understand this constraint!

### ELI5:
> * **Padding**: Imagine you're coloring a picture, but some parts are missing around the edges. Padding is like adding extra space around the picture so you can color everything nicely without going outside the lines.

> * **Kernel Size:**
When you want to look closely at something, you might use a small magnifying glass. In a computer, a kernel is like a special magnifying glass that helps us see small details in pictures. The kernel size is like the size of that magnifying glass—smaller sizes let us see tiny things, and larger sizes show bigger things.

> * **Stride:**
When you take steps, you move from one place to another. Stride is like the size of your steps when you walk. In a computer, stride is about how far we move the special magnifying glass (the kernel) when we look at a picture. Small strides mean we take small steps and see lots of details, while big strides mean we take bigger steps and see less detail.

## ResNet Implementation

In a ResNet, we start by putting some blocks together to form the base of our tower. This is like the initial part of the network that helps us understand the basic features of the input, like the shape and color.


Then, we start adding more blocks on top of the base. But here's the interesting part: instead of just adding one block on top of another, we create something called a "residual block."


A residual block is like a special kind of block that keeps a connection to the previous block. It's like having a shortcut between the blocks. This shortcut helps us in two ways:

* It allows us to learn more complex features. As we go higher in the tower, we can start combining the basic features we learned in the base to form more intricate patterns. It's like using the building blocks in different ways to create more interesting shapes and structures.

* It helps us avoid getting lost or confused. Sometimes, as we add more blocks, we can make mistakes or lose important information. But with the shortcut connection, we can go back to the previous block and make corrections or use the information that we might have missed. It's like having a guide or map that shows us the right path if we get stuck.

By using these residual blocks with shortcut connections, we can build a deeper and more powerful tower (or neural network). This helps us understand and recognize more complex patterns in the input data, like identifying different objects in images or understanding the meaning of words in text.

So, in simple terms, a ResNet is like building a tower of blocks where each block remembers what was learned before and helps us build more complex structures. This allows us to solve more challenging problems and make better predictions.

* 📖 [ResNet - Identity Mappings in Deep Residual Networks](https://arxiv.org/pdf/1603.05027v3.pdf)

> 🔑 **Note**: Basic implementation of a Deep Residual Network (ResNet) in TensorFlow. Here's an example of a ResNet with multiple residual blocks:

In [None]:
import tensorflow as tf
from tensorflow.keras import layers

def residual_block(x, filters, strides=1, use_projection=False):
    identity = x

    # Projection shortcut to match dimensions (if needed)
    if use_projection:
        identity = layers.Conv2D(filters, kernel_size=1, strides=strides, padding='same')(identity)
        identity = layers.BatchNormalization()(identity)

    # First convolutional layer
    x = layers.Conv2D(filters, kernel_size=3, strides=strides, padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

    # Second convolutional layer
    x = layers.Conv2D(filters, kernel_size=3, strides=1, padding='same')(x)
    x = layers.BatchNormalization()(x)

    # Adding the identity shortcut to the output
    x = layers.add([x, identity])
    x = layers.ReLU()(x)

    return x

def build_resnet(input_shape, num_classes):
    inputs = tf.keras.Input(shape=input_shape)

    # Initial convolutional layer
    x = layers.Conv2D(64, kernel_size=7, strides=2, padding='same')(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
    x = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(x)

    # Residual blocks
    x = residual_block(x, filters=64, strides=1, use_projection=False)
    x = residual_block(x, filters=64, strides=1, use_projection=False)
    x = residual_block(x, filters=64, strides=1, use_projection=False)

    x = residual_block(x, filters=128, strides=2, use_projection=True)
    x = residual_block(x, filters=128, strides=1, use_projection=False)
    x = residual_block(x, filters=128, strides=1, use_projection=False)

    x = residual_block(x, filters=256, strides=2, use_projection=True)
    x = residual_block(x, filters=256, strides=1, use_projection=False)
    x = residual_block(x, filters=256, strides=1, use_projection=False)

    x = residual_block(x, filters=512, strides=2, use_projection=True)
    x = residual_block(x, filters=512, strides=1, use_projection=False)
    x = residual_block(x, filters=512, strides=1, use_projection=False)

    # Final layers
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dense(num_classes, activation='softmax')(x)

    # Create the model
    model = tf.keras.Model(inputs=inputs, outputs=x)

    return model


The build_resnet function can be used to create a ResNet model by specifying the input shape and the number of classes for your classification task.

In [None]:
input_shape = (224, 224, 3)  # Example input shape for RGB images
num_classes = 10  # Example number of classes

model = build_resnet(input_shape, num_classes)


## There are two ways to instantiate a Model:

more info at [tf.keras.Model](https://www.tensorflow.org/api_docs/python/tf/keras/Model) or [Keras Build a Model](https://keras.io/guides/transfer_learning/#build-a-model)

In [None]:
base_model = keras.applications.Xception(
    weights="imagenet",  # Load weights pre-trained on ImageNet.
    input_shape=(150, 150, 3),
    include_top=False,
)  # Do not include the ImageNet classifier at the top.

# Freeze the base_model
base_model.trainable = False

# Create new model on top
inputs = keras.Input(shape=(150, 150, 3))
x = data_augmentation(inputs)  # Apply random data augmentation

# Pre-trained Xception weights requires that input be scaled
# from (0, 255) to a range of (-1., +1.), the rescaling layer
# outputs: `(inputs * scale) + offset`
scale_layer = keras.layers.Rescaling(scale=1 / 127.5, offset=-1)
x = scale_layer(x)

# The base model contains batchnorm layers. We want to keep them in inference mode
# when we unfreeze the base model for fine-tuning, so we make sure that the
# base_model is running in inference mode here.
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.2)(x)  # Regularize with dropout
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)

model.summary()

### 1 - With the "Functional API"

Where you start from Input, you chain layer calls to specify the model's forward pass, and finally you create your model from inputs and outputs:

In [None]:
import tensorflow as tf

inputs = tf.keras.Input(shape=(3,))
x = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

A new Functional API model can also be created by using the intermediate tensors. This enables you to quickly extract sub-components of the model.

**Example:**

In [None]:
inputs = keras.Input(shape=(None, None, 3))
processed = keras.layers.RandomCrop(width=32, height=32)(inputs)
conv = keras.layers.Conv2D(filters=2, kernel_size=3)(processed)
pooling = keras.layers.GlobalAveragePooling2D()(conv)
feature = keras.layers.Dense(10)(pooling)

full_model = keras.Model(inputs, feature)
backbone = keras.Model(processed, conv)
activations = keras.Model(conv, feature)

> 🔑 **Note**: that the backbone and activations models are not created with keras.Input objects, but with the tensors that are originated from keras.Input objects. Under the hood, the layers and weights will be shared across these models, so that user can train the full_model, and use backbone or activations to do feature extraction. The inputs and outputs of the model can be nested structures of tensors as well, and the created models are standard Functional API models that support all the existing APIs.

### 2 - By subclassing the Model class

In that case, you should define your layers in __init__() and you should implement the model's forward pass in call().

In [None]:
import tensorflow as tf

class MyModel(tf.keras.Model):

  def __init__(self):
    super().__init__()
    self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
    self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)

  def call(self, inputs):
    x = self.dense1(inputs)
    return self.dense2(x)

model = MyModel()


If you subclass Model, you can optionally have a training argument (boolean) in call(), which you can use to specify a different behavior in training and inference:

In [None]:
import tensorflow as tf

class MyModel(tf.keras.Model):

  def __init__(self):
    super().__init__()
    self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
    self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)
    self.dropout = tf.keras.layers.Dropout(0.5)

  def call(self, inputs, training=False):
    x = self.dense1(inputs)
    if training:
      x = self.dropout(x, training=training)
    return self.dense2(x)

model = MyModel()

Once the model is created, you can config the model with losses and metrics with model.compile(), train the model with model.fit(), or use the model to do prediction with model.predict().