<a href="https://colab.research.google.com/github/sagunkayastha/CAI_Workshop/blob/main/Workshop_s2/DL_intro2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!wget -q https://raw.githubusercontent.com/sagunkayastha/CAI_Workshop/main/Workshop_s2/utils/utils.py
!wget -q https://raw.githubusercontent.com/sagunkayastha/CAI_Workshop/main/Workshop_s2/data/bmi_data.csv

***A Single Neuron***

!['Single_neuron'](https://raw.githubusercontent.com/sagunkayastha/CAI_Workshop/main/Workshop_s2/images/i1.png)

Predicting crop yield based on two inputs:
- the amount of water provided to the crop (irrigation)  
- the amount of fertilizer used.



-----------------
Both of these factors have optimal ranges, and too much or too little of either can negatively impact the crop yield.

- Water (Irrigation): Essential for growth, but both too little and too much can reduce crop yield due to drought stress or waterlogging.

- Fertilizer: Needed for nutrients; however, too little can stunt growth due to deficiency, and too much can harm yield through toxicity and environmental damage.






----
Lets define weights(importance) based on thier impact on crop yield

In [None]:
import numpy as np
weights = [0.6, 0.2] ## my assumption is that importance of water is higher than fertilizer
bias = 0.1
# 0.6 is the weight of the first input, 0.3 is the weight of the second input
# 0.6 gallons of water and 0.3 lb of fertilizer
x = [0.5, 0.3]


Combining Inputs: In our model, these inputs are weighted based on their impact on crop yield, with a bias term included to account for other factors influencing yield (such as soil quality or pest levels).

$$z = \sum_{i=1}^{n} x_i w_i + b$$

$$output = \sigma(z)$$


The model calculates the predicted crop yield by balancing the effects of water and fertilizer.

It recognizes the non-linear relationship: both inputs contribute positively to yield up to a point, but beyond their optimal ranges, the effect reverses and becomes negative.

In [None]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))


z = (x[0] * weights[0]) + (x[1] * weights[1]) + bias
crop_yield = sigmoid(z)
print("Crop yield for 0.6 gallons of water and 0.3 lb of fertilizer: ", crop_yield)

Learning from Feedback (Backpropagation):
After the harvest, the actual yields are compared to the model's predictions.

This feedback allows the model to adjust the weights of water and fertilizer inputs.

If yields are lower than expected at extreme values of either input, the model learns to adjust the importance (weight) it assigns to staying within optimal ranges.

lets suppose the actual yield was 0.81

In [None]:
# MSE loss function
def loss_function(predicted, real):
    return (predicted - real) **2

actual = 0.81
loss = loss_function(crop_yield, actual)
print("Error(Loss): ", loss)

![loss](https://www.researchgate.net/publication/329960546/figure/fig2/AS:865846410899458@1583445279578/Weight-update-by-gradient-descent-in-the-cost-function.png)

Adjusting Weights: **Adjusting Knobs** Through backpropagation, the model:

- Increases the negative weight of water and fertilizer inputs as they move beyond their optimal ranges, reflecting the detrimental effects of both excessive and insufficient application.
- Fine-tunes the bias and weights to better capture the complex, non-linear relationships between inputs and crop yield, aiming for the optimal use of resources.

In [None]:
# Forward propagation

def forward(x, weights, bias):
    z = np.dot(x, weights) + bias
    return sigmoid(z)

# change weights
# Try changing the weights and bias to see how they affect the error
weights = [1.4, 0.4]
bias = 0.1
x = [0.6, 0.3]
crop_yield = forward(x, weights, bias)
loss = loss_function(crop_yield, actual)
print("Error(Loss): ", loss)


Making Predictions: First, the network makes predictions based on its current settings (weights). Think of these weights like knobs that can be turned to change the network's behavior.

Measuring Mistakes: After making predictions, the network looks at how far off it was from the correct answers. This difference is called the loss, and the network's goal is to make this as small as possible.

Asking "How Much?": The network then asks, "How much does each weight affect the loss?" To find this out, it computes the partial derivatives of the loss function with respect to each weight. These partial derivatives are called gradients.

Finding Direction: The gradients tell the network not just how much, but also in which direction to adjust each weight (knob) to reduce the mistakes. If a gradient is positive, reducing the weight decreases the loss, and if it's negative, increasing the weight does.

Adjusting Knobs: Finally, the network slightly adjusts each knob (weight) in the direction indicated by the gradients to make better predictions next time. This step is repeated many times, and with each repetition, the network gets better at making predictions.

---------------

--------------

The gradients for updating the weights and bias are calculated using the chain rule as follows:

**Dominoes** - first output then activation then summation(z)

- Gradient with respect to weights:
  $$dLoss/dWeights = dLoss/dOutput \cdot dOutput/dZ \cdot dZ/dWeights$$
  
- Gradient with respect to bias:
  $$dLoss/dBias = dLoss/dOutput \cdot dOutput/dZ \cdot dZ/dBias$$


Ignore calculus if you find it too complicated. But basically we are trying to find out how much affect does each weight (gallons of water and fertilizer) has on our final loss



Using a learning rate eta we update the weights and bias as follows:

$$w_i^{new} = w_i - \eta \cdot \frac{\partial L}{\partial w_i}$$
$$b^{new} = b - \eta \cdot \frac{\partial L}{\partial b}$$


In [None]:


def backward(x, weights, bias, output, target, learning_rate):
    """Perform backpropagation and update the weights and bias."""
    # Compute the derivative of the loss with respect to output
    dLoss_dOutput = -(target - output)  # we ignore the factor of 2 for simplicity

    # Compute the derivative of the output with respect to z
    dOutput_dZ = output * (1 - output)

    # Compute the gradient of the loss with respect to weights
    dLoss_dWeights = dLoss_dOutput * dOutput_dZ * x

    # Compute the gradient of the loss with respect to bias
    dLoss_dBias = dLoss_dOutput * dOutput_dZ


    # Update the weights and bias
    weights -= learning_rate * dLoss_dWeights
    bias -= learning_rate * dLoss_dBias

    return weights, bias

Lets implement forward and backward pass together. This is called an **Iteration**(**Terminology**)

In [None]:

weights = np.array([0.6, 0.2])
bias = np.array(0.1)
x = np.array([0.6, 0.3])


crop_yield = forward(x, weights, bias)
loss = loss_function(crop_yield, actual)
print("Error(Loss): ", loss)

weights, bias = backward(x, weights, bias, crop_yield, actual, 0.1)
print("Updated weights: ", weights, "Updated bias: ", bias)

We would perform this iteratively for all the samples in our dataset. We can update the weights for each example, for a batch of example or for whole dataset.


 Stochastic Gradient Descent, Batch Gradient descent (**Terminology**)

In [None]:
# since we have only one data point, we can update the weights and bias directly.
# This is basically Batch Gradient Descent where we use all the data points to update the weights and bias. and our batch size is 1


weights = np.array([0.6, 0.2])

bias = np.array(0.1)
x = np.array([0.6, 0.3])

initial_wb = [weights.copy(), bias.copy()]
for epoch in range(100):
    crop_yield = forward(x, weights, bias)
    loss = loss_function(crop_yield, actual)
    weights, bias = backward(x, weights, bias, crop_yield, actual, learning_rate=0.1)
    print("Error(Loss) epoch: ", loss)



In [None]:
print("Initial weights and bias: ", initial_wb[0], initial_wb[1])
print("Updated weights and bias: ", weights, bias)

# Lets try a generated dataset

In [None]:
from utils import generate_data, plot_data

In [None]:
x1, x2, y = generate_data(1000)
fig = plot_data(x1, x2, y)
fig.show()

***Normalization***(Terminology) is a step in preparing data for machine learning that makes all the data similar in scale. This is important because:

- Helps Learn Faster: It makes the machine learning model learn and make predictions faster.
- Fair Treatment: Ensures every piece of data is treated equally by the model, so no single type of data overpowers others.
- Better Predictions: Leads to more accurate and stable predictions from the model.
- Works Well with Many Models: Some machine learning models need data to be normalized to work correctly.
- Avoids Problems: Prevents issues that can happen when data is in very different scales.

In [None]:
# normalize the data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X1 = scaler.fit_transform(x1.reshape(-1, 1)).flatten()
X2 = scaler.fit_transform(x2.reshape(-1, 1)).flatten()

X = np.array([X1, X2]).T


In [None]:
X

In [None]:
# Lets manually initialize the weights and bias
weights = np.array([-0.2, 0.4])
bias = np.array([0.4])

In [None]:
# Start with a single example
x_input = X[5]
actual = y[5]

output = forward(x_input, weights, bias)
loss = loss_function(output, actual)
print("Error(Loss): ", loss)
weights, bias = backward(x_input, weights, bias, output, actual, learning_rate=0.1)
print("Updated weights and bias: ", weights, bias)

Single Epoch

In [None]:
epoch_loss = 0
for iteration, (x_input,actual) in enumerate(zip(X, y)):
    output = forward(x_input, weights, bias)
    loss = loss_function(output, actual)
    weights, bias = backward(x_input, weights, bias, output, actual, learning_rate=0.1)

    # print("Previous output:", output, "Previous loss:", loss)
    # print("Updated output:", updated_output, "Updated loss:", updated_loss)
    epoch_loss += loss

epoch_loss = epoch_loss / len(X)
print("First Epoch loss:", epoch_loss)

Now for 100 epochs

In [None]:
weights = np.array([-0.2, 0.4])
bias = np.array([0.4])
epoch_losses = []
for epoch in range(100): # This is the number of times we iterate through the entire dataset

    epoch_loss = 0
    for iteration, (x_input, actual) in enumerate(zip(X, y)):
        output = forward(x_input, weights, bias)
        loss = loss_function(output, actual)
        weights, bias = backward(x_input, weights, bias, output, actual, learning_rate=0.01)

        # print("Previous output:", output, "Previous loss:", loss)
        # print("Updated output:", updated_output, "Updated loss:", updated_loss)
        epoch_loss += loss

    epoch_loss = epoch_loss / len(X)
    epoch_losses.append(epoch_loss)
    print(f"Epoch loss: {epoch}", epoch_loss[0])

Same data with tensorflow

In [None]:
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers.experimental import SGD


np.random.seed(402)
tf.random.set_seed(42)
weights = np.array([-0.2, 0.4])
bias = np.array([0.4])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=402)


# Define the model architecture
model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=1, activation='sigmoid', input_shape=(2,),
                          kernel_initializer=tf.keras.initializers.Constant(weights),
                          bias_initializer=tf.keras.initializers.Constant(bias))
])
model.summary()

model.compile(optimizer='SGD', loss='mse')




#### Number of Parameters
- Resnet50 -> 25M

- gpt-4 -> 1.76 trillion parameters

- llama2 -> 7B, 13B, 70B

In [None]:
# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=1, validation_data=(X_test, y_test), validation_split=0.2)

In [None]:

# Define the model architecture
model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=6, activation='relu', input_shape=(2,),
                          ),
    tf.keras.layers.Dense(units=3, activation='relu'),
    tf.keras.layers.Dense(units=1)

])
print(model.summary())

model.compile(optimizer='adam', loss='mse')

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)


Try changing the parameters and hyperparameters

##### Machine Learning Model vs Neural Network

In [None]:
from sklearn.metrics import r2_score, mean_squared_error
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error for Neural Network :", mse)

In [None]:

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=100, n_jobs=-1)
rf.fit(X_train, y_train.ravel())
y_pred = rf.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error for Random Forest:", mse)

### Parameters vs Hyperparameters

- Definition: Parameters are learned from data; hyperparameters are set before training.
- Role: Parameters make predictions; hyperparameters guide the learning process.
- Adjustment: Parameters adjust automatically; hyperparameters are chosen manually (or can use searched using algorithms).
- Examples: Parameters are weights/biases; hyperparameters include learning rate, epochs.
- Optimization: Parameters optimized during training; hyperparameters through testing various settings.

In [None]:
import pandas as pd
df = pd.read_csv('bmi_data.csv')

In [None]:
df.head()

This is a classification task, here we are trying to predict BMI based on Gender, Height and Weight

#### Preprocessing

- Convert Gender to numeric categorical variable
- Normalize the input data for better neural network performance.
- Split the data into training and testing sets.

In [None]:
df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})
df.head()

In [None]:

X = df[['Gender', 'Height', 'Weight']].values

# Normalize X
scaler = MinMaxScaler()
X_normalized = scaler.fit_transform(X)




For classification problem we have to change few things.
- Input shape
- Activation in the output layer
- Loss Function

You can have 6 outputs(one hot encoded) with softmax or 1 output(0 to 6) with sigmoid. The loss function will depend on what you choose for the ouput layer.

In this example we are using one hot encoded y, softmax with categorical_crossentropy

0 -> [1, 0, 0, 0, 0, 0]

1 -> [0, 1, 0, 0, 0, 0]

2 -> [0, 0, 1, 0, 0, 0]

and so on

In [None]:
from tensorflow.keras.utils import to_categorical
Y = to_categorical(df['Index'].values)

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X_normalized, Y, test_size=0.2, random_state=42)

In [None]:
# For classification p
model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=16, activation='relu', input_shape=(3,),
                          ),
    tf.keras.layers.Dense(units=8, activation='relu'),
    tf.keras.layers.Dense(units=6, activation='softmax'),

])
print(model.summary())

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # for classification problems, we use categorical_crossentropy

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)


# Softmax activation
![Softmax](https://docs-assets.developer.apple.com/published/c2185dfdcf/0ab139bc-3ff6-49d2-8b36-dcc98ef31102.png)