***A Single Neuron***

!['Single_neuron'](https://raw.githubusercontent.com/sagunkayastha/CAI_Workshop/main/Workshop_s2/images/i1.png)

Predicting crop yield based on two inputs: ***(Regression task)*** 
- the amount of water provided to the crop (irrigation)  
- the amount of fertilizer used. 



-----------------
Both of these factors have optimal ranges, and too much or too little of either can negatively impact the crop yield.

- Water (Irrigation): Essential for growth, but both too little and too much can reduce crop yield due to drought stress or waterlogging.

- Fertilizer: Needed for nutrients; however, too little can stunt growth due to deficiency, and too much can harm yield through toxicity and environmental damage.






---- 
Lets define weights(importance) based on thier impact on crop yield

In [1]:
import numpy as np
weights = [0.6, 0.2] ## my assumption is that importance of water is higher than fertilizer
bias = 0.1
# 0.6 is the weight of the first input, 0.3 is the weight of the second input 
# 0.6 gallons of water and 0.3 lb of fertilizer
x = [0.6, 0.3]  


Combining Inputs: In our model, these inputs are weighted based on their impact on crop yield, with a bias term included to account for other factors influencing yield (such as soil quality or pest levels).

$$z = \sum_{i=1}^{n} x_i w_i + b$$

$$output = \sigma(z)$$


The model calculates the predicted crop yield by balancing the effects of water and fertilizer. 

It recognizes the non-linear relationship: both inputs contribute positively to yield up to a point, but beyond their optimal ranges, the effect reverses and becomes negative.

In [2]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))


z = (x[0] * weights[0]) + (x[1] * weights[1]) + bias
crop_yield = sigmoid(z)
print("Crop yield for 0.6 gallons of water and 0.3 lb of fertilizer: ", crop_yield)

Crop yield for 0.6 gallons of water and 0.3 lb of fertilizer:  0.6271477663131956


Learning from Feedback (Backpropagation):
After the harvest, the actual yields are compared to the model's predictions. 

This feedback allows the model to adjust the weights of water and fertilizer inputs. 

If yields are lower than expected at extreme values of either input, the model learns to adjust the importance (weight) it assigns to staying within optimal ranges.

lets suppose the actual yield was **0.81**

In [3]:
# MSE loss function
def loss_function(predicted, real):
    return (predicted - real) **2

actual = 0.81
loss = loss_function(crop_yield, actual)
print("Error(Loss): ", loss)

Error(Loss):  0.033434939364253756


![loss](https://www.researchgate.net/publication/329960546/figure/fig2/AS:865846410899458@1583445279578/Weight-update-by-gradient-descent-in-the-cost-function.png)

Adjusting Weights: **Adjusting Knobs** Through backpropagation, the model:

- Increases the negative weight of water and fertilizer inputs as they move beyond their optimal ranges, reflecting the detrimental effects of both excessive and insufficient application.
- Fine-tunes the bias and weights to better capture the complex, non-linear relationships between inputs and crop yield, aiming for the optimal use of resources.

In [4]:
# Forward propagation

def forward(x, weights, bias):
    z = np.dot(x, weights) + bias
    return sigmoid(z)

# change weights 
# Try changing the weights and bias to see how they affect the error
weights = [0.1, 0.5]
bias = 0.56
x = [0.6, 0.3]  
crop_yield = forward(x, weights, bias)
loss = loss_function(crop_yield, actual)
print("Error(Loss): ", loss)


Error(Loss):  0.015996964321260375



Making Predictions: First, the network makes predictions based on its current settings (weights). Think of these weights like knobs that can be turned to change the network's behavior.

Measuring Mistakes: After making predictions, the network looks at how far off it was from the correct answers. This difference is called the loss, and the network's goal is to make this as small as possible.

Asking "How Much?": The network then asks, "How much does each weight affect the loss?" To find this out, it computes the partial derivatives of the loss function with respect to each weight. These partial derivatives are called gradients.

Finding Direction: The gradients tell the network not just how much, but also in which direction to adjust each weight (knob) to reduce the mistakes. If a gradient is positive, reducing the weight decreases the loss, and if it's negative, increasing the weight does.

Adjusting Knobs: Finally, the network slightly adjusts each knob (weight) in the direction indicated by the gradients to make better predictions next time. This step is repeated many times, and with each repetition, the network gets better at making predictions.

---------------

--------------

**How much does each weight affect loss or how important is a particular weight**

The gradients for updating the weights and bias are calculated using the chain rule as follows: 

**Dominoes** - first output then activation then summation(z)

- Gradient with respect to weights:
  $$dLoss/dWeights = dLoss/dOutput \cdot dOutput/dZ \cdot dZ/dWeights$$
  
- Gradient with respect to bias:
  $$dLoss/dBias = dLoss/dOutput \cdot dOutput/dZ \cdot dZ/dBias$$


Ignore calculus if you find it too complicated. But basically we are trying to find out how much affect does each weight (gallons of water and fertilizer) has on our final loss



Using a learning rate eta we update the weights and bias as follows:

$$w_i^{new} = w_i - \eta \cdot \frac{\partial L}{\partial w_i}$$
$$b^{new} = b - \eta \cdot \frac{\partial L}{\partial b}$$


In [5]:


def backward(x, weights, bias, output, target, learning_rate):
    """Perform backpropagation and update the weights and bias."""
    # Compute the derivative of the loss with respect to output
    dLoss_dOutput = -(target - output)  # we ignore the factor of 2 for simplicity

    # Compute the derivative of the output with respect to z
    dOutput_dZ = output * (1 - output)
    
    # Compute the gradient of the loss with respect to weights
    dLoss_dWeights = dLoss_dOutput * dOutput_dZ * x

    # Compute the gradient of the loss with respect to bias
    dLoss_dBias = dLoss_dOutput * dOutput_dZ


    # Update the weights and bias
    weights -= learning_rate * dLoss_dWeights
    bias -= learning_rate * dLoss_dBias

    return weights, bias

Lets implement forward and backward pass together. This is called an **Iteration**(**Terminology**)

In [6]:

weights = np.array([0.6, 0.2])
bias = np.array(0.1)
x = np.array([0.6, 0.3])


crop_yield = forward(x, weights, bias)
loss = loss_function(crop_yield, actual)
print("Error(Loss): ", loss)

weights, bias = backward(x, weights, bias, crop_yield, actual, 0.1)
print("Updated weights: ", weights, "Updated bias: ", bias)

Error(Loss):  0.033434939364253756
Updated weights:  [0.60256542 0.20128271] Updated bias:  0.10427569678243


We would perform this iteratively for all the samples in our dataset. We can update the weights for each example, for a batch of example or for whole dataset.


 Stochastic Gradient Descent, Batch Gradient descent (**Terminology**)

In [7]:
# since we have only one data point, we can update the weights and bias directly.
# This is basically Batch Gradient Descent where we use all the data points to update the weights and bias. and our batch size is 1

weights = np.array([0.6, 0.2])

bias = np.array(0.1)
x = np.array([0.6, 0.3])

initial_wb = [weights.copy(), bias.copy()]
for epoch in range(100):
    crop_yield = forward(x, weights, bias)
    loss = loss_function(crop_yield, actual)
    weights, bias = backward(x, weights, bias, crop_yield, actual, learning_rate=0.1)
    print(f"Error(Loss) epoch : {epoch} {loss}")


Error(Loss) epoch : 0 0.033434939364253756
Error(Loss) epoch : 1 0.03290729108643822
Error(Loss) epoch : 2 0.032389608201726115
Error(Loss) epoch : 3 0.03188167940840362
Error(Loss) epoch : 4 0.031383297959557674
Error(Loss) epoch : 5 0.03089426158373801
Error(Loss) epoch : 6 0.030414372405329917
Error(Loss) epoch : 7 0.029943436864789445
Error(Loss) epoch : 8 0.029481265638878904
Error(Loss) epoch : 9 0.029027673561031
Error(Loss) epoch : 10 0.02858247954195932
Error(Loss) epoch : 11 0.028145506490621124
Error(Loss) epoch : 12 0.02771658123563171
Error(Loss) epoch : 13 0.027295534447218824
Error(Loss) epoch : 14 0.026882200559798602
Error(Loss) epoch : 15 0.026476417695245096
Error(Loss) epoch : 16 0.026078027586921494
Error(Loss) epoch : 17 0.02568687550452988
Error(Loss) epoch : 18 0.025302810179834895
Error(Loss) epoch : 19 0.024925683733306187
Error(Loss) epoch : 20 0.024555351601723212
Error(Loss) epoch : 21 0.024191672466777253
Error(Loss) epoch : 22 0.02383450818470382
Error(Lo

In [8]:
print("Initial weights and bias: ", initial_wb[0], initial_wb[1])
print("Updated weights and bias: ", weights, bias)

Initial weights and bias:  [0.6 0.2] 0.1
Updated weights and bias:  [0.77149565 0.28574782] 0.3858260804059064


# Lets try a generated dataset

In [9]:
from utils.utils import generate_data, plot_data

In [10]:
x1, x2, y = generate_data(1000)
fig = plot_data(x1, x2, y)
fig.show()

***Normalization***(Terminology) is a step in preparing data for machine learning that makes all the data similar in scale. This is important because:

- Helps Learn Faster: It makes the machine learning model learn and make predictions faster.
- Fair Treatment: Ensures every piece of data is treated equally by the model, so no single type of data overpowers others.
- Better Predictions: Leads to more accurate and stable predictions from the model.
- Works Well with Many Models: Some machine learning models need data to be normalized to work correctly.
- Avoids Problems: Prevents issues that can happen when data is in very different scales.

In [11]:
# normalize the data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X1 = scaler.fit_transform(x1.reshape(-1, 1)).flatten()
X2 = scaler.fit_transform(x2.reshape(-1, 1)).flatten()

X = np.array([X1, X2]).T


In [12]:
X

array([[0.37173493, 0.18260941],
       [0.95075462, 0.54073995],
       [0.73095408, 0.87304912],
       ...,
       [0.13283943, 0.06599082],
       [0.95027532, 0.05404206],
       [0.44355353, 0.28003421]])

In [13]:
# Lets manually initialize the weights and bias
weights = np.array([-0.2, 0.4])
bias = np.array([0.4])

In [14]:
# Start with a single example
x_input = X[900]
actual = y[900]

output = forward(x_input, weights, bias)
loss = loss_function(output, actual)
print("Error(Loss): ", loss)
weights, bias = backward(x_input, weights, bias, output, actual, learning_rate=0.1)
print("Updated weights and bias: ", weights, bias)

Error(Loss):  [606.65951257]
Updated weights and bias:  [-0.08425026  0.71344   ] [0.9666839]


Single Epoch

In [15]:
epoch_loss = 0
for iteration, (x_input,actual) in enumerate(zip(X, y)):
    output = forward(x_input, weights, bias)
    loss = loss_function(output, actual)
    weights, bias = backward(x_input, weights, bias, output, actual, learning_rate=0.1)

    epoch_loss += loss

epoch_loss = epoch_loss / len(X)
print("First Epoch loss:", epoch_loss)

First Epoch loss: [10022.72649758]


Now for 100 epochs

In [16]:
weights = np.array([-0.2, 0.4])
bias = np.array([0.4])
epoch_losses = []
for epoch in range(100): # This is the number of times we iterate through the entire dataset

    epoch_loss = 0
    for iteration, (x_input, actual) in enumerate(zip(X, y)):
        output = forward(x_input, weights, bias)
        loss = loss_function(output, actual)
        weights, bias = backward(x_input, weights, bias, output, actual, learning_rate=0.01)

        # print("Previous output:", output, "Previous loss:", loss)
        # print("Updated output:", updated_output, "Updated loss:", updated_loss)
        epoch_loss += loss

    epoch_loss = epoch_loss / len(X)
    epoch_losses.append(epoch_loss)
    print(f"Epoch loss: {epoch}", epoch_loss[0])

Epoch loss: 0 10023.517666565402
Epoch loss: 1 10022.709938004873
Epoch loss: 2 10022.67171622697
Epoch loss: 3 10022.655868239188
Epoch loss: 4 10022.647112818275
Epoch loss: 5 10022.641540692855
Epoch loss: 6 10022.637676454788
Epoch loss: 7 10022.634836445844
Epoch loss: 8 10022.632659687835
Epoch loss: 9 10022.630937333246
Epoch loss: 10 10022.629540056389
Epoch loss: 11 10022.62838344643
Epoch loss: 12 10022.627410042829
Epoch loss: 13 10022.626579361757
Epoch loss: 14 10022.625862050414
Epoch loss: 15 10022.625236303169
Epoch loss: 16 10022.62468557953
Epoch loss: 17 10022.624197103794
Epoch loss: 18 10022.623760850329
Epoch loss: 19 10022.623368840623
Epoch loss: 20 10022.62301464556
Epoch loss: 21 10022.62269302635
Epoch loss: 22 10022.622399670807
Epoch loss: 23 10022.622130997188
Epoch loss: 24 10022.621884005355
Epoch loss: 25 10022.621656163365
Epoch loss: 26 10022.62144531951
Epoch loss: 27 10022.621249633286
Epoch loss: 28 10022.621067521162
Epoch loss: 29 10022.620897613

Same data with tensorflow

In [17]:
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers.experimental import SGD


np.random.seed(402)
tf.random.set_seed(42)
weights = np.array([-0.2, 0.4])
bias = np.array([0.4])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=402)


# Define the model architecture
model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=1, activation='sigmoid', input_shape=(2,),
                          kernel_initializer=tf.keras.initializers.Constant(weights),
                          bias_initializer=tf.keras.initializers.Constant(bias))
])
model.summary()

model.compile(optimizer='SGD', loss='mse')





Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 1)                 3         
                                                                 
Total params: 3 (12.00 Byte)
Trainable params: 3 (12.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________



#### Number of Parameters
- Resnet50 -> 25M

- gpt-4 -> 1.76 trillion parameters

- llama2 -> 7B, 13B, 70B

In [18]:
# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=1, validation_data=(X_test, y_test), validation_split=0.2)

Epoch 1/10

113/800 [===>..........................] - ETA: 0s - loss: 11818.6436

Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x158b4987d90>

In [19]:

# Define the model architecture
model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=6, activation='relu', input_shape=(2,),
                          ),
    tf.keras.layers.Dense(units=5, activation='relu'),
    tf.keras.layers.Dense(units=1)

])
print(model.summary())

model.compile(optimizer='adam', loss='mse')

# Train the model
model.fit(X_train, y_train, epochs=500, batch_size=32, validation_split=0.2)


Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_1 (Dense)             (None, 6)                 18        
                                                                 
 dense_2 (Dense)             (None, 5)                 35        
                                                                 
 dense_3 (Dense)             (None, 1)                 6         
                                                                 
Total params: 59 (236.00 Byte)
Trainable params: 59 (236.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None


Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78

<keras.src.callbacks.History at 0x158b5b4e550>

##### Machine Learning Model vs Neural Network

In [20]:
from sklearn.metrics import r2_score, mean_squared_error
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error for Neural Network :", mse)

Mean Squared Error for Neural Network : 6.463613956419199


In [21]:

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=100, n_jobs=-1)
rf.fit(X_train, y_train.ravel())
y_pred = rf.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error for Random Forest:", mse)

Mean Squared Error for Random Forest: 6.4331221269193986


### Parameters vs Hyperparameters

- Definition: Parameters are learned from data; hyperparameters are set before training.
- Role: Parameters make predictions; hyperparameters guide the learning process.
- Adjustment: Parameters adjust automatically; hyperparameters are chosen manually (or can use searched using algorithms).
- Examples: Parameters are weights/biases; hyperparameters include learning rate, epochs.
- Optimization: Parameters optimized during training; hyperparameters through testing various settings.

## BMI Dataset

In [22]:
import pandas as pd
df = pd.read_csv('bmi_data.csv')

In [23]:
df.head()

Unnamed: 0,Gender,Height,Weight,Index
0,Male,174,96,4
1,Male,189,87,2
2,Female,185,110,4
3,Female,195,104,3
4,Male,149,61,3


This is a classification task, here we are trying to predict BMI based on Gender, Height and Weight

#### Preprocessing

- Convert Gender to numeric categorical variable
- Normalize the input data for better neural network performance.
- Split the data into training and testing sets.

In [24]:
df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})
df.head()

Unnamed: 0,Gender,Height,Weight,Index
0,0,174,96,4
1,0,189,87,2
2,1,185,110,4
3,1,195,104,3
4,0,149,61,3


In [25]:

X = df[['Gender', 'Height', 'Weight']].values  

# Normalize X
scaler = MinMaxScaler()
X_normalized = scaler.fit_transform(X)




For classification problem we have to change few things.
- Input shape
- Activation in the output layer
- Loss Function

You can have 6 outputs(one hot encoded) with softmax or 1 output(0 to 6) with sigmoid. The loss function will depend on what you choose for the ouput layer.

In this example we are using one hot encoded y, softmax with categorical_crossentropy

0 -> [1, 0, 0, 0, 0, 0]

1 -> [0, 1, 0, 0, 0, 0]

2 -> [0, 0, 1, 0, 0, 0]

and so on

In [26]:
from tensorflow.keras.utils import to_categorical
Y = to_categorical(df['Index'].values)

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X_normalized, Y, test_size=0.2, random_state=42)

In [27]:
# For classification p
model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=16, activation='relu', input_shape=(3,),
                          ),
    tf.keras.layers.Dense(units=8, activation='relu'),
    tf.keras.layers.Dense(units=6, activation='softmax'),

])
print(model.summary())

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # for classification problems, we use categorical_crossentropy

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)


Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_4 (Dense)             (None, 16)                64        
                                                                 
 dense_5 (Dense)             (None, 8)                 136       
                                                                 
 dense_6 (Dense)             (None, 6)                 54        
                                                                 
Total params: 254 (1016.00 Byte)
Trainable params: 254 (1016.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None
Epoch 1/50

Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch

<keras.src.callbacks.History at 0x158b813d4f0>

# Softmax activation

![Softmax](https://docs-assets.developer.apple.com/published/c2185dfdcf/0ab139bc-3ff6-49d2-8b36-dcc98ef31102.png)