# **Task 1: It Is Desired To Observe The Iteration, Gradient And Loss Changes For The Initial Weight In The Data Set.**

It is expected that appropriate codes will be written to obtain the results in the output below.

**Expected Output:**

```
    Iteration  Feature 1 Gradient      Loss
0           1            0.346963  0.693127
1           2            0.250376  0.527516
2           3            0.196769  0.441138
3           4            0.163503  0.387700
4           5            0.140597  0.350752
..        ...                 ...       ...
95         96            0.010033  0.108377
96         97            0.009923  0.108025
97         98            0.009816  0.107678
98         99            0.009711  0.107337
99        100            0.009608  0.107001

[100 rows x 3 columns]
```

**Tips:**

* The modeling process and all other operations are performed on all variables as in the contents.

* Since we want the gradient for a single variable, it is sufficient to make a selection from the dw where the gradient calculation is made as follows; **"df[0]"**. Thus, the gradient information of the first variable is kept.

* It may be useful to define a dictionary as follows before starting the loop inside the **"gradient_descent"** function;

* grad_tracking = {'Iteration': [], 'Feature 1 Gradient': [], 'Loss': []}

Afterwards, while the iterations are continuing, that is, while inside the for loop, it is sufficient to add the iteration information **(epoch +1)**, the gradient information of the first variable **(dw[0])** and the loss information **(loss)** in the relevant iteration to the grad_tracking dictionary.

So, while we are inside the loop, we will be adding elements to a simple dictionary.

* We don't need the predict function, our goal is to observe the gradients, iterations and loss.

* The output of the gradient_descent function should simply return grad_tracking.

* When calling the gradient_descent function, the arguments can be: gradient_descent(X_train_scaled, y_train, lr=0.1, epochs=100)

* The output will be ugly, so it would be a good idea to keep the output and then convert it to a dataframe.

# **Task 1 Solution**

In [27]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def compute_loss(y, y_pred):
    epsilon = 1e-5
    return -np.mean(y * np.log(y_pred + epsilon) + (1 - y) * np.log(1 - y_pred + epsilon))

def compute_gradients(X, y, y_pred):
    return np.dot(X.T, (y_pred - y)) / len(y)

def gradient_descent(X, y, lr=0.01, epochs=100):
    weights = np.zeros(X.shape[1])
    bias = 0

    grad_tracking = {'Iteration': [], 'Feature 1 Gradient': [], 'Loss': []} # We define the dictionary we need.

    for epoch in range(epochs):
        z = np.dot(X, weights) + bias
        y_pred = sigmoid(z)
        loss = compute_loss(y, y_pred)
        dw = compute_gradients(X, y, y_pred)
        db = np.mean(y_pred - y)

        grad_tracking['Iteration'].append(epoch+1) # We get iteration information
        grad_tracking['Feature 1 Gradient'].append(dw[0]) # We get the gradient information.
        grad_tracking['Loss'].append(loss) # We get loss information


        weights -= lr * dw
        bias -= lr * db

    return grad_tracking

In [28]:
grad_tracking = gradient_descent(X_train_scaled, y_train, lr=0.1, epochs=100)
grad_tracking_df = pd.DataFrame(grad_tracking)
print(grad_tracking_df)

    Iteration  Feature 1 Gradient      Loss
0           1            0.346963  0.693127
1           2            0.250376  0.527516
2           3            0.196769  0.441138
3           4            0.163503  0.387700
4           5            0.140597  0.350752
..        ...                 ...       ...
95         96            0.010033  0.108377
96         97            0.009923  0.108025
97         98            0.009816  0.107678
98         99            0.009711  0.107337
99        100            0.009608  0.107001

[100 rows x 3 columns]


# **Task 2: Save The Output Of The "gradient_descent" Function And Convert It To A Dataframe.**

Call the gradient descent function in the following 3 combinations and convert the results to dataframe and print them.

1. gradient_descent(X_train_scaled, y_train, lr=0.1, epochs=100)
2. gradient_descent(X_train_scaled, y_train, lr=0.01, epochs=100)
3. gradient_descent(X_train_scaled, y_train, lr=0.001, epochs=100)

**Expected Output:**

```
    Iteration  Feature 1 Gradient      Loss
0           1            0.346963  0.693127
1           2            0.250376  0.527516
2           3            0.196769  0.441138
3           4            0.163503  0.387700
4           5            0.140597  0.350752
..        ...                 ...       ...
95         96            0.010033  0.108377
96         97            0.009923  0.108025
97         98            0.009816  0.107678
98         99            0.009711  0.107337
99        100            0.009608  0.107001

[100 rows x 3 columns]
```




```
    Iteration  Feature 1 Gradient      Loss
0           1            0.346963  0.693127
1           2            0.336617  0.673916
2           3            0.326640  0.655887
3           4            0.317060  0.638961
4           5            0.307888  0.623058
..        ...                 ...       ...
95         96            0.085450  0.259625
96         97            0.084800  0.258520
97         98            0.084160  0.257431
98         99            0.083530  0.256357
99        100            0.082909  0.255299

[100 rows x 3 columns]
```



```
    Iteration  Feature 1 Gradient      Loss
0           1            0.346963  0.693127
1           2            0.345928  0.691178
2           3            0.344896  0.689242
3           4            0.343867  0.687318
4           5            0.342841  0.685405
..        ...                 ...       ...
95         96            0.265483  0.552141
96         97            0.264805  0.551030
97         98            0.264130  0.549925
98         99            0.263458  0.548826
99        100            0.262789  0.547733

[100 rows x 3 columns]
```

# **Task 2 Solution**

In [29]:
first_iteration_results = gradient_descent(X_train_scaled, y_train, lr=0.1, epochs=100)
second_iteration_results = gradient_descent(X_train_scaled, y_train, lr=0.01, epochs=100)
third_iteration_results = gradient_descent(X_train_scaled, y_train, lr=0.001, epochs=100)

first_iteration_df = pd.DataFrame(first_iteration_results)
second_iteration_df = pd.DataFrame(second_iteration_results)
third_iteration_df = pd.DataFrame(third_iteration_results)

In [30]:
print(first_iteration_df)
print("#"*80)
print(second_iteration_df)
print("#"*80)
print(third_iteration_df)

    Iteration  Feature 1 Gradient      Loss
0           1            0.346963  0.693127
1           2            0.250376  0.527516
2           3            0.196769  0.441138
3           4            0.163503  0.387700
4           5            0.140597  0.350752
..        ...                 ...       ...
95         96            0.010033  0.108377
96         97            0.009923  0.108025
97         98            0.009816  0.107678
98         99            0.009711  0.107337
99        100            0.009608  0.107001

[100 rows x 3 columns]
################################################################################
    Iteration  Feature 1 Gradient      Loss
0           1            0.346963  0.693127
1           2            0.336617  0.673916
2           3            0.326640  0.655887
3           4            0.317060  0.638961
4           5            0.307888  0.623058
..        ...                 ...       ...
95         96            0.085450  0.259625
96         97  

# **Task 3: How Do Loss Values Change As The Learning Rate Value Changes? What Is The Reason?**

In the previous question, we used different learning rate values. How do the resulting loss values ​​change as a result of these different learning rate values? What is the reason? Explain.

# **Task 3 Solution**

**The learning rate value** **(lr)** in the **first iteration** is **0.1**. Here, the initial loss value is 0.693, and the loss value in the 100th iteration is 0.107. Since the learning rate value is relatively high here, the loss value decreases rapidly in the first few iterations. **With the learning rate value being 0.1, the model optimizes more aggressively and quickly.**

**The learning rate value** in the **second iteration** is **0.01**. Here, the initial loss value is 0.693, and the loss value in the 100th iteration is 0.255. Here, the gradients decrease in smaller steps. **With the learning rate value being 0.01, the learning process is more controlled and occurs in smaller steps.**

**The learning rate value** in the **third iteration** is **0.001**. Here, the initial loss value is 0.693, and the loss value in the 100th iteration is 0.547. Here, the gradients decrease quite slowly. **A learning rate value of 0.001 slows down the learning speed of the model significantly.**

This is because in the gradient descent algorithm, the learning rate controls how much the model parameters are updated at each iteration. A large learning rate multiplies the gradient by a larger step, which causes the model parameters to change faster. A small learning rate multiplies the gradient by a smaller step, which causes the model parameters to change slower.

# **Task 4: What Is The New Weight Value As A Result Of The First Update Rule?**

```
    Iteration  Feature 1 Gradient      Loss
0           1            0.346963  0.693127
1           2            0.336617  0.673916
2           3            0.326640  0.655887
3           4            0.317060  0.638961
4           5            0.307888  0.623058
..        ...                 ...       ...
95         96            0.085450  0.259625
96         97            0.084800  0.258520
97         98            0.084160  0.257431
98         99            0.083530  0.256357
99        100            0.082909  0.255299

[100 rows x 3 columns]
```

**We have these:**

* learning rate: 0.001

* Gradient value in the first iteration: 0.346963

* Initial value of weight: 0

In this case, what will be the new weight value as a result of the first update rule?

# **Task 4 Solution**

In [31]:
learning_rate = 0.001
gradient_value = 0.346963
initial_weight = 0

In [32]:
new_weight = initial_weight - learning_rate * gradient_value

In [33]:
new_weight

-0.00034696300000000005