<h2 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Shallow Neural Network Practice 
</font>
</h2>

<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
🤖 In this project, we will implement the structure of a neural network and its training algorithm from scratch, based on the concepts learned in previous lessons. Coding at this foundational level will provide a practical, in-depth understanding of how neural network models operate. This project aims to facilitate a more straightforward analysis of various neural network structures encountered in the future.

In this project, we will fully design a Shallow Neural Network and train it on the well-known Pima Indians Diabetes dataset. Finally, we will evaluate the model's performance on this dataset.
</font>
</p>


In [1]:
import numpy as np
import pandas as pd

<h2 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Dataset
</font>
</h2>

<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
In this project, we will use the simple yet famous <b>Pima Indians Diabetes</b> dataset. This dataset contains information on 768 Native American women from the Pima tribe, collected to examine risk factors for Type 2 diabetes. The information includes age, weight, height, family history of diabetes, blood pressure, blood glucose level, and other factors.
</font>
</p>

<h3 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Reading the Dataset
</font>
</h3>

<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
First, we need to read the dataset file. We can read the training data from the <code>diabetes_train.csv</code> file located in the <code>data</code> folder and use the samples in it to train the model.
The performance of our model will be evaluated on the <code>diabetes_test.csv</code> data, which is structured similarly to the training data except that the <code>Outcome</code> column has been removed.
</font>
</p>


In [2]:
train_data =pd.read_csv("data/diabetes_train.csv")
train_data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [3]:
test_data = pd.read_csv("data/diabetes_test.csv")
test_data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,98,58,33,190,34.0,0.43,43
1,9,154,78,30,100,30.9,0.164,45
2,6,165,68,26,168,33.6,0.631,49
3,1,99,58,10,0,25.4,0.551,21
4,10,68,106,23,49,35.5,0.285,47


<h2 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Preprocessing
</font>
</h2>

<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
First, we store the target variable column (<code>Outcome</code>) in a separate dataframe and then remove this column from the <code>train_data</code> dataframe to create the equivalent matrices $X$ and $y$.
</font>
</p>


In [4]:
train_data_outcome = train_data['Outcome']
train_data.drop(columns=['Outcome'], inplace=True)

train_data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
One of the important preprocessing tasks is scaling the features to a normal distribution, which is commonly referred to as normalization. Normalization helps reduce large variations in weights and accelerates the convergence of the model. 
To perform normalization in this project, we need to adjust the values of each feature such that their mean becomes <code>0</code> and their variance becomes <code>1</code>. We can achieve this using the following formula.
For the data series <code>X</code> (which in this case are the values of each column) such that <code>X = [x_1, x_2, ..., x_n]</code>, by subtracting the mean from each data sample (<code>x_i</code>) and dividing it by the standard deviation (sigma), we can obtain the normalized data series.
</font>
</p>

$$ Z = \frac{x_i - \bar{x}}{\sigma} $$

<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
It is important to note that, since we only have information about the training dataset when building the model, we should also use the mean and standard deviation of the training samples to normalize the test samples.
</font>
</p>


In [5]:
for column in train_data.columns:
  mean = train_data[column].mean()
  std = train_data[column].std()
  train_data[column]=(train_data[column]- mean)/std
  test_data[column]=(test_data[column] - mean)/std
    
train_data.head()


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,0.649833,0.854539,0.166518,0.90088,-0.687695,0.222281,0.438405,1.443781
1,-0.835754,-1.096441,-0.140758,0.526362,-0.687695,-0.672046,-0.370035,-0.178571
2,1.244068,1.938416,-0.243184,-1.283807,-0.687695,-1.093658,0.570216,-0.093184
3,-0.835754,-0.972569,-0.140758,0.151844,0.123855,-0.480405,-0.908995,-1.032441
4,-1.132872,0.513891,-1.47229,0.90088,0.762734,1.436011,5.303692,-0.007797


In [6]:

train_bias = pd.DataFrame({'new_float_column': [1.0 for _ in range(len(train_data))]})
train_data['Bias'] =  train_bias

test_bias = pd.DataFrame({'new_float_column': [1.0 for _ in range(len(test_data))]})
test_data['Bias'] = test_bias

train_data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Bias
0,0.649833,0.854539,0.166518,0.90088,-0.687695,0.222281,0.438405,1.443781,1.0
1,-0.835754,-1.096441,-0.140758,0.526362,-0.687695,-0.672046,-0.370035,-0.178571,1.0
2,1.244068,1.938416,-0.243184,-1.283807,-0.687695,-1.093658,0.570216,-0.093184,1.0
3,-0.835754,-0.972569,-0.140758,0.151844,0.123855,-0.480405,-0.908995,-1.032441,1.0
4,-1.132872,0.513891,-1.47229,0.90088,0.762734,1.436011,5.303692,-0.007797,1.0


<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
Before designing and training the model, we first need to convert the dataset from a DataFrame to a <code>numpy</code> array. Therefore, in this step, we convert the <code>train_data</code> and <code>train_data_outcome</code> DataFrames to <code>numpy</code> arrays.
Additionally, we use the <code>train_test_split</code> function to split this dataset into training and validation sets with a ratio of <code>0.2</code>.
<br>
</font>
</p>

In [7]:
from sklearn.model_selection import train_test_split
train_data_numpy = train_data.to_numpy()
train_data_outcome_numpy = train_data_outcome.to_numpy()
X_train, X_validation, y_train, y_validation = train_test_split(train_data_numpy, train_data_outcome_numpy, test_size=0.2) 

X_train = X_train.T
X_validation = X_validation.T
y_train = y_train.T
y_validation = y_validation.T
test_data_numpy = test_data.to_numpy().T

<h2 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Modeling
</font>
</h2>
<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
Here, we are implementing a simple shallow model using the gradient descent method from scratch.
<br>
This model is a shallow neural network with one hidden layer containing <code>1000</code> neurons. The activation function for this layer is the Rectified Linear Unit (<code>ReLU</code>). The activation function for the output layer is the sigmoid function (<code>sigmoid</code>).
</font>
</p>

<center>

```python
sigmoid_Z = 1 / (1 + np.exp(-Z)
```
<br>

```python
ReLU_Z = np.maximum(0, Z)
```


<h2 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Creating the <code>Model</code> Class
</font>
</h2>
<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
In this part, we need to create a class named <code>Model</code> that contains the following three functions. We will explain the details of each function in the subsequent sections.
</p>
</font>

```python
def __init__(self)
def predict(self, inputs)
def update_weights_for_one_epoch(self, inputs, outputs, learning_rate)
def fit(self, inputs, outputs, learning_rate, epochs=64)


<h3 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
The <code>__init__</code> Function
</font>
</h3>
<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
In the <code>__init__(self)</code> function, initialize the initial weights of the hidden and output layers (<code>w1</code> and <code>w2</code>) randomly with a mean of <code>0</code> and a standard deviation of <code>0.01</code>.
</font>
</p>


<h3 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
The <code>predict</code> Function
</font>
</h3>

<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
The <code>predict(self, inputs)</code> function takes the inputs and returns the outputs of both layers (<code>A_1</code> and <code>A_2</code>) in order.
This operation is performed according to the following formulas.
</font>
</p>

$$Z^{[1]}=W^{[1]}.X$$
$$A^{[1]}=ReLU(Z^{[1]})$$
$$Z^{[2]}=W^{[2]}A^{[1]}$$
$$A^{[2]}=\sigma(Z^{[2]})=\frac{1}{1+e^{-Z^{[2]}}}=Y_{pred}$$

<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
</font>
</p>

<h3 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
The <code>update_weights_for_one_epoch</code> Function
</font>
</h3>

<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
In the <code>update_weights_for_one_epoch(self, inputs, outputs, learning_rate)</code> function, we need to update the network weights for one <code>epoch</code>. That the <code>learning_rate</code> value is the same as the learning rate or alpha. The required formulas for this section are provided below.
</font>
</p>
                                                                                     
$$W^{[2]} = W^{[2]} + \Delta W^{[2]}$$
$$\Delta W^{[2]} = - \alpha \frac{\partial cost}{\partial W^{[2]}}$$
$$\frac{\partial cost}{\partial W^{[2]}} = (\frac{-2}{n}(Y_{true}-A^{[2]})\odot A^{[2]}\odot (1-A^{[2]}))\bullet A^{[1]T}$$
$$W^{[2]}=W^{[2]}+(\frac{2 \alpha}{n}(Y_{true}-A^{[2]})\odot A^{[2]}\odot (1-A^{[2]}))\bullet A^{[1]T}$$


$$W^{[1]} = W^{[1]} + \Delta W^{[1]}$$
$$\Delta W^{[1]} = - \alpha \frac{\partial cost}{\partial W^{[1]}}$$

$$\frac{\partial cost}{\partial W^{[1]}} = (((\frac{-2}{n}(Y_{true}-A^{[2]})\odot A^{[2]}\odot (1-A^{[2]}))^T\bullet W^{[2]})^T\odot \frac{\partial A^{[1]}}{\partial Z^{[1]}}) \bullet X^T$$

$$W^{[1]}=W^{[1]}+(((\frac{2 \alpha}{n}(Y_{true}-A^{[2]})\odot A^{[2]}\odot (1-A^{[2]}))^T\bullet W^{[2]})^T\odot \frac{\partial A^{[1]}}{\partial Z^{[1]}}) \bullet X^T$$


<h3 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
The <code>fit</code> Function
</font>
</h3>

<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
The <code>fit(self, inputs, outputs, learning_rate, epochs=64)</code> function updates the network weights for the specified number of epochs.
</font>
</p>


In [8]:
class Model:

    def __init__(self):

        self.w1 = np.random.normal(loc=0.0, scale=0.01, size=(1000, 9))
        self.w2 = np.random.normal(loc=0.0, scale=0.01, size=(1, 1000))

    def predict(self, inputs):
        x = inputs

        Z_1 = self.w1 @ x
        A_1 = np.maximum(0, Z_1)

        Z_2 = self.w2 @ A_1
        A_2 = 1 / (1 + np.exp(-Z_2))

        return A_1, A_2

    def update_weights_for_one_epoch(self, inputs, outputs, learning_rate):
        x, y_true = inputs, outputs
        A_1, A_2 = self.predict(inputs)

        n = inputs.shape[1]

        shared_coefficient = (2/n) * (y_true - A_2) * A_2 * (1 - A_2)
        relu_gradient = np.where(A_1 > 0, 1, 0)
        
        self.w1 += learning_rate * (self.w2.T @ shared_coefficient * relu_gradient) @ x.T
        self.w2 += learning_rate * shared_coefficient @ A_1.T

    def fit(self, inputs, outputs, learning_rate, epochs=64):
        for i in range(epochs):
            self.update_weights_for_one_epoch(inputs, outputs, learning_rate)

<h3 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Training and Evaluation
</font>
</h3>

<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir">
After designing the network structure, we can first create an object of this class using <code dir="ltr">Model()</code>, and then we call the <code>fit</code> function with appropriate arguments to start training the model. It is better to experiment with different learning rates (such as <code>0.1</code>, <code>0.01</code>, <code>0.001</code>, etc.) and different numbers of training epochs, and compare the results on validation samples.
To evaluate the accuracy of the model, we can use the <code dir="ltr">evaluation(model, inputs, outputs)</code> function.
</font>
</p>


In [9]:
def evaluation(model, inputs, outputs):
  _, A_2 = model.predict(inputs)
  prediction = (A_2 > 0.5)
  return np.mean(prediction == outputs) * 100

In [12]:
model = Model()
model.fit(X_train, y_train, learning_rate = 0.01, epochs = 100)

# Model evaluation 
print(f"you model accuracy on given set: {round(evaluation(model, X_validation, y_validation), 2)}%")

you model accuracy on given set: 71.64%


<h2 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Prediction for Test Data and Output
</font>
</h2>


<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir">
Finally, we need to compute the model's output for the test samples. To do this, first we obtain the model's output on the test data. If the model predicts a higher probability that an individual has diabetes (output greater than <code>0.5</code>), we predict that the individual has diabetes; otherwise, we predict that the individual does not have diabetes. 
<br>Therefore, the <code>prediction</code> variable, which is a NumPy array, will contain <code>True</code> and <code>False</code> values.
</font>
</p>


In [14]:
_ , output= model.predict(test_data_numpy)
prediction = output > 0.5

<h2 align="left" style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
<b>Result Generation</b>
</font>
</h2>

<p style="text-align: justify; line-height:200%; font-family:vazir; font-size:medium">
<font face="vazir" size=3>
We execute the cells below to create the files for the implemented classes and our results.
</font>
</p>


In [15]:
from inspect import getsource

inspect_model = Model()

with open("model.py", "w") as f:
    f.write('import numpy as np\n')
    f.write('class Model:\n')
    f.write(getsource(inspect_model.__init__)+'\n')
    f.write(getsource(inspect_model.predict)+'\n')
    f.write(getsource(inspect_model.update_weights_for_one_epoch)+"\n")
    f.write(getsource(inspect_model.fit)+'\n')
f.close()

test_data.to_csv('processed_test_data.csv', index=False)
np.save("prediction.npy", prediction)

In [16]:
import zipfile
import joblib
import os

if not os.path.exists(os.path.join(os.getcwd(), 'play_with_shallow.ipynb')):
    %notebook -e play_with_shallow.ipynb
    
def compress(file_names):
    print("File Paths:")
    print(file_names)
    compression = zipfile.ZIP_DEFLATED
    with zipfile.ZipFile("result.zip", mode="w") as zf:
        for file_name in file_names:
            zf.write('./' + file_name, file_name, compress_type=compression)

file_names = ['processed_test_data.csv', 'model.py', 'prediction.npy', 'play_with_shallow.ipynb']
compress(file_names)

File Paths:
['processed_test_data.csv', 'model.py', 'prediction.npy', 'play_with_shallow.ipynb']
