# Question 1:

Look up the Adam optimization functions in PyTorch
https://pytorch.org/docs/stable/optim.html . How does it work? Try at least one other
optimization function with the diabetes dataset shown in class. How does the model
perform with the new optimizer? Did it perform better or worse than Adam? Why do you
think that is?

Optimizers work by updating the parameters of the model at each step. At each step the optimizer can evaluate the model and return the loss. The Adam optimization functions uses a per-parameter learning rate that can change at each step.  It is similar to stochastic gradient descent, however it builds on that because it updates the learning rate for all parameters throughout training of the model, while stochastic gradient descent keeps the same learning rate throughout.  

In [141]:
import pandas as pd
import torch

diabetes_df = pd.read_csv("../week_13/diabetes.csv")
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [142]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X=diabetes_df.drop('Outcome', axis=1).values
y = diabetes_df['Outcome'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)


In [143]:
import torch.nn as nn
import torch.nn.functional as F #this has activation functions

#Creating tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)

y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

print(X_train)

tensor([[1.0000e+00, 9.0000e+01, 6.2000e+01,  ..., 2.7200e+01, 5.8000e-01,
         2.4000e+01],
        [5.0000e+00, 1.2600e+02, 7.8000e+01,  ..., 2.9600e+01, 4.3900e-01,
         4.0000e+01],
        [2.0000e+00, 1.0500e+02, 5.8000e+01,  ..., 3.4900e+01, 2.2500e-01,
         2.5000e+01],
        ...,
        [1.0000e+00, 9.7000e+01, 7.0000e+01,  ..., 3.8100e+01, 2.1800e-01,
         3.0000e+01],
        [1.0000e+01, 1.1100e+02, 7.0000e+01,  ..., 2.7500e+01, 1.4100e-01,
         4.0000e+01],
        [4.0000e+00, 1.4400e+02, 5.8000e+01,  ..., 2.9500e+01, 2.8700e-01,
         3.7000e+01]])


In [146]:
class ANN_Model(nn.Module):
    def __init__(self, input_features=8, hidden1=20, hidden2=20, out_features=2):
        super().__init__()
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, out_features)
        
    def forward(self, x): 
        x = F.relu(self.layer_1_connection(x))
        x = F.relu(self.layer_2_connection(x))
        x = self.out(x)
        return x

In [147]:
torch.manual_seed(42)

#instantiate the model
model = ANN_Model()

In [148]:
# Loss function
loss_function = nn.CrossEntropyLoss()

#Optimizer
optimizer = torch.optim.Adam(model.parameters(), lr = 0.01)

In [149]:
#run model through multiple epochs
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = model.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1: 
        print(f'Epoch number: {epoch} with loss: {loss.item()}')
        
    optimizer.zero_grad()  #clears the gradient before running backwards propagation
    loss.backward() #for backward propagation
    optimizer.step() #performs one optimization step each epoch

Epoch number: 1 with loss: 0.7310296297073364
Epoch number: 11 with loss: 0.6581035852432251
Epoch number: 21 with loss: 0.6340878009796143
Epoch number: 31 with loss: 0.6144365668296814
Epoch number: 41 with loss: 0.591715395450592
Epoch number: 51 with loss: 0.5678935050964355
Epoch number: 61 with loss: 0.5529173612594604
Epoch number: 71 with loss: 0.5399057865142822
Epoch number: 81 with loss: 0.5277354121208191
Epoch number: 91 with loss: 0.5180846452713013
Epoch number: 101 with loss: 0.5108497738838196
Epoch number: 111 with loss: 0.5028660893440247
Epoch number: 121 with loss: 0.4947950541973114
Epoch number: 131 with loss: 0.48683962225914
Epoch number: 141 with loss: 0.47834235429763794
Epoch number: 151 with loss: 0.4712027311325073
Epoch number: 161 with loss: 0.46509668231010437
Epoch number: 171 with loss: 0.4602738320827484
Epoch number: 181 with loss: 0.45450371503829956
Epoch number: 191 with loss: 0.4466153681278229
Epoch number: 201 with loss: 0.44014832377433777
Ep

In [150]:
#predictions
y_pred = []

with torch.no_grad():
    for i, data in enumerate(X_test):
        prediction = model(data)
        y_pred.append(prediction.argmax().item())

In [151]:
from sklearn.metrics import accuracy_score
a_score = accuracy_score(y_test, y_pred)
print(a_score)

0.7207792207792207


In [152]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.78      0.79      0.79       100
           1       0.60      0.59      0.60        54

    accuracy                           0.72       154
   macro avg       0.69      0.69      0.69       154
weighted avg       0.72      0.72      0.72       154



# Use a different optimization: Stochastic Gradient Descent

In [174]:
#Optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)

In [175]:
#run model through multiple epochs
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = model.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1: 
        print(f'Epoch number: {epoch} with loss: {loss.item()}')
        
    optimizer.zero_grad()  #clears the gradient before running backwards propagation
    loss.backward() #for backward propagation
    optimizer.step() #performs one optimization step each epoch

Epoch number: 1 with loss: 0.8413105010986328
Epoch number: 11 with loss: 4.089559078216553
Epoch number: 21 with loss: 3.4423537254333496
Epoch number: 31 with loss: 2.8429617881774902
Epoch number: 41 with loss: 2.2629141807556152
Epoch number: 51 with loss: 1.7017086744308472
Epoch number: 61 with loss: 1.1851328611373901
Epoch number: 71 with loss: 0.8169639110565186
Epoch number: 81 with loss: 0.6704997420310974
Epoch number: 91 with loss: 0.6415143013000488
Epoch number: 101 with loss: 0.6374567151069641
Epoch number: 111 with loss: 0.636874258518219
Epoch number: 121 with loss: 0.6367053389549255
Epoch number: 131 with loss: 0.6366013288497925
Epoch number: 141 with loss: 0.636509120464325
Epoch number: 151 with loss: 0.6364208459854126
Epoch number: 161 with loss: 0.6363317966461182
Epoch number: 171 with loss: 0.6362389922142029
Epoch number: 181 with loss: 0.6361460089683533
Epoch number: 191 with loss: 0.6360546350479126
Epoch number: 201 with loss: 0.6359624862670898
Epoch 

In [176]:
#predictions
y_pred = []

with torch.no_grad():
    for i, data in enumerate(X_test):
        prediction = model(data)
        y_pred.append(prediction.argmax().item())

In [177]:
from sklearn.metrics import accuracy_score
a_score = accuracy_score(y_test, y_pred)
print(a_score)

0.6428571428571429


In [178]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.65      0.99      0.78       100
           1       0.00      0.00      0.00        54

    accuracy                           0.64       154
   macro avg       0.32      0.49      0.39       154
weighted avg       0.42      0.64      0.51       154



Stochastic Gradient Descent didn't perform as well as Adam.  The accuracy score was 0.64 for SGD, while it was 0.72 for Adam.  This is because it uses a constant learning rate throughout, instead of an updated, per-parameter learning rate throughout like Adam. 

# Question 2:

Write a function that lists and counts the number of divisors for an input value.
Example 1:
Input: 5
Output: “There are 2 divisors: 1 and 5”
Example 2:
Input: 40
Output: “There are 8 divisors: 1, 2, 4, 5, 8, 10, 20, and 40"

In [137]:
def num_divisors(n):
    num_list = []
    for x in range(1,1000,1):
        if n % x == 0:
            num_list.append(x)
    name_string = ', '.join([str(x) for x in num_list[0:-1]]) + ' and ' + str(num_list[-1])
    return "There are " + str(len(num_list)) + " divisors: " + name_string

In [138]:
num_divisors(40)

'There are 8 divisors: 1, 2, 4, 5, 8, 10, 20 and 40'

In [139]:
num_divisors(5)

'There are 2 divisors: 1 and 5'

In [140]:
num_divisors(18)

'There are 6 divisors: 1, 2, 3, 6, 9 and 18'