## 1. Look up the Adam optimization functions in PyTorch
https://pytorch.org/docs/stable/optim.html . 
- How does it work? Try at least one other optimization function with the diabetes dataset shown in class. 
    - Optimizers work to reduce losses by changing the attributes of the neural network such as weights and learning rate.Adam(Adaptive moment) works by maintaining a learning rate for each network weight (parameter) and separately adapted as learning unfolds. Adam combines benefits of both AdaGrad(Adaptive Gradient) and RMSProp(Root Mean Square propogation).
    - Epoch number: 1 with loss: 0.6604259610176086
    - Epoch number: 491 with loss: 0.43983137607574463
    - accuracy_score - 0.7272727272727273
    - recall - 0.73
- How does the model perform with the new optimizer? 
    - Tried different optimizers Adagrad, SGD,Adadelta accuracy didn't improve infact it went down
    - With RMSprop optimizer accuracy went up to 0.72 
        - Epoch number: 1 with loss: 0.6150843501091003 
        - Epoch number: 491 with loss: 0.05505194514989853
        - acuuracy_score - 0.7207792207792207
        - recall - 0.72
- Did it perform better or worse than Adam? Why do you think that is?
    - Compared to Adam doing RMSprop Adam did perfom better.
    - Didn't quiet understand why values sometimes changed after every run ? after restaring kernel?

In [13]:
import pandas as pd
import torch

diabetes_df = pd.read_csv("../week_13/diabetes.csv")
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [14]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = diabetes_df.drop('Outcome', axis=1).values
y = diabetes_df['Outcome'].values

# Split into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42, stratify=y)

# #Standardize
sc= StandardScaler()
X_train=sc.fit_transform(X_train)
X_test=sc.fit_transform(X_test)

In [15]:
import torch.nn as nn
import torch.nn.functional as F #this has activation functions

# Creating tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)

y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

print(X_train)

tensor([[-0.8514, -0.9801, -0.4048,  ..., -0.6077,  0.3108, -0.7922],
        [ 0.3566,  0.1614,  0.4654,  ..., -0.3021, -0.1164,  0.5610],
        [-0.5494, -0.5045, -0.6223,  ...,  0.3726, -0.7649, -0.7076],
        ...,
        [-0.8514, -0.7582,  0.0303,  ...,  0.7800, -0.7861, -0.2847],
        [ 1.8665, -0.3142,  0.0303,  ..., -0.5695, -1.0194,  0.5610],
        [ 0.0546,  0.7322, -0.6223,  ..., -0.3149, -0.5770,  0.3073]])


In [16]:
class ANN_Model(nn.Module):
    def __init__(self, input_features=8, hidden1=20, hidden2=20, out_features =2):
        super().__init__()
        self.layer_1_connection = nn.Linear(input_features, hidden1)
        self.layer_2_connection = nn.Linear(hidden1, hidden2)
        self.out = nn.Linear(hidden2, out_features)
    
    def forward(self, x):
        #apply activation functions
        x = F.relu(self.layer_1_connection(x))
        x = F.relu(self.layer_2_connection(x))
        x = self.out(x)
        return x

In [17]:
torch.manual_seed(42)

#instantiate the model
model = ANN_Model()

In [18]:
# loss function
loss_function = nn.CrossEntropyLoss()

#optimizer
#optimizer = torch.optim.Adam(model.parameters(), lr = 0.01)
#noted results for this learning rate
#optimizer =  torch.optim.SGD(model.parameters(), lr=0.05) #, momentum=0.9)

#According to what was discussed in group exercise, SparseAdam doesn't work here because this dataset doesn't have any missing/null values. SparseAdams is used for datasets where
#there are missing values.
#optimizer = torch.optim.SparseAdam(model.parameters(), lr=0.01,betas=0.9)
#optimizer = torch.optim.Adadelta(model.parameters(), lr=0.05 )#, rho=0.9, eps=1e-06, weight_decay=0)
#optimizer = torch.optim.Adagrad(model.parameters(), lr=0.01,lr_decay=0, weight_decay=0, initial_accumulator_value=0)

optimizer = torch.optim.RMSprop(model.parameters(), lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)
#Adamw?
#how fast they tune?
#how quickly the model can perform?
#different optimizer for regression problems?


In [19]:
#run model through multiple epochs/iterations
final_loss = []
n_epochs = 500
for epoch in range(n_epochs):
    y_pred = model.forward(X_train)
    loss = loss_function(y_pred, y_train)
    final_loss.append(loss)
    
    if epoch % 10 == 1:
        print(f'Epoch number: {epoch} with loss: {loss.item()}')
    
    optimizer.zero_grad() #zero the gradient before running backwards propagation
    loss.backward() #for backward propagation 
    optimizer.step() #performs one optimization step each epoch
    

Epoch number: 1 with loss: 0.6150843501091003
Epoch number: 11 with loss: 0.43112674355506897
Epoch number: 21 with loss: 0.41004201769828796
Epoch number: 31 with loss: 0.3770589232444763
Epoch number: 41 with loss: 0.361624538898468
Epoch number: 51 with loss: 0.34350842237472534
Epoch number: 61 with loss: 0.32837292551994324
Epoch number: 71 with loss: 0.3073282837867737
Epoch number: 81 with loss: 0.28974857926368713
Epoch number: 91 with loss: 0.2859925627708435
Epoch number: 101 with loss: 0.264140784740448
Epoch number: 111 with loss: 0.24424584209918976
Epoch number: 121 with loss: 0.23626987636089325
Epoch number: 131 with loss: 0.22080349922180176
Epoch number: 141 with loss: 0.2224786877632141
Epoch number: 151 with loss: 0.20888471603393555
Epoch number: 161 with loss: 0.20124228298664093
Epoch number: 171 with loss: 0.18742360174655914
Epoch number: 181 with loss: 0.187859907746315
Epoch number: 191 with loss: 0.17149567604064941
Epoch number: 201 with loss: 0.16896587610

In [20]:
#predictions
y_pred = []

with torch.no_grad():
    for i, data in enumerate(X_test):
        prediction = model(data)
        y_pred.append(prediction.argmax().item())


In [21]:
from sklearn.metrics import accuracy_score
a_score = accuracy_score(y_test, y_pred)
print(a_score)

0.7207792207792207


In [22]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.76      0.84      0.80       100
           1       0.63      0.50      0.56        54

    accuracy                           0.72       154
   macro avg       0.69      0.67      0.68       154
weighted avg       0.71      0.72      0.71       154



## 2. Write a function that lists and counts the number of divisors for an input value.
- Example 1:
- Input: 5
- Output: “There are 2 divisors: 1 and 5”
- Example 2:
- Input: 40
- Output: “There are 8 divisors: 1, 2, 4, 5, 8, 10, 20, and 40”

In [23]:
def getDivisorsAndCount(number):
    if number > 0:
        count = 0
        divisors = []
        for i in range(1,number+1):
            if number%i == 0:
                count = count + 1
                divisors.append(i)
        
    my_string = ','.join(map(str, divisors)) 
    new =' and '
    old=','
    maxreplace = 1

    result = new.join(my_string.rsplit(old, maxreplace))
    print("There are ",count,"divisors: " , result)
    

In [24]:
getDivisorsAndCount(5)
getDivisorsAndCount(40)
getDivisorsAndCount(13)

There are  2 divisors:  1 and 5
There are  8 divisors:  1,2,4,5,8,10,20 and 40
There are  2 divisors:  1 and 13
