<a href="https://colab.research.google.com/github/pavanramadass/machine-learning-projects/blob/main/Assignment1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

TASK 1:

The machine learning problem I would like to solve using logistic regression is classifying whether someone has diabetes or not.
Logistic regression is the best choice for this problem, because logistic regression's output is a binary output, in other words,
the output I receive from logistic regression is a yes or no. Therefore, logistic regression will classify whether someone has 
diabetes based on the inputs. 

TASK 2:

In [None]:
# Exploratory Data Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from google.colab import files
import io

uploaded = files.upload()
data = pd.read_csv(io.BytesIO(uploaded['diabetes2.csv']))

'''
Link to Dataset:
https://www.kaggle.com/kandij/diabetes-dataset
'''

'''
For my Exploratory Data Analysis, I chose to check for NaNs and also outliers. I chose
to check for these two features in the data set, because these are most important to understand
when it comes to training a machine learning algorithm. 
If there are NaNs when there should not be, it will negatively affect the training as well as 
if there are many outliers. Moreover, having may NaNs as well as outliers also shows the 
dataset may be a faulty/unreliable dataset for training an AI on. 
'''

'''
I am using the .info() command to find the range-index, the number of columns
and their labels in the data, as well as how many non-null/null values are there
in the dataset. 

Based on the .info() command, there are no null values which means this is a 
good dataset to work with.  
'''
data.info()

'''
I am using seaborn pairplot to analyze for outliers in the dataset. From the plots, there
are no major outliers in the dataset. However, for skin-thickness, insulin, and bmi there
were a few outliers, so I am curious to see how these outliers will affect the performance
of the Logistic Regression. Since out of the many normal data points, only a very very small
amount are outliers, I chose not to remove them as I hypothesis them to not have a major adverse affect
in the training.  
'''
sns.pairplot(data, height=2.5)

TASK 3 & TASK 4:

In [None]:
# Importing Libraries
import tensorflow as tf 
import numpy as np
import pandas as pd 
import math 
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from google.colab import files
import io

def sigmoid(w, X, b):
  dot_product = np.dot(X, w.T) + b 
  sig = 1/(1 + np.exp(-dot_product))  
  return sig

# used to compare the gradient descents and their optimizers 
def cost_function(n, y_actual, y_pred):
  return (-1/n) * np.sum(y_actual * np.log(y_pred) + (1 - y_actual) * np.log(1 - y_pred))

# Batch gradient descent calculation 
def gradient_descent_batch(w, b, X, Y):
  n = X.shape[0]
  A = sigmoid(w, X, b)
  cost = cost_function(n, Y, A)
  dw = np.dot((A - Y).T, X) / n 
  db = np.sum(A - Y) / n 
  gradients = {"dw": dw, "db": db} 
  return gradients, cost

# Stoch gradient descent calculation 
def gradient_descent_stoch(w, b, X, Y):
  n = X.shape[0]
  A = sigmoid(w, X, b)
  cost = cost_function(n, Y, A)
  dw = ((A - Y).T * X) / n 
  db = np.sum(A - Y) / n
  gradients = {"dw": dw, "db": db}
  return gradients, cost

def update(w, b, X, Y, data_list, n, b1, b2, e, epoch, gradient_type, optimizer_type):
  costs = []
  if gradient_type == 1: # batch gradient
    if optimizer_type == 0: # no optimizer
      for i in range(epoch):
        gradients, cost = gradient_descent_batch(w, b, X, Y) 
        dw = gradients["dw"]
        db = gradients["db"]
        w = w - (n * dw)
        b = b - (n * db) 
        costs.append(cost)
      coeffs = {"w": w, "b": b}
      gradient = {"dw": dw, "db": db}
      return coeffs, gradient, costs
    elif optimizer_type == 1: # adam optimizer
      m_w, v_w, m_b, v_b, t = 0, 0, 0, 0, 0  
      for i in range(epoch):
        gradients, cost = gradient_descent_batch(w, b, X, Y)
        dw = gradients["dw"]
        db = gradients["db"]
        t += 1
        w, b, m_w, v_w, m_b, v_b = adam(w, b, dw, db, m_w, v_w, m_b, v_b, n, b1, b2, e, t) 
        costs.append(cost)
      coeffs = {"w": w, "b": b}
      gradient = {"dw": dw, "db": db}
      return coeffs, gradient, costs
    elif optimizer_type == 2: # nadam optimizer
      m_w, v_w, m_b, v_b, t = 0, 0, 0, 0, 0
      for i in range(epoch):
        gradients, cost = gradient_descent_batch(w, b, X, Y)
        dw = gradients["dw"]
        db = gradients["db"]
        t += 1
        w, b, m_w, v_w, m_b, v_b = nadam(w, b, dw, db, m_w, v_w, m_b, v_b, n, b1, b2, e, t) 
        costs.append(cost)
      coeffs = {"w": w, "b": b}
      gradient = {"dw": dw, "db": db}
      return coeffs, gradient, costs
  elif gradient_type == 2: # stoch gradient
    if optimizer_type == 0: # no optimizer
      for i in range(epoch):
        np.random.shuffle(data_list)
        for example in data_list:
          X = example[0:-1]
          Y = example[-1] 
          gradients, cost = gradient_descent_stoch(w, b, X, Y) 
          dw = gradients["dw"]
          db = gradients["db"]
          w = w - (n * dw)
          b = b - (n * db)
          costs.append(cost)
      coeffs = {"w": w, "b": b}
      gradient = {"dw": dw, "db": db}
      return coeffs, gradient, costs 
    elif optimizer_type == 1: # adam optimizer
      for i in range(epoch):
        np.random.shuffle(data_list)
        m_w, v_w, m_b, v_b, t = 0, 0, 0, 0, 0
        for example in data_list:
          X = example[0:-1]
          Y = example[-1]
          gradients, cost = gradient_descent_stoch(w, b, X, Y)
          dw = gradients["dw"]
          db = gradients["db"]
          w, b, m_w, v_w, m_b, v_b = adam(w, b, dw, db, m_w, v_w, m_b, v_b, n, b1, b2, e, t)
          costs.append(cost)
      coeffs = {"w": w, "b": b}
      gradient = {"dw": dw, "db": db}
      return coeffs, gradient, costs
    elif optimizer_type == 2: # nadam optimizer 
      for i in range(epoch):
        np.random.shuffle(data_list)
        m_w, v_w, m_b, v_b, t = 0, 0, 0, 0, 0
        for example in data_list:
          X = example[0:-1]
          Y = example[-1]
          gradients, cost = gradient_descent_stoch(w, b, X, Y)
          dw = gradients["dw"]
          db = gradients["db"]
          w, b, m_w, v_w, m_b, v_b= nadam(w, b, dw, db, m_w, v_w, m_b, v_b, n, b1, b2, e, t)
          costs.append(cost)
      coeffs = {"w": w, "b": b}
      gradient = {"dw": dw, "db": db}
      return coeffs, gradient, costs
        
def adam(w, b, dw, db, m_w, v_w, m_b, v_b, n, b1, b2, e, t):
  m_w = b1 * m_w + (1 - b1) * dw 
  v_w = b2 * v_w + (1 - b2) * dw ** 2

  m_b = b1 * m_b + (1 - b1) * db
  v_b = b2 * v_b + (1 - b2) * db

  m_w_ = m_w / (1 - b1 ** t) 
  v_w_ = v_w / (1 - b2 ** t)

  m_b_ = m_b / (1 - b1 ** t)
  v_b_ = v_b / (1 - b2 ** t) 

  w = w - (n / (np.sqrt(v_w_) + e)) * m_w_ 
  b = b - (n / (np.sqrt(v_b_) + e)) * m_b_ 
  return w, b, m_w, v_w, m_b, v_b 


def nadam(w, b, dw, db, m_w, v_w, m_b, v_b, n, b1, b2, e, t):
  m_w = b1 * m_w + (1 - b1) * dw 
  m_b = b1 * m_b + (1 - b1) * db

  m_w_ = m_w / (1 - b1 ** t)
  m_b_ = m_b / (1 - b1 ** t) 

  v_w = b2 * v_w + (1 - b2) * (dw ** 2) 
  v_b = b2 * v_b + (1 - b2) * (db ** 2) 

  v_w_ = v_w / (1 - b2 ** t) 
  v_b_ = v_b / (1 - b2 ** t) 

  w = w - (n / (np.sqrt(v_w_) + e)) * (b1 * m_w_ + ((1 - b1) * dw) / (1 - b1 ** t))
  b = b - (n / (np.sqrt(v_b_) + e)) * (b1 * m_b_ + ((1 - b1) * db) / (1 - b1 ** t))

  return w, b, m_w, v_w, m_b, v_b 

def predict(y, y_size, x_size):
  y_pred = np.zeros((1, x_size))
  for i in range(y_size):
    if y[0][i] > 0.5:
      y_pred[0][i] = 1
  return y_pred 


#Data preprocessing
uploaded = files.upload()
data = pd.read_csv(io.BytesIO(uploaded['diabetes2.csv']))

costs = [[]]

data_list = data.iloc[:, :].values 

X = data.iloc[:,0:-1].values # need to figure out which is for rows and which is for columns 
Y = data.iloc[:,-1].values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state=42)

feature_size = X_train.shape[1]

# Batch Gradient with no optimizer 
w = np.zeros((1, feature_size))
b = 0

coeff, gradient, cost = update(w, b, X_train, Y_train, data_list, n=0.002, b1=0.9, b2=0.999, e=0.00000001, epoch=1000, gradient_type=1, optimizer_type=0)
costs.append(cost)

w = coeff["w"]
b = coeff["b"]
y = sigmoid(w, X, b)
x_size = X_train.shape[0]
y_size = y.shape[1] 
y_pred = predict(y, y_size, x_size) 


# Batch Gradient with adam optimizer
w = np.zeros((1, feature_size))
b = 0

coeff, gradient, cost = update(w, b, X_train, Y_train, data_list, n=0.002, b1=0.9, b2=0.999, e=0.00000001, epoch=1000, gradient_type=1, optimizer_type=1)
costs.append(cost)
w = coeff["w"]
b = coeff["b"]
y = sigmoid(w, X, b)
x_size = X_train.shape[0]
y_size = y.shape[1] 
y_pred = predict(y, y_size, x_size) 

# Batch Gradient with nadam optimizer
w = np.zeros((1, feature_size))
b = 0

coeff, gradient, cost = update(w, b, X_train, Y_train, data_list, n=0.002, b1=0.9, b2=0.999, e=0.00000001, epoch=1000, gradient_type=1, optimizer_type=2)
costs.append(cost)
w = coeff["w"]
b = coeff["b"]
y = sigmoid(w, X, b)
x_size = X_train.shape[0]
y_size = y.shape[1] 
y_pred = predict(y, y_size, x_size) 

# Stochastic Gradient with no optimizer
w = np.zeros((1, feature_size))
b = 0

coeff, gradient, cost = update(w, b, X_train, Y_train, data_list, n=0.002, b1=0.9, b2=0.999, e=0.00000001, epoch=1000, gradient_type=2, optimizer_type=0)
costs.append(cost)
w = coeff["w"]
b = coeff["b"]
y = sigmoid(w, X, b)
x_size = X_train.shape[0]
y_size = y.shape[1] 
y_pred = predict(y, y_size, x_size) 

# Stochastic Gradient with adam optimizer
w = np.zeros((1, feature_size))
b = 0

coeff, gradient, cost = update(w, b, X_train, Y_train, data_list, n=0.002, b1=0.9, b2=0.999, e=0.00000001, epoch=1000, gradient_type=2, optimizer_type=1)
costs.append(cost)
w = coeff["w"]
b = coeff["b"]
y = sigmoid(w, X, b)
x_size = X_train.shape[0]
y_size = y.shape[1] 
y_pred = predict(y, y_size, x_size) 

# Stochastic Gradient with nadam optimizer
w = np.zeros((1, feature_size))
b = 0

coeff, gradient, cost = update(w, b, X_train, Y_train, data_list, n=0.002, b1=0.9, b2=0.999, e=0.00000001, epoch=1000, gradient_type=2, optimizer_type=2)
costs.append(cost)
w = coeff["w"]
b = coeff["b"]
y = sigmoid(w, X, b)
x_size = X_train.shape[0]
y_size = y.shape[1] 
y_pred = predict(y, y_size, x_size) 

# Conclusive plotting 
plt.plot(costs[1], '-g', label='batch gradient--no optimizer')
plt.plot(costs[2], '-r', label='batch gradient--adam')
plt.plot(costs[3], '-b', label='batch gradient--nadam')
plt.ylabel('cost')
plt.xlabel('epochs')
plt.title('Cost Analysis of batch gradient')
plt.legend(loc='upper right')
plt.show()

plt.plot(costs[4], '-g', label='stoch gradient--no optimizer')
plt.plot(costs[5],'-r', label='stoch gradient--adam')
plt.plot(costs[6], '-b', label='stoch gradient--nadam')
plt.ylabel('cost')
plt.xlabel('epochs')
plt.title('Cost Analysis of stochastic gradient')
plt.legend(loc='upper right')
plt.show()

# Conclusive Writing
'''
In conclusion, I noticed that using the optimizers was better than using the vanilla gradients. Moreover, 
comparing the vanilla gradients, batch gradient descent had a better cost reduction compared to the 
stochastic gradient. Therefore, I say we must use optimizers, because they increase the performance of 
updating the coefficients w and b. The optimizers I used were adam and nadam, and both assisted in improving
the updating performance by incorporating other parameters into the calculation such as betas and epsilon as
well as biases. 

Now, when comparing nadam and adam, both optimizers seem to be on par with one another. This may be due to the
data set I have chosen, because in the research paper it was discussed that nadam is a more optimal optimizer than
adam because it incorporates NAG into the adam optimizer. Therefore, I hypothesize that nadam would work best on 
very large datasets such as data sets used for image recognition where the algorithm trains through thousands of images
and, when it comes to a smaller data set like the one I have chosen, it does not matter whether one uses adam or nadam when updating 
the coefficients of the logistic function. 
'''