<a href="https://colab.research.google.com/github/savula13/ProjectsInMLandAI/blob/main/Assignment3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 3
Saipranav Avula

In [None]:
import pandas as pd
import seaborn as sn
import numpy as np
import random
import matplotlib.pyplot as plt
import tensorflow as tf
import keras_tuner as kt
import tensorflow.keras as keras
from kerastuner.tuners import RandomSearch
from sklearn import model_selection as ms
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Research

For implementing a 2-layer neural network, I used Tensorflow and I specifically used the Keras framework.
With Keras, I learned how to instantiate a model with a set number of input nodes using the Model class. 
From the Keras documentation, I learned how to use the Sequential class to manually add hidden
and output layers to the insantiated model. In this documentation, it also details how to set the activation
functions for each layer. 

The Functional API for Keras allows a user to create more complex models than the Sequential class, which is not
as applicable for this particular assignment, but it has functionality to obtain model summaries and model visualizations
which are useful in evaluating models.

Keras also has a class for hyperparameter tuning. I used the documentation for the Tuner class to learn about how different
methods such as RandomSearch, Hyperband, and Bayesian Optimization can be used to find the optimal hyperparameters such as 
the number of nodes in the hidden layers, the number of hidden layers, learning rate, momentum, etc.



## Links
https://keras.io/guides/functional_api/

https://keras.io/api/models/sequential/

https://keras.io/api/models/model/

https://keras.io/getting_started/intro_to_keras_for_engineers/

https://keras.io/api/keras_tuner/tuners/random/

https://www.tensorflow.org/tutorials/keras/keras_tuner

https://towardsdatascience.com/the-art-of-hyperparameter-tuning-in-deep-neural-nets-by-example-685cb5429a38

https://www.analyticsvidhya.com/blog/2021/05/tuning-the-hyperparameters-and-layers-of-neural-network-deep-learning/



# Part 2
## 1. Exploratory Data Analysis
The dataset I am using is based on Stellar Classification which uses the spectral data of stars to categorize them into different categories.
Specifically the raw data has been processed to use Absolute Magnitude and B-V Color Index to identify Giants and Dwarfs.

https://www.kaggle.com/datasets/vinesmsuic/star-categorization-giants-and-dwarfs

In [None]:
df = pd.read_csv("./Star39552_balanced.csv")
df.head()

In [None]:
df.drop(["SpType"], axis = 1, inplace = True)
rows, cols = df.shape
print("There are {} rows".format(rows))
print("There are {} columns".format(cols))

In [None]:
df.columns

This data has already gone through preprocessing so it is already balanced.

In [None]:
df.TargetClass.value_counts(normalize=True)

In [None]:
sn.pairplot(df, hue = "TargetClass")

In [None]:
X = df.iloc[:, 0:cols-1]
Y = df.iloc[:, cols-1]
Y.head()

## 2. Train Dev Test Split

In [1]:
X_train, X_temp, Y_train, Y_temp = ms.train_test_split(X, Y, test_size = 0.3, random_state= 42)
X_temp.shape

NameError: ignored

In [None]:
X_test, X_dev, Y_test, Y_dev = ms.train_test_split(X_temp, Y_temp, test_size=0.5, random_state=42)

In [None]:
Y_test.shape

# 3. Forward Propogation

For the forward propogation, I am using the relu activation function for the first 2 layers of the model (input and first).

Since it is linear for values greater than 0, the relu is a common and good choice of activation functon. 
and the sigmoid activation is used for the output layer so that the outputs are between 0 and 1. 

In this case the number of layer nodes is manually set. Hyperparameter tuning is implemented later to determine the optimal number of layer nodes.

In [None]:
init_model = tf.keras.Sequential()
init_model.add(Dense(12, input_shape=(cols-1,), activation='relu'))
init_model.add(Dense(6, activation='relu'))
init_model.add(Dense(1, activation='sigmoid'))

# 4 & 5. Cost Function and Gradient Descent

Gradient descent is implemented using binary cross-entropy as the loss function. Since the target variable (giant star or dwarf star) is binary, 
binary cross-entropy is an ideal choice since it is usually used for binary classifcation problems. This includes optimizing the cost function over the layers as well

The Adam optimizer is used, which uses the past gradients to calculate the current gradient and is commonly used in training neural nets. Since it 
has built in tuning it is a good option to choose as optimizer.

In [None]:
init_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Below, the neural network is trained on teh training data. Then it is validated on the dev set.

In [None]:
Xt = tf.convert_to_tensor(X_train)
Yt = tf.convert_to_tensor(Y_train)
init_model.fit(Xt, Yt, epochs = 10, batch_size=500)
init_model.fit(X_train, Y_train, epochs= 10, validation_data=(X_dev, Y_dev))

Once the neural network is trained and validated initially, then the predictions made by the model are evaluated using the test set.

In [None]:
acc_temp = init_model.evaluate(X_test, Y_test)

print("The accuracy with this initial neural network configuration is {}".format(acc_temp[1]))

# Task 3

Now that the neural network has been initialized, the optimal hyperparameters can be found using RandomSearch. Keras has a Tuner
class that makes it relatively simple to implement RandomSearch with given constraints of hyperparameters.

I chose this method to choose the optimal hyperparameters because it is more efficient and generally as effective as GridSearch.

I also chose the Adam optimizer as in other machine learning research, Adam has proven to be an effective optimizer that tunes itself.

The varied hyperparameters are:

1. Nodes in First Hidden Layer
2. Nodes in Second Hidden Layer
3. Learning rate

I did not use regularization because as seen in the inital runs,
the accuracy on the training sets is similar to when it is validated using the dev sets (70% accuacy on training vs 83% accuracy on dev).
Therefore, there is no concern that the model is overfitting.

In [None]:
def build_model(hp):
    
    first_layer = hp.Int(name = 'first_layer', min_value = 16, max_value = 128, step = 16)
    second_layer = hp.Int(name = 'second_layer', min_value = 16, max_value = 64, step = 16)

    #Forward Propogation 

    #Creating neural network layers with dropouts
    model = tf.keras.Sequential()
    #input and first hidden layer
    model.add(Dense(units = first_layer, input_shape=(5,), activation='relu'))
    model.add(keras.layers.Dropout(0.2))
    #second hidden layer
    model.add(Dense(units = second_layer, activation='relu'))
    model.add(keras.layers.Dropout(0.2))
    #output layer
    model.add(Dense(1, activation='sigmoid'))

    #choices for learning rate 
    hp_learning_rate = hp.Choice('learning_rate', values = [1e-2, 1e-3, 1e-4]) 

    #Cost Function and Gradient Descent Implementation
    #configuring model with choices from above
    model.compile(loss='binary_crossentropy', optimizer = keras.optimizers.Adam(learning_rate = hp_learning_rate), metrics=['accuracy'])

    return model

Once the RandomSearch and model are configured, the model is fit on the training data and then validated using the dev set. The RandomSearch tunes the parameters
after every iteration of the validation on the dev set.

In [None]:
tuner = kt.RandomSearch(build_model, objective = 'val_accuracy', max_trials = 5, directory = 'temp2',
project_name = 'random_search') 

#fitting model on training data, validating using dev set, and tuning after each iteration
tuner.search(X_train, Y_train, epochs = 10, validation_data = (X_dev, Y_dev))

Once the Random Search is done, then the best hyperparameters for the model are found.

In [None]:
#returing best hyperparameters to use in final model
best_model = tuner.get_best_models(1)[0]
best_hyperparameters = tuner.get_best_hyperparameters(1)[0] 
print(f"""
The hyperparameter search has been completed. 
The optimal number of layers in the first densely-connected
layer is {best_hyperparameters.get('first_layer')}. The optimal number of layers in the second densely-connected layer is {best_hyperparameters.get('second_layer')}. 
The optimal learning rate for the optimizer is {best_hyperparameters.get('learning_rate')}.
""")

A final neural network model is built with the best hyperparameters from the random search. It is then fit on the training set and validated using the dev set.

In [None]:

model = tuner.hypermodel.build(best_hyperparameters)

model.fit(X_train, Y_train, epochs= 10, validation_data=(X_dev, Y_dev))


Once the best model is trained and validated, it is evauluated using the test data.

In [None]:
_, acc = model.evaluate(X_test, Y_test)

print("The accuracy of the neural network with the optimal hyperparameters when evaluated on the test set is {}".format(acc))
print(acc)

# Task 4

I am creating a logistic regression model to compare its performance with the neural network peformance. Logistic regression is ideal for binary classification and we have used it for previous binary classifcaiton problems

In [None]:
lr_model = LogisticRegression()
lr_model.fit(X_train, Y_train)

In [None]:
Y_pred = lr_model.predict(X_test)

test_accuracy = metrics.accuracy_score(Y_test, Y_pred)

print("The accuracy of the logistic regression model when evaluated on the test set is {}".format(test_accuracy))

## Comparison

The accuracy of the nerual network model is .8756 87.6%$ (in my testing).

The accuracy of the logistic regression model is .877 ~ 87.7% (in my testing).

Therefore, the models are achieving the same accuracy on the same training and test sets.
One reason that they have the same performance is that there is not a large number of features/columns
in the dataset. Neural networks tend to have substantial improvement over other models when using unstructured
data or data with many input features. Therefore, it makes sense that a simpler model such as Logistic Regression is
able to achieve very similar accuracy. 

Another factor could be that the neural network I used had 2 hidden layers. If a network with more layers was chosen it could've
led to a more substantial improvement over the Logistic Regression.