# Assignment

For this assignment you will use the `wine.csv` data with the goal of building a red or white wine classifier. Use all the features in the dataset, allowing the network to decide how to build the internal weighting system.

1. Load the `wine.csv` data and prepare the data for analysis: Split the data into training and testing and normalize the features. <span style="color:red" float:right>[3 point]</span>
1. Train a logistic regression classifier to predict the type of wine (red vs. white). Report the accuracy of the model. <span style="color:red" float:right>[5 point]</span>
1. Train a multi-layer feed-forward neural network to predict the type of wine. You network should have one hidden layer. You are free to choose how many neurons you want in the hidden layer. <span style="color:red" float:right>[15 point]</span>
1. Tune your neural network by trying different values for the learning rate and the number of neurons in the hidden layer. <span style="color:red" float:right>[10 point]</span>
1. Report the accuracy of the best model you obtained in the previous step. <span style="color:red" float:right>[5 point]</span>

Determine what the best neural network structure and hyperparameter settings results in the best predictive capability

# End of assignment

In [1]:
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score, mean_squared_error, precision_recall_fscore_support
from sklearn.svm import SVC
from sklearn.metrics import roc_curve, auc
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

from sklearn.model_selection import GridSearchCV

Load the wine.csv data and prepare the data for analysis: Split the data into training and testing and normalize the features. [3 point]

In [2]:
#load data and perform EDA
wine = pd.read_csv('wine.csv')
print(wine.dtypes)
print(wine.isna().sum())
wine.head()

#split data into train and test sets
seed = 24
X_train, X_test, y_train, y_test = train_test_split(wine.drop(columns = "Class"), wine["Class"], 
                                                    test_size = 0.30, random_state = seed)

X_train = X_train.reset_index(drop = True)
X_test = X_test.reset_index(drop = True)

#normalization
znormalizer = StandardScaler()
znormalizer.fit(X_train)
X_train_norm = pd.DataFrame(znormalizer.transform(X_train))
X_test_norm = pd.DataFrame(znormalizer.transform(X_test))

#logistic regression model
logit = LogisticRegression()
logit.fit(X_train_norm, y_train)
y_pred_logit = logit.predict(X_test_norm)
print("Accuracy of Logistic Regression is {}".format(accuracy_score(y_test, y_pred_logit)))


fixed acidity           float64
volatile acidity        float64
citric acid             float64
residual sugar          float64
chlorides               float64
free sulfur dioxide     float64
total sulfur dioxide    float64
density                 float64
pH                      float64
sulphates               float64
alcohol                 float64
quality                   int64
Class                     int64
dtype: object
fixed acidity           0
volatile acidity        0
citric acid             0
residual sugar          0
chlorides               0
free sulfur dioxide     0
total sulfur dioxide    0
density                 0
pH                      0
sulphates               0
alcohol                 0
quality                 0
Class                   0
dtype: int64
Accuracy of Logistic Regression is 0.9917948717948718


In [3]:
import tensorflow as tf
from tensorflow import keras

model = keras.Sequential([
    keras.layers.Dense(64, input_dim=X_train_norm.shape[1], activation="relu"), # first hidden layer 
])

# summary of model object
print(model.summary())

#model compile
model.compile(optimizer=keras.optimizers.SGD(learning_rate=0.1, momentum=0.0), 
              loss="binary_crossentropy",
              metrics=["accuracy"])

#train model
pred = model.fit(X_train_norm, 
                    y_train, 
                    batch_size=16,
                    epochs=40,
                    verbose=1
)


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 64)                832       
Total params: 832
Trainable params: 832
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


In [6]:
#tuning with learning rate and num of neurons in hidden layer
learning_rates = [0.001,0.01,0.1]
num_neurons = [64,32,8]

for rate in learning_rates:
    for neurons in num_neurons:
        model = keras.Sequential([keras.layers.Dense(neurons, input_dim=X_train_norm.shape[1], activation="relu"), # first hidden layer])

        # summary of model object
        print(model.summary())

        #model compile
        model.compile(optimizer=keras.optimizers.SGD(learning_rate=0.1, momentum=0.0), 
                      loss="binary_crossentropy",
                      metrics=["accuracy"])

        #train model
        pred = model.fit(X_train_norm, 
                            y_train, 
                            batch_size=16,
                            epochs=40,
                            verbose=1
        )



SyntaxError: invalid syntax (<ipython-input-6-bfa5102d52d3>, line 13)