Logistic Regression vs Basic Neural Network 
 - neural network using a sigmoid activation function is essentially a logistic regression model
 

In [10]:
#imports 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np  
import tensorflow as tf
from path import Path

In [2]:
#load dataset
file_path = Path("../Resources/diabetes.csv")
diabetes_df = pd.read_csv(file_path)
diabetes_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Data needs to be standardized for basic neural network, no preprocessing required for the logistic regression. Need to keep track of scaled and unscaled training dataset. 

In [3]:
#remove diabetes outcome target from data
y = diabetes_df.Outcome
X = diabetes_df.drop(columns="Outcome")

#split datasets into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y)

In [4]:
#preprocess numerical data for basic neural network

#create standardscaler instance
scaler = StandardScaler()

#fit scaler
X_scaler = scaler.fit(X_train)

#scale the data
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

In [7]:
#define logistic regression model
log_classifier = LogisticRegression(solver="lbfgs", max_iter=200)

#train the model
log_classifier.fit(X_train,y_train)

#evaluate the model
y_pred = log_classifier.predict(X_test)

Logistic Regression model accuracy: 0.729


In [6]:
print(f"Logistic Regression model accuracy: {accuracy_score(y_test,y_pred):.3f}")

Logistic Regression model accuracy: 0.729


In [11]:
#define basic neural network
nn_model = tf.keras.models.Sequential()
nn_model.add(tf.keras.layers.Dense(units=16, activation="relu", input_dim=8))
nn_model.add(tf.keras.layers.Dense(units=1, activation="sigmoid"))

#convert series to arrays 
y_test = np.array(y_test) 
y_train = np.array(y_train)

#compile the sequential model and customize metrics
nn_model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])

#train the model
fit_model = nn_model.fit(X_train_scaled, y_train, epochs=50)

#evaluate the model
model_loss, model_accuracy = nn_model.evaluate(X_test_scaled, y_test, verbose=2)
print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

Train on 576 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
192/1 - 0s - loss: 0.4898 - accuracy: 0.7240
Loss: 0.48355482518672943, Accuracy: 0.7239583134651184


after comparing both models predictive accuracy their output is very similar. Logistic regression: 73%. Basic neural network: 72%. neither model reached 80% accuracy because the input data was insufficient, not enough data points and too few features. Both models need further optimization, parameters, structure, and weights. Some features likely need to be removed as they could be causing confusion. 