Let's build a model from scratch to detect heart disease using the dataset you've uploaded. We'll implement a neural network using TensorFlow/Keras and perform hyperparameter tuning with grid search. 

### Step 1: Load and Preprocess the Dataset


In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the dataset
df = pd.read_csv('data/heart.csv')

# Split the data into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)




### Step 2: Create the Model Function
We'll define the model creation function for the neural network, with hyperparameters for neurons and dropout rate.


In [3]:

import tensorflow as tf
from scikeras.wrappers import KerasClassifier

def create_model(neurons=128, dropout_rate=0.2):
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(neurons, activation='relu', input_shape=(X_train.shape[1],)),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(int(neurons/2), activation='relu'),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(int(neurons/4), activation='relu'),
        tf.keras.layers.Dropout(dropout_rate),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model


2024-05-09 21:05:42.237239: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-09 21:05:42.237299: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-09 21:05:42.239348: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-09 21:05:42.256627: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.



### Step 3: Wrap the Model in `KerasClassifier`
We'll wrap the model creation function in the `KerasClassifier` to make it compatible with `GridSearchCV`.


In [4]:

model = KerasClassifier(model=create_model, verbose=0)



### Step 4: Define the Grid Search Parameters
We'll tune the number of neurons and dropout rate.


In [5]:
param_grid = {
    'model__neurons': [64, 128, 256],
    'model__dropout_rate': [0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5],
    'batch_size': [32, 64],
    'epochs': [50, 100]
}



### Step 5: Execute Grid Search
We'll use `GridSearchCV` to search for the best hyperparameters.


In [7]:
from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring='accuracy', cv=3, n_jobs=-1, verbose=1)
grid_result = grid.fit(X_train, y_train)

# Summarize the best results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))


Fitting 3 folds for each of 84 candidates, totalling 252 fits


ValueError: 
All the 252 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
252 fits failed with the following error:
Traceback (most recent call last):
  File "/home/gr00stl/anaconda3/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/gr00stl/anaconda3/lib/python3.11/site-packages/scikeras/wrappers.py", line 1501, in fit
    super().fit(X=X, y=y, sample_weight=sample_weight, **kwargs)
  File "/home/gr00stl/anaconda3/lib/python3.11/site-packages/scikeras/wrappers.py", line 770, in fit
    self._fit(
  File "/home/gr00stl/anaconda3/lib/python3.11/site-packages/scikeras/wrappers.py", line 928, in _fit
    self._ensure_compiled_model()
  File "/home/gr00stl/anaconda3/lib/python3.11/site-packages/scikeras/wrappers.py", line 439, in _ensure_compiled_model
    if not self.model_.compiled:
           ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Sequential' object has no attribute 'compiled'



### Step 6: Evaluate the Best Model
Let's evaluate the best model on the test set and plot the confusion matrix.


In [None]:

from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Get the best estimator
best_model = grid_result.best_estimator_

# Make predictions
y_pred = best_model.predict(X_test)

# Print the classification report
print(classification_report(y_test, y_pred))

# Plot the confusion matrix
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=[0, 1])
disp.plot(cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.show()



This implementation should help you build and evaluate a neural network model from scratch, while performing hyperparameter tuning using grid search.