In [None]:
import tensorflow as tf 
from keras import models , layers 
import pandas as pd 
from sklearn.model_selection import train_test_split
import seaborn as sns 
import matplotlib.pyplot as plt 
import numpy as np

# Diabetes Prediction with Deep Learning
This project demonstrates how to build a neural network to predict diabetes using patient health data.The model uses a simple feedforward neural network implemented with Tensorflow/Keras

In [None]:
df=pd.read_csv('diabetes.csv')
x=df.drop('Outcome',axis=1)
y=df['Outcome']

# Data Loading and Exploration

We're using a diabetes dataset that contains various health metrics as features and a binary outcome indicating whether the patient has diabetes (1) or not (0).

*Expected features:* Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, Age

In [None]:
X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.2)

# Data Preparation

We split our data into training and testing sets using an 80-20 split. This allows us to train the model on one portion of the data and evaluate its performance on unseen data.

In [None]:
plt.subplots(figsize=(8,8))
sns.heatmap(df.corr(),annot=True,cmap='coolwarm')

# Data Exploration

Let's examine the distribution of our target variable and check for any data imbalances that might affect our model's performance.

In [None]:
model=models.Sequential([
    layers.Dense(256,activation='relu',input_shape=(X_train.shape[1],)),
    layers.Dropout(0.2),
    layers.Dense(128,activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(64,activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(128,activation='relu'),
    layers.Dense(1,activation='sigmoid')
])

# Model Architecture

We're building a sequential neural network with:
- *Input layer:* Size matches our number of features (8)
- *Hidden layer 1:* 256 neurons with ReLU activation
- *Hidden layer 2:* 128 neurons with ReLU activation  
- *Hidden layer 3:* 64 neurons with ReLU activation 
- *Hidden layer 4:* 128 neurons with ReLU activation   
- *Output layer:* 1 neuron with sigmoid activation (for binary classification)
*Dropout* is a technique that randomly "drops out" (ignores) a percentage of neurons during each training iteration. This prevents the model from becoming too dependent on any single neuron and helps reduce overfitting.

- *Dropout rate of 0.2-0.5* is typically used
- It works like an "ensemble" of multiple networks
- Only active during training, not during prediction
This architecture allows the model to learn complex patterns in the health data.

In [None]:
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

# Model Compilation

We compile the model with:
- *Optimizer:* Adam (adaptive learning rate optimization)
- *Loss function:* Binary crossentropy (appropriate for yes/no classification)
- *Metric:* Accuracy (to track performance during training)

In [None]:

history=model.fit(X_train,y_train,epochs=100,validation_data=(X_test,y_test))

 # Model Training

We train the model for 100 epochs, using the validation set to monitor for overfitting. The training process will show us how the model learns patterns from the data over time.

In [None]:
scores=model.evaluate(X_test,y_test)
print('Training Accuracy:%.2f%%\n'%(scores[1]*100))

# Model Evaluation

Now we evaluate the final model performance on the test set that it has never seen during training. This gives us an unbiased estimate of how well the model would perform in real-world scenarios

# Conclusion

This diabetes prediction model achieved 77% accuracy* on the test set. The model can be further improved by:
- Feature engineering and normalization
- Hyperparameter tuning
- Trying different architectures
- Addressing class imbalance if present

The saved model can now be deployed for making predictions on new patient data.