<a href="https://www.kaggle.com/code/khurshiduktamov/portfolio-diabet-diagnosis-v1?scriptVersionId=157974249" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## CRSIP-DM Methodology
<img src="https://cdn.sketchbubble.com/pub/media/catalog/product/optimized1/d/0/d063643deb841c9084106b83a7db3810cbe35aaa29adc1bbcd818b1d396e2f3f/crisp-dm-mc-slide1.png" alt="CRISP-DM" width="800"/>

1. **Business Understanding:**
   - Define the problem you are trying to solve from a business perspective.
   - Determine the goals and objectives of the project.

2. **Data Understanding:**
   - Collect initial data and explore its characteristics.
   - Identify data quality issues and gain insights into the structure and content of the data.

3. **Data Preparation:**
   - Cleanse and preprocess the data to address issues identified in the data understanding phase.
   - Select relevant features and create new features if necessary.

4. **Modeling:**
   - Choose appropriate modeling techniques based on the nature of the problem and data.
   - Build and train machine learning models on the prepared data.

5. **Evaluation:**
   - Evaluate the performance of the models against business objectives.
   - Fine-tune models and iterate on the process if needed.

6. **Deployment:**
   - Deploy the chosen model into a production environment.
   - Implement monitoring and maintenance procedures.


# Business Understanding
The business goal for this task is to develop a predictive model that can accurately predict whether a patient has diabetes based on diagnostic measurements. The model aims to assist healthcare professionals in the early identification of individuals at risk of diabetes, allowing for timely intervention, monitoring, and treatment. The ultimate objective is to improve the health outcomes of patients by providing an effective tool for diabetes risk assessment.

Key components of the business goal include:

1. **Early Detection:** The model should be able to identify individuals at risk of diabetes at an early stage, enabling proactive healthcare interventions.

2. **Risk Assessment:** Provide a quantitative measure of the likelihood of diabetes based on diagnostic measurements, supporting healthcare professionals in making informed decisions.

In summary, the business goal is centered around leveraging predictive modeling to enhance diabetes risk assessment and management, contributing to better patient care and public health outcomes.

**Goals:**

1. **Develop Predictive Model:**
   - *Objective:* Create a machine learning model for diabetes prediction. We use Classification algorithms as we have to classify as Yes/No.

2. **Enhance Early Detection:**
   - *Objective:* Improve early identification of diabetes risk. To determine if the model enables early detection, we can analyze the confusion matrix, specifically focusing on the False Negatives (FN) and True Positives (TP). In the context of diabetes prediction:

    False Negatives (FN): These are cases where the actual outcome is positive (indicating diabetes), but the model predicts negative. In the context of early detection, these are individuals who have diabetes but were not identified by the model.

    True Positives (TP): These are cases where the actual outcome is positive, and the model correctly predicts positive. In the context of early detection, these are individuals who have diabetes and were correctly identified by the model. 
    For early detection, we want to minimize False Negatives and maximize True Positives. If our model has a low number of False Negatives and a high number of True Positives, it suggests that the model is effective in identifying individuals at risk of diabetes, contributing to early detection.

**Objectives:**

1. **Data Preparation:**
   - *Objective:* Gather and preprocess relevant dataset.

2. **EDA and Feature Selection:**
   - *Objective:* Gain insights, select features, and potentially engineer new ones.

3. **Model Development and Evaluation:**
   - *Objective:* Implement, train, and evaluate machine learning models.

4. **Hyperparameter Tuning:**
   - *Objective:* Fine-tune model hyperparameters for optimal performance.

5. **Interpretability and Deployment:**
   - *Objective:* Ensure model interpretability and deploy for accessibility.

6. **Documentation and Reporting:**
   - *Objective:* Document the project and generate a comprehensive report.

7. **Continuous Monitoring and Improvement:**
   - *Objective:* Monitor real-world performance and consider updates as needed.

These concise goals and objectives aim to develop a practical and impactful predictive model for diabetes, supporting early detection and informed healthcare decisions.

# **Data understanding**

In [None]:
df = pd.read_csv("/kaggle/input/healthcare-diabetes/Healthcare-Diabetes.csv")
df

**Uzbek**: Yuqoridagi ma'lumotlar to'plami Hindistonning Qandli diabet va buyrak kasalliklari milliy institutidan olingan. Maqsad diagnostik o'lchovlar asosida bemorda diabet bor-yo'qligini taxmin qilishdir.
Tarkib

Dataset ichida barcha bemorlar kamida 21 yoshli ayollari.
Ustunlar

    Pregnancies: homilador bo'lish soni
    Glucose: glyukozaga test natijasi
    BloodPressure: diastolik qon bosimi (mm Hg)
    SkinThickness: Triceps teri burmasining qalinligi (mm)
    Insulin: 2 soatlik sarum insulini (mu U/ml)
    BMI: Tana massasi indeksi (vazn kg / (m bo'yi) ^ 2)
    DiabetesPedigreeFunction: diabetning naslchilik funktsiyasi
    Age: Yosh (yil)
    Outcome: Class (0 - diabet yo'q, 1 - diabet)

**English**: The medical dataset was obtained from the National Institute of Diabetes and Cancer, India. The goal is to predict whether a patient has diabetes based on diagnostic measurements.
Content

All patients in the dataset are women at least 21 years old.
Columns

      Pregnant women: number of pregnancies
      Glucose: Glucose test result
      Blood pressure: diastolic blood level (mm Hg)
      SkinThickness: Triceps Skinfold Surface (mm)
      Insulin: 2-hour serum insulin (mu U/ml)
      BMI: Body Mass Index (weight in kg / (height in m) ^ 2)
      DiabetesPedigreeFunction: Diabetes Pedigree Support
      Age: Age (years)
      Result: Class (0 - no diabetes, 1 - diabetes)

**Exploratory Data Analysis and Features selection**

In [None]:
df.info()

In [None]:
df.isnull().sum()

* df is pretty good without any object columns nor NaN values

In [None]:
df.describe()

*   We need to standartize the inputs for models

**Zero values**
*   Glucose: [You may feel very confused, pass out, or have a seizure. Without prompt treatment, severe hypoglycemia may lead to a coma or even death. ](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwi40_fgmceDAxUlQfEDHVDGDQsQFnoECBAQAw&url=https%3A%2F%2Fwww.endocrine.org%2Fpatient-engagement%2Fendocrine-library%2Fsevere-hypoglycemia&usg=AOvVaw2nfoi5YZpLztMF0aqtwqrn&opi=89978449) - ***so we need to remove rows with 0 values in Glucose column if we have to use this column as a feature***
*   Diastolic Blood Pressure: [Extremely low or zero DBP is a possibility in cases of severe hypotension, stiff arteries in elderly, diabetes, arteriovenous malformation, and aortic dissection.](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwj8xZXJnceDAxUHExAIHbhWDBEQFnoECBEQAw&url=https%3A%2F%2Fjournals.lww.com%2Fiaaf%2Ffulltext%2F2016%2F17010%2F_zero__diastolic_blood_pressure.10.aspx&usg=AOvVaw1gs73dVbaDHf83zN0GXBpR&opi=89978449) - ***No action required***
*   SkinThickness: It's possible for skin thickness measurements, such as "Triceps Skinfold Surface (mm)," to have a value of 0 in certain circumstances. However, a measurement of 0 may not be physiologically meaningful in many cases, and it could indicate missing or incorrect data rather than a valid measurement. - ChatGPT3.5 ***No action required - we can use BMI instead(Depending on correlation of course) or delete 0 values***
*   Insulin: In a physiological context, it's less likely for the insulin concentration in a 2-hour serum sample to be exactly 0. However, in a dataset, a recorded value of 0 might have several interpretations:     
    1. Missing Data or Measurement Limitation: A 0 value might indicate missing data or that the measurement fell below the detection limit of the assay used to measure insulin. Some laboratory assays may not accurately measure very low concentrations.

    2. Data Entry Error: It could also be a data entry error or an encoding convention for missing or undetectable values. - ChatGPT3.5
    ***No action required. Decide before using this column as a feature***

* BMI: In the context of Body Mass Index (BMI), a value of 0 is not physiologically meaningful and is likely indicative of missing or incorrect data rather than a valid measurement. - ChatGPT3.5 ***We have to remove rows with 0 value if we are using this column as a feature***




In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Assuming 'df' is your DataFrame
sns.set(style="darkgrid")

# Create a countplot for the 'Outcome' variable
plt.figure(figsize=(7, 5))
sns.countplot(x='Outcome', data=df)

# Add labels and title
plt.xlabel('Outcome (Diabetes: No/Yes)')
plt.ylabel('Count')
plt.title('Distribution of Diabetes Outcome')

# Show the plot
plt.show()

In [None]:
df.Outcome.value_counts()


* We will try without equalizing values first. Otherwise we need to equalize the results so that the model is not skewed to one side.

In [None]:
df.corr()

**Easier with visualisation. Let's use HEATMAP**

In [None]:
# Calculate the correlation matrix
correlation_matrix = df.corr()

# Create a heatmap using seaborn
plt.figure(figsize=(12, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".4f", linewidths=.5)

# Show the plot
plt.show()

In [None]:
# Calculate the correlation with 'Outcome'
correlation_with_outcome = df.corrwith(df['Outcome']).sort_values(ascending=False)

# Create a heatmap using seaborn for correlation with 'Outcome'
plt.figure(figsize=(8, 6))
sns.heatmap(pd.DataFrame(correlation_with_outcome, columns=['correlation']), annot=True, cmap='coolwarm', fmt=".4f", linewidths=.5)

# Show the plot
plt.title('Correlation with Outcome')
plt.show()

# **Data preparation**

In [None]:
# saving the features to another df_model
df_model = df[['Pregnancies', 'Age', 'BMI', 'Glucose', 'Outcome', 'DiabetesPedigreeFunction', 'Insulin', 'SkinThickness', 'BloodPressure']].copy()
print(df_model.shape)

# Deleting 0 values from BMI and Glucose
df_model = df_model[df_model['Glucose'] !=0 ]
df_model = df_model[df_model['BMI'] !=0 ]
df_model = df_model[df_model['Insulin'] !=0 ]
df_model = df_model[df_model['BloodPressure'] !=0 ]
df_model = df_model[df_model['SkinThickness'] !=0 ]
print(df_model.shape)

In [None]:
df_model.Outcome.value_counts()

In [None]:
960-467

* Let's try without equalizing Outcome values first

In [None]:
# # Equalizing Outcome column so model didn't skew to one side.

# # Assuming 'Outcome' is the column representing diabetes status
# # Replace 'df' with your actual DataFrame name
# df0 = df_model[df_model['Outcome'] == 0]

# # Number of rows to drop randomly
# rows_to_drop = 493

# # Randomly drop rows
# random_rows = df0.sample(n=rows_to_drop, random_state=42)  # You can change the random_state for reproducibility
# df_dropped = df0.drop(random_rows.index)

# # Remaining rows in the DataFrame (Outcome 1)
# df_remaining = df_model[df_model['Outcome'] == 1]

# # Concatenate the randomly dropped rows and the remaining rows
# df_merged = pd.concat([df_dropped, df_remaining], ignore_index=True)

# # Replace 'df_merged' with your actual merged DataFrame name
# df_merged = df_merged.sample(frac=1.0, random_state=42)  # You can change the random_state for reproducibility

# print(df_merged.shape)
# print(df_merged.Outcome.value_counts())

In [None]:
from sklearn.model_selection import train_test_split

# # Split data into features and target
# x = df_merged.drop('Outcome', axis=1)
# y = df_merged['Outcome']


# Split data into features and target
x = df_model.drop('Outcome', axis=1)
y = df_model['Outcome']

# Split data into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [None]:
print(x_train.shape)
print(x_test.shape)

**Standardization**

In [None]:
from sklearn.preprocessing import MinMaxScaler
min_mix_scaler = MinMaxScaler()

x_train_scaled = min_mix_scaler.fit_transform(x_train)
x_test_scaled = min_mix_scaler.fit_transform(x_test)

# **Modeling & Evaluation**

# Logistic regression

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Step 2: Model Training
# Create a Logistic Regression model
logreg_model = LogisticRegression(random_state=42)

# Train the model on the scaled training data
logreg_model.fit(x_train_scaled, y_train)

# Step 3: Model Evaluation
# Make predictions on the testing set
y_pred = logreg_model.predict(x_test_scaled)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)

# Display results
print(f'Accuracy: {accuracy:.2f}')
print('\nConfusion Matrix:')
print(conf_matrix)
print('\nClassification Report:')
print(classification_rep)


**Check if model is not skewed to one side**

In [None]:
from sklearn.metrics import roc_curve, precision_recall_curve, auc

# Calculate ROC curve
fpr, tpr, _ = roc_curve(y_test, y_pred_prob)
roc_auc = auc(fpr, tpr)

# Calculate Precision-Recall curve
precision, recall, _ = precision_recall_curve(y_test, y_pred_prob)

# Plot ROC curve and Precision-Recall curve
plt.figure(figsize=(12, 6))

# ROC curve
plt.subplot(1, 2, 1)
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')

# Precision-Recall curve
plt.subplot(1, 2, 2)
plt.plot(recall, precision, color='green', lw=2, label='Precision-Recall curve')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend(loc='upper right')

plt.tight_layout()
plt.show()


# Decision Tree

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Step 1: Model Training
# Create a Decision Tree model
decision_tree_model = DecisionTreeClassifier(random_state=42)

# Train the model on the scaled training data
decision_tree_model.fit(x_train_scaled, y_train)

# Step 2: Model Evaluation
# Make predictions on the testing set
y_pred_tree = decision_tree_model.predict(x_test_scaled)

# Evaluate the Decision Tree model
accuracy_tree = accuracy_score(y_test, y_pred_tree)
conf_matrix_tree = confusion_matrix(y_test, y_pred_tree)
classification_rep_tree = classification_report(y_test, y_pred_tree)

# Display results for Decision Tree
print(f'Decision Tree Accuracy: {accuracy_tree:.2f}')
print('\nDecision Tree Confusion Matrix:')
print(conf_matrix_tree)
print('\nDecision Tree Classification Report:')
print(classification_rep_tree)


**Check if model is not skewed to one side**

In [None]:
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt

# Get predicted probabilities for class 1
y_probs_tree = decision_tree_model.predict_proba(x_test_scaled)[:, 1]

# Compute ROC curve and AUC
fpr_tree, tpr_tree, _ = roc_curve(y_test, y_probs_tree)
roc_auc_tree = auc(fpr_tree, tpr_tree)

# Plot ROC curve for Decision Tree
plt.figure(figsize=(8, 6))
plt.plot(fpr_tree, tpr_tree, color='darkgreen', lw=2, label=f'Decision Tree ROC curve (AUC = {roc_auc_tree:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random Guess')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Decision Tree ROC Curve')
plt.legend(loc='lower right')
plt.show()


# Random Forest (Highest accuracy)

In [None]:
from sklearn.ensemble import RandomForestClassifier

# Create a Random Forest model
random_forest_model = RandomForestClassifier(random_state=42)

# Train the model on the scaled training data
random_forest_model.fit(x_train_scaled, y_train)

# Make predictions on the testing set
y_pred_rf = random_forest_model.predict(x_test_scaled)

# Evaluate the Random Forest model
accuracy_rf = accuracy_score(y_test, y_pred_rf)
conf_matrix_rf = confusion_matrix(y_test, y_pred_rf)
classification_rep_rf = classification_report(y_test, y_pred_rf)

# Display results for Random Forest
print(f'Random Forest Accuracy: {accuracy_rf:.2f}')
print('\nRandom Forest Confusion Matrix:')
print(conf_matrix_rf)
print('\nRandom Forest Classification Report:')
print(classification_rep_rf)


In [None]:
from sklearn.metrics import confusion_matrix

# Plot the confusion matrix using a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix_rf, annot=True, fmt='d', cmap='Blues', xticklabels=['No Diabetes', 'Diabetes'], yticklabels=['No Diabetes', 'Diabetes'])
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()


In [None]:
print(pd.Series(y_pred_rf).value_counts())

**Checking if model is not skewed to one side. ROC and Precision-Recall Curve**

In [None]:
from sklearn.metrics import precision_recall_curve

# Compute Precision-Recall curve
precision_rf, recall_rf, _ = precision_recall_curve(y_test, y_probs_rf)

# Plot ROC curve and Precision-Recall curve side by side
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))

# Plot ROC curve for Random Forest
axes[0].plot(fpr, tpr, color='darkorange', lw=2, label=f'Random Forest ROC curve (AUC = {roc_auc:.2f})')
axes[0].plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random Guess')
axes[0].set_xlabel('False Positive Rate')
axes[0].set_ylabel('True Positive Rate')
axes[0].set_title('Random Forest ROC Curve')
axes[0].legend(loc='lower right')

# Plot Precision-Recall curve for Random Forest
axes[1].plot(recall_rf, precision_rf, color='darkorange', lw=2, label='Random Forest')
axes[1].set_xlabel('Recall')
axes[1].set_ylabel('Precision')
axes[1].set_title('Random Forest Precision-Recall Curve')
axes[1].legend(loc='upper right')

plt.show()


* In an ideal scenario, the ROC curve should hug the top-left corner, and the AUC (Area Under the Curve) value should be close to 1. This indicates a good balance between true positive rate and false positive rate. So, the model is well balanced.
* For a balanced model, the Precision-Recall curve should show high precision and recall across different thresholds. If the curves look good, it provides further evidence that our model is not skewed.

# Support Vector Machine SVM

In [None]:
from sklearn.svm import SVC

# Create an SVM model
svm_model = SVC(random_state=42)

# Train the model on the scaled training data
svm_model.fit(x_train_scaled, y_train)

# Make predictions on the testing set
y_pred_svm = svm_model.predict(x_test_scaled)

# Evaluate the SVM model
accuracy_svm = accuracy_score(y_test, y_pred_svm)
conf_matrix_svm = confusion_matrix(y_test, y_pred_svm)
classification_rep_svm = classification_report(y_test, y_pred_svm)

# Display results for SVM
print(f'SVM Accuracy: {accuracy_svm:.2f}')
print('\nSVM Confusion Matrix:')
print(conf_matrix_svm)
print('\nSVM Classification Report:')
print(classification_rep_svm)


# Cross Validation with Gradient Boost

In [None]:
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier

# Create a Gradient Boosting model
gbm_model = GradientBoostingClassifier(random_state=42)

# Perform cross-validation
cv_scores_gbm = cross_val_score(gbm_model, x_train_scaled, y_train, cv=5, scoring='accuracy')

# Display cross-validation results for Gradient Boosting
print(f'Cross-Validation Scores for Gradient Boosting: {cv_scores_gbm}')
print(f'Mean Accuracy: {cv_scores_gbm.mean():.2f}')


# XGB (Second highest accuracy)

In [None]:
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Create an XGBoost model
xgb_model = XGBClassifier(random_state=42)

# Train the model on the scaled training data
xgb_model.fit(x_train_scaled, y_train)

# Make predictions on the testing set
y_pred_xgb = xgb_model.predict(x_test_scaled)

# Evaluate the XGBoost model
accuracy_xgb = accuracy_score(y_test, y_pred_xgb)
conf_matrix_xgb = confusion_matrix(y_test, y_pred_xgb)
classification_rep_xgb = classification_report(y_test, y_pred_xgb)

# Display results for XGBoost
print(f'XGBoost Accuracy: {accuracy_xgb:.2f}')
print('\nXGBoost Confusion Matrix:')
print(conf_matrix_xgb)
print('\nXGBoost Classification Report:')
print(classification_rep_xgb)


**Checking if model is not skewed to one side**

In [None]:
from sklearn.metrics import roc_curve, auc, precision_recall_curve

# Get predicted probabilities for class 1
y_probs_xgb = xgb_model.predict_proba(x_test_scaled)[:, 1]

# Compute ROC curve and AUC for XGBoost
fpr_xgb, tpr_xgb, _ = roc_curve(y_test, y_probs_xgb)
roc_auc_xgb = auc(fpr_xgb, tpr_xgb)

# Compute Precision-Recall curve
precision_xgb, recall_xgb, _ = precision_recall_curve(y_test, y_probs_xgb)

# Plot ROC curve and Precision-Recall curve side by side
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))

# Plot ROC curve for XGBoost
axes[0].plot(fpr_xgb, tpr_xgb, color='darkorange', lw=2, label=f'XGBoost ROC curve (AUC = {roc_auc_xgb:.2f})')
axes[0].plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random Guess')
axes[0].set_xlabel('False Positive Rate')
axes[0].set_ylabel('True Positive Rate')
axes[0].set_title('XGBoost ROC Curve')
axes[0].legend(loc='lower right')

# Plot Precision-Recall curve for XGBoost
axes[1].plot(recall_xgb, precision_xgb, color='darkorange', lw=2, label='XGBoost')
axes[1].set_xlabel('Recall')
axes[1].set_ylabel('Precision')
axes[1].set_title('XGBoost Precision-Recall Curve')
axes[1].legend(loc='upper right')

plt.show()


# Keras

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Create a Sequential model
model = Sequential()

# Add layers to the model
model.add(Dense(64, input_dim=x_train_scaled.shape[1], activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))  # Output layer for binary classification

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Display the model summary
model.summary()

# Train the model
history = model.fit(x_train_scaled, y_train, epochs=20, batch_size=32, validation_split=0.2)

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(x_test_scaled, y_test)
print(f'Test Accuracy: {test_acc:.4f}')
