# Feature Scaling & Normalization with Heart Disease

---
 ## the goal of this task
 ---

##### I want to build a classification model on the Heart Disease dataset (predicting whether a patient has heart disease), so that I can learn how feature scaling and normalization affect neural network training and convergence.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.metrics import classification_report, confusion_matrix
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.datasets import load_iris
from sklearn.datasets import load_breast_cancer
import os

Could not find kaggle.json. Please download your API token from https://www.kaggle.com/settings/account and place it in C:\Users\bbuser\.kaggle\kaggle.json


---
### load data
---

In [7]:
# Load dataset (UCI Heart Disease dataset)
url = "C:\\Users\\bbuser\\Downloads\\heart.csv"
data = pd.read_csv(url)

# Features and target
X = data.drop("target", axis=1)
y = data["target"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

def build_model():
    model = Sequential([
        Dense(16, activation='relu', input_shape=(X_train.shape[1],)),
        Dense(8, activation='relu'),
        Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    return model

# Raw data model
model_raw = build_model()
history_raw = model_raw.fit(X_train, y_train, validation_split=0.2, epochs=50, verbose=0)

# Evaluation
y_pred_raw = (model_raw.predict(X_test) > 0.5).astype(int)
print("Raw Data Performance:")
print(classification_report(y_test, y_pred_raw))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
Raw Data Performance:
              precision    recall  f1-score   support

           0       0.75      0.75      0.75       102
           1       0.75      0.75      0.75       103

    accuracy                           0.75       205
   macro avg       0.75      0.75      0.75       205
weighted avg       0.75      0.75      0.75       205



## Q2: Which scaling method works better: MinMaxScaler vs StandardScaler?

### Answer:
- **MinMaxScaler (Normalization):** Scales all features into [0,1].  
  Works well when input features don’t follow a Gaussian distribution.  
- **StandardScaler (Standardization):** Transforms features to have mean=0 and std=1.  
  Works better when features have different variances and distributions close to normal.  

We’ll compare both:


In [8]:
# MinMax Scaled data
scaler_minmax = MinMaxScaler()
X_train_minmax = scaler_minmax.fit_transform(X_train)
X_test_minmax = scaler_minmax.transform(X_test)

model_minmax = build_model()
history_minmax = model_minmax.fit(X_train_minmax, y_train, validation_split=0.2, epochs=50, verbose=0)

y_pred_minmax = (model_minmax.predict(X_test_minmax) > 0.5).astype(int)
print("MinMaxScaler Performance:")
print(classification_report(y_test, y_pred_minmax))


# Standard Scaled data
scaler_standard = StandardScaler()
X_train_std = scaler_standard.fit_transform(X_train)
X_test_std = scaler_standard.transform(X_test)

model_std = build_model()
history_std = model_std.fit(X_train_std, y_train, validation_split=0.2, epochs=50, verbose=0)

y_pred_std = (model_std.predict(X_test_std) > 0.5).astype(int)
print("StandardScaler Performance:")
print(classification_report(y_test, y_pred_std))


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
MinMaxScaler Performance:
              precision    recall  f1-score   support

           0       0.86      0.72      0.78       102
           1       0.76      0.88      0.82       103

    accuracy                           0.80       205
   macro avg       0.81      0.80      0.80       205
weighted avg       0.81      0.80      0.80       205



  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step
StandardScaler Performance:
              precision    recall  f1-score   support

           0       0.87      0.78      0.82       102
           1       0.81      0.88      0.84       103

    accuracy                           0.83       205
   macro avg       0.84      0.83      0.83       205
weighted avg       0.84      0.83      0.83       205



## Q3: Do categorical features need to be one-hot encoded, and how does that affect performance?

### Answer:
Yes. Neural networks expect numeric inputs. If categorical variables are present, they must be **one-hot encoded**.  
In the Heart Disease dataset, features like `sex`, `cp`, `thal`, and `slope` are categorical. Encoding prevents the model from assuming an ordinal relationship between categories.

We’ll preprocess categorical variables with **OneHotEncoder** and compare performance.


In [13]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

categorical = ['sex','cp','fbs','restecg','exang','slope','thal']
numeric = [col for col in X.columns if col not in categorical]

# Column transformer: OneHotEncode categorical, scale numeric
preprocessor = ColumnTransformer([
    ('num', StandardScaler(), numeric),
    ('cat', OneHotEncoder(drop='first'), categorical)])

X_train_enc = preprocessor.fit_transform(X_train)
X_test_enc = preprocessor.transform(X_test)

# Build a model with input shape matching the encoded features
def build_model_enc():
    model = Sequential([
        Dense(16, activation='relu', input_shape=(X_train_enc.shape[1],)),
        Dense(8, activation='relu'),
        Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    return model

model_enc = build_model_enc()
history_enc = model_enc.fit(X_train_enc, y_train, validation_split=0.2, epochs=50, verbose=0)

y_pred_enc = (model_enc.predict(X_test_enc) > 0.5).astype(int)
print("With One-Hot Encoding Performance:")
print(classification_report(y_test, y_pred_enc))


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
With One-Hot Encoding Performance:
              precision    recall  f1-score   support

           0       0.93      0.77      0.84       102
           1       0.81      0.94      0.87       103

    accuracy                           0.86       205
   macro avg       0.87      0.86      0.86       205
weighted avg       0.87      0.86      0.86       205



## Q4: How sensitive is the neural network to changes in learning rate when features are scaled vs unscaled?

### Answer:
- On **unscaled data**, large feature values cause unstable gradients → learning rate must be very small (0.0001) to avoid divergence.  
- On **scaled data**, learning rate can be larger (0.001–0.01), leading to faster convergence.  

We’ll compare training with different learning rates:


In [14]:
def train_with_lr(lr, scaled=True):
    if scaled:
        X_tr, X_te = X_train_std, X_test_std
    else:
        X_tr, X_te = X_train, X_test

    model = Sequential([
        Dense(16, activation='relu', input_shape=(X_tr.shape[1],)),
        Dense(8, activation='relu'),
        Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    history = model.fit(X_tr, y_train, validation_split=0.2, epochs=50, verbose=0)
    results = model.evaluate(X_te, y_test, verbose=0)
    return results

for lr in [0.0001, 0.001, 0.01]:
    res_unscaled = train_with_lr(lr, scaled=False)
    res_scaled = train_with_lr(lr, scaled=True)
    print(f"LR={lr} | Unscaled Acc={res_unscaled[1]:.3f} | Scaled Acc={res_scaled[1]:.3f}")


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


LR=0.0001 | Unscaled Acc=0.551 | Scaled Acc=0.732


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


LR=0.001 | Unscaled Acc=0.771 | Scaled Acc=0.800


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


LR=0.01 | Unscaled Acc=0.776 | Scaled Acc=0.951


#  Final Observations

1. **Raw (Unscaled Data):**
   - Training converges very slowly.
   - Accuracy is lower compared to scaled versions.

2. **MinMaxScaler vs StandardScaler:**
   - Both improve convergence and accuracy.
   - StandardScaler usually works better for this dataset (because distributions are closer to normal).

3. **One-Hot Encoding:**
   - Boosts performance since categorical variables are properly represented.
   - Prevents false ordinal relationships.

4. **Learning Rate Sensitivity:**
   - Unscaled data requires very small learning rates to avoid divergence.
   - Scaled data allows faster convergence with higher learning rates.

 **Conclusion:** Scaling and encoding significantly improve neural network training and predictive performance.
