# Feature Scaling & Normalization with Heart Disease

### How does training change if I don’t scale features at all?

If wee keep it without scaling the features this will feature will dominate the training which will cause the larger gradient, as well the training will come slower.

### Which scaling method (MinMaxScaler vs StandardScaler) works better on this dataset?

We use StandardScaler because the dataset has features with different ranges and some outliers it keeps training stable and fair for all features.

### Do categorical features need to be one-hot encoded, and how does that affect performance?

Yes we have to encode the features, because the neural networks deals with numerical data, and it affects the model performance to make it learn quickly and takes less time.

### How sensitive is the neural network to changes in learning rate when features are scaled vs unscaled?

Neural networks are much more sensitive to learning rate when features are unscaled.
Scaling makes optimization smoother, faster, and far less dependent on fine-tuning the learning rate.

In [50]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

In [51]:
df = pd.read_csv("Data/heart.csv")
df

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1020,59,1,1,140,221,0,1,164,1,0.0,2,0,2,1
1021,60,1,0,125,258,0,0,141,1,2.8,1,1,3,0
1022,47,1,0,110,275,0,0,118,1,1.0,1,1,2,0
1023,50,0,0,110,254,0,0,159,0,0.0,2,0,2,1


In [52]:
X = df.drop("target", axis=1)
y = df["target"]

In [53]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [54]:
# Create scalers
minmax = MinMaxScaler().fit(X_train)
std = StandardScaler().fit(X_train)

# Raw data
X_train_raw = X_train.values
X_test_raw = X_test.values

# MinMax scaled data
X_train_minmax = minmax.transform(X_train)
X_test_minmax = minmax.transform(X_test)

# Standard scaled data
X_train_std = std.transform(X_train)
X_test_std = std.transform(X_test)

In [55]:
X_raw = X_train.copy()
X_minmax = MinMaxScaler().fit_transform(X_train)
X_std = StandardScaler().fit_transform(X_train)

In [60]:
def build_model():
    model = Sequential([
        Dense(16, activation='relu', input_shape=(X_train.shape[1],)),
        Dense(8, activation='relu'),
        Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer=Adam(learning_rate=0.001),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    return model

In [None]:
model_raw = build_model()
history_raw = model_raw.fit(X_raw, y_train, epochs=100, verbose=0)

model_minmax = build_model()
history_minmax = model_minmax.fit(X_minmax, y_train, epochs=100, verbose=0)

model_std = build_model()
history_std = model_std.fit(X_std, y_train, epochs=100, verbose=0)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [58]:
def evaluate(model, X_test, y_test):
    y_pred = (model.predict(X_test) > 0.5).astype("int32")
    return {
        "accuracy": accuracy_score(y_test, y_pred),
        "precision": precision_score(y_test, y_pred),
        "recall": recall_score(y_test, y_pred)
    }

In [None]:
results_raw = evaluate(model_raw, X_test_raw, y_test)
results_minmax = evaluate(model_minmax, X_test_minmax, y_test)
results_std = evaluate(model_std, X_test_std, y_test)

print("\nRaw Data Results:")
print(f"  Accuracy : {results_raw['accuracy']:.4f}")
print(f"  Precision: {results_raw['precision']:.4f}")
print(f"  Recall   : {results_raw['recall']:.4f}")

print("\nMinMax Scaled Results:")
print(f"  Accuracy : {results_minmax['accuracy']:.4f}")
print(f"  Precision: {results_minmax['precision']:.4f}")
print(f"  Recall   : {results_minmax['recall']:.4f}")

print("\nStandard Scaled Results:")
print(f"  Accuracy : {results_std['accuracy']:.4f}")
print(f"  Precision: {results_std['precision']:.4f}")
print(f"  Recall   : {results_std['recall']:.4f}")


[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step 
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step 
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step

Raw Data Results:
  Accuracy : 0.7463
  Precision: 0.7476
  Recall   : 0.7476

MinMax Scaled Results:
  Accuracy : 0.8927
  Precision: 0.8649
  Recall   : 0.9320

Standard Scaled Results:
  Accuracy : 0.9512
  Precision: 0.9429
  Recall   : 0.9612
