# **HEART DISEASE CLASSIFICATION**
## *About the dataset*

*This dataset contain patient records and my job is to train a model to detect whether a patient have a heart disease*

### **Features** 

1. Age: age of the patient [years]

2. Sex: sex of the patient [M: Male, F: Female]

3. ChestPainType: chest pain type [TA: Typical Angina, ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic]

4. RestingBP: resting blood pressure [mm Hg]

5. Cholesterol: serum cholesterol [mm/dl]

6. FastingBS: fasting blood sugar [1: if FastingBS > 120 mg/dl, 0: otherwise]

7. RestingECG: resting electrocardiogram results [Normal: Normal, ST: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), LVH: showing probable or definite left ventricular hypertrophy by Estes' criteria]

8. MaxHR: maximum heart rate achieved [Numeric value between 60 and 202]

9. ExerciseAngina: exercise-induced angina [Y: Yes, N: No]

10. Oldpeak: oldpeak = ST [Numeric value measured in depression]

11. ST_Slope: the slope of the peak exercise ST segment [Up: upsloping, Flat: flat, Down: downsloping]

12. HeartDisease: output class [1: heart disease, 0: Normal]

In [1]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import joblib


### **EXPLORATORY DATA ANALYSIS (EDA)**

In [2]:
# Read the data
data = pd.read_csv('Data\heart.csv')
data.head()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0
3,48,F,ASY,138,214,0,Normal,108,Y,1.5,Flat,1
4,54,M,NAP,150,195,0,Normal,122,N,0.0,Up,0


In [3]:
data.describe(include= 'all')

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
count,918.0,918,918,918.0,918.0,918.0,918,918.0,918,918.0,918,918.0
unique,,2,4,,,,3,,2,,3,
top,,M,ASY,,,,Normal,,N,,Flat,
freq,,725,496,,,,552,,547,,460,
mean,53.510893,,,132.396514,198.799564,0.233115,,136.809368,,0.887364,,0.553377
std,9.432617,,,18.514154,109.384145,0.423046,,25.460334,,1.06657,,0.497414
min,28.0,,,0.0,0.0,0.0,,60.0,,-2.6,,0.0
25%,47.0,,,120.0,173.25,0.0,,120.0,,0.0,,0.0
50%,54.0,,,130.0,223.0,0.0,,138.0,,0.6,,1.0
75%,60.0,,,140.0,267.0,0.0,,156.0,,1.5,,1.0


In [18]:
print(f'Data info: {data.info()}\n\n')
print('-------------------------------')
print(f'Null values in data: {data.isna().sum()}')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 918 entries, 0 to 917
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Age             918 non-null    int64  
 1   Sex             918 non-null    object 
 2   ChestPainType   918 non-null    object 
 3   RestingBP       918 non-null    int64  
 4   Cholesterol     918 non-null    int64  
 5   FastingBS       918 non-null    int64  
 6   RestingECG      918 non-null    object 
 7   MaxHR           918 non-null    int64  
 8   ExerciseAngina  918 non-null    object 
 9   Oldpeak         918 non-null    float64
 10  ST_Slope        918 non-null    object 
 11  HeartDisease    918 non-null    int64  
dtypes: float64(1), int64(6), object(5)
memory usage: 86.2+ KB
Data info: None


-------------------------------
Null values in data: Age               0
Sex               0
ChestPainType     0
RestingBP         0
Cholesterol       0
FastingBS         0
RestingECG 

### **DATA PREPROCESSING**

In [3]:
def preprocessing(data):
    # Convert binary features into 0 and 1
    binary_features = ['Sex', 'ExerciseAngina']
    for i in binary_features:
        data[i] = data[i].apply(lambda x: 1 if x in ['M', 'Y'] else 0)
    
    # Encoding features with more than 3 categories
    le = LabelEncoder()
    cat_features = ['ChestPainType', 'RestingECG', 'ST_Slope']
    for i in cat_features:
        data[i] = le.fit_transform(data[i])
    
    return data

data = preprocessing(data)

In [4]:
# Split data and scale it

X = data.drop('HeartDisease', axis= 1)
y = data.HeartDisease

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size= 0.3, random_state= 42)

y_train = np.array(y_train)
y_test = np.array(y_test)

### **MODELLING**

In [28]:
import numpy as np

class SVM:
    def __init__(self, alpha=0.01, regularization=0.001, iter=1000, print_cost=False):
        self.alpha = alpha
        self.lambda_ = regularization
        self.print_cost = print_cost
        self.iter = iter
        self.w = None
        self.b = None

    def fit(self, X, y):
        m, n = X.shape
        y_ = np.where(y <= 0, -1, 1)

        self.w = np.random.rand(n)
        self.b = np.random.rand()

        for _ in range(self.iter):
            for i, x_i in enumerate(X):
                condition = y_[i] * (np.dot(x_i, self.w) - self.b) >= 1
                if condition:
                    dw = 2 * self.lambda_ * self.w
                    db = 0
                else:
                    dw = 2 * self.lambda_ * self.w - np.dot(y_[i], x_i)
                    db = y_[i]

                self.w -= self.alpha * dw
                self.b -= self.alpha * db
            
            if self.print_cost and _ % 50 == 0:
                cost = self.cost(X, y)
                print(f'Cost at {_}: {cost}')

    def predict(self, X):
        y_preds = np.sign(np.dot(X, self.w) - self.b)
        return y_preds

    def score(self, X, y):
        y_preds = self.predict(X)
        return np.mean(y_preds == y)

    def cost(self, X, y):
        y_ = np.where(y <= 0, -1, 1)
        hinge_loss = np.maximum(0, 1 - y_ * (np.dot(X, self.w) - self.b))
        regularization_loss = self.lambda_ * np.sum(self.w ** 2)
        total_loss = np.mean(hinge_loss) + regularization_loss
        return total_loss


### **TRAIN AND EVALUATE THE MODEL**

In [46]:
model = SVM(alpha=0.01, regularization=0.001, iter=3000)
model.fit(X_train, y_train)



In [47]:
print("Training Accuracy:", model.score(X_train, y_train))
print("Test Accuracy:", model.score(X_test, y_test))

Training Accuracy: 0.4672897196261682
Test Accuracy: 0.5036231884057971


In [None]:
joblib.dump(model, 'SVM_model.pkl')

### **USING SVM ON SCI-KIT LEARN**

In [90]:
from sklearn.svm import SVC
n = [i / 100.0 for i in range(1,100)]
for _ in n:
    model = SVC(kernel='rbf', gamma= _, random_state = 42)
    model.fit(X_train, y_train)

    print(f"Training Accuracy at gamma {_}: {model.score(X_train, y_train)}")
    print(f"Test Accuracy at gamma {_}: {model.score(X_test, y_test)}\n\n")

Training Accuracy at gamma 0.01: 0.8582554517133957
Test Accuracy at gamma 0.01: 0.8731884057971014


Training Accuracy at gamma 0.02: 0.8691588785046729
Test Accuracy at gamma 0.02: 0.8840579710144928


Training Accuracy at gamma 0.03: 0.8785046728971962
Test Accuracy at gamma 0.03: 0.8840579710144928


Training Accuracy at gamma 0.04: 0.8785046728971962
Test Accuracy at gamma 0.04: 0.8840579710144928


Training Accuracy at gamma 0.05: 0.883177570093458
Test Accuracy at gamma 0.05: 0.8876811594202898


Training Accuracy at gamma 0.06: 0.8847352024922118
Test Accuracy at gamma 0.06: 0.8913043478260869


Training Accuracy at gamma 0.07: 0.8862928348909658
Test Accuracy at gamma 0.07: 0.8913043478260869


Training Accuracy at gamma 0.08: 0.8878504672897196
Test Accuracy at gamma 0.08: 0.8913043478260869


Training Accuracy at gamma 0.09: 0.8925233644859814
Test Accuracy at gamma 0.09: 0.8913043478260869


Training Accuracy at gamma 0.1: 0.8940809968847352
Test Accuracy at gamma 0.1: 0.88

In [86]:
# Based on what i observed above the model fit best with rbf kernel and gamma = 0.09

model = SVC(kernel='rbf', gamma= 0.09)
model.fit(X_train, y_train)

print(f"Training Accuracy at gamma 0.09: {model.score(X_train, y_train)}")
print(f"Test Accuracy at gamma 0.09: {model.score(X_test, y_test)}\n\n")

Training Accuracy at gamma 0.99: 0.8925233644859814
Test Accuracy at gamma 0.99: 0.8913043478260869




In [109]:
i = 9
print(f'Model predict patient {i+1}: {model.predict(X_test[i].reshape(1, -1))}')
print(f'The ground truth is {y_test[i]}')

Model predict patient 10: [1]
The ground truth is 1
