### **Background**

dataset: https://archive.ics.uci.edu/ml/datasets/Heart+Disease

This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to
this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).

The names and social security numbers of the patients were recently removed from the database, replaced with dummy values.

Only 14 attributes used:
1. #3 (age)
2. #4 (sex)
3. #9 (cp):chest pain type

>>-- Value 1: typical angina

>>-- Value 2: atypical angina

>>-- Value 3: non-anginal pain

>>-- Value 4: asymptomatic

4. #10 (trestbps): resting blood pressure (in mm Hg on admission to the hospital)
5. #12 (chol):serum cholestoral in mg/dl
6. #16 (fbs):fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
7. #19 (restecg):resting electrocardiographic results

>>-- Value 0: normal

>>-- Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)

>>-- Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria

8. #32 (thalach): maximum heart rate achieved
9. #38 (exang):exercise induced angina (1 = yes; 0 = no)
10. #40 (oldpeak): ST depression induced by exercise relative to rest
11. #41 (slope):the slope of the peak exercise ST segment

>>-- Value 1: upsloping

>>-- Value 2: flat

>>-- Value 3: downsloping

12. #44 (ca):number of major vessels (0-3) colored by flourosopy
13. #51 (thal): 3 = normal; 6 = fixed defect; 7 = reversable defect
14. #58 (num) (the predicted attribute)

### **Importing dependencies**

In [121]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### **Importing dataset**

In [122]:
dataset=pd.read_csv("heart.csv")
dataset.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [123]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trestbps  303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalach   303 non-null    int64  
 8   exang     303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    int64  
 11  ca        303 non-null    int64  
 12  thal      303 non-null    int64  
 13  target    303 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 33.3 KB


In [124]:
dataset.describe()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


In [125]:
X=dataset.iloc[:,:-1].values
y=dataset.iloc[:,-1].values

### **Splitting dataset**

In [126]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.1, random_state=2)
print(X.shape, X_train.shape, X_test.shape)

(303, 13) (272, 13) (31, 13)


In [127]:
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
#X_train=sc.fit_transform(X_train)

### **Initializing and training model**

In [128]:
from sklearn.linear_model import LogisticRegression
regressor=LogisticRegression()
regressor.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

### **Predicting by the model**

In [129]:
y_pred_train=regressor.predict(X_train)
from sklearn.metrics import accuracy_score
print("Train accuracy; ",accuracy_score(y_train, y_pred_train))

Train accuracy;  0.8382352941176471


In [130]:
y_pred=regressor.predict(X_test)
print("Test accuracy; ",accuracy_score(y_test, y_pred))

Test accuracy;  0.9354838709677419


### **Building a predictive system**

In [131]:
#Predicting a single value 
input_data = (62,0,0,140,268,0,0,160,0,3.6,0,2,2)
input_data=np.array(input_data)
#here input_data is a 1D nparray.but our regressor expect a 2D array.
#So, we have to reshape it from 1D to 2D
#print(input_data.shape)
input_data=input_data.reshape(1,-1)
#print(input_data.shape)
output=regressor.predict(input_data)
print(output)

[0]
