**MediPredict: Dual-Disease ML Diagnosis for Heart and Parkinson’s**

MediPredict is a dual-disease prediction system built using machine learning algorithms to assist in the early detection of Heart Disease and Parkinson’s Disease.
This project leverages structured datasets and evaluates the effectiveness of six key classification models:

-->Support Vector Classifier (SVC)

-->K-Nearest Neighbors (KNN)

-->Decision Tree Classifier

-->Random Forest Classifier

-->Naive Bayes

-->Logistic Regression

Each model is trained and tested on publicly available datasets using standardized preprocessing techniques. Performance is measured using accuracy scores, classification reports, and confusion matrices to determine the most suitable algorithm for each condition.

In [13]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

In [4]:
heart_data = pd.read_csv('/content/heart (2).csv')

In [35]:
parkinsons_data = pd.read_csv('/content/parkinsons.csv')

In [12]:
heart_data.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [36]:
parkinsons_data.head()

Unnamed: 0,name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
0,phon_R01_S01_1,119.992,157.302,74.997,0.00784,7e-05,0.0037,0.00554,0.01109,0.04374,0.426,0.02182,0.0313,0.02971,0.06545,0.02211,21.033,1,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
1,phon_R01_S01_2,122.4,148.65,113.819,0.00968,8e-05,0.00465,0.00696,0.01394,0.06134,0.626,0.03134,0.04518,0.04368,0.09403,0.01929,19.085,1,0.458359,0.819521,-4.075192,0.33559,2.486855,0.368674
2,phon_R01_S01_3,116.682,131.111,111.555,0.0105,9e-05,0.00544,0.00781,0.01633,0.05233,0.482,0.02757,0.03858,0.0359,0.0827,0.01309,20.651,1,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
3,phon_R01_S01_4,116.676,137.871,111.366,0.00997,9e-05,0.00502,0.00698,0.01505,0.05492,0.517,0.02924,0.04005,0.03772,0.08771,0.01353,20.644,1,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
4,phon_R01_S01_5,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,0.584,0.0349,0.04825,0.04465,0.1047,0.01767,19.649,1,0.417356,0.823484,-3.747787,0.234513,2.33218,0.410335


In [10]:
heart_data.tail()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0
302,57,0,1,130,236,0,0,174,0,0.0,1,1,2,0


In [37]:
parkinsons_data.tail()

Unnamed: 0,name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
190,phon_R01_S50_2,174.188,230.978,94.261,0.00459,3e-05,0.00263,0.00259,0.0079,0.04087,0.405,0.02336,0.02498,0.02745,0.07008,0.02764,19.517,0,0.448439,0.657899,-6.538586,0.121952,2.657476,0.13305
191,phon_R01_S50_3,209.516,253.017,89.488,0.00564,3e-05,0.00331,0.00292,0.00994,0.02751,0.263,0.01604,0.01657,0.01879,0.04812,0.0181,19.147,0,0.431674,0.683244,-6.195325,0.129303,2.784312,0.168895
192,phon_R01_S50_4,174.688,240.005,74.287,0.0136,8e-05,0.00624,0.00564,0.01873,0.02308,0.256,0.01268,0.01365,0.01667,0.03804,0.10715,17.883,0,0.407567,0.655683,-6.787197,0.158453,2.679772,0.131728
193,phon_R01_S50_5,198.764,396.961,74.904,0.0074,4e-05,0.0037,0.0039,0.01109,0.02296,0.241,0.01265,0.01321,0.01588,0.03794,0.07223,19.02,0,0.451221,0.643956,-6.744577,0.207454,2.138608,0.123306
194,phon_R01_S50_6,214.289,260.277,77.973,0.00567,3e-05,0.00295,0.00317,0.00885,0.01884,0.19,0.01026,0.01161,0.01373,0.03078,0.04398,21.209,0,0.462803,0.664357,-5.724056,0.190667,2.555477,0.148569


In [9]:
heart_data.shape

(303, 14)

In [38]:
parkinsons_data.shape

(195, 24)

In [14]:
heart_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trestbps  303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalach   303 non-null    int64  
 8   exang     303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    int64  
 11  ca        303 non-null    int64  
 12  thal      303 non-null    int64  
 13  target    303 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 33.3 KB


In [39]:
parkinsons_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 24 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              195 non-null    object 
 1   MDVP:Fo(Hz)       195 non-null    float64
 2   MDVP:Fhi(Hz)      195 non-null    float64
 3   MDVP:Flo(Hz)      195 non-null    float64
 4   MDVP:Jitter(%)    195 non-null    float64
 5   MDVP:Jitter(Abs)  195 non-null    float64
 6   MDVP:RAP          195 non-null    float64
 7   MDVP:PPQ          195 non-null    float64
 8   Jitter:DDP        195 non-null    float64
 9   MDVP:Shimmer      195 non-null    float64
 10  MDVP:Shimmer(dB)  195 non-null    float64
 11  Shimmer:APQ3      195 non-null    float64
 12  Shimmer:APQ5      195 non-null    float64
 13  MDVP:APQ          195 non-null    float64
 14  Shimmer:DDA       195 non-null    float64
 15  NHR               195 non-null    float64
 16  HNR               195 non-null    float64
 1

In [15]:
heart_data.isnull().sum()

Unnamed: 0,0
age,0
sex,0
cp,0
trestbps,0
chol,0
fbs,0
restecg,0
thalach,0
exang,0
oldpeak,0


In [40]:
parkinsons_data.isnull().sum()

Unnamed: 0,0
name,0
MDVP:Fo(Hz),0
MDVP:Fhi(Hz),0
MDVP:Flo(Hz),0
MDVP:Jitter(%),0
MDVP:Jitter(Abs),0
MDVP:RAP,0
MDVP:PPQ,0
Jitter:DDP,0
MDVP:Shimmer,0


In [16]:
heart_data.describe()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


In [41]:
parkinsons_data.describe()

Unnamed: 0,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDVP:APQ,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
count,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0
mean,154.228641,197.104918,116.324631,0.00622,4.4e-05,0.003306,0.003446,0.00992,0.029709,0.282251,0.015664,0.017878,0.024081,0.046993,0.024847,21.885974,0.753846,0.498536,0.718099,-5.684397,0.22651,2.381826,0.206552
std,41.390065,91.491548,43.521413,0.004848,3.5e-05,0.002968,0.002759,0.008903,0.018857,0.194877,0.010153,0.012024,0.016947,0.030459,0.040418,4.425764,0.431878,0.103942,0.055336,1.090208,0.083406,0.382799,0.090119
min,88.333,102.145,65.476,0.00168,7e-06,0.00068,0.00092,0.00204,0.00954,0.085,0.00455,0.0057,0.00719,0.01364,0.00065,8.441,0.0,0.25657,0.574282,-7.964984,0.006274,1.423287,0.044539
25%,117.572,134.8625,84.291,0.00346,2e-05,0.00166,0.00186,0.004985,0.016505,0.1485,0.008245,0.00958,0.01308,0.024735,0.005925,19.198,1.0,0.421306,0.674758,-6.450096,0.174351,2.099125,0.137451
50%,148.79,175.829,104.315,0.00494,3e-05,0.0025,0.00269,0.00749,0.02297,0.221,0.01279,0.01347,0.01826,0.03836,0.01166,22.085,1.0,0.495954,0.722254,-5.720868,0.218885,2.361532,0.194052
75%,182.769,224.2055,140.0185,0.007365,6e-05,0.003835,0.003955,0.011505,0.037885,0.35,0.020265,0.02238,0.0294,0.060795,0.02564,25.0755,1.0,0.587562,0.761881,-5.046192,0.279234,2.636456,0.25298
max,260.105,592.03,239.17,0.03316,0.00026,0.02144,0.01958,0.06433,0.11908,1.302,0.05647,0.0794,0.13778,0.16942,0.31482,33.047,1.0,0.685151,0.825288,-2.434031,0.450493,3.671155,0.527367


In [17]:
heart_data['target'].value_counts()

Unnamed: 0_level_0,count
target,Unnamed: 1_level_1
1,165
0,138


In [43]:
parkinsons_data['status'].value_counts()

Unnamed: 0_level_0,count
status,Unnamed: 1_level_1
1,147
0,48


Pre-Information for the data reference

**For Heart Diseases**

1--->*represents unhealthy heart*

0--->*represents healthy heart*

**For Parkinson's disease**

1--->*Parkinson's positive*

0--->*Healthy*

For Heart Disease- X and Y

For Parkinson's Disease- Xa and Yb

In [18]:
X= heart_data.drop(columns='target',axis=1)
Y= heart_data['target']

In [19]:
print(X)

     age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  ca  thal
0     63    1   3       145   233    1        0      150      0      2.3      0   0     1
1     37    1   2       130   250    0        1      187      0      3.5      0   0     2
2     41    0   1       130   204    0        0      172      0      1.4      2   0     2
3     56    1   1       120   236    0        1      178      0      0.8      2   0     2
4     57    0   0       120   354    0        1      163      1      0.6      2   0     2
..   ...  ...  ..       ...   ...  ...      ...      ...    ...      ...    ...  ..   ...
298   57    0   0       140   241    0        1      123      1      0.2      1   0     3
299   45    1   3       110   264    0        1      132      0      1.2      1   0     3
300   68    1   0       144   193    1        1      141      0      3.4      1   2     3
301   57    1   0       130   131    0        1      115      1      1.2      1   1     3
302   57  

In [20]:
print(Y)

0      1
1      1
2      1
3      1
4      1
      ..
298    0
299    0
300    0
301    0
302    0
Name: target, Length: 303, dtype: int64


In [62]:
Xa= parkinsons_data.drop(columns=['name','status'],axis=1)
Yb= parkinsons_data['status']

In [63]:
print(Xa)

     MDVP:Fo(Hz)  MDVP:Fhi(Hz)  MDVP:Flo(Hz)  ...   spread2        D2       PPE
0        119.992       157.302        74.997  ...  0.266482  2.301442  0.284654
1        122.400       148.650       113.819  ...  0.335590  2.486855  0.368674
2        116.682       131.111       111.555  ...  0.311173  2.342259  0.332634
3        116.676       137.871       111.366  ...  0.334147  2.405554  0.368975
4        116.014       141.781       110.655  ...  0.234513  2.332180  0.410335
..           ...           ...           ...  ...       ...       ...       ...
190      174.188       230.978        94.261  ...  0.121952  2.657476  0.133050
191      209.516       253.017        89.488  ...  0.129303  2.784312  0.168895
192      174.688       240.005        74.287  ...  0.158453  2.679772  0.131728
193      198.764       396.961        74.904  ...  0.207454  2.138608  0.123306
194      214.289       260.277        77.973  ...  0.190667  2.555477  0.148569

[195 rows x 22 columns]


In [64]:
print(Yb)

0      1
1      1
2      1
3      1
4      1
      ..
190    0
191    0
192    0
193    0
194    0
Name: status, Length: 195, dtype: int64


In [67]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=2)

In [22]:
print(X.shape, X_train.shape, X_test.shape)

(303, 13) (242, 13) (61, 13)


In [69]:
print(Y.shape, Y_train.shape, Y_test.shape)

(195,) (156,) (39,)


In [65]:
Xa_train, Xa_test, Yb_train, Yb_test = train_test_split(Xa, Yb, test_size=0.2, stratify=Yb, random_state=2)

In [66]:
print(Xa.shape, Xa_train.shape, Xa_test.shape)

(195, 22) (156, 22) (39, 22)


In [68]:
print(Yb.shape, Yb_train.shape, Yb_test.shape)

(195,) (156,) (39,)


In [71]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

In [73]:
scaler.fit(Xa_train)

In [74]:
Xa_train = scaler.transform(Xa_train)
Xa_test = scaler.transform(Xa_test)

In [75]:
print(Xa_train)

[[-3.37789850e-01 -1.86151275e-01 -9.11085922e-01 ...  3.02808525e-01
   3.67380761e-01 -1.01626972e-01]
 [ 1.09942206e+00  2.52399879e-01  7.59431971e-01 ...  9.62684763e-01
   2.30410182e-01  7.25430092e-03]
 [-8.75220075e-01 -5.64868721e-01 -3.69947894e-01 ... -1.24083946e-03
  -1.27562573e+00 -5.03037967e-01]
 ...
 [ 9.67834202e-01  1.38914623e-01 -8.24451036e-01 ...  5.83176337e-01
   5.94403638e-01 -2.56870663e-01]
 [-7.69983726e-01 -6.17537239e-01 -4.08691589e-01 ...  2.01206260e-01
  -9.18334164e-01 -4.43401072e-01]
 [ 1.19847659e+00  4.93351249e-01  1.11785168e+00 ... -1.03979251e-01
   5.12603529e-01 -5.39510027e-01]]


In [76]:
print(Yb_train)

13     1
142    1
174    0
113    1
88     1
      ..
123    1
190    0
109    1
153    1
144    1
Name: status, Length: 156, dtype: int64


**Model Training**

Support Vector Machine Model (SVM)

In [82]:
from sklearn import svm

model = svm.SVC(kernel='linear')

In [83]:
model.fit(Xa_train, Yb_train)

In [110]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

data = pd.read_csv('/content/heart (2).csv')

X = data.drop('target', axis=1)
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

svc_model = SVC(kernel='linear', C=1.0)

svc_model.fit(X_train, y_train)

y_pred = svc_model.predict(X_test)

print("Heart Disease Prediction Accuracy:", accuracy_score(y_test, y_pred))


Heart Disease Prediction Accuracy: 0.8688524590163934


In [111]:
data = pd.read_csv('/content/parkinsons.csv')

X = data.drop(['name', 'status'], axis=1)
y = data['status']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# SVC classifier
svc_model = SVC(kernel='rbf', C=1.0, gamma='scale')

svc_model.fit(X_train, y_train)
y_pred = svc_model.predict(X_test)

# Accuracy
print("Parkinson’s Prediction Accuracy:", accuracy_score(y_test, y_pred))


Parkinson’s Prediction Accuracy: 0.8974358974358975


KNN Model Prediction on both the diseases

In [112]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

data = pd.read_csv('/content/heart (2).csv')

X = data.drop('target', axis=1)
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create KNN model
knn_model = KNeighborsClassifier(n_neighbors=5)

knn_model.fit(X_train, y_train)
y_pred = knn_model.predict(X_test)

# Accuracy
print("Heart Disease Prediction Accuracy (KNN):", accuracy_score(y_test, y_pred))


Heart Disease Prediction Accuracy (KNN): 0.9016393442622951


In [113]:
data = pd.read_csv('/content/parkinsons.csv')

X = data.drop(['name', 'status'], axis=1)
y = data['status']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# SVC classifier
svc_model = SVC(kernel='rbf', C=1.0, gamma='scale')

svc_model.fit(X_train, y_train)
y_pred = svc_model.predict(X_test)

# Accuracy
print("Parkinson’s Prediction Accuracy:", accuracy_score(y_test, y_pred))


Parkinson’s Prediction Accuracy: 0.8974358974358975


Decision Tree Model

In [114]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

data = pd.read_csv('/content/heart (2).csv')

X = data.drop('target', axis=1)
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Decision Tree model
dt_model = DecisionTreeClassifier(criterion='gini', max_depth=4, random_state=42)
dt_model.fit(X_train, y_train)

# Prediction
y_pred = dt_model.predict(X_test)

# Accuracy
print("Heart Disease Prediction Accuracy (Decision Tree):", accuracy_score(y_test, y_pred))


Heart Disease Prediction Accuracy (Decision Tree): 0.8524590163934426


In [115]:
data = pd.read_csv('/content/parkinsons.csv')

X = data.drop(['name', 'status'], axis=1)
y = data['status']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Decision Tree model
dt_model = DecisionTreeClassifier(criterion='entropy', max_depth=4, random_state=42)
dt_model.fit(X_train, y_train)

# Prediction
y_pred = dt_model.predict(X_test)

# Accuracy
print("Parkinson’s Prediction Accuracy (Decision Tree):", accuracy_score(y_test, y_pred))


Parkinson’s Prediction Accuracy (Decision Tree): 0.9487179487179487


Rain Forest Classifer

In [116]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

data = pd.read_csv('/content/heart (2).csv')

X = data.drop('target', axis=1)
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Prediction
y_pred = rf_model.predict(X_test)

# Accuracy
print("Heart Disease Accuracy (Random Forest):", accuracy_score(y_test, y_pred))


Heart Disease Accuracy (Random Forest): 0.8360655737704918


In [117]:
data = pd.read_csv('/content/parkinsons.csv')

X = data.drop(['name', 'status'], axis=1)
y = data['status']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)

# Accuracy
print("Parkinson’s Accuracy (Random Forest):", accuracy_score(y_test, y_pred))


Parkinson’s Accuracy (Random Forest): 0.9487179487179487


Naive Bayes Classifier

In [121]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB

heart = pd.read_csv("/content/heart (2).csv")
X_heart = heart.drop('target', axis=1)
y_heart = heart['target']

X_train_h, X_test_h, y_train_h, y_test_h = train_test_split(X_heart, y_heart, test_size=0.2, random_state=42)
scaler_h = StandardScaler()
X_train_h = scaler_h.fit_transform(X_train_h)
X_test_h = scaler_h.transform(X_test_h)

# Naive Bayes
nb_heart = GaussianNB()
nb_heart.fit(X_train_h, y_train_h)
y_pred_h = nb_heart.predict(X_test_h)

print("Heart Disease Accuracy (Naive Bayes):", accuracy_score(y_test_h, y_pred_h))


Heart Disease Accuracy (Naive Bayes): 0.8688524590163934


In [122]:
parkinsons = pd.read_csv("/content/parkinsons.csv")
X_parkinson = parkinsons.drop(['name', 'status'], axis=1)
y_parkinson = parkinsons['status']

X_train_p, X_test_p, y_train_p, y_test_p = train_test_split(X_parkinson, y_parkinson, test_size=0.2, random_state=42)
scaler_p = StandardScaler()
X_train_p = scaler_p.fit_transform(X_train_p)
X_test_p = scaler_p.transform(X_test_p)

# Naive Bayes
nb_parkinson = GaussianNB()
nb_parkinson.fit(X_train_p, y_train_p)
y_pred_p = nb_parkinson.predict(X_test_p)

print("Parkinson’s Accuracy (Naive Bayes):", accuracy_score(y_test_p, y_pred_p))


Parkinson’s Accuracy (Naive Bayes): 0.717948717948718


Linear Regression Model

In [23]:
model = LogisticRegression()

In [24]:
model.fit(X_train, Y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [128]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


data = pd.read_csv('/content/heart (2).csv')

X = data.drop('target', axis=1)
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Logistic Regression model
log_reg_model = LogisticRegression(random_state=42)
log_reg_model.fit(X_train, y_train)

# Prediction
y_pred = log_reg_model.predict(X_test)

# Accuracy
print("Heart Disease Accuracy (Logistic Regression):", accuracy_score(y_test, y_pred))


Heart Disease Accuracy (Logistic Regression): 0.8524590163934426


In [129]:
data = pd.read_csv('/content/parkinsons.csv')

X = data.drop(['name', 'status'], axis=1)
y = data['status']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Logistic Regression model
log_reg_model = LogisticRegression(random_state=42)
log_reg_model.fit(X_train, y_train)

# Prediction
y_pred = log_reg_model.predict(X_test)

# Accuracy
print("Parkinson’s Accuracy (Logistic Regression):", accuracy_score(y_test, y_pred))


Parkinson’s Accuracy (Logistic Regression): 0.8974358974358975


Accuracy Score for both the models

In [124]:
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)

In [125]:
print('Accuracy score of Trainning data: ', training_data_accuracy)

Accuracy score of Trainning data:  0.7115384615384616


In [126]:
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)

In [127]:
print('Accuracy score of Test data: ', test_data_accuracy)

Accuracy score of Test data:  0.6923076923076923


In [123]:
Xa_train_prediction = model.predict(Xa_train)
training_data_accuracy = accuracy_score(Xa_train_prediction, Yb_train)

In [85]:
print('Accuracy score of Trainning data: ', training_data_accuracy)

Accuracy score of Trainning data:  0.8974358974358975


In [88]:
Xa_test_prediction = model.predict(Xa_test)
test_data_accuracy = accuracy_score(Xa_test_prediction, Yb_test)

In [90]:
print('Accuracy score of Test data: ', test_data_accuracy)

Accuracy score of Test data:  0.8974358974358975


Prediction of the diseases using the data

In [106]:
heart_disease_model = LogisticRegression()  # Create a new model for heart disease
heart_disease_model.fit(X_train, Y_train)    # Train the model on heart disease data

input_data = X.iloc[-1].values

input_data_as_numpy_array = np.asarray(input_data)
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

# Use the heart disease model for prediction
prediction = heart_disease_model.predict(input_data_reshaped)  # Predict using the heart disease model

print(prediction)

if (prediction[0]==0):
  print('The person does not have a heart disease')
else:
  print('The person has a heart disease')

[1]
The person has a heart disease


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [98]:
parkinsons_data_model = LogisticRegression()

parkinsons_data_model.fit(Xa_train, Yb_train)

input_data = parkinsons_data.drop(columns=['name', 'status'], axis=1).iloc[row_index].values

input_data_as_numpy_array = np.asarray(input_data)
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = parkinsons_data_model.predict(input_data_reshaped)

print(prediction)

if (prediction[0]==0):
  print('The person does not have parkinsons')
else:
  print('The person has a parkinsons')

[0]
The person does not have parkinsons
