# **Parkinsons Disease Prediction using Machine Learning**

The dataset used in this project is the Parkinson's Disease dataset:
www.kaggle.com/datasets/thecansin/parkinsons-data-set

Dataset Citation:
'Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection',
Little MA, McSharry PE, Roberts SJ, Costello DAE, Moroz IM.
BioMedical Engineering OnLine 2007, 6:23 (26 June 2007)

## Dataset Features

1. **name** – Subject name and recording number (ASCII string identifier).
2. **MDVP\:Fo(Hz)** – Average vocal fundamental frequency (pitch).
3. **MDVP\:Fhi(Hz)** – Maximum vocal fundamental frequency.
4. **MDVP\:Flo(Hz)** – Minimum vocal fundamental frequency.
5. **MDVP\:Jitter(%)** – Percentage of cycle-to-cycle variation in pitch.
6. **MDVP\:Jitter(Abs)** – Absolute Jitter.
7. **MDVP\:RAP** – Relative average perturbation.
8. **MDVP\:PPQ** – Five-point period perturbation quotient.
9. **Jitter\:DDP** – Difference of differences of periods (3 × RAP).
10. **MDVP\:Shimmer** – Local shimmer in amplitude.
11. **MDVP\:Shimmer(dB)** – Shimmer in decibels.
12. **Shimmer\:APQ3** – Three-point amplitude perturbation quotient.
13. **Shimmer\:APQ5** – Five-point amplitude perturbation quotient.
14. **MDVP\:APQ** – Eleven-point amplitude perturbation quotient.
15. **Shimmer\:DDA** – Average absolute difference of differences of consecutive amplitudes.
16. **NHR** – Noise-to-Harmonics Ratio.
17. **HNR** – Harmonics-to-Noise Ratio.
18. **RPDE** – Recurrence Period Density Entropy (nonlinear measure of signal periodicity).
19. **D2** – Correlation dimension (signal complexity).
20. **DFA** – Detrended Fluctuation Analysis (signal fractal scaling exponent).
21. **spread1** – First measure of nonlinear spread of fundamental frequency.
22. **spread2** – Second measure of nonlinear spread of fundamental frequency.
23. **PPE** – Pitch Period Entropy (irregularity in pitch).
24. **status** – Health status of subject:
 * `1` = Parkinson’s disease**
*  `0` = Healthy**

# 1. Data Collection

## 1.1. Import Libraries

In [1]:
import numpy as np     # linear algebra
import pandas as pd    # data manipulation
import seaborn as sns   # data visualization
import pickle  # library for saving and loading the model
import warnings
warnings.filterwarnings('ignore')

## 1.2. Read the dataset

In [None]:
# load the dataset into a pandas dataframe
df = pd.read_csv('parkinsons.csv')

## 1.3. Understand the data

In [None]:
# display first 5 rows
df.head()

In [None]:
# check the number of rows and columns
df.shape

In [None]:
df.info()

In [None]:
df.columns.tolist()

In [None]:
df = df.rename(columns=lambda x: x.replace(":", "_").replace("(", "").replace(")", "").replace("%", "perc"))

print("Yeni kolonlar:", df.columns.tolist())

In [None]:
df = df.drop(columns='name', axis=1)

In [None]:
df.info()

In [None]:
# get the statistical measures of the data
df.describe()

In [None]:
# display what unique value in dependent columns
df['status'].unique()

#### 0 --> Healthy

#### 1 --> Parkinson's Positive

In [None]:
# display how many different values are in each unique value
df['status'].value_counts()

In [None]:
# see mean unique values for each columns
df.groupby('status').mean()

In [None]:
df.corr().status.sort_values()

In [None]:
# separating dependent and independent data
X = df.drop(columns = 'status', axis=1)
y = df['status']

In [None]:
X.columns.tolist()

# 2. Modelling

## 2.1. Splitting to Train and Test Data

test_size=0.2, 20% of dataset goes to test set, 80% to training

random_state=2      # ensures reproducibility, controls randomness

If you set it (e.g., 2), you’ll always get the same split each time you run the code. This is good for reproducibility.

If you leave it as default (None), the split will change every time you run the code (since it uses system time as the seed).

In [None]:
# library for splitting data
from sklearn.model_selection import train_test_split

# split data 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)

In [None]:
print(X.shape, X_train.shape, X_test.shape)

## 2.2. Standardization

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

scaler.fit(X_train)

standardized_data = scaler.transform(X_train)  #scaler.fit_transform()

X_test = scaler.transform(X_test)

In [None]:
pickle.dump(scaler, open("scaler_parkinson.sav", "wb"))

pickle.load(open('scaler_parkinson.sav', 'rb'))

In [None]:
print(standardized_data)

## 2.3. Model Training

### Support Vector Machine

In [None]:
from sklearn import svm
# set up the model
model_svc = svm.SVC(kernel= 'linear')

In [None]:
# train the Support Vector Machine Classifier
model_svc.fit(X_train, y_train)

## 2.4. Model Evaluation

In [None]:
# library to evaluate model
from sklearn.metrics import accuracy_score

# accuracy on training data
X_train_prediction_svc = model_svc.predict(X_train)
training_data_accuracy_svc = accuracy_score(X_train_prediction_svc, y_train)

print('Accuracy on training data for SVC: ', training_data_accuracy_svc)

# accuracy on test data
X_test_prediction_svc = model_svc.predict(X_test)
test_data_accuracy_svc = accuracy_score(X_test_prediction_svc, y_test)

print('Accuracy on test data for SVC: ' , test_data_accuracy_svc)

# 3. Predictive System

In [None]:
input_data = (113.71500,116.44300,96.91300,0.00349,0.00003,0.00171,0.00203,0.00514,0.01472,0.13300,0.00748,0.00905,0.01148,0.02245,0.00478,26.54700,0.380253,0.766700,-5.943501,0.192150,1.852542,0.179677)

# change the input data to a numpy array
input_data_as_numpy_array = np.asarray(input_data)

# reshape the np array as we are predicting for one instance 
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

# standardize the input data
std_data = scaler.transform(input_data_reshaped)

# get prediction
prediction = model_svc.predict(std_data)
print(prediction)

In [None]:
if prediction[0]== 0:
    print("The person does not have Parkinsons Disease.")
else:
    print("The person has Parkinsons Disease")

# 4. Model Deploying

## 4.1. Save the trained model

In [None]:
# save the trained model
filename = 'trained_model_parkinson.sav'
pickle.dump(model_svc, open(filename, 'wb'))

# load the saved model
loaded_model = pickle.load(open('trained_model_parkinson.sav', 'rb'))