<a href="https://www.kaggle.com/code/manishkr1754/parkinson-s-disease-detection?scriptVersionId=144350542" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

---
<center><h1>Parkinson's Disease Detection</h1></center>
<center><h3>Part of 30 Days 30 ML Projects Challenge</h3></center>

---

## 1) Understanding Problem Statement
---

Parkinson's disease is a debilitating neurological disorder that affects millions of individuals globally, leading to a significant decline in their quality of life. Early diagnosis and accurate prediction of Parkinson's disease can greatly enhance patient care and treatment outcomes. Leveraging the power of machine learning, we aim to address this critical healthcare challenge.

This project falls within the domain of **Medical Diagnosis and Classification using Machine Learning**. The primary objective is **to build a predictive model for the early detection of Parkinson's disease by analyzing a combination of clinical data, neuroimaging results and patient demographics**.

## 2) Understanding Data
---

The project uses **Parkinson's Disease Data** which contains several variables (independent variables) and the outcome variable or dependent variable.

## 3) Getting System Ready
---
Importing required libraries


In [None]:
import numpy as np
import pandas as pd

# for model buidling
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

## 4) Data Eyeballing
---

### Laoding Data

In [None]:
parkinsons_disease_data = pd.read_csv('Datasets/Day14_Parkinsons_Disease_Data.csv') 

In [None]:
parkinsons_disease_data

In [None]:
print('The size of Dataframe is: ', parkinsons_disease_data.shape)
print('-'*100)
print('The Column Name, Record Count and Data Types are as follows: ')
parkinsons_disease_data.info()
print('-'*100)

In [None]:
# Defining numerical & categorical columns
numeric_features = [feature for feature in parkinsons_disease_data.columns if parkinsons_disease_data[feature].dtype != 'O']
categorical_features = [feature for feature in parkinsons_disease_data.columns if parkinsons_disease_data[feature].dtype == 'O']

# print columns
print('We have {} numerical features : {}'.format(len(numeric_features), numeric_features))
print('\nWe have {} categorical features : {}'.format(len(categorical_features), categorical_features))

In [None]:
print('Missing Value Presence in different columns of DataFrame are as follows : ')
print('-'*100)
total=parkinsons_disease_data.isnull().sum().sort_values(ascending=False)
percent=(parkinsons_disease_data.isnull().sum()/parkinsons_disease_data.isnull().count()*100).sort_values(ascending=False)
pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])

In [None]:
print('Summary Statistics of numerical features for DataFrame are as follows:')
print('-'*100)
parkinsons_disease_data.describe()

In [None]:
parkinsons_disease_data['status'].value_counts() # status is target variable

## 5) Model Building
---

### Creating Feature Matrix (Independent Variables) & Target Variable (Dependent Variable)

In [None]:
# separating the data and labels
X = parkinsons_disease_data.drop(columns = ['name', 'status'], axis=1) # Feature matrix
y = parkinsons_disease_data['status'] # Target variable

In [None]:
X

In [None]:
y

### Data Standardization

In [None]:
scaler = StandardScaler()

In [None]:
scaler.fit(X)

In [None]:
standardized_data = scaler.transform(X)

In [None]:
standardized_data

In [None]:
X = standardized_data

In [None]:
X

### Train-Test Split

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=45)

In [None]:
print(X.shape, X_train.shape, X_test.shape)

In [None]:
print(y.shape, y_train.shape, y_test.shape)

### Model Comparison : Training & Evaluation

In [None]:
models = [LogisticRegression, SVC, DecisionTreeClassifier, RandomForestClassifier]
accuracy_scores = []
precision_scores = []
recall_scores = []
f1_scores = []

for model in models:
    classifier = model().fit(X_train, y_train)
    y_pred = classifier.predict(X_test)
    
    accuracy_scores.append(accuracy_score(y_test, y_pred))
    precision_scores.append(precision_score(y_test, y_pred))
    recall_scores.append(recall_score(y_test, y_pred))
    f1_scores.append(f1_score(y_test, y_pred))

In [None]:
classification_metrics_df = pd.DataFrame({
    "Model": ["Logistic Regression", "SVM", "Decision Tree", "Random Forest"],
    "Accuracy": accuracy_scores,
    "Precision": precision_scores,
    "Recall": recall_scores,
    "F1 Score": f1_scores
})

classification_metrics_df.set_index('Model', inplace=True)
classification_metrics_df

### Inference

In the context of parkinson's Disease Prediction, 
- All models, including Logistic Regression, SVM, Decision Tree, and Random Forest, exhibit high accuracy, signifying their ability to correctly classify individuals with or without Parkinson's disease. Precision scores indicate that the models, particularly Random Forest, excel at making accurate positive predictions, minimizing false positives. Furthermore, all models achieve perfect recall, demonstrating their sensitivity in correctly identifying all actual cases of Parkinson's disease. The Random Forest model outperforms others with the highest F1 score, indicating a well-balanced trade-off between precision and recall. Overall, these machine learning models show significant potential for early Parkinson's disease detection, with the **Random Forest model** standing out as a robust choice for its superior F1 score and overall strong performance.