# O que é a Doença de Parkinson?

A doença de Parkinson é um distúrbio progressivo do sistema nervoso central que afeta o movimento e induz tremores e rigidez. Tem 5 estágios e afeta mais de 1 milhão de pessoas todos os anos na Índia. Isso é crônico e ainda não tem cura. É uma doença neurodegenerativa que afeta os neurônios produtores de dopamina no cérebro.

# O que é XGBoost?

Bom, XGBoost é um novo algoritmo de aprendizado de máquina projetado com velocidade e desempenho em mente. XGBoost significa eXtreme Gradient Boosting e é baseado em árvores de decisão. Neste projeto, importaremos o XGBClassifier da biblioteca xgboost.

# Objetivo

Construir um modelo para detectar com precisão a presença da doença de Parkinson em um indivíduo.

# Importações

In [1]:
import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import MinMaxScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [2]:
df = pd.read_csv("data/parkinsons.data")
df.head()

Unnamed: 0,name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,...,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
0,phon_R01_S01_1,119.992,157.302,74.997,0.00784,7e-05,0.0037,0.00554,0.01109,0.04374,...,0.06545,0.02211,21.033,1,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
1,phon_R01_S01_2,122.4,148.65,113.819,0.00968,8e-05,0.00465,0.00696,0.01394,0.06134,...,0.09403,0.01929,19.085,1,0.458359,0.819521,-4.075192,0.33559,2.486855,0.368674
2,phon_R01_S01_3,116.682,131.111,111.555,0.0105,9e-05,0.00544,0.00781,0.01633,0.05233,...,0.0827,0.01309,20.651,1,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
3,phon_R01_S01_4,116.676,137.871,111.366,0.00997,9e-05,0.00502,0.00698,0.01505,0.05492,...,0.08771,0.01353,20.644,1,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
4,phon_R01_S01_5,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,...,0.1047,0.01767,19.649,1,0.417356,0.823484,-3.747787,0.234513,2.33218,0.410335


## Pegando do DataFrame o que vão ser os nossos labels e as nossas features.

In [3]:
features = df.loc[:, df.columns != 'status'].values[:, 1:]
labels = df.loc[:, 'status'].values
print("features: ", features)
print("labels: ", labels)

features:  [[119.992 157.302 74.997 ... 0.266482 2.301442 0.284654]
 [122.4 148.65 113.819 ... 0.33559 2.486855 0.368674]
 [116.682 131.111 111.555 ... 0.311173 2.342259 0.332634]
 ...
 [174.688 240.005 74.287 ... 0.158453 2.679772 0.131728]
 [198.764 396.961 74.904 ... 0.207454 2.138608 0.123306]
 [214.289 260.277 77.973 ... 0.190667 2.555477 0.148569]]
labels:  [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1
 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0
 0 0 0 0 0 0 0 0 0 0]


## Contagem dos dados na coluna 'status' nos temos 0 e 1

In [4]:
print("valores 0: ", labels[labels == 0].shape[0])
print("Valores 1: ", labels[labels == 1].shape[0])

valores 0:  48
Valores 1:  147


## Normalizar as 'features' 
Utilizando o MinMaxScaler nos conseguimos normalizar os dados em uma range de (-1,1) e depois justar os dados e transformalos usando o metodo fit_transform.

In [5]:
scaler = MinMaxScaler((-1,1))
x = scaler.fit_transform(features)
y = labels

## Agora, nos vamos dividir os dados do dataset entre dados de teste e dados de treino. Deixando 20% dos dados para treino.

In [6]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7)

## Agora é so inicializar o XGBClassifier e treinar o modelo. 

Esse classificador usa eXtreme Gradient Boosting. Ele se enquadra na categoria de Ensemble Learning em ML, onde treinamos e prevemos o uso de muitos modelos para produzir um resultado superior.

In [7]:
modelo = XGBClassifier()
modelo.fit(x_train, y_train)





XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.300000012,
              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=8,
              num_parallel_tree=1, predictor='auto', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

## Por fim, nossa previsão

In [8]:
y_pred = modelo.predict(x_test)
print("Acuracia do nosso modelo: ", accuracy_score(y_test, y_pred) * 100)

Acuracia do nosso modelo:  94.87179487179486


# Referências do projeto

- https://scikit-learn.org/stable/
- https://xgboost.readthedocs.io/en/stable/python/python_intro.html
- https://data-flair.training/blogs/gradient-boosting-algorithm/
- https://data-flair.training/blogs/python-machine-learning-project-detecting-parkinson-disease/
- https://drauziovarella.uol.com.br/doencas-e-sintomas/doenca-de-parkinson/