<a href="https://colab.research.google.com/github/lima-breno/machine_learning/blob/main/classificadores.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Classificadores**
**Autor**: Renan Santos Mendes

**Email**: renansantosmendes@gmail.com

**Descrição**: Este notebook apresenta um exemplo de modelos de aprendizado de máquina para um problema de classificação.


# **Saúde Fetal**

As Cardiotocografias (CTGs) são opções simples e de baixo custo para avaliar a saúde fetal, permitindo que os profissionais de saúde atuem na prevenção da mortalidade infantil e materna. O próprio equipamento funciona enviando pulsos de ultrassom e lendo sua resposta, lançando luz sobre a frequência cardíaca fetal (FCF), movimentos fetais, contrações uterinas e muito mais.

Este conjunto de dados contém 2126 registros de características extraídas de exames de Cardiotocografias, que foram então classificados por três obstetras especialistas em 3 classes:

- Normal
- Suspeito
- Patológico

In [None]:
%%time
!git clone https://github.com/renansantosmendes/ml_datasets.git

Cloning into 'ml_datasets'...
remote: Enumerating objects: 9, done.[K
remote: Counting objects: 100% (9/9), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 9 (delta 1), reused 9 (delta 1), pack-reused 0[K
Unpacking objects: 100% (9/9), 375.03 KiB | 6.82 MiB/s, done.
CPU times: user 14.2 ms, sys: 2.37 ms, total: 16.5 ms
Wall time: 612 ms


In [None]:
!pip install -U mlflow --quiet
!pip install -U threadpoolctl --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.1/18.1 MB[0m [31m88.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m83.5/83.5 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m184.3/184.3 kB[0m [31m19.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m148.1/148.1 kB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.5/79.5 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.7/78.7 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.7/62.7 kB[0m [31m5.5 MB/s[0

In [None]:
import os
import pickle
import mlflow
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# 2 - Fazendo a leitura do dataset e atribuindo às respectivas variáveis

In [None]:
data = pd.read_csv(os.path.join('ml_datasets','fetal_health.csv'))

# 3 - Preparando o dado antes de iniciar o treino do modelo

In [None]:
features_to_remove = data.columns[7:]
X=data.drop(features_to_remove, axis=1)
y=data["fetal_health"]
columns = list(X.columns)

scaler = preprocessing.StandardScaler()
X_df = scaler.fit_transform(X)
X_df = pd.DataFrame(X_df, columns=columns)

X_train, X_test, y_train, y_test = train_test_split(X_df,
                                                    y,
                                                    test_size=0.3,
                                                    random_state=42)

---

## Criando uma conta no DAGsHub
Acessar [DAGsHub](https://dagshub.com)

# 4 - Criando e treinando uma Árvore de Decisão

In [None]:
from sklearn import tree
from mlflow import MlflowClient
from sklearn.tree import DecisionTreeClassifier

In [None]:
mlflow.sklearn.autolog(log_models=True,
                       log_input_examples=True,
                       log_model_signatures=True)

In [None]:
# MLFLOW_TRACKING_URI='https://dagshub.com/renansantosmendes/IAAM02.mlflow'
# MLFLOW_TRACKING_USERNAME='renansantosmendes'
# MLFLOW_TRACKING_PASSWORD='cc41cc48f8e489dd5b87404dd6f9720944e32e9b'

In [None]:
# os.environ['MLFLOW_TRACKING_USERNAME'] = MLFLOW_TRACKING_USERNAME
# os.environ['MLFLOW_TRACKING_PASSWORD'] = MLFLOW_TRACKING_PASSWORD

# mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

# SVM

In [None]:
from sklearn.svm import SVC

In [None]:
%%time


with mlflow.start_run(run_name='svc'):
  mlflow.log_artifact(os.path.join(tmp_dir, 'scaler.pkl'))



CPU times: user 825 ms, sys: 60.4 ms, total: 886 ms
Wall time: 7.21 s


# Árvore de Decisão

In [None]:
%%time


In [None]:
%%time
with mlflow.start_run(run_name='decision_tree') as run:


  mlflow.log_artifact(os.path.join(tmp_dir, 'scaler.pkl'))

### 4.3 Visualizando a árvore

#### 4.3.1 Forma 1

In [None]:
print(tree.export_text(tree_clf))

#### 4.3.2 Forma 2

In [None]:
fig = plt.figure(figsize=(25,20))
_ = tree.plot_tree(tree_clf,
                   feature_names=data.columns.tolist()[:-1],
                   class_names=['normal', 'suspeito', 'doente'],
                   filled=True)

# **Modelos Ensemble**

## Voting

In [None]:
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import RandomForestClassifier

In [None]:
%%time

log_clf =
rnd_clf =
svm_clf =

voting =

with mlflow.start_run(run_name='voting') as run:

  mlflow.log_artifact(os.path.join(tmp_dir, 'scaler.pkl'))

## Bagging

In [None]:
from sklearn.ensemble import BaggingClassifier

In [None]:
%%time
bagging =

with mlflow.start_run(run_name='bagging') as run:

  mlflow.log_artifact(os.path.join(tmp_dir, 'scaler.pkl'))

## Boosting

In [None]:
from sklearn.ensemble import AdaBoostClassifier

In [None]:
%%time


In [None]:
from sklearn.ensemble import GradientBoostingClassifier

In [None]:
%%time
grd =

with mlflow.start_run(run_name='grf') as run:

  mlflow.log_artifact(os.path.join(tmp_dir, 'scaler.pkl'))

In [None]:
from xgboost import XGBClassifier
mlflow.xgboost.autolog(log_models=True,
                       log_input_examples=True,
                       log_model_signatures=True)

In [None]:
%%time
