# **Saúde Fetal**

**Autor**: Renan Santos Mendes

**Email**: renansantosmendes@gmail.com

**Descrição**: Este notebook apresenta um exemplo de modelo de aprendizado de máquina para um problema de classificação.


# **Saúde Fetal**

As Cardiotocografias (CTGs) são opções simples e de baixo custo para avaliar a saúde fetal, permitindo que os profissionais de saúde atuem na prevenção da mortalidade infantil e materna. O próprio equipamento funciona enviando pulsos de ultrassom e lendo sua resposta, lançando luz sobre a frequência cardíaca fetal (FCF), movimentos fetais, contrações uterinas e muito mais.

Este conjunto de dados contém 2126 registros de características extraídas de exames de Cardiotocografias, que foram então classificados por três obstetras especialistas em 3 classes:

- Normal
- Suspeito
- Patológico

In [2]:
pip install -U mlflow -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.1[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [3]:
import os
import pickle
import mlflow
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# 2 - Fazendo a leitura do dataset e atribuindo às respectivas variáveis

In [5]:
data = pd.read_csv('https://raw.githubusercontent.com/renansantosmendes/lectures-cdas-2023/master/fetal_health.csv')

In [13]:
data.head()

Unnamed: 0,baseline value,accelerations,fetal_movement,uterine_contractions,light_decelerations,severe_decelerations,prolongued_decelerations,abnormal_short_term_variability,mean_value_of_short_term_variability,percentage_of_time_with_abnormal_long_term_variability,...,histogram_min,histogram_max,histogram_number_of_peaks,histogram_number_of_zeroes,histogram_mode,histogram_mean,histogram_median,histogram_variance,histogram_tendency,fetal_health
0,120.0,0.0,0.0,0.0,0.0,0.0,0.0,73.0,0.5,43.0,...,62.0,126.0,2.0,0.0,120.0,137.0,121.0,73.0,1.0,2.0
1,132.0,0.006,0.0,0.006,0.003,0.0,0.0,17.0,2.1,0.0,...,68.0,198.0,6.0,1.0,141.0,136.0,140.0,12.0,0.0,1.0
2,133.0,0.003,0.0,0.008,0.003,0.0,0.0,16.0,2.1,0.0,...,68.0,198.0,5.0,1.0,141.0,135.0,138.0,13.0,0.0,1.0
3,134.0,0.003,0.0,0.008,0.003,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,11.0,0.0,137.0,134.0,137.0,13.0,1.0,1.0
4,132.0,0.007,0.0,0.008,0.0,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,9.0,0.0,137.0,136.0,138.0,11.0,1.0,1.0


# 3 - Preparando o dado antes de iniciar o treino do modelo

In [7]:
features_to_remove = data.columns[7:]

In [8]:
X=data.drop(features_to_remove, axis=1)
y=data["fetal_health"]

In [9]:
columns = list(X.columns)

In [10]:
scaler = preprocessing.StandardScaler()
X_df = scaler.fit_transform(X)
X_df = pd.DataFrame(X_df, columns=columns)

In [11]:
X_train, X_test, y_train, y_test = train_test_split(X_df, y, random_state=42, test_size=0.3)

In [25]:
import mlflow
from mlflow import MlflowClient
###################################################################################
# Lembrem de atualizar as variáveis abaixo com os valores do seu usuário no dagshub
###################################################################################
os.environ['MLFLOW_TRACKING_USERNAME'] = 'suport.develop@gmail.com'
os.environ['MLFLOW_TRACKING_PASSWORD'] = '018b6c087f3bba30a88adfb1988cf53f1f42c4e5'

mlflow.set_tracking_uri('https://dagshub.com/suport.develop/mlops-puc-180923.mlflow')

mlflow.sklearn.autolog(log_models=True,
                       log_input_examples=True,
                       log_model_signatures=True)

# **Modelos Ensemble**

In [14]:
from sklearn.ensemble import GradientBoostingClassifier

In [20]:
%%time
#cria o classificador
gradient_clf = GradientBoostingClassifier(max_depth=10, n_estimators=200, learning_rate=0.05)

#treinar o classificador fit(entrada, saida)
gradient_clf.fit(X_train, y_train)

2023/09/18 22:04:22 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '205f0e7444a244ea856e42f8f75b373e', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow


CPU times: user 4.39 s, sys: 89.4 ms, total: 4.48 s
Wall time: 7.01 s


GradientBoostingClassifier(learning_rate=0.05, max_depth=10, n_estimators=200)

In [27]:
%%time
#cria o classificador
gradient_clf = GradientBoostingClassifier(max_depth=10, n_estimators=150, learning_rate=0.05)

#contexto do start_run cria a pasta no diretório com os dados e parametros da execução
with mlflow.start_run(run_name='gradiente_bosting') as run:
  gradient_clf.fit(X_train, y_train)



CPU times: user 3.43 s, sys: 155 ms, total: 3.58 s
Wall time: 17.4 s
