# Primeiro Modelo Machine Learning
## Prevendo a prob de um dado paciente ter diabetes ou não

### Referências

- Leitura de Dados em formato .csv do Pandas - <br><br>[Pandas Docs - API Reference - Input/Output - Flat File - read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas-read-csv)
<br><br>[Pandas Docs - User Guide - IO tools (text, CSV, HDF5, …)](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-tools-text-csv-hdf5)

***

- Como excluir(drop) uma coluna do DataFrame(tabela) - <br><br>[Pandas Docs - API Reference - DataFrame - Reindexing/selection/label manipulation - DataFrame.drop](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html#pandas-dataframe-drop)

***

- Como selecionar apenas uma coluna do DataFrame(tabela) - <br><br>[Pandas Docs - API Reference - DataFrame - Indexing, iteration - DataFrame.loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas-dataframe-loc)
<br><br>[Pandas Docs - User Guide - Indexing and selecting data](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-and-selecting-data)

***

- Dividindo entre dados de teste e treino - <br><br>[Scikit-learn Docs - API Reference - sklearn.model_selection: Model Selection - Splitter Functions - model_selection.train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn-model-selection-train-test-split)
<br><br>[Scikit-learn Docs - User Guide - 3. Model selection and evaluation - 3.1. Cross-validation: evaluating estimator performance](https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation-evaluating-estimator-performance)

***

- Instanciando, treinando e prevendo - <br><br>[Scikit-learn Docs - Tutorial - An introduction to machine learning with scikit-learn - Learning and predicting](https://scikit-learn.org/stable/tutorial/basic/tutorial.html#learning-and-predicting)

***

- Usando o Random Forest Classifier - <br><br>[Scikit-learn Docs - API Reference - sklearn.ensemble: Ensemble Methods - ensemble.RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn-ensemble-randomforestclassifier)

***

- Avaliando o modelo - <br><br>[Scikit-learn Docs - API Reference - sklearn.metrics: Metrics - Classification metrics - metrics.accuracy_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html#sklearn-metrics-accuracy-score)
<br><br>[Scikit-learn Docs - User Guide - 3. Model selection and evaluation - 3.3. Model evaluation: quantifying the quality of predictions - 3.3.2. Classification metrics](https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics)

In [3]:
# importando as bibliotecas auxiliares
from pandas import read_csv
from sklearn.model_selection import train_test_split 
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# leitura de dados
FILE_NAME = 'data/diabetes.csv'
df_diabetes = read_csv(filepath_or_buffer=FILE_NAME)

# split entre features e labels
X = df_diabetes.drop(columns='Outcome')
y = df_diabetes.loc[:, 'Outcome']

# split entre dados de treino e teste
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# instanciando e treinando o modelo
modelo = DecisionTreeClassifier(random_state=42)
modelo.fit(X_train, y_train)

# previsoes do modelo
y_pred = modelo.predict(X_test)

# avaliando o modelo
acc_modelo = accuracy_score(y_test, y_pred) * 100
print(f'A acurácia do modelo é de {acc_modelo:.2f}%')

A acurácia do modelo é de 70.13%


***
***

# Segundo Modelo de Machine Learning
## Prevendo a Mediana do preço de casas em Boston

In [41]:
# importando as bibliotecas auxiliares
from pandas import read_csv
from sklearn.model_selection import train_test_split 
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error

# leitura de dados
FILE_NAME = 'data/boston_housing.csv'
boston_housing = read_csv(filepath_or_buffer=FILE_NAME)

# split entre features e labels
X = boston_housing.drop(columns='MEDV')
y =boston_housing.loc[:, 'MEDV']

# split entre dados de treino e teste
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# instanciando e treinando o modelo
modelo = DecisionTreeRegressor(random_state=42)
modelo.fit(X_train, y_train)

# previsoes do modelo
y_pred = modelo.predict(X_test)

# avaliando o modelo
mae_modelo = mean_absolute_error(y_test, y_pred)
print(f'O MAE do modelo é: {mae_modelo:.2f}')

O MAE do modelo é: 2.41
