## Neste desafio, você será desafiado a criar 4 modelos que possam prever se um solicitante é de baixo ou alto risco de crédito com base em informações relevantes, como histórico de crédito, valor do crédito solicitado, tempo de emprego atual, estado civil e outras variáveis. Você usará um conjunto de dados que inclui informações sobre o solicitante, bem como sua classificação de risco de crédito. Você deverá treinar e ajustar um modelo de classificação que possa prever com precisão a classe de risco de crédito de um novo solicitante com base nas informações fornecidas. Boa sorte!

### Algortimos a serem usados e links dos materiais de estudo
### KNN - [KNN](https://membro.comunidadedatascience.com/89193-fundamentos-de-machine-learning/2150255-aula-11-k-nearest-neighbors-teoria)

### Decision Tree - [Decision Tree](https://membro.comunidadedatascience.com/89193-fundamentos-de-machine-learning/2334896-aula-41-introducao-a-decision-tree)

### Random Forest - [Random Forest](https://membro.comunidadedatascience.com/89193-fundamentos-de-machine-learning/2394374-aula-47-random-forest-teoria)

### Logistic Regression - [Logistic Regression](https://membro.comunidadedatascience.com/89193-fundamentos-de-machine-learning/2424298-aula-50-introducao-a-logistic-regression)

# Imports

In [34]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from category_encoders.count import CountEncoder
from sklearn.model_selection import train_test_split

In [37]:
url = "datasets/train.csv"
data = pd.read_csv(url)

|Nome do atributo |	Descrição |
| --- | --- |
|checking_status|	Status da conta corrente existente |
|duration|	Duração em meses |
|credit_history|	Histórico de crédito do solicitante, incluindo créditos tomados, pagos devidamente, atrasos e contas críticas |
|purpose|	Finalidade do crédito solicitado |
|credit_amount|	Valor do crédito solicitado |
|savings_status|	Situação da conta poupança/título do solicitante |
|employment|	Tempo de emprego atual do solicitante, em anos |
|installment_commitment|	Taxa de prestação em percentagem do rendimento disponível |
|personal_status|	Estado civil e situação pessoal do solicitante |
|other_parties|	Outras pessoas envolvidas no crédito |
|residence_since|	Tempo de residência atual do solicitante, em anos |
|property_magnitude|	Tamanho da propriedade possuída pelo solicitante |
|age|	Idade do solicitante |
|other_payment_plans|	Outros planos de pagamento em que o solicitante está inscrito |
|housing|	Tipo de moradia do solicitante |
|existing_credits|	Número de créditos existentes atualmente em nome do solicitante |
|job|	Tipo de emprego atual do solicitante |
|num_dependents|	Número de dependentes financeiros do solicitante |
|own_telephone|	Indicação se o solicitante possui telefone próprio |
|foreign_worker|	Indicação se o solicitante é estrangeiro |
|class|	Classe (bom ou mau risco de crédito) atribuída ao solicitante com base em sua capacidade de pagamento |

In [40]:
data.shape

(103904, 25)

In [42]:
data = data.sample(50000)

In [44]:
data = data.drop(columns=['Unnamed: 0'])

In [45]:
train, test = data.

Unnamed: 0,id,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Inflight wifi service,Departure/Arrival time convenient,Ease of Online booking,...,Inflight entertainment,On-board service,Leg room service,Baggage handling,Checkin service,Inflight service,Cleanliness,Departure Delay in Minutes,Arrival Delay in Minutes,satisfaction
38356,116261,Female,Loyal Customer,49,Personal Travel,Eco,390,3,5,3,...,1,1,3,1,3,1,5,0,0.0,neutral or dissatisfied
64021,8679,Female,Loyal Customer,34,Business travel,Business,2169,2,5,5,...,2,2,2,2,2,2,3,9,8.0,neutral or dissatisfied
12265,13846,Male,Loyal Customer,11,Personal Travel,Eco,628,3,5,3,...,2,3,5,5,3,4,2,0,0.0,neutral or dissatisfied
58657,1547,Male,Loyal Customer,54,Personal Travel,Eco,862,3,5,3,...,1,3,3,5,5,4,1,10,5.0,neutral or dissatisfied
92327,90047,Female,Loyal Customer,34,Business travel,Business,2417,2,2,2,...,4,4,4,4,5,4,5,0,0.0,satisfied
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32693,15499,Female,Loyal Customer,51,Personal Travel,Eco,377,3,2,3,...,5,5,3,1,1,5,1,0,8.0,neutral or dissatisfied
75083,62807,Female,Loyal Customer,33,Personal Travel,Eco,2338,0,1,1,...,2,3,5,3,4,2,2,0,0.0,satisfied
4797,33442,Female,Loyal Customer,36,Personal Travel,Eco,2556,3,4,3,...,3,2,3,5,4,4,4,2,25.0,neutral or dissatisfied
73207,33721,Female,Loyal Customer,42,Personal Travel,Eco,667,3,4,3,...,4,4,3,4,4,4,5,0,3.0,neutral or dissatisfied


In [26]:
import hashlib
from datetime import datetime

def calcular_hash_md5(texto):
    md5 = hashlib.md5()
    md5.update(texto.encode('utf-8'))
    return md5.hexdigest()

def obter_data_hora_atual():
    data_hora_atual = datetime.now()
    return data_hora_atual.strftime('%H:%M:%S.%f')

In [28]:
data['id'] = data.apply(lambda row: calcular_hash_md5(obter_data_hora_atual()), axis=1)

In [32]:
data = data[['id', 'checking_status', 'duration', 'credit_history', 'purpose',
       'credit_amount', 'savings_status', 'employment',
       'installment_commitment', 'personal_status', 'other_parties',
       'residence_since', 'property_magnitude', 'age', 'other_payment_plans',
       'housing', 'existing_credits', 'job', 'num_dependents', 'own_telephone',
       'foreign_worker', 'class']]

In [33]:
data

Unnamed: 0,id,checking_status,duration,credit_history,purpose,credit_amount,savings_status,employment,installment_commitment,personal_status,...,property_magnitude,age,other_payment_plans,housing,existing_credits,job,num_dependents,own_telephone,foreign_worker,class
0,fa230c7d6cd7a628228f57bebc12f16b,<0,6.0,critical/other existing credit,radio/tv,1169.0,no known savings,>=7,4.0,male single,...,real estate,67.0,none,own,2.0,skilled,1.0,yes,yes,good
1,e0f782d89aac7433cc922c9d85cf030b,0<=X<200,48.0,existing paid,radio/tv,5951.0,<100,1<=X<4,2.0,female div/dep/mar,...,real estate,22.0,none,own,1.0,skilled,1.0,none,yes,bad
2,e04997a7a779ccfd76d0c4967ce53dfe,no checking,12.0,critical/other existing credit,education,2096.0,<100,4<=X<7,2.0,male single,...,real estate,49.0,none,own,1.0,unskilled resident,2.0,none,yes,good
3,94e72fcafb35fed62cae3ac09d50c483,<0,42.0,existing paid,furniture/equipment,7882.0,<100,4<=X<7,2.0,male single,...,life insurance,45.0,none,for free,1.0,skilled,2.0,none,yes,good
4,c60f2737502effb8c3bb4b6cd2a5a202,<0,24.0,delayed previously,new car,4870.0,<100,1<=X<4,3.0,male single,...,no known property,53.0,none,for free,2.0,skilled,2.0,none,yes,bad
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,4913b7ac4a790873efed7691d9838cd0,no checking,12.0,existing paid,furniture/equipment,1736.0,<100,4<=X<7,3.0,female div/dep/mar,...,real estate,31.0,none,own,1.0,unskilled resident,1.0,none,yes,good
996,29ff83c3a455e71c1bdc050f03ba1d52,<0,30.0,existing paid,used car,3857.0,<100,1<=X<4,4.0,male div/sep,...,life insurance,40.0,none,own,1.0,high qualif/self emp/mgmt,1.0,yes,yes,good
997,d3d002cdf788e0a39f15b4fd2bca1d44,no checking,12.0,existing paid,radio/tv,804.0,<100,>=7,4.0,male single,...,car,38.0,none,own,1.0,skilled,1.0,none,yes,good
998,b5168ca12479348a4ad707b550219920,<0,45.0,existing paid,radio/tv,1845.0,<100,1<=X<4,4.0,male single,...,no known property,23.0,none,for free,1.0,skilled,1.0,yes,yes,bad


In [35]:
train, test = train_test_split(data, shuffle=True, stratify=data['class'])

In [36]:
train

Unnamed: 0,id,checking_status,duration,credit_history,purpose,credit_amount,savings_status,employment,installment_commitment,personal_status,...,property_magnitude,age,other_payment_plans,housing,existing_credits,job,num_dependents,own_telephone,foreign_worker,class
774,4487abfbdcf2cd5c247a10bd646681ce,>=200,12.0,critical/other existing credit,new car,1480.0,500<=X<1000,unemployed,2.0,male single,...,no known property,66.0,bank,for free,3.0,unemp/unskilled non res,1.0,none,yes,good
557,daa785606de7cb4927aa97a6c9e20de1,no checking,21.0,no credits/all paid,new car,5003.0,no known savings,1<=X<4,1.0,female div/dep/mar,...,life insurance,29.0,bank,own,2.0,skilled,1.0,yes,yes,bad
73,4a7e1bef15720a38a3015350130186d3,0<=X<200,42.0,critical/other existing credit,business,5954.0,<100,4<=X<7,2.0,female div/dep/mar,...,real estate,41.0,bank,own,2.0,unskilled resident,1.0,none,yes,good
324,3a9e3f8283ed171509026d6392bf2ad0,no checking,18.0,critical/other existing credit,new car,1028.0,<100,1<=X<4,4.0,female div/dep/mar,...,real estate,36.0,none,own,2.0,skilled,1.0,none,yes,good
662,ff09b63cd015e4e3892da28912f08d78,no checking,21.0,existing paid,furniture/equipment,2241.0,<100,>=7,4.0,male single,...,real estate,50.0,none,own,2.0,skilled,1.0,none,yes,good
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
954,a231079d1de8fd6e59670303ce5de9e3,<0,12.0,existing paid,new car,1893.0,<100,1<=X<4,4.0,female div/dep/mar,...,life insurance,29.0,none,own,1.0,skilled,1.0,yes,yes,good
742,a3fa93182f7fd8f95ecffae3b6d65d77,no checking,21.0,existing paid,radio/tv,3160.0,no known savings,>=7,4.0,male single,...,life insurance,41.0,none,own,1.0,skilled,1.0,yes,yes,good
143,589ef69c463d95fed3bac64d6ce0e0f3,<0,18.0,existing paid,furniture/equipment,2462.0,<100,1<=X<4,2.0,male single,...,car,22.0,none,own,1.0,skilled,1.0,none,yes,bad
649,6dece5424d1885388dbec7f56068fa8f,<0,12.0,existing paid,education,684.0,<100,1<=X<4,4.0,male single,...,car,40.0,none,rent,1.0,unskilled resident,2.0,none,yes,bad


In [None]:
cat_cols = X_train.select_dtypes(include=['object']).columns