<a href="https://colab.research.google.com/github/leonardogfrodrigues/desafio-tecnico/blob/main/classificadores_tradicionais.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Desafio Técnico - Americanas SA**

## Uso de classificadores tradicionais

Leonardo G. F. Rodrigues

https://www.linkedin.com/in/leonardogfrodrigues/

leonardogfrodrigues@gmail.com


---

Desenvolvido em linguagen Python (versão 3.7.13) e executado em nuvem por meio do ambiente Google Colaboratory. 

---

Importando o conjunto de dados direto do Google Drive

In [17]:
from google.colab import drive
drive.mount('/content/gdrive')

path = '/content/gdrive/MyDrive/dataset_cdjr.parquet.gzip'

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


Leitura do conjunto de dados

Importação das Bibliotecas

In [18]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

In [19]:
df = pd.read_parquet(path)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 466 entries, 337 to 92
Data columns (total 17 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   feature0   466 non-null    float64
 1   feature1   466 non-null    int64  
 2   feature2   466 non-null    float64
 3   feature3   466 non-null    float64
 4   feature4   466 non-null    float64
 5   feature5   466 non-null    float64
 6   feature6   466 non-null    int64  
 7   feature7   466 non-null    float64
 8   feature8   466 non-null    float64
 9   feature9   466 non-null    int64  
 10  feature10  466 non-null    float64
 11  feature11  466 non-null    float64
 12  feature12  466 non-null    float64
 13  feature13  466 non-null    float64
 14  feature14  466 non-null    float64
 15  feature15  466 non-null    int64  
 16  target     466 non-null    int64  
dtypes: float64(12), int64(5)
memory usage: 65.5 KB


## Preparação dos Dados

## Preparação dos Dados

**- Atribuição:** dados em X e rótulos em y

**- Eliminação da coluna target:** não é correto realizar a classificação considerando o alvo (classe) como atributo

**- Distribuição:** 75% para treino e 25% para teste

**- Normalização:** aplicada para manter a integridade dos dados e contribuir com o desempenho da modelagem. Além disso, evita que valores discrepantes sejam mal distribuídos, mudando esses valores para uma escala comum, que combine com o conjunto.

In [20]:
X = df.drop('target', axis = 1)
y = df['target']

In [21]:
# Divisão do dataset: 75% para treino, 25% para teste.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 42)

In [22]:
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## KNN

In [23]:
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

predictions = knn.predict(X_test)
#print(predictions)

print(classification_report(y_test, predictions, digits=4))
print(confusion_matrix(y_test ,predictions))

              precision    recall  f1-score   support

           0     0.5532    0.4561    0.5000        57
           1     0.5571    0.6500    0.6000        60

    accuracy                         0.5556       117
   macro avg     0.5552    0.5531    0.5500       117
weighted avg     0.5552    0.5556    0.5513       117

[[26 31]
 [21 39]]


## Naive Bayes

In [24]:
from sklearn.naive_bayes import GaussianNB

nb = GaussianNB()
nb.fit(X_train, y_train)

predictions = nb.predict(X_test)

print(classification_report(y_test, predictions, digits=4))
print(confusion_matrix(y_test ,predictions))

              precision    recall  f1-score   support

           0     0.6667    0.2456    0.3590        57
           1     0.5521    0.8833    0.6795        60

    accuracy                         0.5726       117
   macro avg     0.6094    0.5645    0.5192       117
weighted avg     0.6079    0.5726    0.5233       117

[[14 43]
 [ 7 53]]


## SVM Linear

In [25]:
from sklearn.svm import SVC

svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)

predictions = svm_linear.predict(X_test)

print(classification_report(y_test, predictions, digits=4))
print(confusion_matrix(y_test ,predictions))

              precision    recall  f1-score   support

           0     0.7222    0.2281    0.3467        57
           1     0.5556    0.9167    0.6918        60

    accuracy                         0.5812       117
   macro avg     0.6389    0.5724    0.5192       117
weighted avg     0.6368    0.5812    0.5237       117

[[13 44]
 [ 5 55]]


## SVM com kernel polinomial

In [26]:
from sklearn.svm import SVC

svm_poly = SVC(kernel='poly')
svm_poly.fit(X_train, y_train)

predictions = svm_poly.predict(X_test)

print(classification_report(y_test, predictions, digits=4))
print(confusion_matrix(y_test ,predictions))

              precision    recall  f1-score   support

           0     0.8333    0.0877    0.1587        57
           1     0.5315    0.9833    0.6901        60

    accuracy                         0.5470       117
   macro avg     0.6824    0.5355    0.4244       117
weighted avg     0.6786    0.5470    0.4312       117

[[ 5 52]
 [ 1 59]]


## SVM com kernel gaussiano

In [27]:
from sklearn.svm import SVC

svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)

predictions = svm_rbf.predict(X_test)

print(classification_report(y_test, predictions, digits=4))
print(confusion_matrix(y_test ,predictions))

              precision    recall  f1-score   support

           0     0.7917    0.3333    0.4691        57
           1     0.5914    0.9167    0.7190        60

    accuracy                         0.6325       117
   macro avg     0.6915    0.6250    0.5940       117
weighted avg     0.6890    0.6325    0.5972       117

[[19 38]
 [ 5 55]]


## Decision Tree

In [28]:
from sklearn.tree import DecisionTreeClassifier

dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

predictions = dt.predict(X_test)

print(classification_report(y_test, predictions, digits=4))
print(confusion_matrix(y_test ,predictions))

              precision    recall  f1-score   support

           0     0.5357    0.5263    0.5310        57
           1     0.5574    0.5667    0.5620        60

    accuracy                         0.5470       117
   macro avg     0.5465    0.5465    0.5465       117
weighted avg     0.5468    0.5470    0.5469       117

[[30 27]
 [26 34]]


## Random Forest

In [29]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(max_depth=2, random_state=0)
rf.fit(X_train, y_train)

predictions = rf.predict(X_test)

print(classification_report(y_test, predictions, digits=4))
print(confusion_matrix(y_test ,predictions))

              precision    recall  f1-score   support

           0     0.7027    0.4561    0.5532        57
           1     0.6125    0.8167    0.7000        60

    accuracy                         0.6410       117
   macro avg     0.6576    0.6364    0.6266       117
weighted avg     0.6564    0.6410    0.6285       117

[[26 31]
 [11 49]]


## Gradient Boosting

In [30]:
from sklearn.ensemble import GradientBoostingClassifier

gb = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0)
gb.fit(X_train, y_train)

predictions = gb.predict(X_test)

print(classification_report(y_test, predictions, digits=4))
print(confusion_matrix(y_test ,predictions))

              precision    recall  f1-score   support

           0     0.6383    0.5263    0.5769        57
           1     0.6143    0.7167    0.6615        60

    accuracy                         0.6239       117
   macro avg     0.6263    0.6215    0.6192       117
weighted avg     0.6260    0.6239    0.6203       117

[[30 27]
 [17 43]]


## Regressão Logística

In [31]:
from sklearn.linear_model import LogisticRegression

logr = LogisticRegression()
logr.fit(X_train, y_train)

predictions = logr.predict(X_test)

print(classification_report(y_test, predictions, digits=4))
print(confusion_matrix(y_test ,predictions))

              precision    recall  f1-score   support

           0     0.7407    0.3509    0.4762        57
           1     0.5889    0.8833    0.7067        60

    accuracy                         0.6239       117
   macro avg     0.6648    0.6171    0.5914       117
weighted avg     0.6629    0.6239    0.5944       117

[[20 37]
 [ 7 53]]
