<a href="https://colab.research.google.com/github/eldercamposds/Fraude_Cartao/blob/main/Fraude_Cart%C3%A3oipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Detecção de fraudes em cartões de crédito

## Sobre o Dataset

Dataset obtido em [www.kaggle.com](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud?select=creditcard.csv)

### Objetivo
> Desenvolver um modelo de Machine Learning que seja capazes de reconhecer transações fraudulentas de cartão de crédito, para que os clientes não sejam cobrados por itens que não compraram.

### Conteúdo
>O conjunto de dados contém transações efetuadas com cartões de crédito em setembro de 2013 por titulares de cartões europeus.

>Este conjunto de dados apresenta transações ocorridas em dois dias, onde temos 492 fraudes em 284.807 transações. O conjunto de dados é altamente desequilibrado, a classe positiva (fraudes) representa 0,172% de todas as transações.
Ele contém apenas variáveis ​​de entrada numéricas que são o resultado de uma transformação PCA. Infelizmente, devido a questões de confidencialidade, não foram fornecidos os recursos originais e mais informações básicas sobre os dados. Os recursos V1, V2,… V28 são os principais componentes obtidos com PCA.

### Variáveis


*   V1, V2,... V28 = Cartões (valores anonimizados)
*   Time = segundos decorridos entre as transações
*   Amount = Valor da transação
*   Class = 1 para fraude e 0 caso contrário





In [42]:
#importações
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, mean_squared_error, r2_score

In [43]:
#importando arquivos
df_credit = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Dados/creditcard.csv")
df_credit.head(3)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0


In [44]:
#verificando os tipos de dados
df_credit.dtypes

Time      float64
V1        float64
V2        float64
V3        float64
V4        float64
V5        float64
V6        float64
V7        float64
V8        float64
V9        float64
V10       float64
V11       float64
V12       float64
V13       float64
V14       float64
V15       float64
V16       float64
V17       float64
V18       float64
V19       float64
V20       float64
V21       float64
V22       float64
V23       float64
V24       float64
V25       float64
V26       float64
V27       float64
V28       float64
Amount    float64
Class       int64
dtype: object

In [45]:
#verifiando nulos
df_credit.isnull().sum()

Time      0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64

In [46]:
#verificando os tipo de dados não fraude

df_nao_fraude = df_credit.Amount[df_credit.Class == 0]
df_nao_fraude.describe()

count    284315.000000
mean         88.291022
std         250.105092
min           0.000000
25%           5.650000
50%          22.000000
75%          77.050000
max       25691.160000
Name: Amount, dtype: float64

In [47]:
#verificando so tipos de dados fraude

df_fraude = df_credit.Amount[df_credit.Class == 1]
df_fraude.describe()

count     492.000000
mean      122.211321
std       256.683288
min         0.000000
25%         1.000000
50%         9.250000
75%       105.890000
max      2125.870000
Name: Amount, dtype: float64

In [48]:
df_credit.Class.value_counts()

0    284315
1       492
Name: Class, dtype: int64

>O dataframe encontra-se desbalanceado, tendo um total de 284315 registros de não fraude e apenas 492 registro de fraude. Para continuar com o dedsenvolvimento do modelo é necessário balancear o dataframe, o que será feito nas células a seguir.

#Balanceando dados

In [49]:
#Verifica tipos de dados de fraude
df_fraude = df_credit[df_credit.Class == 1]
df_fraude

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
541,406.0,-2.312227,1.951992,-1.609851,3.997906,-0.522188,-1.426545,-2.537387,1.391657,-2.770089,...,0.517232,-0.035049,-0.465211,0.320198,0.044519,0.177840,0.261145,-0.143276,0.00,1
623,472.0,-3.043541,-3.157307,1.088463,2.288644,1.359805,-1.064823,0.325574,-0.067794,-0.270953,...,0.661696,0.435477,1.375966,-0.293803,0.279798,-0.145362,-0.252773,0.035764,529.00,1
4920,4462.0,-2.303350,1.759247,-0.359745,2.330243,-0.821628,-0.075788,0.562320,-0.399147,-0.238253,...,-0.294166,-0.932391,0.172726,-0.087330,-0.156114,-0.542628,0.039566,-0.153029,239.93,1
6108,6986.0,-4.397974,1.358367,-2.592844,2.679787,-1.128131,-1.706536,-3.496197,-0.248778,-0.247768,...,0.573574,0.176968,-0.436207,-0.053502,0.252405,-0.657488,-0.827136,0.849573,59.00,1
6329,7519.0,1.234235,3.019740,-4.304597,4.732795,3.624201,-1.357746,1.713445,-0.496358,-1.282858,...,-0.379068,-0.704181,-0.656805,-1.632653,1.488901,0.566797,-0.010016,0.146793,1.00,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
279863,169142.0,-1.927883,1.125653,-4.518331,1.749293,-1.566487,-2.010494,-0.882850,0.697211,-2.064945,...,0.778584,-0.319189,0.639419,-0.294885,0.537503,0.788395,0.292680,0.147968,390.00,1
280143,169347.0,1.378559,1.289381,-5.004247,1.411850,0.442581,-1.326536,-1.413170,0.248525,-1.127396,...,0.370612,0.028234,-0.145640,-0.081049,0.521875,0.739467,0.389152,0.186637,0.76,1
280149,169351.0,-0.676143,1.126366,-2.213700,0.468308,-1.120541,-0.003346,-2.234739,1.210158,-0.652250,...,0.751826,0.834108,0.190944,0.032070,-0.739695,0.471111,0.385107,0.194361,77.89,1
281144,169966.0,-3.113832,0.585864,-5.399730,1.817092,-0.840618,-2.943548,-2.208002,1.058733,-1.632333,...,0.583276,-0.269209,-0.456108,-0.183659,-0.328168,0.606116,0.884876,-0.253700,245.00,1


In [50]:
#Verifica tipos de dados de NÃO fraude
df_nao_fraude = df_credit[df_credit.Class == 0]
df_nao_fraude

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.166480,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.167170,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.379780,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.108300,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.50,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.206010,0.502292,0.219422,0.215153,69.99,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
284802,172786.0,-11.881118,10.071785,-9.834783,-2.066656,-5.364473,-2.606837,-4.918215,7.305334,1.914428,...,0.213454,0.111864,1.014480,-0.509348,1.436807,0.250034,0.943651,0.823731,0.77,0
284803,172787.0,-0.732789,-0.055080,2.035030,-0.738589,0.868229,1.058415,0.024330,0.294869,0.584800,...,0.214205,0.924384,0.012463,-1.016226,-0.606624,-0.395255,0.068472,-0.053527,24.79,0
284804,172788.0,1.919565,-0.301254,-3.249640,-0.557828,2.630515,3.031260,-0.296827,0.708417,0.432454,...,0.232045,0.578229,-0.037501,0.640134,0.265745,-0.087371,0.004455,-0.026561,67.88,0
284805,172788.0,-0.240440,0.530483,0.702510,0.689799,-0.377961,0.623708,-0.686180,0.679145,0.392087,...,0.265245,0.800049,-0.163298,0.123205,-0.569159,0.546668,0.108821,0.104533,10.00,0


In [51]:
#fraude com a emsa quantidade de não fraude
df_nao_fraude =  df_nao_fraude.sample(n=492)
df_nao_fraude

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
150428,93328.0,-6.282187,3.584511,-3.346462,-0.668370,-1.096950,-1.513805,0.130837,0.646317,4.333338,...,-0.445666,0.932338,0.209392,-0.135873,-0.089574,-0.388241,-0.144466,-1.017340,3.79,0
129830,79222.0,-1.186894,0.422237,0.365205,-1.220625,2.862200,3.312624,0.298612,0.647693,-0.363863,...,-0.381357,-0.942530,0.076356,0.985938,0.376183,0.123704,-0.112295,-0.115520,9.72,0
7267,9663.0,-1.178157,0.715470,1.423573,1.467534,0.377157,0.222116,0.149876,0.142945,1.261478,...,-0.184988,0.075538,0.054361,-0.073493,-0.181383,-0.294114,0.074951,0.169789,9.99,0
81218,58835.0,1.397116,-0.928326,0.826125,-0.610322,-1.370642,0.035343,-1.272609,0.180214,0.309696,...,0.047470,0.166911,-0.101580,-0.453253,0.418099,-0.140156,0.041727,0.016281,13.98,0
136781,81865.0,-2.624081,2.153765,0.752903,0.234243,-1.195105,1.798252,-3.797852,-6.979159,-0.908887,...,-3.528764,2.117063,0.327859,0.248706,-0.463584,0.404743,0.207824,0.184777,29.99,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
143423,85344.0,-1.361361,-0.955378,1.997082,-0.670347,-0.035455,-0.604432,-0.025980,0.179256,0.615099,...,0.115954,-0.066193,0.376562,0.076157,-0.311600,0.825886,-0.024937,0.147639,150.00,0
27374,34526.0,0.657965,-1.338376,-0.015055,0.649531,-1.077882,-0.493668,0.154319,-0.269853,-1.009965,...,-0.235567,-1.013211,-0.287906,-0.020170,0.313796,-0.479453,-0.018660,0.091958,371.20,0
226982,144885.0,-22.177139,-17.003882,-7.484396,4.232360,-0.835724,1.322928,3.308495,-0.671719,3.090143,...,-4.952402,-0.335761,1.588404,-0.330427,1.812323,0.323081,-5.228273,9.076549,192.72,0
207501,136704.0,-1.727487,1.667157,-0.267123,-0.461004,-0.446569,0.598532,-1.173080,-1.600658,0.528556,...,2.692764,0.494573,0.264058,-0.967258,-0.484305,-0.147637,-0.061596,0.082302,22.50,0


In [52]:
#concatenando dataframes
df = pd.concat([df_nao_fraude, df_fraude], axis=0)
'''
Ao final deste processo voltaremos a ter apenas um dataframe com
984 linhas no total, porém com a quantidade
de dados de fraude e não fraude balanceado
'''
df

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
150428,93328.0,-6.282187,3.584511,-3.346462,-0.668370,-1.096950,-1.513805,0.130837,0.646317,4.333338,...,-0.445666,0.932338,0.209392,-0.135873,-0.089574,-0.388241,-0.144466,-1.017340,3.79,0
129830,79222.0,-1.186894,0.422237,0.365205,-1.220625,2.862200,3.312624,0.298612,0.647693,-0.363863,...,-0.381357,-0.942530,0.076356,0.985938,0.376183,0.123704,-0.112295,-0.115520,9.72,0
7267,9663.0,-1.178157,0.715470,1.423573,1.467534,0.377157,0.222116,0.149876,0.142945,1.261478,...,-0.184988,0.075538,0.054361,-0.073493,-0.181383,-0.294114,0.074951,0.169789,9.99,0
81218,58835.0,1.397116,-0.928326,0.826125,-0.610322,-1.370642,0.035343,-1.272609,0.180214,0.309696,...,0.047470,0.166911,-0.101580,-0.453253,0.418099,-0.140156,0.041727,0.016281,13.98,0
136781,81865.0,-2.624081,2.153765,0.752903,0.234243,-1.195105,1.798252,-3.797852,-6.979159,-0.908887,...,-3.528764,2.117063,0.327859,0.248706,-0.463584,0.404743,0.207824,0.184777,29.99,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
279863,169142.0,-1.927883,1.125653,-4.518331,1.749293,-1.566487,-2.010494,-0.882850,0.697211,-2.064945,...,0.778584,-0.319189,0.639419,-0.294885,0.537503,0.788395,0.292680,0.147968,390.00,1
280143,169347.0,1.378559,1.289381,-5.004247,1.411850,0.442581,-1.326536,-1.413170,0.248525,-1.127396,...,0.370612,0.028234,-0.145640,-0.081049,0.521875,0.739467,0.389152,0.186637,0.76,1
280149,169351.0,-0.676143,1.126366,-2.213700,0.468308,-1.120541,-0.003346,-2.234739,1.210158,-0.652250,...,0.751826,0.834108,0.190944,0.032070,-0.739695,0.471111,0.385107,0.194361,77.89,1
281144,169966.0,-3.113832,0.585864,-5.399730,1.817092,-0.840618,-2.943548,-2.208002,1.058733,-1.632333,...,0.583276,-0.269209,-0.456108,-0.183659,-0.328168,0.606116,0.884876,-0.253700,245.00,1


In [53]:
#ajustar index
df.reset_index(inplace=True)
df

Unnamed: 0,index,Time,V1,V2,V3,V4,V5,V6,V7,V8,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,150428,93328.0,-6.282187,3.584511,-3.346462,-0.668370,-1.096950,-1.513805,0.130837,0.646317,...,-0.445666,0.932338,0.209392,-0.135873,-0.089574,-0.388241,-0.144466,-1.017340,3.79,0
1,129830,79222.0,-1.186894,0.422237,0.365205,-1.220625,2.862200,3.312624,0.298612,0.647693,...,-0.381357,-0.942530,0.076356,0.985938,0.376183,0.123704,-0.112295,-0.115520,9.72,0
2,7267,9663.0,-1.178157,0.715470,1.423573,1.467534,0.377157,0.222116,0.149876,0.142945,...,-0.184988,0.075538,0.054361,-0.073493,-0.181383,-0.294114,0.074951,0.169789,9.99,0
3,81218,58835.0,1.397116,-0.928326,0.826125,-0.610322,-1.370642,0.035343,-1.272609,0.180214,...,0.047470,0.166911,-0.101580,-0.453253,0.418099,-0.140156,0.041727,0.016281,13.98,0
4,136781,81865.0,-2.624081,2.153765,0.752903,0.234243,-1.195105,1.798252,-3.797852,-6.979159,...,-3.528764,2.117063,0.327859,0.248706,-0.463584,0.404743,0.207824,0.184777,29.99,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
979,279863,169142.0,-1.927883,1.125653,-4.518331,1.749293,-1.566487,-2.010494,-0.882850,0.697211,...,0.778584,-0.319189,0.639419,-0.294885,0.537503,0.788395,0.292680,0.147968,390.00,1
980,280143,169347.0,1.378559,1.289381,-5.004247,1.411850,0.442581,-1.326536,-1.413170,0.248525,...,0.370612,0.028234,-0.145640,-0.081049,0.521875,0.739467,0.389152,0.186637,0.76,1
981,280149,169351.0,-0.676143,1.126366,-2.213700,0.468308,-1.120541,-0.003346,-2.234739,1.210158,...,0.751826,0.834108,0.190944,0.032070,-0.739695,0.471111,0.385107,0.194361,77.89,1
982,281144,169966.0,-3.113832,0.585864,-5.399730,1.817092,-0.840618,-2.943548,-2.208002,1.058733,...,0.583276,-0.269209,-0.456108,-0.183659,-0.328168,0.606116,0.884876,-0.253700,245.00,1


#Retirando dados para validação
> Nos processos a seguir serão realizados tratamentos para a separação dos dados de teste do restante da base



In [54]:
df_val_nao_fraude = df.head(5)
df_val_nao_fraude

Unnamed: 0,index,Time,V1,V2,V3,V4,V5,V6,V7,V8,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,150428,93328.0,-6.282187,3.584511,-3.346462,-0.66837,-1.09695,-1.513805,0.130837,0.646317,...,-0.445666,0.932338,0.209392,-0.135873,-0.089574,-0.388241,-0.144466,-1.01734,3.79,0
1,129830,79222.0,-1.186894,0.422237,0.365205,-1.220625,2.8622,3.312624,0.298612,0.647693,...,-0.381357,-0.94253,0.076356,0.985938,0.376183,0.123704,-0.112295,-0.11552,9.72,0
2,7267,9663.0,-1.178157,0.71547,1.423573,1.467534,0.377157,0.222116,0.149876,0.142945,...,-0.184988,0.075538,0.054361,-0.073493,-0.181383,-0.294114,0.074951,0.169789,9.99,0
3,81218,58835.0,1.397116,-0.928326,0.826125,-0.610322,-1.370642,0.035343,-1.272609,0.180214,...,0.04747,0.166911,-0.10158,-0.453253,0.418099,-0.140156,0.041727,0.016281,13.98,0
4,136781,81865.0,-2.624081,2.153765,0.752903,0.234243,-1.195105,1.798252,-3.797852,-6.979159,...,-3.528764,2.117063,0.327859,0.248706,-0.463584,0.404743,0.207824,0.184777,29.99,0


In [55]:
df_val_fraude = df.tail(5)
df_val_fraude

Unnamed: 0,index,Time,V1,V2,V3,V4,V5,V6,V7,V8,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
979,279863,169142.0,-1.927883,1.125653,-4.518331,1.749293,-1.566487,-2.010494,-0.88285,0.697211,...,0.778584,-0.319189,0.639419,-0.294885,0.537503,0.788395,0.29268,0.147968,390.0,1
980,280143,169347.0,1.378559,1.289381,-5.004247,1.41185,0.442581,-1.326536,-1.41317,0.248525,...,0.370612,0.028234,-0.14564,-0.081049,0.521875,0.739467,0.389152,0.186637,0.76,1
981,280149,169351.0,-0.676143,1.126366,-2.2137,0.468308,-1.120541,-0.003346,-2.234739,1.210158,...,0.751826,0.834108,0.190944,0.03207,-0.739695,0.471111,0.385107,0.194361,77.89,1
982,281144,169966.0,-3.113832,0.585864,-5.39973,1.817092,-0.840618,-2.943548,-2.208002,1.058733,...,0.583276,-0.269209,-0.456108,-0.183659,-0.328168,0.606116,0.884876,-0.2537,245.0,1
983,281674,170348.0,1.991976,0.158476,-2.583441,0.40867,1.151147,-0.096695,0.22305,-0.068384,...,-0.16435,-0.295135,-0.072173,-0.450261,0.313267,-0.289617,0.002988,-0.015309,42.53,1


In [56]:
#retirando as 5 primeiras linhas
df = df.iloc[5:]

#retirando as 5 ultimas linhas
df = df[:-5]

df


Unnamed: 0,index,Time,V1,V2,V3,V4,V5,V6,V7,V8,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
5,186694,127216.0,1.812308,-1.004487,-0.605354,0.191686,-0.020857,1.987091,-1.194588,0.655689,...,-0.343950,-0.543772,0.463556,-0.857703,-0.725692,-0.857053,0.112907,-0.028151,64.00,0
6,175641,122426.0,-0.699668,0.975529,0.089807,-0.768471,0.305718,-0.957884,0.856180,0.049865,...,-0.199548,-0.305629,0.141436,0.024112,-0.827579,0.148101,0.602717,0.378580,4.97,0
7,84815,60474.0,-0.762709,1.120886,2.256936,1.129910,0.335899,0.238069,0.584542,0.222209,...,0.231625,0.511111,-0.302621,-0.014859,0.166841,-0.088037,0.057714,0.092869,18.20,0
8,128466,78721.0,-0.973362,1.204274,0.890669,-0.115799,0.155060,-0.545068,0.401889,0.224244,...,-0.219969,-0.600577,0.091385,-0.049726,-0.133399,0.027878,-0.314829,-0.186520,1.98,0
9,50217,44394.0,0.386301,-1.555409,0.207626,0.585236,-1.313316,-0.461351,0.215276,-0.028543,...,-0.006121,-0.905674,-0.247855,0.576907,-0.032711,0.770866,-0.156880,0.068161,425.33,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
974,274382,165981.0,-5.766879,-8.402154,0.056543,6.950983,9.880564,-5.773192,-5.748879,0.721743,...,0.880395,-0.130436,2.241471,0.665346,-1.890041,-0.120803,0.073269,0.583799,0.00,1
975,274475,166028.0,-0.956390,2.361594,-3.171195,1.970759,0.474761,-1.902598,-0.055178,0.277831,...,0.473211,0.719400,0.122458,-0.255650,-0.619259,-0.484280,0.683535,0.443299,39.90,1
976,275992,166831.0,-2.027135,-1.131890,-1.135194,1.086963,-0.010547,0.423797,3.790880,-1.155595,...,-0.315105,0.575520,0.490842,0.756502,-0.142685,-0.602777,0.508712,-0.091646,634.30,1
977,276071,166883.0,2.091900,-0.757459,-1.192258,-0.755458,-0.620324,-0.322077,-1.082511,0.117200,...,0.288253,0.831939,0.142007,0.592615,-0.196143,-0.136676,0.020182,-0.015470,19.95,1


In [58]:
#concatenar o datafrfame de validação

df_val_total = pd.concat([df_val_nao_fraude, df_val_fraude])
df_val_total.reset_index(inplace=True)
df_val_total_real = df_val_total.Class
df_val_total = df_val_total.drop(['level_0', 'index', 'Time', 'Class'], axis=1)
df_val_total

Unnamed: 0,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
0,-6.282187,3.584511,-3.346462,-0.66837,-1.09695,-1.513805,0.130837,0.646317,4.333338,4.25905,...,0.666125,-0.445666,0.932338,0.209392,-0.135873,-0.089574,-0.388241,-0.144466,-1.01734,3.79
1,-1.186894,0.422237,0.365205,-1.220625,2.8622,3.312624,0.298612,0.647693,-0.363863,0.092985,...,0.164126,-0.381357,-0.94253,0.076356,0.985938,0.376183,0.123704,-0.112295,-0.11552,9.72
2,-1.178157,0.71547,1.423573,1.467534,0.377157,0.222116,0.149876,0.142945,1.261478,0.127816,...,-0.284529,-0.184988,0.075538,0.054361,-0.073493,-0.181383,-0.294114,0.074951,0.169789,9.99
3,1.397116,-0.928326,0.826125,-0.610322,-1.370642,0.035343,-1.272609,0.180214,0.309696,0.489179,...,-0.000424,0.04747,0.166911,-0.10158,-0.453253,0.418099,-0.140156,0.041727,0.016281,13.98
4,-2.624081,2.153765,0.752903,0.234243,-1.195105,1.798252,-3.797852,-6.979159,-0.908887,-2.033295,...,1.802606,-3.528764,2.117063,0.327859,0.248706,-0.463584,0.404743,0.207824,0.184777,29.99
5,-1.927883,1.125653,-4.518331,1.749293,-1.566487,-2.010494,-0.88285,0.697211,-2.064945,-5.587794,...,1.252967,0.778584,-0.319189,0.639419,-0.294885,0.537503,0.788395,0.29268,0.147968,390.0
6,1.378559,1.289381,-5.004247,1.41185,0.442581,-1.326536,-1.41317,0.248525,-1.127396,-3.232153,...,0.226138,0.370612,0.028234,-0.14564,-0.081049,0.521875,0.739467,0.389152,0.186637,0.76
7,-0.676143,1.126366,-2.2137,0.468308,-1.120541,-0.003346,-2.234739,1.210158,-0.65225,-3.463891,...,0.247968,0.751826,0.834108,0.190944,0.03207,-0.739695,0.471111,0.385107,0.194361,77.89
8,-3.113832,0.585864,-5.39973,1.817092,-0.840618,-2.943548,-2.208002,1.058733,-1.632333,-5.245984,...,0.306271,0.583276,-0.269209,-0.456108,-0.183659,-0.328168,0.606116,0.884876,-0.2537,245.0
9,1.991976,0.158476,-2.583441,0.40867,1.151147,-0.096695,0.22305,-0.068384,0.577829,-0.888722,...,-0.017652,-0.16435,-0.295135,-0.072173,-0.450261,0.313267,-0.289617,0.002988,-0.015309,42.53


In [59]:
#verificando distribuição dos dados fraude e não fraude
df.Class.value_counts()

0    487
1    487
Name: Class, dtype: int64

In [60]:
#separando labels e features

X = df.drop(['index', 'Time', 'Class'], axis=1)
Y = df['Class']


In [61]:
print(X,Y)

           V1        V2        V3        V4        V5        V6        V7  \
5    1.812308 -1.004487 -0.605354  0.191686 -0.020857  1.987091 -1.194588   
6   -0.699668  0.975529  0.089807 -0.768471  0.305718 -0.957884  0.856180   
7   -0.762709  1.120886  2.256936  1.129910  0.335899  0.238069  0.584542   
8   -0.973362  1.204274  0.890669 -0.115799  0.155060 -0.545068  0.401889   
9    0.386301 -1.555409  0.207626  0.585236 -1.313316 -0.461351  0.215276   
..        ...       ...       ...       ...       ...       ...       ...   
974 -5.766879 -8.402154  0.056543  6.950983  9.880564 -5.773192 -5.748879   
975 -0.956390  2.361594 -3.171195  1.970759  0.474761 -1.902598 -0.055178   
976 -2.027135 -1.131890 -1.135194  1.086963 -0.010547  0.423797  3.790880   
977  2.091900 -0.757459 -1.192258 -0.755458 -0.620324 -0.322077 -1.082511   
978 -1.374424  2.793185 -4.346572  2.400731 -1.688433  0.111136 -0.922038   

           V8        V9       V10  ...       V20       V21       V22  \
5  

In [62]:
#separando treino e teste

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 42, stratify=Y )

In [65]:
#treinamento

lr = LogisticRegression(max_iter = 1000)
lr.fit(X_train, Y_train)
pred = lr.predict(X_test)
acc = accuracy_score(Y_test, pred)

f'Acurácia: {acc * 100:.2F}%'

'Acurácia: 94.36%'

In [68]:
#Validação

pred = lr.predict(df_val_total)
df = pd.DataFrame({'real': df_val_total_real, 'previsao':pred})
df

Unnamed: 0,real,previsao
0,0,0
1,0,0
2,0,0
3,0,0
4,0,1
5,1,1
6,1,1
7,1,1
8,1,1
9,1,0
