# CCFDetector: Using ML against Credit Card Frauds

L'obiettivo di questo progetto, realizzato per l'esame di Fondamenti di Intelligenza Artificiale presso l'Univeristà degli Studi di Salerno, è quello di realizzare un sistema di Machine Learning per l'individuazione di transazioni fraudolente relative ai pagamenti elettronici, cioè tutte quelle transazioni effettuate da carte di credito senza l'autorizzazione dei possessori .
## Setup del progetto e Data Understanding
---
### Inizio ad importare le librerie necessarie

In [17]:
# import the necessary packages 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 
from matplotlib import gridspec

### Carico il dataset
Il dataset contiene più di 20 milioni di righe, dunque, per motivi legati alla capacità dell'hardware, ne andrò a selezionare 300.000.

In [18]:
df = pd.read_csv('credit_card_transactions-ibm_v2.csv').sample(n=300000, random_state=42)
df.head(10)

Unnamed: 0,User,Card,Year,Month,Day,Time,Amount,Use Chip,Merchant Name,Merchant City,Merchant State,Zip,MCC,Errors?,Is Fraud?
18199893,1470,0,2019,7,10,00:11,$59.18,Chip Transaction,-6853385250336487907,Harwood,MD,20776.0,5813,,No
9731325,822,1,2019,1,14,22:12,$280.91,Online Transaction,4241336128694185533,ONLINE,,,4814,,No
536687,41,3,2010,3,15,07:07,$-144.00,Swipe Transaction,190253443608377572,Hemet,CA,92543.0,3359,,No
13223840,1084,0,2015,9,20,14:58,$6.76,Chip Transaction,-7837310524365334241,Littleton,CO,80122.0,5300,,No
17070521,1384,0,2014,10,12,11:44,$9.17,Swipe Transaction,-5023497618971072366,Gardner,KS,66030.0,5812,,No
792843,55,3,2006,6,10,09:15,$1.36,Swipe Transaction,-6571010470072147219,Rego Park,NY,11374.0,5499,,No
8966297,776,0,2007,2,25,22:31,$97.81,Swipe Transaction,-6974082828836151610,Milwaukee,WI,53224.0,4900,,No
895801,66,0,2007,8,21,13:02,$23.25,Swipe Transaction,3675785629314646441,Gonzales,TX,78629.0,7349,,No
22902727,1880,2,2010,12,31,01:06,$486.70,Swipe Transaction,-3398248499422470718,Atlantic City,NJ,8401.0,7995,,No
5845532,490,5,2015,11,28,13:45,$92.67,Chip Transaction,4722913068560264812,Pompano Beach,FL,33063.0,5411,,No


### Descrizione del dataset


In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 300000 entries, 18199893 to 18106096
Data columns (total 15 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   User            300000 non-null  int64  
 1   Card            300000 non-null  int64  
 2   Year            300000 non-null  int64  
 3   Month           300000 non-null  int64  
 4   Day             300000 non-null  int64  
 5   Time            300000 non-null  object 
 6   Amount          300000 non-null  object 
 7   Use Chip        300000 non-null  object 
 8   Merchant Name   300000 non-null  int64  
 9   Merchant City   300000 non-null  object 
 10  Merchant State  266377 non-null  object 
 11  Zip             264443 non-null  float64
 12  MCC             300000 non-null  int64  
 13  Errors?         4778 non-null    object 
 14  Is Fraud?       300000 non-null  object 
dtypes: float64(1), int64(7), object(7)
memory usage: 36.6+ MB


La variabile target, dunque quella che il modello dovrà predire, è rappresentata dalla colonna **Is Fraud?**, contenente valori non interi. Per questioni di praticità, vado a creare una nuova colonna chiamata **Fraud**, nella quale mapperò i valori della colonna originale che hanno valore 'Yes' a 1 e quelli che hanno valore 'No' a 0. Infine, rimuovo la colonna originale.

In [20]:
# Creo una nuova colonna 'Fraud' e le assegno i valori di 'Is Fraud?' convertiti in 0 e 1
df['Fraud'] = df['Is Fraud?'].map({'Yes': 1, 'No': 0})

# Elimina la colonna 'Is Fraud?'
df = df.drop('Is Fraud?', axis=1)
df.head(10)

Unnamed: 0,User,Card,Year,Month,Day,Time,Amount,Use Chip,Merchant Name,Merchant City,Merchant State,Zip,MCC,Errors?,Fraud
18199893,1470,0,2019,7,10,00:11,$59.18,Chip Transaction,-6853385250336487907,Harwood,MD,20776.0,5813,,0
9731325,822,1,2019,1,14,22:12,$280.91,Online Transaction,4241336128694185533,ONLINE,,,4814,,0
536687,41,3,2010,3,15,07:07,$-144.00,Swipe Transaction,190253443608377572,Hemet,CA,92543.0,3359,,0
13223840,1084,0,2015,9,20,14:58,$6.76,Chip Transaction,-7837310524365334241,Littleton,CO,80122.0,5300,,0
17070521,1384,0,2014,10,12,11:44,$9.17,Swipe Transaction,-5023497618971072366,Gardner,KS,66030.0,5812,,0
792843,55,3,2006,6,10,09:15,$1.36,Swipe Transaction,-6571010470072147219,Rego Park,NY,11374.0,5499,,0
8966297,776,0,2007,2,25,22:31,$97.81,Swipe Transaction,-6974082828836151610,Milwaukee,WI,53224.0,4900,,0
895801,66,0,2007,8,21,13:02,$23.25,Swipe Transaction,3675785629314646441,Gonzales,TX,78629.0,7349,,0
22902727,1880,2,2010,12,31,01:06,$486.70,Swipe Transaction,-3398248499422470718,Atlantic City,NJ,8401.0,7995,,0
5845532,490,5,2015,11,28,13:45,$92.67,Chip Transaction,4722913068560264812,Pompano Beach,FL,33063.0,5411,,0
