# Dispute - work group
### Individuazione automatica transazioni "safe" e transazioni da controllare

#### Recommender system
Un sistema di raccomandazione (recommender system) è in grado di individuare preferenze e profili di consumo di cardholder verso merchant selezionati.

#### Scoring
Nel sistema proposto, ciascuna carta è associata ad una lista di <i>score</i>, che rappresenta il gradimento del cardholder nei confronti di ciascuno dei merchant nel perimetro di analisi. Più è alto lo score, più il cardholder è propenso a transare verso quell'esercente. Di conseguenza, è possibile stilare un ranking per ciascun cardholder con i merchant meno affini alle proprie abitudini di consumo. 

#### Proposta
Quando vengono effettuate transazioni, al successivo training, il sistema fa inferenza su un coefficiente di affinità (i.e. la prediction puntuale del modello). Tale score serve alla personalizzazione dell'esperienza utente nel sistema proposto.

## Un esempio

A fini di esempio, prediamo i dati di carte a Milano che transano presso merchant ecommerce, per un totale di 10000 interazioni.

Il modello verrà allenato a riconoscere le preferenze dei cardholder in base alle loro spese pregresse ed alla similarità con altri cardholder.

In [58]:
# !pip install pyathena
# !pip install kneed

In [3]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from esbmr2 import esbmr2
from hpf_vi import hpf_vi
from pyathena import connect
import boto3
import plotly.express as px
import plotly.offline as py
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt
import datetime
from IPython.display import display, HTML
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from importlib import reload
from esbmr2 import esbmr2

# %% Importing custom modules
import train_test_split
import data_processing
import scoring
from CF import CF # importing the class for the heuristics

In [4]:
import pandas as pd
import numpy as np
import time
from functools import partial
import multiprocessing
from importlib import reload
# Custom modules
import multiprocessing_functions
reload(multiprocessing_functions)
from CF import CF

In [5]:
s3 = boto3.resource('s3')
s3_client = boto3.client('s3')
STAGING_BUCKET = 'nexi-tmp-athena-staging'
STAGING_AREA = 'tmp-athena-queries'
REGION_NAME = 'eu-west-1'
cursor = connect(s3_staging_dir="s3://{}/{}".format(STAGING_BUCKET,STAGING_AREA),
                 region_name=REGION_NAME,
                 work_group='WKGP-SM-DATA'
                ).cursor()
def runQueryFromString(sqlQuery):
    cursor.execute(sqlQuery)
    resultPath = cursor.output_location.replace('s3://','').split('/')
    try:
        obj = s3_client.get_object(Bucket=resultPath[0], Key=resultPath[1] + '/' + resultPath[2])
    except:
        return None
    return pd.read_csv(obj['Body'])

### Training del modello

In [11]:
df =  runQueryFromString("""
SELECT * FROM merch_recom.interazioni_milano_2020
""")

In [19]:
df = df[:10000]

In [20]:
expl, id_car, nm_nome_cleaned = data_processing.cross_tab(df)
train, test = train_test_split.train_val_split(expl)

In [21]:
expl.shape

(6078, 719)

In [57]:
# model = hpf_vi()
# model.fit(train,100)

In [23]:
model.predicted = model.predicted / model.predicted.max()

In [24]:
predicted_interactions = pd.DataFrame(model.predicted)

In [25]:
predicted_interactions.columns = nm_nome_cleaned

In [26]:
predicted_interactions.index = id_car

### Output

L'output del modello è una matrice di scores.

In [27]:
predicted_interactions

Unnamed: 0,ABBIGLIAMEN,AC MILAN,ACQUISTO,ACQUISTO ANTEO,ACQUISTO ONLINE,ADIDASITALY,ADSUPERCARS,AGOSDU,ALDO COPPOLA,ALESSI,...,WYCONCOSMETICSCOM,XENIA,YAMAMAY,YAP RICARICA,YAP RICARICA CLICK,YELLOWTAXMU,YOUFIT,YVES ROCHER,ZARA,ZERO ITALY
B6DD772CF0D9808FA6BBC30D63DD17A3,0.000008,0.000082,0.000016,0.000011,0.000007,0.000093,0.000007,0.000007,0.000008,0.000008,...,0.000008,0.000040,0.000022,0.000026,0.000011,0.000014,0.000006,0.000008,0.000734,0.000007
B6DD77E990B366D9E5B924BDFC1FEC32,0.000030,0.000188,0.000033,0.000039,0.000033,0.000211,0.000032,0.000031,0.000033,0.000034,...,0.000033,0.000089,0.000059,0.000091,0.000041,0.000033,0.000030,0.000035,0.000239,0.000032
B6DD87DB2A0A0AC093051BE0C0F76131,0.000005,0.000071,0.000007,0.000010,0.000006,0.000081,0.000006,0.000006,0.000041,0.000007,...,0.000041,0.000035,0.000020,0.000010,0.000010,0.000007,0.000012,0.000007,0.000165,0.000006
B6DD97CC62F05D4B340BBFEA2B19F1B8,0.000003,0.000056,0.000005,0.000007,0.000004,0.000064,0.000004,0.000004,0.000013,0.000004,...,0.000013,0.000028,0.000015,0.000007,0.000007,0.000004,0.000005,0.000005,0.000148,0.000004
B6DD9F7DCEF9748C813B166645EF6795,0.000010,0.000107,0.000012,0.000016,0.000012,0.000121,0.000011,0.000011,0.000031,0.000012,...,0.000031,0.000051,0.000030,0.000017,0.000017,0.000012,0.000014,0.000013,0.000193,0.000012
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
F4D45ABF688893E3666B46C305C5B912,0.000004,0.000059,0.000005,0.000007,0.000004,0.000067,0.000004,0.000004,0.000005,0.000005,...,0.000005,0.000029,0.000016,0.000008,0.000007,0.000005,0.000003,0.000005,0.000151,0.000004
F4D462D0304A867F6906298C8EC77733,0.000005,0.000058,0.000012,0.000007,0.000004,0.000067,0.000004,0.000004,0.000005,0.000005,...,0.000005,0.000029,0.000016,0.000008,0.000007,0.000010,0.000003,0.000005,0.000643,0.000004
F4D463B0BD74735677368D0F26463861,0.000018,0.000089,0.000052,0.000013,0.000009,0.000101,0.000008,0.000008,0.000019,0.000009,...,0.000019,0.000043,0.000025,0.000014,0.000013,0.000040,0.000009,0.000010,0.003065,0.000009
F4D465E14137E8A491D4975A1BC4D1BB,0.000003,0.000056,0.000005,0.000007,0.000004,0.000064,0.000004,0.000004,0.000013,0.000004,...,0.000013,0.000028,0.000015,0.000007,0.000007,0.000004,0.000005,0.000005,0.000148,0.000004


### Esempi

Una volta allenato il modello, abbiamo a disposizione un vettore di score per ciascuna carta in analisi.

Ad esempio, per la prima carta nel dataset, abbiamo i seguenti score:

In [44]:
pr_int = predicted_interactions.iloc[0,:][predicted_interactions.iloc[0,:]>0.00005] * 1000

In [49]:
fig = go.Figure(data=[
    go.Bar(name='Merchant', x=list(pr_int.index), y=list(pr_int)),
])

fig.update_layout(barmode = 'stack',
                  title='Coefficienti di affinità (score) su merchant e-commerce',
                  yaxis_title='Score',
                  xaxis_title='Merchant')
#fig.show(renderer="notebook")
py.iplot(fig)


### Un altro esempio, su un altro cardholder

Questa volta analizziamo gli score di una carta particolarmente attiva in spese farmaceutiche:

In [50]:
pr_int_2 = predicted_interactions.iloc[10,:][predicted_interactions.iloc[0,:]>0.00005] * 1000

In [52]:
fig = go.Figure(data=[
    go.Bar(name='Merchant', x=list(pr_int_2.index), y=list(pr_int_2)),
])

fig.update_layout(barmode = 'stack',
                  title='Coefficienti di affinità (score) su merchant e-commerce',
                  yaxis_title='Score',
                  xaxis_title='Merchant')
#fig.show(renderer="notebook")
py.iplot(fig)

### Quali sono i merchant su cui è meno probabile che la carta transi?

In altre parole, quali sono i merchant meno affini ad una carta?

In [53]:
def worst_five_merchant(card):
    sort = predicted_interactions.loc[card,:].sort_values(inplace=False)
    return sort[:5]

In [55]:
worst_five_merchant('F4D45ABF688893E3666B46C305C5B912')

EDICOLALAFO EB                 0.000003
HOMOBILEIT RICARICA CMILANO    0.000003
ONLYBOX EB                     0.000003
VALETATO EBA                   0.000003
BODYMELODY                     0.000003
Name: F4D45ABF688893E3666B46C305C5B912, dtype: float64

In [59]:
!jupyter nbconvert 01-prj-dispute.ipynb --to html --no-input

[NbConvertApp] Converting notebook 01-prj-dispute.ipynb to html
[NbConvertApp] Writing 10701555 bytes to 01-prj-dispute.html
