- First step is  regression, then we need to consider if the client will be attending the event and whether the increase in spending was due to the event or due to other factors such as christmas
- Look at the correlation between premium (VIP) status and money spent
- Look at the overall increase in spending after a social celebrity action

## Methodology
Causal ML --> 
First step: Regression model to predict uplift (difference between purchases beforehand and afterwards)
Second Model: Causality on the model feature "Event". Event is a binary feature on whether the person attended the event or not. We check the difference between the people who attended the event versus ones who didn't. We can also try a causal model. 


In [34]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [26]:
df_actions = pd.read_csv("data/actions.csv")
df_clients = pd.read_csv("data/clients.csv")
df_transactions = pd.read_csv("data/transactions.csv")

In [3]:
df_actions["action_type_label"].value_counts()

action_type_label
Collection                 6730
Social Celebrity Action    3176
Business Treatment           93
Lauch                         1
Name: count, dtype: int64

In [4]:
df_actions["action_label"]

0        Exclusive Offer
1        Exclusive Offer
2       Social Gathering
3       Social Gathering
4        Exclusive Offer
              ...       
9995     Exclusive Offer
9996     Exclusive Offer
9997     Exclusive Offer
9998     Exclusive Offer
9999     Exclusive Offer
Name: action_label, Length: 10000, dtype: object

In [5]:
df_clients.isna().sum(), df_clients.shape

(client_id                                      0
 client_country                              9656
 client_gender                              16207
 client_nationality                         16150
 client_city                                18264
 client_segment                                 0
 client_premium_status                          0
 client_is_phone_contactable                    0
 client_is_email_contactable                    0
 client_is_instant_messaging_contactable        0
 client_is_contactable                          0
 dtype: int64,
 (28751, 11))

In [6]:
df_transactions.columns, df_clients.columns, df_actions.columns

(Index(['client_id', 'transaction_id', 'transaction_date', 'product_quantity',
        'gross_amount_euro', 'product_category', 'product_subcategory',
        'product_style'],
       dtype='object'),
 Index(['client_id', 'client_country', 'client_gender', 'client_nationality',
        'client_city', 'client_segment', 'client_premium_status',
        'client_is_phone_contactable', 'client_is_email_contactable',
        'client_is_instant_messaging_contactable', 'client_is_contactable'],
       dtype='object'),
 Index(['action_id', 'action_type_label', 'action_subcategory_label',
        'action_start_date', 'action_year', 'action_end_date',
        'action_collection_year', 'action_collection', 'action_universe',
        'action_category_label', 'action_channel', 'action_label', 'client_id',
        'client_is_present', 'client_is_invited'],
       dtype='object'))

## Transaction data aggregation
For every client, we will have a daily aggregated dataset. For every client, we have a daily spending and quantity
- Feature extracted from transactions:
    - Money spent
    - Items purchased
    - Favourite Product
    - Favourite Style


In [28]:
df_transactions["client_id"].value_counts()

client_id
c81328703    394
c93212715    337
c15284276    241
c27492922    186
c75296385    185
            ... 
c04620334      1
c67451642      1
c16609876      1
c92900730      1
c54482513      1
Name: count, Length: 13884, dtype: int64

In [30]:
df_transactions[["client_id", "transaction_date"]].value_counts()

client_id  transaction_date
c70119140  2021-12-07          40
c15284276  2021-12-05          28
c54934148  2020-08-08          27
           2020-03-12          26
c48728989  2021-10-27          26
                               ..
c38031613  2021-09-22           1
           2021-09-11           1
           2021-09-09           1
           2021-09-07           1
c99997359  2021-11-23           1
Name: count, Length: 34053, dtype: int64

In [31]:
df_transactions.head().style

Unnamed: 0,client_id,transaction_id,transaction_date,product_quantity,gross_amount_euro,product_category,product_subcategory,product_style
0,c17974679,t11288118,2020-07-25,1,3044,Women Ready-to-Wear,Unknown,Casual Style
1,c17974679,t93389142,2020-06-20,1,569,Women Accessory,Unknown,Stylish Fashion
2,c17974679,t62924895,2021-01-13,1,7727,Women Bags,Unknown,Leather
3,c17974679,t93389142,2020-06-20,1,3954,Women Ready-to-Wear,Unknown,Casual Style
4,c17974679,t33368160,2021-05-03,1,1541,Women Ready-to-Wear,bottom segment,Casual Style


In [54]:
    

trans_gr = df_transactions.groupby(["client_id", "transaction_date"]).agg({
    'product_quantity': 'sum',
    'gross_amount_euro':'sum',
    'product_category': [lambda x: x.mode().iloc[0], lambda x: len(x.value_counts())],
    'product_style': [lambda x: x.mode(), lambda x: len(x.value_counts())],
})

trans_gr.columns = ["nr_items_purchased", "money_spent", "favorite_category", "nr_categories_purchased", "favorite_style", "nr_styles_purchased"]
trans_gr = trans_gr.reset_index(drop=False)
trans_gr

Unnamed: 0,client_id,transaction_date,nr_items_purchased,money_spent,favorite_category,nr_categories_purchased,favorite_style,nr_styles_purchased
0,c00029531,2020-09-25,1,546,Women Accessory,1,Stylish Fashion,1
1,c00029531,2020-10-04,1,4652,Women Small Leather Goods,1,Canvas Style,1
2,c00029531,2021-03-15,1,1483,Women Accessory,1,Leather,1
3,c00055636,2018-07-20,1,620,Women Accessory,1,Stylish Fashion,1
4,c00055636,2018-12-19,1,2278,Men Bags,1,Canvas Style,1
...,...,...,...,...,...,...,...,...
34048,c99978675,2018-06-02,1,2471,Women Ready-to-Wear,1,Day Style,1
34049,c99978675,2018-12-07,1,3897,Women Ready-to-Wear,1,Unknown Style,1
34050,c99989096,2020-12-28,1,984,Women Accessory,1,Stylish Fashion,1
34051,c99995560,2018-07-13,1,3926,Women Bags,1,Leather,1


In [None]:
df_transactions.groupby(["client_id", "transaction_date"]).apply()

# Merge

In [12]:
df = pd.merge(df_transactions, df_clients, on="client_id", how="inner")
df = pd.merge(df, df_actions, on="client_id", how="outer")
df

Unnamed: 0,client_id,transaction_id,transaction_date,product_quantity,gross_amount_euro,product_category,product_subcategory,product_style,client_country,client_gender,...,action_year,action_end_date,action_collection_year,action_collection,action_universe,action_category_label,action_channel,action_label,client_is_present,client_is_invited
0,c00015183,,,,,,,,,,...,2021.0,2021-01-31,2021.0,Retail Action,Women's Fashion,Retail,In store,Lunar New Year Celebration,0.0,1.0
1,c00027842,,,,,,,,,,...,2021.0,2021-07-17,2021.0,Fall Collection,Women's Fashion,Retail,,Exclusive Offer,0.0,1.0
2,c00029531,t54964130,2021-03-15,1.0,1483.0,Women Accessory,Unknown,Leather,KR,F,...,2020.0,2020-10-02,2020.0,Winter Collection,Men's Fashion,Retail,In store,Exclusive Offer,0.0,1.0
3,c00029531,t54964130,2021-03-15,1.0,1483.0,Women Accessory,Unknown,Leather,KR,F,...,2021.0,2021-07-17,2021.0,Fall Collection,Women's Fashion,Retail,,Exclusive Offer,0.0,1.0
4,c00029531,t33011136,2020-10-04,1.0,4652.0,Women Small Leather Goods,Unknown,Canvas Style,KR,F,...,2020.0,2020-10-02,2020.0,Winter Collection,Men's Fashion,Retail,In store,Exclusive Offer,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88714,c99986266,,,,,,,,,,...,2021.0,2021-12-09,2021.0,Retail Action,Women's Fashion,Retail,In store,Holiday Celebration,1.0,1.0
88715,c99986588,,,,,,,,,,...,2020.0,2020-05-14,2020.0,ABCDER Collection,Women's Fashion,Retail,In store,Social Gathering,0.0,1.0
88716,c99989096,t27112485,2020-12-28,1.0,984.0,Women Accessory,Unknown,Stylish Fashion,,F,...,,,,,,,,,,
88717,c99995560,t75693702,2018-07-13,1.0,3926.0,Women Bags,Unknown,Leather,IT,F,...,,,,,,,,,,


In [13]:
df.isna().sum()

client_id                                      0
transaction_id                              4929
transaction_date                            4929
product_quantity                            4929
gross_amount_euro                           4929
product_category                            4929
product_subcategory                         4929
product_style                               4939
client_country                              9491
client_gender                              11886
client_nationality                         11721
client_city                                18331
client_segment                              4929
client_premium_status                       4929
client_is_phone_contactable                 4929
client_is_email_contactable                 4929
client_is_instant_messaging_contactable     4929
client_is_contactable                       4929
action_id                                  11489
action_type_label                          11489
action_subcategory_l

In [17]:
df["client_is_invited"] = df["client_is_invited"].fillna(0)
df["client_is_present"] = df["client_is_present"].fillna(0)

In [18]:
df["client_is_present"].value_counts(), df["client_is_invited"].value_counts()

(client_is_present
 1.0    65397
 0.0    23322
 Name: count, dtype: int64,
 client_is_invited
 1.0    77230
 0.0    11489
 Name: count, dtype: int64)

In [20]:
df[["client_is_invited", "client_is_present"]].value_counts()

client_is_invited  client_is_present
1.0                1.0                  65397
                   0.0                  11833
0.0                0.0                  11489
Name: count, dtype: int64

In [21]:
pd.crosstab(df["client_is_invited"], df["client_is_present"])

client_is_present,0.0,1.0
client_is_invited,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,11489,0
1.0,11833,65397


In [35]:
((df["client_is_present"]==1) & (df["client_is_invited"]==1)).sum() / df["client_is_invited"].sum()

0.8467823384695067

# From src

In [4]:
import os
os.chdir("..")
os.getcwd()

'/Users/Dell/Desktop/HEC/Invit-ai-ton_by_Eleven'

In [21]:
from src.preprocessor import get_and_merge_data

In [22]:
PARENT_DIR = "data/"
df = get_and_merge_data(PARENT_DIR)
df

Unnamed: 0,client_id,transaction_id,transaction_date,product_quantity,gross_amount_euro,product_category,product_subcategory,product_style,client_country,client_gender,...,action_year,action_end_date,action_collection_year,action_collection,action_universe,action_category_label,action_channel,action_label,client_is_present,client_is_invited
0,c17974679,t11288118,2020-07-25,1,3044,Women Ready-to-Wear,Unknown,Casual Style,DE,F,...,2020.0,2020-03-09,2020.0,Winter Collection,Women's Fashion,Client,Outside venue,Business Engagement,1.0,1.0
1,c17974679,t93389142,2020-06-20,1,569,Women Accessory,Unknown,Stylish Fashion,DE,F,...,2020.0,2020-03-09,2020.0,Winter Collection,Women's Fashion,Client,Outside venue,Business Engagement,1.0,1.0
2,c17974679,t62924895,2021-01-13,1,7727,Women Bags,Unknown,Leather,DE,F,...,2020.0,2020-03-09,2020.0,Winter Collection,Women's Fashion,Client,Outside venue,Business Engagement,1.0,1.0
3,c17974679,t93389142,2020-06-20,1,3954,Women Ready-to-Wear,Unknown,Casual Style,DE,F,...,2020.0,2020-03-09,2020.0,Winter Collection,Women's Fashion,Client,Outside venue,Business Engagement,1.0,1.0
4,c17974679,t33368160,2021-05-03,1,1541,Women Ready-to-Wear,bottom segment,Casual Style,DE,F,...,2020.0,2020-03-09,2020.0,Winter Collection,Women's Fashion,Client,Outside venue,Business Engagement,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
83785,c36938015,t19758562,2021-07-03,1,2220,Woman Shoes,Unknown,Easy Day Style,DE,M,...,,,,,,,,,,
83786,c19544295,t83687427,2021-12-27,1,6472,Women Bags,Unknown,Leather,JP,,...,,,,,,,,,,
83787,c07447234,t31595753,2021-05-27,1,111,Women Accessory,bottom segment,Fashion Style,US,F,...,,,,,,,,,,
83788,c08145778,t79698899,2021-11-02,1,1220,Women Accessory,Unknown,Stylish Fashion,AU,F,...,,,,,,,,,,


In [16]:
df.isna().sum().to_frame().style

Unnamed: 0,0
client_id,0
transaction_id,4929
transaction_date,4929
product_quantity,4929
gross_amount_euro,4929
product_category,4929
product_subcategory,4929
product_style,4939
client_country,9491
client_gender,11886


In [25]:
df.dtypes.to_frame().style

Unnamed: 0,0
client_id,object
transaction_id,object
transaction_date,object
product_quantity,int64
gross_amount_euro,int64
product_category,object
product_subcategory,object
product_style,object
client_country,object
client_gender,object


In [23]:
df["action_end_date"] = pd.to_datetime(df["action_end_date"])
df["action_start_date"] = pd.to_datetime(df["action_start_date"])

In [60]:
df_actions.groupby("action_id").apply(lambda x: len(x["client_id"].value_counts()))

  df_actions.groupby("action_id").apply(lambda x: len(x["client_id"].value_counts()))


action_id
a000858    76
a004211     1
a005065    10
a005924     1
a006094     3
           ..
a987560    15
a988444     1
a993574     1
a996597     2
a999183     5
Length: 564, dtype: int64

# Recommending Clients who never attended an event before

In [55]:
df

Unnamed: 0,client_id,transaction_id,transaction_date,product_quantity,gross_amount_euro,product_category,product_subcategory,product_style,client_country,client_gender,...,action_year,action_end_date,action_collection_year,action_collection,action_universe,action_category_label,action_channel,action_label,client_is_present,client_is_invited
0,c17974679,t11288118,2020-07-25,1,3044,Women Ready-to-Wear,Unknown,Casual Style,DE,F,...,2020.0,2020-03-09,2020.0,Winter Collection,Women's Fashion,Client,Outside venue,Business Engagement,1.0,1.0
1,c17974679,t93389142,2020-06-20,1,569,Women Accessory,Unknown,Stylish Fashion,DE,F,...,2020.0,2020-03-09,2020.0,Winter Collection,Women's Fashion,Client,Outside venue,Business Engagement,1.0,1.0
2,c17974679,t62924895,2021-01-13,1,7727,Women Bags,Unknown,Leather,DE,F,...,2020.0,2020-03-09,2020.0,Winter Collection,Women's Fashion,Client,Outside venue,Business Engagement,1.0,1.0
3,c17974679,t93389142,2020-06-20,1,3954,Women Ready-to-Wear,Unknown,Casual Style,DE,F,...,2020.0,2020-03-09,2020.0,Winter Collection,Women's Fashion,Client,Outside venue,Business Engagement,1.0,1.0
4,c17974679,t33368160,2021-05-03,1,1541,Women Ready-to-Wear,bottom segment,Casual Style,DE,F,...,2020.0,2020-03-09,2020.0,Winter Collection,Women's Fashion,Client,Outside venue,Business Engagement,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
83785,c36938015,t19758562,2021-07-03,1,2220,Woman Shoes,Unknown,Easy Day Style,DE,M,...,,NaT,,,,,,,,
83786,c19544295,t83687427,2021-12-27,1,6472,Women Bags,Unknown,Leather,JP,,...,,NaT,,,,,,,,
83787,c07447234,t31595753,2021-05-27,1,111,Women Accessory,bottom segment,Fashion Style,US,F,...,,NaT,,,,,,,,
83788,c08145778,t79698899,2021-11-02,1,1220,Women Accessory,Unknown,Stylish Fashion,AU,F,...,,NaT,,,,,,,,
