# Analysis: How does Discover change a customer's purchasing behaviour?

#### Context: 
Previously, visitors of PCSG website may feel overwhelmed/at loss by the sheer amount of products listed. Having a product recommendation system aims to reduce the friction to navigation/purchase, thereby increasing sales.

#### Types of questions to answer:

1. How long does it take customers to buy something after using Discover?
    1. Completely new customers who have not made any purchase
    2. Existing customers who have made purchases before
    
    *Note: we do not have a "control group", because we do not know how long does it/used to take to get make their first product purchase*

#### Steps involved in formatting Discover User Data
Populate a table of emails, keeping the EARLIEST discover attempt of each customer 
Sample dataset of CSV file "discover_user_input_results.csv" is used in this analysis, which is obtained from the partial json file.

       **note to self: since this is a "partial" dataset, I will need to rerun this experiment with the full dataset**

In [7]:
import numpy as np
import pandas as pd
pd.set_option("display.max_columns", 100)

discover_data = pd.read_csv("discover_user_input_results.csv")
discover_data.head()


Unnamed: 0.1,Unnamed: 0,email,sensitivity,skinType,timestamp,concern_0,concern_1,concern_2,concern_3,concern_4,concern_5,concern_6,concern_7,concern_8,concern_9,concern_10,concern_11,concern_12,concern_13,result_0,result_1,result_2,result_3,result_4,result_5,result_6,result_7,result_8,result_9
0,935,jeremy@paulaschoice.sg,False,Combination,"September 9th 2017, 1:56:08 pm",Dehydration,Sun Damage,Men,Combination,,,,,,,,,,,7830,7780,7660,7740,7860,1720,8740,2760,5800,7880
1,936,jeremy@paulaschoice.sg,False,Oily,"September 9th 2017, 1:58:21 pm",Acne,Wrinkles,PIH,Men,Oily,,,,,,,,,,1150,7670,8720,7740,7870,6130,8740,2760,5700,6240
2,937,ck1411@singnet.com.sg,False,Combination,"September 9th 2017, 2:38:56 pm",Enlarged Pores,Sun Damage,Wrinkles,Dehydration,Loss of Firmness,Dullness,Combination,,,,,,,,7830,7780,7820,7740,7870,7760,7690,2760,7900,7960
3,938,jeremy@paulaschoice.sg,False,Oily,"September 9th 2017, 2:51:12 pm",Clogged Pores,Redness,Uneven Texture,Enlarged Pores,Acne,Wrinkles,Dehydration,Sun Damage,PIH,Dullness,Loss of Firmness,Men,Oily,,7830,7670,8720,7740,7870,7800,8740,2760,5700,7730
4,939,starlites18@gmail.com,False,Combination,"September 9th 2017, 2:59:07 pm",Enlarged Pores,Clogged Pores,Acne,Sun Damage,Uneven Texture,Redness,Combination,,,,,,,,6002,1350,2010,7740,7870,6130,7690,2750,5700,7730


In [8]:
discover_first = discover_data.drop_duplicates(subset = ['email'], keep='first').copy()
discover_first['timestamp_proper'] = pd.to_datetime(discover_first['timestamp'].map(lambda x: x.replace('th', "")))
discover_first.drop('timestamp', axis=1,inplace=True)
discover_first.head()
# discover_first.to_csv('discover_first.csv')

Unnamed: 0.1,Unnamed: 0,email,sensitivity,skinType,concern_0,concern_1,concern_2,concern_3,concern_4,concern_5,concern_6,concern_7,concern_8,concern_9,concern_10,concern_11,concern_12,concern_13,result_0,result_1,result_2,result_3,result_4,result_5,result_6,result_7,result_8,result_9,timestamp_proper
0,935,jeremy@paulaschoice.sg,False,Combination,Dehydration,Sun Damage,Men,Combination,,,,,,,,,,,7830,7780,7660,7740,7860,1720,8740,2760,5800,7880,2017-09-09 13:56:08
2,937,ck1411@singnet.com.sg,False,Combination,Enlarged Pores,Sun Damage,Wrinkles,Dehydration,Loss of Firmness,Dullness,Combination,,,,,,,,7830,7780,7820,7740,7870,7760,7690,2760,7900,7960,2017-09-09 14:38:56
4,939,starlites18@gmail.com,False,Combination,Enlarged Pores,Clogged Pores,Acne,Sun Damage,Uneven Texture,Redness,Combination,,,,,,,,6002,1350,2010,7740,7870,6130,7690,2750,5700,7730,2017-09-09 14:59:07
5,940,angelleexinyin@hotmail.com,True,Combination,Acne,Clogged Pores,Uneven Texture,Enlarged Pores,PIH,Dehydration,Sensitivity,Combination,,,,,,,7830,1350,6200,7740,7870,6130,7690,2750,5700,7730,2017-09-09 15:17:38
6,941,leong_lorna@hotmail.com,True,Oily,Clogged Pores,Acne,PIH,Enlarged Pores,Uneven Texture,Sensitivity,Oily,,,,,,,,6002,1350,6200,7740,7870,6130,7690,2750,5700,7730,2017-09-09 15:38:11


### Steps involved in formatting Shopify Sales Data
1. Import Shopify sales_data (dated 2018-02-07) and perform basic formatting 
2. Match the emails available from Discover dataset to sales data, and create a new column called `"discover_first_date"` to indicate if, and when customer first used Discover
    1. Note: doing this excludes all the customers who used Discover but DID NOT make a purchase.
3. Calculate `'used_discover_already'` : whether a transaction is made BEFORE/AFTER the customer has tried Discover.
    1. Note: customers that have not tried Discover *at all* but made a purchase will also show "Not Yet"
    1. Perhaps those who have not tried Discover at all should have a third status ("Not at all")?
4. Calculate `'discover_sales_lead_time'`: time taken between the customer's first attempt of Discover and his/her current transaction.


In [9]:
sales_data = pd.read_csv("shopify_orders_export_20180207.csv", 
                         low_memory=False, 
                         parse_dates=['Paid at', 'Fulfilled at', 'Created at'])

sales_data_clean = sales_data.drop(sales_data.columns.to_series()[-11:-1], axis=1)
sales_data_clean.dropna(subset=['Email'], axis=0, inplace=True)

sales_data_clean['discover_first_date'] = sales_data_clean['Email'].map(discover_first.set_index('email')['timestamp_proper'])
sales_data_clean['used_discover_already'] = (sales_data_clean['Created at']> sales_data_clean['discover_first_date']).map({True: "Used Discover", False: "Not yet"})
sales_data_clean['discover_sales_lead_time'] = sales_data_clean['Created at'] - sales_data_clean['discover_first_date']

sales_data_clean.head()

Unnamed: 0,Name,Email,Financial Status,Paid at,Fulfillment Status,Fulfilled at,Accepts Marketing,Currency,Subtotal,Shipping,Taxes,Total,Discount Code,Discount Amount,Shipping Method,Created at,Lineitem quantity,Lineitem name,Lineitem price,Lineitem compare at price,Lineitem sku,Lineitem requires shipping,Lineitem taxable,Lineitem fulfillment status,Billing Name,Billing Street,Billing Address1,Billing Address2,Billing Company,Billing City,Billing Zip,Billing Province,Billing Country,Billing Phone,Shipping Name,Shipping Street,Shipping Address1,Shipping Address2,Shipping Company,Shipping City,Shipping Zip,Shipping Province,Shipping Country,Shipping Phone,Notes,Note Attributes,Cancelled at,Payment Method,Payment Reference,Refunded Amount,Vendor,Outstanding Balance,Employee,Location,Device ID,Id,Tags,Risk Level,Source,Lineitem discount,Phone,discover_first_date,used_discover_already,discover_sales_lead_time
0,191569914712,gilly.glanville@me.com,paid,2018-02-08 03:59:00,fulfilled,2018-02-08 03:59:01,yes,SGD,290.0,0.0,0.0,290.0,5OFFe53b8c2fb9df,5.0,,2018-02-08 03:59:00,1,Resist Skin Restoring Moisturizer SPF 50 - 60ml,48.0,0.0,7970,True,False,fulfilled,Gilly Glanville,,,,,,,,,,,,,,,,,,,,,,,External Credit,c563530104851.1,0.0,Paula's Choice,0.0,Jeremy Tan,Beauty Collective,9.0,181106500000.0,,Low,pos,0.0,,NaT,Not yet,NaT
1,191569914712,gilly.glanville@me.com,,NaT,,NaT,,,,,,,,,,2018-02-08 03:59:00,1,Resist Optimal Results Hydrating Cleanser - 19...,36.0,0.0,7600,True,False,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,,NaT,Not yet,NaT
2,191569914712,gilly.glanville@me.com,,NaT,,NaT,,,,,,,,,,2018-02-08 03:59:00,1,Resist Advanced Smoothing Treatment 10% AHA (G...,55.0,0.0,7651,True,True,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,,NaT,Not yet,NaT
3,191569914712,gilly.glanville@me.com,,NaT,,NaT,,,,,,,,,,2018-02-08 03:59:00,1,Resist C15 Super Booster - 20 ml,68.0,0.0,7770,True,False,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,,NaT,Not yet,NaT
4,191569914712,gilly.glanville@me.com,,NaT,,NaT,,,,,,,,,,2018-02-08 03:59:00,1,Clinical Ceramide-Enriched Firming Moisturizer,88.0,0.0,2120,True,False,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,,NaT,Not yet,NaT


## Question 1-A: How long does it take for brand new customers to buy something after using Discover?
### First, identify (new) customers who had not made any purchase before Discover launch

1. Create a new column that indicates whether this customer has made any purchase before the launch of Discover (2017-09-09)
2. Create a table containing Emails, First Tried Discover Date and whether they existed before 2017-09-09 (Discover Launch)
    1. *Note: this EXCLUDES those customers who *have not tried Discover*, regardless of whether they have made a purchase before/after the launch of Discover*

1. Create two tables that include transactions before and on/after Discover launch (2017-09-09)

In [10]:
# Create two dataframes that include transactions before and on/after Discover launch (2017-09-09)
# (why?)
pre_discover_sales = sales_data_clean[sales_data_clean['Created at']< "2017-09-09"]
post_discover_sales = sales_data_clean[sales_data_clean['Created at']>= "2017-09-09"]

# Create a new dataframe from post_discover_sales
# that include only new users who took discover
# (why? what is the implication if I DON'T filter for pre-/post-Discover launch transactions?)

only_users_from_post_discover_sales = post_discover_sales[post_discover_sales['Email'].isin(discover_first['email'].unique())]
only_users_from_post_discover_sales

Unnamed: 0,Name,Email,Financial Status,Paid at,Fulfillment Status,Fulfilled at,Accepts Marketing,Currency,Subtotal,Shipping,Taxes,Total,Discount Code,Discount Amount,Shipping Method,Created at,Lineitem quantity,Lineitem name,Lineitem price,Lineitem compare at price,Lineitem sku,Lineitem requires shipping,Lineitem taxable,Lineitem fulfillment status,Billing Name,Billing Street,Billing Address1,Billing Address2,Billing Company,Billing City,Billing Zip,Billing Province,Billing Country,Billing Phone,Shipping Name,Shipping Street,Shipping Address1,Shipping Address2,Shipping Company,Shipping City,Shipping Zip,Shipping Province,Shipping Country,Shipping Phone,Notes,Note Attributes,Cancelled at,Payment Method,Payment Reference,Refunded Amount,Vendor,Outstanding Balance,Employee,Location,Device ID,Id,Tags,Risk Level,Source,Lineitem discount,Phone,discover_first_date,used_discover_already,discover_sales_lead_time
10,191569914707,jglyj82@gmail.com,paid,2018-02-07 13:02:33,unfulfilled,NaT,yes,SGD,82.0,0.0,0.0,82.0,,0.0,Free Delivery (2 Working Days),2018-02-07 13:02:33,1,Skin Balancing Oil-Reducing Cleanser - 237 ml,34.0,0.0,1150,True,False,pending,Joanne Tan,13367 Holly Oak Cir,13367 Holly Oak Cir,,,Cerritos,'90703,CA,US,(562) 404-7973,Joanne Tan,"23 Stevens Drive #03-01, Parc Stevens",23 Stevens Drive #03-01,Parc Stevens,,Singapore,'257914,,SG,9109 2164,,,,Stripe,c561204592659.1,0.0,Paula's Choice,0.0,,,,1.801691e+11,,Low,web,0.0,,2018-02-07 12:54:54,Used Discover,0 days 00:07:39
11,191569914707,jglyj82@gmail.com,,NaT,,NaT,,,,,,,,,,2018-02-07 13:02:33,1,Resist Youth-Extending Daily Hydrating Fluid S...,48.0,0.0,7800,True,False,pending,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,,2018-02-07 12:54:54,Used Discover,0 days 00:07:39
22,191569914-1758,zarr.gyii@gmail.com,paid,2018-02-07 11:16:46,fulfilled,2018-02-07 11:16:46,yes,SGD,29.0,0.0,0.0,29.0,,0.0,,2018-02-07 11:16:45,1,Skin Perfecting 2% BHA (Salicylic Acid) Liquid...,13.0,,2017,True,False,fulfilled,Zarli Aung,,,,,,,,,,,,,,,,,,,,,,,Stripe,c561033281555.1,0.0,Paula's Choice,0.0,Eli Goh,Front Porch,14.0,1.801029e+11,,Low,pos,0.0,,2017-09-26 03:45:14,Used Discover,134 days 07:31:31
23,191569914-1758,zarr.gyii@gmail.com,,NaT,,NaT,,,,,,,,,,2018-02-07 11:16:45,1,Clinical 1% Retinol Treatment - 5 ml,16.0,,8017,True,False,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,,2017-09-26 03:45:14,Used Discover,134 days 07:31:31
29,191569914-1755,ycobonpue@gmail.com,paid,2018-02-07 08:34:54,fulfilled,2018-02-07 08:31:50,yes,SGD,59.0,0.0,0.0,59.0,,0.0,,2018-02-07 08:31:50,1,Resist Omega+ Complex - 30ml,59.0,,2130,True,False,fulfilled,Yvonne Cobonpue,,,,,,,,,,,,,,,,,,,,,,,Gift Card + Stripe,c560797745171.2,0.0,Paula's Choice,0.0,Eli Goh,Front Porch,14.0,1.800194e+11,,Low,pos,0.0,,2017-10-31 06:06:57,Used Discover,99 days 02:24:53
34,191569914693,karenkhor27@gmail.com,paid,2018-02-07 07:15:40,fulfilled,2018-02-07 12:28:57,yes,SGD,171.0,0.0,0.0,171.0,e1567fa9e6db,25.0,Free Delivery (2 Working Days),2018-02-07 07:15:40,1,Clear Acne Regular Strength Cream 2.5% BP - 67 ml,29.0,,6100,True,False,fulfilled,Karen Khor,"08 Commonwealth Lane, #04-04",08 Commonwealth Lane,#04-04,G4S Secure Solutions Singapore Pte. Ltd.,Singapore,'149555,,SG,+6596656577,Karen Khor,"08 Commonwealth Lane, #04-04",08 Commonwealth Lane,#04-04,G4S Secure Solutions Singapore Pte. Ltd.,Singapore,'149555,,SG,+6596656577,,,,Stripe,c560624992275.1,0.0,Paula's Choice,0.0,,,,1.799875e+11,,Low,web,0.0,,2017-09-13 02:03:15,Used Discover,147 days 05:12:25
35,191569914693,karenkhor27@gmail.com,,NaT,,NaT,,,,,,,,,,2018-02-07 07:15:40,1,Calm Sensitive Daytime Moisturizer SPF 30 (Nor...,48.0,,9130,True,False,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,,2017-09-13 02:03:15,Used Discover,147 days 05:12:25
36,191569914693,karenkhor27@gmail.com,,NaT,,NaT,,,,,,,,,,2018-02-07 07:15:40,1,Resist C15 Super Booster - 20 ml,68.0,,7770,True,False,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,,2017-09-13 02:03:15,Used Discover,147 days 05:12:25
37,191569914693,karenkhor27@gmail.com,,NaT,,NaT,,,,,,,,,,2018-02-07 07:15:40,1,Calm Repairing Sensitive Serum - 30 ml,51.0,,3700,True,False,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,,2017-09-13 02:03:15,Used Discover,147 days 05:12:25
42,191569914689,boazruth76@yahoo.com,paid,2018-02-07 05:19:22,fulfilled,2018-02-07 05:19:23,yes,SGD,103.0,0.0,0.0,103.0,5OFF93966be82039,5.0,,2018-02-07 05:19:22,1,Resist Ultra-Light Super Antioxidant Concentra...,55.0,0.0,7740,True,False,fulfilled,Sue Yeo,,,,,,,,,,,,,,,,,,,,,,,External Credit,c560377233427.1,0.0,Paula's Choice,0.0,Jeremy Tan,Beauty Collective,9.0,1.799393e+11,,Low,pos,0.0,,2018-01-24 09:57:32,Used Discover,13 days 19:21:50


In [37]:
sales_data_clean[sales_data_clean['Email'] == 'jovi.kau@gmail.com']

Unnamed: 0,Name,Email,Financial Status,Paid at,Fulfillment Status,Fulfilled at,Accepts Marketing,Currency,Subtotal,Shipping,Taxes,Total,Discount Code,Discount Amount,Shipping Method,Created at,Lineitem quantity,Lineitem name,Lineitem price,Lineitem compare at price,Lineitem sku,Lineitem requires shipping,Lineitem taxable,Lineitem fulfillment status,Billing Name,Billing Street,Billing Address1,Billing Address2,Billing Company,Billing City,Billing Zip,Billing Province,Billing Country,Billing Phone,Shipping Name,Shipping Street,Shipping Address1,Shipping Address2,Shipping Company,Shipping City,Shipping Zip,Shipping Province,Shipping Country,Shipping Phone,Notes,Note Attributes,Cancelled at,Payment Method,Payment Reference,Refunded Amount,Vendor,Outstanding Balance,Employee,Location,Device ID,Id,Tags,Risk Level,Source,Lineitem discount,Phone,discover_first_date,used_discover_already,discover_sales_lead_time
605,191569914409,jovi.kau@gmail.com,paid,2018-01-30 01:51:08,fulfilled,2018-02-01 04:08:04,yes,SGD,91.0,0.0,0.0,91.0,,0.0,Free Delivery (2 Working Days),2018-01-30 01:51:08,1,Clear Acne Extra Strength Exfoliating Treatmen...,43.0,,6210,True,False,fulfilled,Jovi Kau,211 HENDERSON ROAD #09-01,211 HENDERSON ROAD #09-01,,,Singapore,'159552,,SG,8611 4367,Jovi Kau,211 HENDERSON ROAD #09-01,211 HENDERSON ROAD #09-01,,,Singapore,'159552,,SG,8611 4367,,,,Stripe,c533816901651.1,0.0,Paula's Choice,0.0,,,,169644900000.0,,Low,web,0.0,,2017-10-31 05:49:38,Used Discover,90 days 20:01:30
606,191569914409,jovi.kau@gmail.com,,NaT,,NaT,,,,,,,,,,2018-01-30 01:51:08,1,Resist Daily Pore-Refining Treatment 2% BHA (S...,48.0,,7820,True,False,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,,2017-10-31 05:49:38,Used Discover,90 days 20:01:30
4085,191569912733,jovi.kau@gmail.com,paid,2017-12-04 11:24:22,fulfilled,2017-12-04 15:39:46,yes,SGD,82.0,0.0,0.0,82.0,,0.0,Free Delivery (2 Working Days),2017-12-04 11:24:22,2,Clear Acne Regular Strength Exfoliating Treatm...,41.0,,6200,True,False,fulfilled,Jovi Kau,211 Henderson road #09-01,211 Henderson road #09-01,,,Singapore,'159552,,SG,8611 4367,Jovi Kau,211 Henderson road #09-01,211 Henderson road #09-01,,,Singapore,'159552,,SG,8611 4367,,,,Stripe,c195165454355.1,0.0,Paula's Choice,0.0,,,,82206160000.0,,Low,web,0.0,,2017-10-31 05:49:38,Used Discover,34 days 05:34:44
6665,191569911539,jovi.kau@gmail.com,paid,2017-10-31 06:01:38,fulfilled,2017-11-01 07:32:16,yes,SGD,117.0,0.0,0.0,117.0,DSCVR15,15.0,Free Delivery (2 Working Days),2017-10-31 06:01:38,1,Resist Daily Pore-Refining Treatment 2% BHA (S...,48.0,,7820,True,False,fulfilled,Jovi Kau,211 HENDERSON ROAD #09-01,211 HENDERSON ROAD #09-01,,,Singapore,'159552,,SG,8611 4367,Jovi Kau,211 HENDERSON ROAD #09-01,211 HENDERSON ROAD #09-01,,,Singapore,'159552,,SG,8611 4367,,,,Stripe,c33608466451.1,0.0,Paula's Choice,0.0,,,,13000770000.0,,Low,web,0.0,6586114000.0,2017-10-31 05:49:38,Used Discover,0 days 00:12:00
6666,191569911539,jovi.kau@gmail.com,,NaT,,NaT,,,,,,,,,,2017-10-31 06:01:38,1,Clear Acne Body Spray 2% BHA (Salicylic Acid),41.0,,6240,True,False,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,6586114000.0,2017-10-31 05:49:38,Used Discover,0 days 00:12:00
6667,191569911539,jovi.kau@gmail.com,,NaT,,NaT,,,,,,,,,,2017-10-31 06:01:38,1,Clear Acne Extra Strength Exfoliating Treatmen...,43.0,,6210,True,False,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,6586114000.0,2017-10-31 05:49:38,Used Discover,0 days 00:12:00
10160,19156809864,jovi.kau@gmail.com,paid,2017-09-10 14:44:11,fulfilled,2017-09-11 06:52:51,yes,SGD,82.0,0.0,0.0,82.0,,0.0,Free Delivery (2 Working Days),2017-09-10 14:44:10,2,Clear Acne Regular Strength Exfoliating Treatm...,41.0,,6200,True,False,fulfilled,Jovi Kau,211 Henderson road #09-01,211 Henderson road #09-01,,,Singapore,'159552,,SG,8611 4367,Jovi Kau,211 Henderson road #09-01,211 Henderson road #09-01,,,Singapore,'159552,,SG,8611 4367,,,,Stripe,c16773461587.2,0.0,Paula's Choice,0.0,,,,6099986000.0,,Low,web,0.0,,2017-10-31 05:49:38,Not yet,-51 days +08:54:32
11605,19156809177,jovi.kau@gmail.com,paid,2017-08-19 07:42:18,fulfilled,2017-08-21 15:26:25,yes,SGD,89.0,0.0,0.0,89.0,,0.0,Free Delivery (2 Working Days),2017-08-19 07:42:18,1,Clear Acne Regular Strength Exfoliating Treatm...,41.0,,6200,True,False,fulfilled,Jovi Kau,53 commonwealth drive #17-558,53 commonwealth drive #17-558,,,Singapore,'142053,,SG,8611 4367,Jovi Kau,53 commonwealth drive #17-558,53 commonwealth drive #17-558,,,Singapore,'142053,,SG,8611 4367,,,,Stripe,c16608159251.1,0.0,Paula's Choice,0.0,,,,6004308000.0,,Low,web,0.0,,2017-10-31 05:49:38,Not yet,-73 days +01:52:40
11606,19156809177,jovi.kau@gmail.com,,NaT,,NaT,,,,,,,,,,2017-08-19 07:42:18,1,Resist Daily Pore-Refining Treatment 2% BHA (S...,48.0,,7820,True,False,fulfilled,,,,,,,,,,,,,,,,,,,,,,,,,,,Paula's Choice,,,,,,,,,0.0,,2017-10-31 05:49:38,Not yet,-73 days +01:52:40
13558,19156808270,jovi.kau@gmail.com,paid,2017-07-23 06:23:35,fulfilled,2017-07-24 08:02:16,yes,SGD,82.0,0.0,0.0,82.0,,0.0,Free Delivery (2 Working Days),2017-07-23 06:23:34,1,Clear Acne Body Spray 2% BHA (Salicylic Acid),41.0,,6240,True,False,fulfilled,JOVI Kau,211 Henderson Road #09-01,211 Henderson Road #09-01,,,Singapore,'159552,,SG,8611 4367,JOVI Kau,211 Henderson Road #09-01,211 Henderson Road #09-01,,,Singapore,'159552,,SG,8611 4367,,,,Stripe,c16424178643.1,0.0,Paula's Choice,0.0,,,,5848871000.0,,Low,web,0.0,,2017-10-31 05:49:38,Not yet,-100 days +00:33:56


### Filter for transactions made within a day of using Discover (for the first time)
Into a table called `same_day_purchase`

In [43]:
'''
comparison_table = sales_data_clean[['Email', 'Created at', 'discover_first_date', 'used_discover_already']]
comparison_table['discover_sales_lead_time'] = comparison_table['Created at'] - comparison_table['discover_first_date']
'''
# Filter for transactions that were made within a day of attempting Discover
same_day_purchase = sales_data_clean[(sales_data_clean['discover_sales_lead_time'] <= "1 days") & (sales_data_clean['discover_sales_lead_time'] > "0 days")]
same_day_purchases_count = len(same_day_purchase['Email'].unique())

# Perform a count and percentage of customer analysis 
post_discover_launch_customer_count = len(post_discover_sales['Email'].unique())
percentage_same_day_purchase = same_day_purchases_count/post_discover_launch_customer_count
print("same_day_purchases_count:", same_day_purchases_count)
print("post_discover_launch_customer_count:", post_discover_launch_customer_count)
print("percentage_same_day_purchase: {:.2f}%".format(100*percentage_same_day_purchase))

# same_day_purchase.to_excel("same_day_purchase.xlsx")



same_day_purchases_count: 335
post_discover_launch_customer_count: 2847
percentage_same_day_purchase: 11.77%


# 12% of customers made a purchase within a day after using Discover!

## How long does it take for brand new customers to buy something after using Discover?
### Brand new customers = customers who had not purchased anything pre-Discover launch
There will be two types of post-launch new customers: 
1. Those who made their first purchase after trying out Discover
2. Those who made their first purchase WITHOUT trying out Discover 
    2. (either because they didn't bother, or didn't know of its existence)
3. Those who made their first purchase WITHOUT trying out Discover (i.e. point 2, but subsequently tried Discover)
    3. This group is currently ignored

In [53]:
# Create a new dataframe containing UNIQUE emails from post_discover_sales
post_launch_emails = pd.DataFrame(post_discover_sales['Email'].unique(), 
                                  columns=["Post Launch Emails"])

# Test if these emails are also found in pre_discover_sales email list
post_launch_emails['Exist before launch?'] = post_launch_emails.isin(pre_discover_sales['Email'].unique())

# Merge (aka Join) Discover Data with emails in post_launch_emails
# and create columns to indicate whether and when did the email tried discover
post_launch_emails = post_launch_emails.merge(right=discover_first[['email', 'timestamp_proper']], 
                                              left_on='Post Launch Emails', 
                                              right_on='email')

# Delete the extra email column created after joining
post_launch_emails = post_launch_emails.drop('email', axis=1)

# Rename the column into something more elucidatory
post_launch_emails = post_launch_emails.rename(index=str, columns={"timestamp_proper": "First Tried Discover"})

# get first transaction date of each of the "new" customers
# Remember that sales_data is reverse-chronologically indexed
post_launch_first_purchase = post_discover_sales[['Email', 'Created at']].drop_duplicates(subset='Email', keep='last')
print("Number of rows:", len(post_launch_first_purchase))
post_launch_first_purchase

post_launch_emails = post_launch_emails.merge(post_launch_first_purchase, 
                                              left_on="Post Launch Emails", 
                                              right_on="Email")
post_launch_emails = post_launch_emails.drop('Email', axis=1).rename(index=str, 
                                                                     columns={"Created at": "First Transaction Post Launch"})

post_launch_emails['Time To Buy'] = post_launch_emails['First Transaction Post Launch'] - post_launch_emails['First Tried Discover']

# Filter for purchases made in 1 day (if necessary)
# post_launch_emails = post_launch_emails[(post_launch_emails['Time To Buy']> "0 days") & (post_launch_emails['Exist before launch?'] == False)]

# Lead time to purchase for new customers after taking Discover
print(post_launch_emails[(post_launch_emails['Time To Buy'] > '0 days') & 
                         (post_launch_emails['Exist before launch?'] == False)]['Time To Buy'].describe())
print('Top 85% of time to buy:', post_launch_emails['Time To Buy'].quantile(0.85))

post_launch_emails

Number of rows: 2847
count                        346
mean     10 days 03:05:21.540462
std      21 days 16:32:44.176929
min              0 days 00:01:22
25%       0 days 00:19:19.500000
50%       0 days 09:24:47.500000
75%       9 days 00:29:25.250000
max            139 days 05:01:07
Name: Time To Buy, dtype: object
Top 85% of time to buy: 25 days 16:44:05.549999


Unnamed: 0,Post Launch Emails,Exist before launch?,First Tried Discover,First Transaction Post Launch,Time To Buy
0,jglyj82@gmail.com,False,2018-02-07 12:54:54,2018-02-07 13:02:33,0 days 00:07:39
1,zarr.gyii@gmail.com,True,2017-09-26 03:45:14,2017-09-26 05:10:37,0 days 01:25:23
2,ycobonpue@gmail.com,True,2017-10-31 06:06:57,2017-09-19 06:46:41,-42 days +00:39:44
3,karenkhor27@gmail.com,True,2017-09-13 02:03:15,2017-09-18 07:12:43,5 days 05:09:28
4,boazruth76@yahoo.com,False,2018-01-24 09:57:32,2018-01-11 04:18:16,-14 days +18:20:44
5,bbggf.0901@gmail.com,True,2017-10-27 17:38:14,2017-10-28 16:57:08,0 days 23:18:54
6,jeronblahblah@gmail.com,True,2017-09-17 05:58:14,2017-10-21 10:10:01,34 days 04:11:47
7,ruienseah@gmail.com,False,2018-02-06 08:16:14,2018-02-06 08:29:58,0 days 00:13:44
8,qingshuang111@gmail.com,False,2017-10-01 15:37:11,2017-10-07 07:22:03,5 days 15:44:52
9,roseleenlua@gmail.com,True,2018-01-16 05:28:25,2018-01-17 05:26:42,0 days 23:58:17


## 50% of our new customers made a purchase within 10 hours after trying out Discover!


In [38]:
# post_launch_emails[post_launch_emails['Post Launch Emails'] == 'jovi.kau@gmail.com']

Unnamed: 0,Post Launch Emails,Exist before launch?,First Tried Discover,First Transaction Post Launch,Time To Buy
66,jovi.kau@gmail.com,True,2017-10-31 05:49:38,2017-09-10 14:44:10,-51 days +08:54:32


From the calculation above, we know that: 
1. there are 2847 first-time purchases since Discover was launched.
2. on average, it takes almost **7 days** for a customer to make a purchase after using Discover
3. 50% of transactions occured since Discover was launched occured within **5 hours** of trying out Discover

#### Obtain a list of unique emails of NEW customers who made their first purchase *without* taking Discover

In [40]:
new_cust_without_discover = post_launch_emails[(post_launch_emails['Time To Buy'] < pd.Timedelta('00:00:00')) &
                                               (post_launch_emails['Exist before launch?'] == False)]
new_cust_without_discover

Unnamed: 0,Post Launch Emails,Exist before launch?,First Tried Discover,First Transaction Post Launch,Time To Buy
4,boazruth76@yahoo.com,False,2018-01-24 09:57:32,2018-01-11 04:18:16,-14 days +18:20:44
11,parsheechandnani@gmail.com,False,2017-10-31 06:32:51,2017-10-17 04:23:55,-15 days +21:51:04
34,trac_tan@yahoo.com.sg,False,2018-01-08 16:16:34,2018-01-06 11:58:38,-3 days +19:42:04
43,weejingshan@mail.com,False,2017-10-30 17:42:25,2017-10-15 16:30:02,-16 days +22:47:37
48,jeenaujames@gmail.com,False,2018-01-23 08:07:22,2017-11-11 08:47:29,-73 days +00:40:07
87,eeliew@rocketmail.com,False,2017-10-24 13:45:18,2017-10-18 09:28:27,-7 days +19:43:09
88,lilyloh26@gmail.com,False,2018-01-11 13:26:06,2018-01-09 14:45:21,-2 days +01:19:15
93,norainialwi@yahoo.com,False,2018-01-03 00:20:53,2017-12-20 09:41:08,-14 days +09:20:15
96,huiting.chen@yahoo.com,False,2018-02-04 09:56:00,2018-01-24 16:37:26,-11 days +06:41:26
102,liewjas@yahoo.com.sg,False,2018-01-23 07:03:59,2018-01-23 06:17:41,-1 days +23:13:42


In [36]:

# Obtain a list of new customers who had made their first purchase post-launch without using Discover
new_cust_without_discover = post_launch_emails[(post_launch_emails['Time To Buy'] < pd.Timedelta('00:00:00')) & 
                                               (post_launch_emails['Exist before launch?'] == False)] 
                                                #['Post Launch Emails'].unique()

post_discover_sales[~post_discover_sales['Email'].isin(new_cust_without_discover['Post Launch Emails'].unique())]

# post_launch_emails was already defined earlier, and have not been altered in any way
first_sales_without_discover = post_discover_sales[~post_launch_emails.isin(pre_discover_sales['Email'].unique())]
first_sales_without_discover = first_sales_without_discover.dropna()
first_sales_without_discover

2              ycobonpue@gmail.com
4             boazruth76@yahoo.com
11      parsheechandnani@gmail.com
14      shireengill.1991@gmail.com
19           mirandalee88@live.com
23               manasa@luxola.com
26         jessiejiaqiye@gmail.com
33        lim.veronica21@yahoo.com
34           trac_tan@yahoo.com.sg
40          rinadarman08@gmail.com
41            ancarujoiu@yahoo.com
43            weejingshan@mail.com
45         Jennyann57@yahoo.com.sg
48           jeenaujames@gmail.com
49           ana_adl27@hotmail.com
53        faithyeoh.gw95@gmail.com
61            Xjxmandy@hotmail.com
66              jovi.kau@gmail.com
74            zhujin0928@gmail.com
75          yashoda_20@hotmail.com
79            jannawqe@hotmail.com
87           eeliew@rocketmail.com
88             lilyloh26@gmail.com
89         ivyaustasia@hotmail.com
93           norainialwi@yahoo.com
95           hayatiabdul@gmail.com
96          huiting.chen@yahoo.com
98           artspassion@gmail.com
101         seankais

In [33]:
type(new_cust_without_discover['Time To Buy'][0])

pandas._libs.tslib.Timedelta

In [22]:
new_cust_without_discover = post_launch_emails[post_launch_emails['Time To Buy'] < pd.Timedelta('00:00:00')]
new_cust_without_discover

Unnamed: 0,Post Launch Emails,Exist before launch?,First Tried Discover,First Transaction Post Launch,Time To Buy
2,ycobonpue@gmail.com,True,2017-10-31 06:06:57,2017-09-19 06:46:41,-42 days +00:39:44
4,boazruth76@yahoo.com,False,2018-01-24 09:57:32,2018-01-11 04:18:16,-14 days +18:20:44
11,parsheechandnani@gmail.com,False,2017-10-31 06:32:51,2017-10-17 04:23:55,-15 days +21:51:04
14,shireengill.1991@gmail.com,True,2018-02-05 14:31:05,2017-09-30 16:08:33,-128 days +01:37:28
19,mirandalee88@live.com,True,2017-11-01 02:48:44,2017-09-10 04:32:15,-52 days +01:43:31
23,manasa@luxola.com,True,2017-10-07 07:47:14,2017-09-12 06:18:06,-26 days +22:30:52
26,jessiejiaqiye@gmail.com,True,2017-10-23 10:32:45,2017-10-11 15:52:14,-12 days +05:19:29
33,lim.veronica21@yahoo.com,True,2017-09-10 06:43:12,2017-09-10 03:01:37,-1 days +20:18:25
34,trac_tan@yahoo.com.sg,False,2018-01-08 16:16:34,2018-01-06 11:58:38,-3 days +19:42:04
40,rinadarman08@gmail.com,True,2017-10-31 13:36:28,2017-09-19 05:44:48,-43 days +16:08:20


## Average Order Value of _New_ Customers who made their first purchase after taking Discover

In [22]:
new_discover_sale_transaction = post_discover_sales[post_discover_sales['Email'].isin(new_customers_test['Post Launch Emails'].unique())]
# new_discover_sale_transaction.groupby(by=['Email', 'Created at'])['Total'].sum()
new_discover_sale_transaction['Total'].mean()

92.17494824016563

## Average Order Value of _New_ Customers who made their first purchase _without_ taking Discover

In [23]:
new_non_discover_sale_transaction = post_discover_sales[~post_discover_sales['Email'].isin(new_customers_test['Post Launch Emails'].unique())]
new_non_discover_sale_transaction['Total'].mean()

105.15647869674186

# Comparison

In [24]:
table_to_compare = pivot_discover_first
source_table_to_compare = sales_data_clean

In [25]:
multiindex_output = table_to_compare.set_index(keys=['email', 'SKU'])
multiindex_output

Unnamed: 0_level_0,Unnamed: 1_level_0,step_no
email,SKU,Unnamed: 2_level_1
007lavender@gmail.com,6002,result_0
007lavender@gmail.com,7670,result_1
007lavender@gmail.com,6200,result_2
007lavender@gmail.com,7740,result_3
007lavender@gmail.com,7870,result_4
007lavender@gmail.com,6130,result_5
007lavender@gmail.com,7690,result_6
007lavender@gmail.com,2750,result_7
007lavender@gmail.com,5700,result_8
007lavender@gmail.com,6240,result_9


In [26]:
check_presence = multiindex_output.isin(source_table_to_compare[['Email', 'Lineitem sku']])
check_presence[check_presence['step_no'] == True]

Unnamed: 0_level_0,Unnamed: 1_level_0,step_no
email,SKU,Unnamed: 2_level_1


### Following code aims to examine the proportion of SKUs *actually* recommended to the customers
#### This helps to examine the trend of skin type (better to get from raw data for this)
"unpivot" ("melt" in Pandas term) in discover_first[result_SKUs].
### The objective is to allow for easier indexing to search for SKUs in sales_data_clean

In [10]:
# pivot_discover_first = discover_first.unstack(index='email', columns = ['result_0', 'result_1',
#       'result_2', 'result_3', 'result_4', 'result_5', 'result_6', 'result_7',
#       'result_8', 'result_9'])
pivot_discover_first = pd.melt(discover_first, 
                               id_vars='email',
                               value_vars=list(discover_first.columns[-11:-1]),
                               var_name='step_no',
                               value_name= 'SKU')

In [11]:
pivot_discover_first.sort_values(['email', 'step_no'], inplace=True)
# pivot_discover_first.to_excel("pivot_discover_first.xlsx")
# pivot_discover_first
discover_first.columns[-10:-1]
discover_first[discover_first['email'] == "yuleandra21@gmail.com"]
# same_day_purchase.groupby(by=['Email','Lineitem sku'])['Lineitem sku'].count()
#pivot_discover_first['Purchased?'] = pivot_discover_first['email'].map(same_day_purchase.set_index(['Email', 'Lineitem sku'])[])

'''Faced issues creating a frequency plot for SKUs under each email based on sales data
Abandoned using jupyter, tried on excel instead, also gave up subsequently
(Might need to use for loop with two filter criteria, but could not figure out how)'''