# Key Performance Indicators: Measuring Business Success

## A/B Testing

- **A/B Testing:** Test different ideas against each other in the real world
- Choose the one that statistically performs better

**Why A/B testing is important**
- No guessing
- Provides answers quickly

**A/B Testing Process**
1. Develop an hypothesis about your product or business
2. **Randomly** assign users to two different groups
3. Expose:
    - Group 1 to the current product rules
    - Group 2 to a product that test the hypothesis
4. Pick whichever performs better according to a set of KPI's


In [9]:
import pandas as pd

customer_data = pd.read_csv("./datasets/user_demographics_v1.csv")
app_purchases = pd.read_csv("./datasets/purchase_data_v1.csv")


print(customer_data.columns)
print(app_purchases.columns)

Index(['uid', 'reg_date', 'device', 'gender', 'country', 'age'], dtype='object')
Index(['date', 'uid', 'sku', 'price'], dtype='object')


**Merging Mechanics**

In [11]:
uid_combined_data = app_purchases.merge(
                    #right dataframe
                    customer_data,
                    #join type
                    how = 'inner',
                    #columns to match
                    on = ['uid'])

print(uid_combined_data.head())

         date       uid            sku  price              reg_date device  \
0  2017-07-10  41195147  sku_three_499    499  2017-06-26T00:00:00Z    and   
1  2017-07-15  41195147  sku_three_499    499  2017-06-26T00:00:00Z    and   
2  2017-11-12  41195147   sku_four_599    599  2017-06-26T00:00:00Z    and   
3  2017-09-26  91591874    sku_two_299    299  2017-01-05T00:00:00Z    and   
4  2017-12-01  91591874   sku_four_599    599  2017-01-05T00:00:00Z    and   

  gender country  age  
0      M     BRA   17  
1      M     BRA   17  
2      M     BRA   17  
3      M     TUR   17  
4      M     TUR   17  


In [None]:
uid_date_combined_data = app_purchases.merge(customer_data, on=['uid', 'date'], how = 'inner')

print(uid_date_combined_data.head())

In [14]:
print(customer_data.columns)
print(app_purchases.columns)

Index(['uid', 'reg_date', 'device', 'gender', 'country', 'age'], dtype='object')
Index(['date', 'uid', 'sku', 'price'], dtype='object')


**Group by**

In [None]:
sub_data_grp = sub_data_demo.groupby(by=['country', 'device'], axis = 0, as_index = False)

sub_data_grp.price.mean()
sub_data_grp.price.agg('mean')
sub_data_grp.price.agg(['mean', 'median'])
sub_data_grp.agg({'price': ['mean', 'min', 'max'],
                  'age': ['mean', 'min', 'max']})
sub_data_grp.agg({'price': [custom_function]})

In [19]:
purchase_price_mean = uid_combined_data.price.agg('mean')

print(purchase_price_mean)

purchase_price_summary = uid_combined_data.price.agg(['mean', 'median'])

print(purchase_price_summary)

purchase_summary = uid_combined_data.agg({'price': ['mean', 'median'], 'age': ['mean', 'median']})

print(purchase_summary)

406.77259604707973
mean      406.772596
median    299.000000
Name: price, dtype: float64
             price        age
mean    406.772596  23.922274
median  299.000000  21.000000


In [21]:
purchase_data = uid_combined_data

grouped_purchase_data = purchase_data.groupby(by = ['device', 'gender'])

purchase_summary = grouped_purchase_data.agg({'price':['mean', 'median', 'std']})

print(purchase_summary)

                    price                   
                     mean median         std
device gender                               
and    F       400.747504    299  179.984378
       M       416.237308    499  195.001520
iOS    F       404.435330    299  181.524952
       M       405.272401    299  196.843197


In [None]:
from pandas import Timestamp
from datetime import timedelta
current_date = Timestamp(2018,3,17)

max_purchase_date = current_date - timedelta(days = 28)

purchase_data['reg_date'] = pd.to_datetime(purchase_data['reg_date'])

purchase_data_filt = purchase_data[purchase_data['reg_date'] < max_purchase_date]

purchase_data_filt = purchase_data_filt[(purchase_data_filt.date <= purchase_data_filt.reg_date + timedelta(days = 28))]

print(purchase_data_filt.price.mean())