## Set up

In [98]:
import pandas as pd 
import math
import scipy.stats
import plotly.express as px 
DATA_PATH = 'dataset/raw/marketing_AB.csv'






In [2]:
marketing_data = pd.read_csv(DATA_PATH)

## Setting Up Problem 

1. Experiment Goal : Measure whether the conversion rate is higher when user exposed to Ads rather than Public Service Annoucements
2. Metrics : Conversion Rate 
3. Variants : 

	A.Control : User only get Public Service Announcements
    
    B.Treatment : User who get Ads
    
4. Hypothesis
$$ H_0 : p_{conversiontreat} \le p_{conversioncontrol} $$
$$ H_1 : p_{conversiontreat}>p_{conversioncontrol} $$


## Experiment Design

1. Randomization Unit : User ID
2. Target of Randomization Unit : User
3. Sample Size : 
	Because the sample size from dataset was not fulfill Sample Ratio Matching 
    we draw again sample with size 501, given this statistical properties : 
    A. Significant Level : 0.05
    
    B. Power Level : 1-beta : 0.8
    
    C. Standard Deviation of Population (Sigma) : 
    
    D. Difference between control and treatment (teta): 0.01
    

In [68]:
## calculate sample size 
alpha  = 0.05 
power_level = 0.8 

difference = 0.01


std_dev = 0.0564465

z_alpha_div2 = 1.95996

z_power_level = 0.84162

nominator = 2*(std_dev**2)*((z_alpha_div2+z_power_level)**2)

minimal_sample = math.ceil(nominator / (difference**2))

minimal_sample


501

## Analysis and Interpretation

### Sanity Check

In [89]:
marketing_data.isnull().sum()

Unnamed: 0       0
user id          0
test group       0
converted        0
total ads        0
most ads day     0
most ads hour    0
dtype: int64

In [91]:
marketing_data.duplicated().sum()

0

In [92]:
marketing_data['user id'].duplicated().sum()

0

### Exploratory (First) Data Analysis

In [None]:
control_sample = marketing_data.loc[marketing_data['test group']=='psa'].sample(501)
experiment_sample = marketing_data.loc[marketing_data['test group']=='ad'].sample(501)



#### Control Group Conversion Rate

In [112]:
control_sample_count = control_sample['converted'].value_counts()
num_of_conversion_idx = control_sample_count.index 
proportion_control  = pd.DataFrame()
proportion_control['category'] = num_of_conversion_idx
proportion_control['count'] = control_sample_count

viz_control = px.pie(proportion_control,values='count',names='category',template='seaborn')
viz_control.update_layout(title='Conversion on Control Group')

In [118]:
experiment_sample_count = experiment_sample['converted'].value_counts()
experiment_sample_count

False    479
True      22
Name: converted, dtype: int64

In [120]:
experiment_sample_count = experiment_sample['converted'].value_counts()
num_of_conversion_idx_exp = experiment_sample_count.index 
proportion_experiment = pd.DataFrame()
proportion_experiment['category'] = num_of_conversion_idx_exp
proportion_experiment['count'] = experiment_sample_count
proportion_experiment
viz_treat = px.pie(proportion_experiment,values='count',names='category',template='seaborn')
viz_treat.update_layout(title='Conversion on Experiment Group')

### Calculate Sample Ratio Matching



### Hypothesis Testing

In [85]:
from statsmodels.stats.proportion import proportions_ztest
#total sample for each variant control and treatment 
total_sample = [501,501]
count_convert_control =len(control_sample.loc[control_sample['converted']==True])
count_convert_treatment =len(experiment_sample.loc[experiment_sample['converted']==True])

converteds = [count_convert_treatment,count_convert_control]

In [121]:
z_score, p_value = proportions_ztest(count = converteds,
                                       nobs = total_sample,
                                       alternative = 'larger')
p_value

0.008849698583050033

#### Conclusion
We are going to draw conclusion based on statistical result (p value and z score)
1. P value 
2. Z Test

In [122]:
from scipy import stats

# Rejection based on Critical Value
alpha= 0.05 # One tailed test 
z_crit = stats.norm.ppf(1-alpha)

if z_score < z_crit: 
    print(f'Z statistic Result :{z_score} While Z Criterion :{z_crit}.Fail to Reject H0')
else :
    print(f'Z statistic Result :{z_score} While Z Criterion :{z_crit}.Reject H0')
    
# p value rejection 
if p_value > alpha: 
    print(f'P Value from Statistical Test  :{p_value} While alpha :{alpha}.Fail to Reject H0')
else :
    print(f'P Value from Statistical Test  :{p_value} While alpha :{alpha}.Reject H0')

Z statistic Result :2.371847439669594 While Z Criterion :1.6448536269514722.Reject H0
P Value from Statistical Test  :0.008849698583050033 While alpha :0.05.Reject H0


### Calculate confidence interval of treatment and control

In [84]:
from statsmodels.stats.proportion import confint_proportions_2indep

confidence_interval = confint_proportions_2indep(count1 = count_convert_treatment, nobs1 = 501,
                                                 count2 = count_convert_control, nobs2 = 501, 
                                                 compare='diff', alpha=0.05)
print(confidence_interval)

(0.004326979744276309, 0.049227370708556026)


Based on those we sure 95% that difference between proportion of user who converted on treatment and control group is between 0.004326979744276309 to 0.049227370708556026.



##  Conclusion and Recommendation

### Statistically Significant and Practically Significant


|                      | Conversion Rate  (Treatment) | Conversion Rate (Control) | Difference | Confidence Interval |
|----------------------|------------------------------|---------------------------|------------|---------------------|
| Treatment vs Control | 4.3%                         | 1.8%                      | 2.5%       | 0% - 4.92%          |
|                      |                              |                           |            |                     |
|                      |                              |                           |            |                     |

Based on our Experiment our Stastictical Result show that the Treatment (Ads) yield more conversion Rate. However the result is not practically significant since our confidence interval still under 1% (prior) conversion rate. 

In [150]:
import plotly.graph_objects as go
fig = px.scatter(x=[0.025], y=['Difference'],
                 error_x=[0.015], error_x_minus=[0.025],template='seaborn')
fig.add_vline(x=0.01, line_width=3, line_dash="dash", line_color="green")
fig.add_vline(x=-0.01, line_width=3, line_dash="dash", line_color="green")
fig.add_vline(x=0, line_width=3, line_dash="dash", line_color="red")
fig.update_layout(title='Practical Significance',xaxis=dict(
            range=[-0.02, 0.05],title='Conversion Rate'
        ),yaxis=dict(title=''))