## We pushed a change within our AI-coach, and we want to study its impact on long term retention as well as short term satisfaction.

### Suggest meaningful metric(s) to study this use case. How would you go about designing them? What are the set of requirements here?
There should be different metrics for the short term and for the long term satisfaction. 
For the short term some possible metrics could be:

    - Deviation of the mean usage time from the previous before/after the change
    - Distribution of usage
    - Rate of recommendation of the application within the social networks (freeletic network, NPS, Net Promoter Score)
    - Rate of mentions in social networks, like twitter, facebook or instagram and their interactions (like, comments, etc)
    - Rate of births and deaths in the AI-coach. The rate of new customer and the rate the customer give up with this application

For the long term some possible metrics could be:
    - Direct surveys to the customer. Users need some time to provide unbiased surveys that is why surveys are better in the long term. 
    - Subscription time to the service. The longer a user keep subscribe to the service the better will be their satisfaction.
    - Continuity of the subscription. Continuity in the subscription line of a customer will point with better levels of satisfaction.
    - Stable or growing usage of the AI-coach. Checking the historic records to check the variance of usage. Longer variances will mean more commitment with the training and coaching. It could be correlated with the subscription rate and continuity.
    

### Assuming we have traffic of X users/Day receiving coach weeks from the coach. What is a good way to test the hypothesis that the implemented measure retains users longer in their 4th calendar week?

One way to test this will be to sample the mean number of users within the first week and do the same the 4th week. Then we can compare whether the two means are significant similar with a <b>paired t-test</b>. Because we are talking about the same group (an hence there is a dependency) at different points of time, this seems the best option. 

### Assuming that retention in week 4 turns out to be at Y% for the group who saw the new treatment, and Z% who did not see the treatment: name, setup and design a test to validate the effectiveness of this measure.

A test to validate this measurement (the different retention percentage between the two groups) will be a <b>Welch's t-test</b>. It is similar to my suggestion to the case of study 1. This test is a good option when we have two different populations, with different variance and different population size like is the current case.  

We will have two hypothesis, the called null hypothesis and the alternative hypothesis.
The null hypothesis will be:

$H_0:$ the means of the retention the two groups are the same

and hence the alternative hypothesis will be:

$H_1:$ the means of the retention the two groups are the different

Then if the $t^*$ is below (within the distribution of the null hypothesis) than the t por the 5% of significant level ($t_{0.05}$) we could discard the null hypothesis and claim that the two different values are significant and likely due to the fact of having use the new treatment. In the other hand, if $t^*$ is not below that level the null hypothesis could not be rejected and hence we can not make any statement about the different retention percentages. This could be use as a confirmation that the retention is independent of the treatment and hence this has no impact on the customer experience.

### Implement the set of formulas that you would need to validate the hypothesis.


The set of formulas to check how significant are our results wihtin the distribution of the null hypothesis are:

$\Huge t^*=\frac{E[Y]-E[Z]}{\sqrt{\frac{s_{Y}^2}{N_Y}+\frac{s_{Z}^2}{N_Z}}}$

where:

$E[Y]$ and $E[Z]$ are the sample mean of each retention percentage 

$N_y$ and $N_z$ are the sample size of each population

$s_y$ and $s_z$ are the standard deviation of each population

The degrees of freedom are given by the formula:

$\Huge ddof=\frac{(\frac{s_{y}^2}{N_{y}}+\frac{s_{z}^2}{N_{z}})^2}{\frac{s_{y}^4}{N_{y}^2(N_{y}-1)}+\frac{s_{z}^4}{N_{z}^2(N_{z}-1)}}$



### Should we roll back this feature? Why? Why not? Please provide a set of numbers that you used for this decision. How confident are you in this decision?

This question has been addressed before. 

Then if the $t^*$ is below (within the distribution of the null hypothesis) than the t por the 5% of significant level ($t_{0.05}$) we could discard the null hypothesis and claim that the two different values are significant and likely due to the fact of having use the new treatment. In the other hand, if $t^*$ is not below that level the null hypothesis could not be rejected and hence we can not make any statement about the different retention percentages. This could be use as a confirmation that the retention is independent of the treatment and hence this has no impact on the customer experience.

In the last case it will be convenient to roll it back. 

### Assuming this feature was a success, how would you go about suggesting a new hypothesis to test? Can you provide a quantitative description to a qualitative dummy values to guide your thought process?

Because the new feature was a success (hurra!) increasing the retention of users after 4 weeks the team want to know what triggered the response of the users to join the new feature. Usually this is done base on advertisement. 
In the last news-letter the marketing department put the addvertisement of this feature without any picture that support the text. This made the users to join the treatment(coaching,feature) with a measured average rate of X users/day. 

In the next news-letter due to the impact in customer retention the bussiness intelligent suggest to give a bit more punch to the app and advertise it with a nice picture and some bold text. One week after this we have a measured joining rate of Y users/Day with Y>X.

It is hence a good marketing idea to make this add more appealing or is an extra effort that the users do not use for their choices?

To check this we have to perform also an statistics test. The welch test will be a good candidate for this.

In [2]:
# Neccessary imports
import math
import random
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.gridspec as grds
import scipy 
from scipy import signal as signal
import scipy.stats as stats

%matplotlib inline

In [38]:
#Code to give a quantitative description on the previous qualitative dummy problem

Total_Customers = 14e6
Percentage_of_users = 0.01 #Assuming that 1% of the users use the new feature
X = Total_Customers * Percentage_of_users 
Y = Total_Customers * (Percentage_of_users+ np.random.normal(loc=0,scale=Percentage_of_users*0.001,size=1)) #Assuming half of the customers acces the application

print 'X', X
print 'Y', Y[0]

#Lets model the two incomming distribution as a Poisson distributions
T = 7*24 #We observed the distributions for one week
                       
population_X = stats.poisson.rvs(loc=Percentage_of_users, mu=X, size=T)
population_Y = stats.poisson.rvs(loc=Percentage_of_users, mu=Y, size=T)
population_customers_arrivals = np.concatenate((population_X, population_Y))

                      
#Implementation of the Welch test in scipy
st,pvalue = stats.ttest_ind(a= population_X, b= population_Y, equal_var=False)  


if pvalue<0.05:
    print '*************************'
    print 'NULL HYPOTHESIS REJECTED'
    print '*************************'
    print 'Continue with the email campaign'
else:
    print '****************************'
    print 'NULL HYPOTHESIS NOT REJECTED'
    print '****************************'
    print 'Stop mail campaign and look for new marketing formulas'
        

X 140000.0
Y 139930.036106
*************************
NULL HYPOTHESIS REJECTED
*************************
Continue with the email campaign


### Optional: Is there a Bayesian way we can test our hypothesis? How does it possibly differ from the frequentist approach, both qualitatively and quantitatively.

Bayesian hypothesis testing works kind of similar to any bayesian inference problem. There is no needs of asymetry in the hypothesis (in the frequentist approach we test the null distribution but rejecting it does not imply that the other hypothesis will be true). It can be used with multiple hypothesis. 

In the previous case where we have two hypothesis $H_{0}$ and $H_{1}$ we will have:

$P(H_{0}|data)=\frac{P(data|H_{0})P(H_{0})}{P(data)}$

and

$P(H_{1}|data)=1-P(H_{0}|data)$


The probability of the data comes from the marginalization on the two division of the probability subspaces:

$P(data)=P(data|H_{0})P(H_{0})+P(data|H_{1})P(H_{1})$

The probabilites $P(data|H_{n})$ are the likelihood while the $P(H_{n})$ are the prior probabilities, that can be set to 0.5 in case of two hypothesis with np preference for any of them. In cases in which we do not have a strong belief on the priors one can always compute the Bayse factor (or likelihod ratio) that helps to quantify which hypotheis is more supported by a data set. 

Quantitatively we can update our posterior with any single new data. Qualitatively we need an expression for the likelyhood which is not always possible. 