In [1]:
import json
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller

In [2]:
import pandas as pd
import numpy as np
from plotly import graph_objects as go
from plotly import express as px

# Part 2 ‐ Experiment and metrics design
The neighboring cities of Gotham and Metropolis have complementary circadian rhythms: on
weekdays, ultimate Gotham is most active at night, and Ultimate Metropolis is most active
during the day. On weekends, there is reasonable activity in both cities.
However, a toll bridge, with a two way toll, between the two cities causes driver partners to tend
to be exclusive to each city. The Ultimate managers of city operations for the two cities have
proposed an experiment to encourage driver partners to be available in both cities, by
reimbursing all toll costs.
## 1) What would you choose as the key measure of success of this experiment in encouraging driver partners to serve both cities, and why would you choose this metric?

## 2) Describe a practical experiment you would design to compare the effectiveness of the proposed change in relation to the key measure of success. Please provide details on:

- a) how you will implement the experiment
- b) what statistical test(s) you will conduct to verify the significance of the
observation
- c) how you would interpret the results and provide recommendations to the city operations team along with any caveats.

In [3]:
with open('ultimate_data_challenge.json', 'r') as f:
    data = json.load(f)

In [4]:
df = pd.DataFrame(data)

In [5]:
df.head()

Unnamed: 0,city,trips_in_first_30_days,signup_date,avg_rating_of_driver,avg_surge,last_trip_date,phone,surge_pct,ultimate_black_user,weekday_pct,avg_dist,avg_rating_by_driver
0,King's Landing,4,2014-01-25,4.7,1.1,2014-06-17,iPhone,15.4,True,46.2,3.67,5.0
1,Astapor,0,2014-01-29,5.0,1.0,2014-05-05,Android,0.0,False,50.0,8.26,5.0
2,Astapor,3,2014-01-06,4.3,1.0,2014-01-07,iPhone,0.0,False,100.0,0.77,5.0
3,King's Landing,9,2014-01-10,4.6,1.14,2014-06-29,iPhone,20.0,True,80.0,2.36,4.9
4,Winterfell,14,2014-01-27,4.4,1.19,2014-03-15,Android,11.8,False,82.4,3.13,4.9


1) There is no metric that explicitly states whether or not a driver is driving multi-city, but we do have the average distance in miles per trip taken in the first 30 days after signup. There are two potential features that might indicate that drivers are driving in multiple cities
- avg_dist: If the average distance is higher than there is a higher chance that the driver is driving across multiple cities
- trips_in_first_30_days: If users are starting to take more trips given that there is a toll reimbersement, then this is good for Ultimate and its indicitive that drivers may be driving across multiple cities.

2) 
- a) The Experiment I would design is A/B testing. For the next 10,000 users (arbitrary large sample), I would randomly assign the toll reimbersement feature in the ultimate app. 5,000 new drivers would have the reimbersement feature (treatment) and the other 5,000 new drivers would not (control). 

- b) The statistical test I would conduct would be to test if their is a statistically significant difference in the mean trips_in_first_30_days <br>

The null hypothesis would be that the only difference between the mean and treatment is due to chance <br>
$H_0: \mu_{control} = \mu_{treatment}$ <br>
$H_1: \mu_{control} \neq \mu_{treatment}$ <br>
We can pick $\alpha = 0.05$ and form a 95 % confidence interval using bootstrap sampling of the control group. With large enough sample sizes the bootstrap samples will have have a normal distribution around the mean of the control group. The 95% confidence interval will simple be from the 2.5% quantile and the 97.5% quantile. 

- c) If the mean of the treatment group lies outside this interval we would reject the null hypothesis that the treatment and control means are the same, and we could say with 95 % confidence

