## Part 1

### Challenge:
As AirBnB's webpage announces a raise on this elements, we should check on number of reservations and increase of night price.

#### Samples:
The best option would be to perform an A/B test on flats of similar characteristics, but that correspond to the mean of the market.

#### Metrics:
- Conversion rates (number of reservations).
- Click rates.
- Night price.

#### Hypothesis:
- The professional photo increments the views of the rental flat.
- Therefore, the user rents it more often.
- And can rise its price, as there's more demand

#### Period:
It should last at least 1 week, and 1 month would be ideal to see if weekends affect the metrics.

## Part 2

The designed experiment is very simple. For a period of 5 days, all trips in 3 cities (Bravos, Pentos and Volantis) have been randomly assigned using linear or road distance:

- Trips whose trip_id starts with digits 0-8 were assigned using road distance.
- Trips whose trip_id starts with digits 9-f were assigned using linear distance.

Data description
The collected data is available in this link. Each object represent a vehicle_interval that contains the following attributes:

- duration: how long the interval last, in seconds.
- distance: how far the vehicle moved in this interval, in meters.
- started_at: when the interval started, UTC Time.
- trip_id: uniquely identifies the trip.
- vehicle_id: uniquely identifies the vehicle.
- rider_id: uniquely identifies the rider. (Not found in the data)
- city_id: either bravos, pentos and volantis.
- type: can be going_to_pickup, waiting_for_rider or driving_to_destination.


## Challenge
Try to answer the following questions:

- Should the company move towards road distance? What's the max price it would make sense to pay per query? (make all the assumptions you need, and make them explicit)
- How would you improve the experimental design? Would you collect any additional data?

In [1]:
import numpy as np, pandas as pd


In [2]:
data = pd.read_json('data/intervals_challenge.json', lines=True)
original_data = pd.read_json('data/intervals_challenge.json', lines=True)
data.head()

Unnamed: 0,duration,distance,started_at,trip_id,vehicle_id,city_id,type
0,857,5384,2016-10-03 13:00:00.286999941,c00cee6963e0dc66e50e271239426914,52d38cf1a3240d5cbdcf730f2d9a47d6,pentos,driving_to_destination
1,245,1248,2016-10-03 13:00:00.852999926,427425e1f4318ca2461168bdd6e4fcbd,8336b28f24c3e7a1e3d582073b164895,volantis,going_to_pickup
2,1249,5847,2016-10-03 13:00:01.670000076,757867f6d7c00ef92a65bfaa3895943f,8885c59374cc539163e83f01ed59fd16,pentos,driving_to_destination
3,471,2585,2016-10-03 13:00:01.841000080,d09d1301d361f7359d0d936557d10f89,81b63920454f70b6755a494e3b28b3a7,bravos,going_to_pickup
4,182,743,2016-10-03 13:00:01.970000029,00f20a701f0ec2519353ef3ffaf75068,b73030977cbad61c9db55418909864fa,pentos,going_to_pickup


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 165170 entries, 0 to 165169
Data columns (total 7 columns):
duration      165170 non-null object
distance      165170 non-null object
started_at    165170 non-null datetime64[ns]
trip_id       165170 non-null object
vehicle_id    165170 non-null object
city_id       165170 non-null object
type          165170 non-null object
dtypes: datetime64[ns](1), object(6)
memory usage: 8.8+ MB


In [4]:
data.shape

(165170, 7)

#### Because we only need the trips to pickup we can already sort it out:

In [5]:
data.type.value_counts()

going_to_pickup           58510
waiting_for_rider         53746
driving_to_destination    52914
Name: type, dtype: int64

In [6]:
data_pickup = data[data.type == 'going_to_pickup'].copy()

In [7]:
data_pickup.head()

Unnamed: 0,duration,distance,started_at,trip_id,vehicle_id,city_id,type
1,245,1248,2016-10-03 13:00:00.852999926,427425e1f4318ca2461168bdd6e4fcbd,8336b28f24c3e7a1e3d582073b164895,volantis,going_to_pickup
3,471,2585,2016-10-03 13:00:01.841000080,d09d1301d361f7359d0d936557d10f89,81b63920454f70b6755a494e3b28b3a7,bravos,going_to_pickup
4,182,743,2016-10-03 13:00:01.970000029,00f20a701f0ec2519353ef3ffaf75068,b73030977cbad61c9db55418909864fa,pentos,going_to_pickup
5,599,1351,2016-10-03 13:00:02.154000044,158e7bc8d42e1d8c94767b00c8f89568,126e868fb282852c2fa95d88878686bf,volantis,going_to_pickup
9,1525,2674,2016-10-03 13:00:05.637000084,d3e6e8fb50c02d66feca2c60830c4fcc,b0906e917dc5cc0bcba190fd80079a74,bravos,going_to_pickup


#### Define a function to sort linear from road trips

In [8]:
def find_type(value):
    test = [str(i) for i in range(0,9)]
    if value.startswith(tuple(test)):
        return "road"
    else:
        return "linear"
    
data_pickup["trip_type"] = data_pickup["trip_id"].apply(find_type)

In [9]:
data_pickup.head()

Unnamed: 0,duration,distance,started_at,trip_id,vehicle_id,city_id,type,trip_type
1,245,1248,2016-10-03 13:00:00.852999926,427425e1f4318ca2461168bdd6e4fcbd,8336b28f24c3e7a1e3d582073b164895,volantis,going_to_pickup,road
3,471,2585,2016-10-03 13:00:01.841000080,d09d1301d361f7359d0d936557d10f89,81b63920454f70b6755a494e3b28b3a7,bravos,going_to_pickup,linear
4,182,743,2016-10-03 13:00:01.970000029,00f20a701f0ec2519353ef3ffaf75068,b73030977cbad61c9db55418909864fa,pentos,going_to_pickup,road
5,599,1351,2016-10-03 13:00:02.154000044,158e7bc8d42e1d8c94767b00c8f89568,126e868fb282852c2fa95d88878686bf,volantis,going_to_pickup,road
9,1525,2674,2016-10-03 13:00:05.637000084,d3e6e8fb50c02d66feca2c60830c4fcc,b0906e917dc5cc0bcba190fd80079a74,bravos,going_to_pickup,linear


In [10]:
data_pickup.trip_type.value_counts()

road      33171
linear    25339
Name: trip_type, dtype: int64

We can observe the road trips are more common than linear trips.

In [11]:
data_pickup.drop(columns= ['trip_id', 'type'], inplace = True)

In [12]:
data_pickup.head()

Unnamed: 0,duration,distance,started_at,vehicle_id,city_id,trip_type
1,245,1248,2016-10-03 13:00:00.852999926,8336b28f24c3e7a1e3d582073b164895,volantis,road
3,471,2585,2016-10-03 13:00:01.841000080,81b63920454f70b6755a494e3b28b3a7,bravos,linear
4,182,743,2016-10-03 13:00:01.970000029,b73030977cbad61c9db55418909864fa,pentos,road
5,599,1351,2016-10-03 13:00:02.154000044,126e868fb282852c2fa95d88878686bf,volantis,road
9,1525,2674,2016-10-03 13:00:05.637000084,b0906e917dc5cc0bcba190fd80079a74,bravos,linear


In [13]:
data_pickup.isnull().all()

duration      False
distance      False
started_at    False
vehicle_id    False
city_id       False
trip_type     False
dtype: bool

In [14]:
data_pickup.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 58510 entries, 1 to 165169
Data columns (total 6 columns):
duration      58510 non-null object
distance      58510 non-null object
started_at    58510 non-null datetime64[ns]
vehicle_id    58510 non-null object
city_id       58510 non-null object
trip_type     58510 non-null object
dtypes: datetime64[ns](1), object(5)
memory usage: 3.1+ MB


In [15]:
data_pickup.sort_values(by= 'duration').duration.unique()

TypeError: '<' not supported between instances of 'str' and 'int'

In [17]:
print('duration NA: ', data_pickup[data_pickup.duration == 'NA'].duration.count())
print('duration 0: ', data_pickup[data_pickup.duration == 0].duration.count())
print('distance NA: ', data_pickup[data_pickup.distance == 'NA'].distance.count())
print('distance 0: ', data_pickup[data_pickup.distance == 0].distance.count())


duration NA:  299
duration 0:  22
distance NA:  299
distance 0:  1421


In [18]:
data_pickup.drop(data_pickup[(data_pickup.duration == 'NA') |
                             (data_pickup.duration == 0) |
                             (data_pickup.distance == 'NA') |
                             (data_pickup.distance == 0)].index,
                 axis=0, inplace=True)

In [19]:
data_pickup.reset_index(drop = True)

Unnamed: 0,duration,distance,started_at,vehicle_id,city_id,trip_type
0,245,1248,2016-10-03 13:00:00.852999926,8336b28f24c3e7a1e3d582073b164895,volantis,road
1,471,2585,2016-10-03 13:00:01.841000080,81b63920454f70b6755a494e3b28b3a7,bravos,linear
2,182,743,2016-10-03 13:00:01.970000029,b73030977cbad61c9db55418909864fa,pentos,road
3,599,1351,2016-10-03 13:00:02.154000044,126e868fb282852c2fa95d88878686bf,volantis,road
4,1525,2674,2016-10-03 13:00:05.637000084,b0906e917dc5cc0bcba190fd80079a74,bravos,linear
...,...,...,...,...,...,...
56773,135,263,2016-10-04 20:33:03.627000093,2c8fd89e1efbb6f207855c453760a4a1,pentos,linear
56774,182,222,2016-10-04 20:33:03.806999922,7a7b7c6ae0671ea05ac039fe4d36874a,volantis,road
56775,77,285,2016-10-04 20:33:28.573999882,57aa029c4c246d713ed03a75cc72700a,volantis,road
56776,17,88,2016-10-04 20:33:49.171999931,05f5cb64e8bfc417e56437e6324d367a,bravos,road


In [20]:
data_pickup['dist_dur'] = data_pickup.distance / data_pickup.duration
data_pickup.head()

Unnamed: 0,duration,distance,started_at,vehicle_id,city_id,trip_type,dist_dur
1,245,1248,2016-10-03 13:00:00.852999926,8336b28f24c3e7a1e3d582073b164895,volantis,road,5.09388
3,471,2585,2016-10-03 13:00:01.841000080,81b63920454f70b6755a494e3b28b3a7,bravos,linear,5.48832
4,182,743,2016-10-03 13:00:01.970000029,b73030977cbad61c9db55418909864fa,pentos,road,4.08242
5,599,1351,2016-10-03 13:00:02.154000044,126e868fb282852c2fa95d88878686bf,volantis,road,2.25543
9,1525,2674,2016-10-03 13:00:05.637000084,b0906e917dc5cc0bcba190fd80079a74,bravos,linear,1.75344
