# Online Guesthouse Booking Experience

**mentor:** ZiXia

**author:** Cheng Hou

---
## Project Instruction

This is a project to measure the online guesthouse booking experience
as follows: a guest finds an available room (listing) that she likes, and she then contacts the host. There are three ways to send an inquiry - a contact, a booking request, or instant book - detailed at the bottom of this document. Upon receiving the inquiry, the host can then decide whether or not to say yes to the request (accept the request). However, the host may not accept a guest for many reasons. Some might be logistical, e.g. dates do not work in the calendar, and some may be more personal, e.g. the guest seems risky. Our goal is to help our guests maximize their likelihood of being accepted by the hosts they contact.

Suppose we run an experiment where we require the guest to write a message that is at least 140 characters long to explain why he or she is interested in staying with the host, and we run this as a 50-50 experiment (50% in treatment, 50% in control). We then look at data on the contacts and bookings of users in the treatment group compared to the control group. We are interested in what happens to the experience of contacting and booking a place on Airbnb when the guest is required to write a message like this. We are also looking for suggestions for evaluating the future of this change. Should we launch the experiment to everyone or stop it? How would you explain the results and the decision to someone who was not highly technical?

Using the (fabricated) experiment assignment and the contact and booking data attached, please provide analysis + write-up to answer these questions.

### Reminder

There are three ways to book a place:

1. contact_me : The guests writes a message to the host to inquire about the listing. The host has the option to pre-approve the guest to book their place, or they can reject, or they can write a free text message with no explicit acceptance or rejection. If the host pre-approves, the guest can then go ahead and click to make the booking.
2. book_it : The guest puts money down to book the place directly, but the host has to accept the reservation request. If the host accepts, the booking happens automatically. 
3. instant_book : The guest books the listing directly, without any need for the host to accept or reject actively (it is an auto-accept by the host).

In [1]:
import os
import numpy as np
import pandas as pd

---
## Data

**Assignments** : contains a row for every time that a user gets assigned to an treatment group.
* id_user : random id of the user. 
  
* ab : The experimental group the user is assigned to. 

In [8]:
assignments = pd.read_csv('takehome_assignments.csv')
print('There are {} records in orginal data.'.format(len(assignments)))
assignments.head(3)

There are 10000 records in orginal data.


Unnamed: 0,id_user,ab
0,f966752c-8533-48b2-af6f-8c6797d2b247,treatment
1,873f93fb-234c-4cfb-83c7-27ff0e582a8e,treatment
2,7308791e-04c3-416a-be2d-4188816decc2,control


**Contacts** : contains a row for every time that a user makes an inquiry

* id_guest : random id of the guest (user) making the inquiry. Can be linked to id_user.

* id_host : random id of the host (user) of the listing to which the inquiry is made.

* id_listing : random id of the listing to which the inquiry is made.

* ts_interaction_first : UTC timestamp of the moment the inquiry is made.

* ts_reply_at_first : UTC timestamp of the moment the host replies to the inquiry, if so. If missing, there is no reply.

* ts_accepted_at_first : UTC timestamp of the moment the host accepts the inquiry, if so. If missing, there is no acceptance.

* ts_booking_at : UTC timestamp of the moment the booking is made, if so. If missing, there is no booking.

* dim_contact_channel_first : The contact channel through which the inquiry was made. One of {contact_me, book_it, instant_book}.

* m_first_message_length : length of the message the guest sent the host, in characters. If missing then there was no message. 

In [76]:
contacts = pd.read_csv('takehome_contacts.csv', 
                       parse_dates=['ts_interaction_first', 'ts_reply_at_first',
                                    'ts_accepted_at_first', 'ts_booking_at'])
NaT = pd._libs.tslib.NaTType
print('There are {} records in orginal data.'.format(len(contacts)))
contacts.head(3)

There are 10000 records in orginal data.


Unnamed: 0,id_guest,id_host,id_listing,ts_interaction_first,ts_reply_at_first,ts_accepted_at_first,ts_booking_at,dim_contact_channel,m_first_message_length
0,f966752c-8533-48b2-af6f-8c6797d2b247,4405ab66-1c68-449b-abd9-1ad1892a6c4d,fe07e0c4-c317-44bc-a82d-5b599a248049,2013-01-01 23:04:35,2013-01-03 23:15:23,NaT,NaT,contact_me,230.0
1,873f93fb-234c-4cfb-83c7-27ff0e582a8e,aa41b57b-e29f-4c95-bf27-48f27519e419,d47717da-315a-42c2-8888-9b7d4bea8829,2013-01-02 00:21:26,2013-01-07 23:38:31,NaT,NaT,contact_me,98.0
2,7308791e-04c3-416a-be2d-4188816decc2,8b118ba1-b439-493e-88c7-2c89a81cec1b,ac231804-951c-4fcb-a0e6-1a4aecbfb6ce,2013-01-02 02:30:19,2013-01-02 18:06:10,2013-01-02 18:06:10,NaT,contact_me,278.0


In [77]:
contacts.dtypes

id_guest                          object
id_host                           object
id_listing                        object
ts_interaction_first      datetime64[ns]
ts_reply_at_first         datetime64[ns]
ts_accepted_at_first      datetime64[ns]
ts_booking_at             datetime64[ns]
dim_contact_channel               object
m_first_message_length           float64
dtype: object

---
## Data cleaning

Drop **duplicted** assignments.

In [36]:
assign = assignments.drop_duplicates()
print(len(assign))

9487


Find the **contradictory** assignments (Some users are assigned to both treatment and control group) and drop them.

In [37]:
contra_guest = assign.id_user[assign.id_user.duplicated()].tolist()
assign = assign[~assign.id_user.isin(contra_guest)]
assign.ab.value_counts()

treatment    4352
control      4349
Name: ab, dtype: int64

---
## Analysis
### The host accept-rate

In [38]:
guest_con = assign.id_user[assign.ab == 'control'].tolist()
guest_tre = assign.id_user[assign.ab == 'treatment'].tolist()

In [118]:
con_accept = sum(np.array(contacts.id_guest.isin(guest_con)) & 
                 np.array(contacts.ts_accepted_at_first.apply(type) != NaT))
con_failed = sum(np.array(contacts.id_guest.isin(guest_con)) & 
                 np.array(contacts.ts_accepted_at_first.apply(type) == NaT))
tre_accept = sum(np.array(contacts.id_guest.isin(guest_tre)) & 
                 np.array(contacts.ts_accepted_at_first.apply(type) != NaT))
tre_failed = sum(np.array(contacts.id_guest.isin(guest_tre)) & 
                 np.array(contacts.ts_accepted_at_first.apply(type) == NaT))

In [125]:
m = pd.DataFrame([[con_accept, con_failed, con_accept/(con_accept + con_failed)],
                  [tre_accept, tre_failed, tre_accept/(tre_accept + tre_failed)]])
m.columns = ['accepted','failed','accept_rate']
m.index = ['control', 'treatment']
m

Unnamed: 0,accepted,failed,accept_rate
control,1595,2955,0.350549
treatment,1561,2993,0.342776


### The host accept rate in '*contact_me*' channel

In [120]:
contacts_me = contacts[contacts.dim_contact_channel == 'contact_me']

In [121]:
con_accept_me = sum(np.array(contacts_me.id_guest.isin(guest_con)) & 
                    np.array(contacts_me.ts_accepted_at_first.apply(type) != NaT))
con_failed_me = sum(np.array(contacts_me.id_guest.isin(guest_con)) & 
                    np.array(contacts_me.ts_accepted_at_first.apply(type) == NaT))
tre_accept_me = sum(np.array(contacts_me.id_guest.isin(guest_tre)) & 
                    np.array(contacts_me.ts_accepted_at_first.apply(type) != NaT))
tre_failed_me = sum(np.array(contacts_me.id_guest.isin(guest_tre)) & 
                    np.array(contacts_me.ts_accepted_at_first.apply(type) == NaT))

In [126]:
m_me = pd.DataFrame([[con_accept_me, con_failed_me, con_accept_me/(con_accept_me + con_failed_me)],
                     [tre_accept_me, tre_failed_me, tre_accept_me/(tre_accept_me + tre_failed_me)]])
m_me.columns = ['accepted','failed','accept_rate']
m_me.index = ['control', 'treatment']
m_me

Unnamed: 0,accepted,failed,accept_rate
control,1059,2589,0.290296
treatment,1042,2600,0.286107
