# Getaround Analysis project - Rental Delay Analysis



Contents
--------
1. [Data loading](#loading)
2. [Exploratory data analysis](#eda)
2. [Conclusion and perspectives](#conclusion)



In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch

## <a name="loading"></a> Data loading

In [3]:
df = pd.read_excel('./data/get_around_delay_analysis.xlsx')
df

Unnamed: 0,rental_id,car_id,checkin_type,state,delay_at_checkout_in_minutes,previous_ended_rental_id,time_delta_with_previous_rental_in_minutes
0,505000,363965,mobile,canceled,,,
1,507750,269550,mobile,ended,-81.0,,
2,508131,359049,connect,ended,70.0,,
3,508865,299063,connect,canceled,,,
4,511440,313932,mobile,ended,,,
...,...,...,...,...,...,...,...
21305,573446,380069,mobile,ended,,573429.0,300.0
21306,573790,341965,mobile,ended,-337.0,,
21307,573791,364890,mobile,ended,144.0,,
21308,574852,362531,connect,ended,-76.0,,


In [4]:
df.describe(include='all')

Unnamed: 0,rental_id,car_id,checkin_type,state,delay_at_checkout_in_minutes,previous_ended_rental_id,time_delta_with_previous_rental_in_minutes
count,21310.0,21310.0,21310,21310,16346.0,1841.0,1841.0
unique,,,2,2,,,
top,,,mobile,ended,,,
freq,,,17003,18045,,,
mean,549712.880338,350030.603426,,,59.701517,550127.411733,279.28843
std,13863.446964,58206.249765,,,1002.561635,13184.023111,254.594486
min,504806.0,159250.0,,,-22433.0,505628.0,0.0
25%,540613.25,317639.0,,,-36.0,540896.0,60.0
50%,550350.0,368717.0,,,9.0,550567.0,180.0
75%,560468.5,394928.0,,,67.0,560823.0,540.0


The dataset contains 21310 observations, each consisting of data pertaining to a car rental event. The dataset has 7 columns:
- The column `car_id` refers to the car that was rented. In the absence of further information, it is of no use to us.
- The columns `rental_id` and `previous_ended_rental_id` are identifiers of the current and previous rentals of a given car. We will use them to follow car rental sequences.
- The column `checkin_type` indicates whether the rental was made using Getaround connect functionality or by mobile.
- The column `state` indicates whether the rental was canceled or not.
- The column `delay_at_checkout_in_minutes` gives the time difference between the actual and expected checkout times. A negative value indicates that the checkout occured earlier than expected, and a positive value indicates a late checkout. A late checkout which makes the next customer waiting is problematic and this is what we aim to mitigate by introducing a delay before availability.
- The column `time_delta_with_previous_rental_in_minutes` represents the expected amount of time between two consecutive rentals. This value is based on the *expected* checkout and checkin times, and does not include the checkout delay.  A `NULL` value corresponds to a time delta larger that 12h (720 min), in which case the rental is assumed to be non-consecutive (`previous_ended_rental_id` is also `NULL`).

## <a id="eda"></a> Exploratory data analysis

Before determining the impact of the introduction of a rental delay, we first gather some necessary insights about user behavior.

In [None]:
## Number of rentals using each method
df['checkin_type'].value_counts()

In [None]:
## Counts of rental states for each checkin type
df_ = df.groupby(['checkin_type', 'state']).count()['rental_id']
df_

In [None]:
## Probability of rental states for each checkin type
df_ / df_.T.groupby('checkin_type').sum()

- Customers favor mobile checkin (80%) over Getaround connect (20%). Part of this difference is due to the fact that not all the cars (actually, only 46%) have the Getaround connect option.
- Rental cancellation rates are higher when customers use Getaround connect functionality (18.5%) than with mobile checkin (14.5%). The cancellation process is possibly made easier with Getaround connect.

In [None]:
##
df_ = df.groupby(['checkin_type', 'state']).agg(lambda x: x.isnull().sum())
df_