![snap](https://lever-client-logos.s3.amazonaws.com/2bd4cdf9-37f2-497f-9096-c2793296a75f-1568844229943.png)

# GetAround 

[GetAround](https://www.getaround.com/?wpsrc=Google+Organic+Search) is the Airbnb for cars. You can rent cars from any person for a few hours to a few days! Founded in 2009, this company has known rapid growth. In 2019, they count over 5 million users and about 20K available cars worldwide. 

As Jedha's partner, they offered this great challenges: 

## Context 

When renting a car, our users have to complete a checkin flow at the beginning of the rental and a checkout flow at the end of the rental in order to:

* Assess the state of the car and notify other parties of pre-existing damages or damages that occurred during the rental.
* Compare fuel levels.
* Measure how many kilometers were driven.

The checkin and checkout of our rentals can be done with three distinct flows:
* **üì± Mobile** rental agreement on native apps: driver and owner meet and both sign the rental agreement on the owner‚Äôs smartphone
* **Connect:** the driver doesn‚Äôt meet the owner and opens the car with his smartphone
* **üìù Paper** contract (negligible)

## Project üöß

For this case study, we suggest that you put yourselves in our shoes, and run an analysis we made back in 2017 üîÆ ü™Ñ

When using Getaround, drivers book cars for a specific time period, from an hour to a few days long. They are supposed to bring back the car on time, but it happens from time to time that drivers are late for the checkout.

Late returns at checkout can generate high friction for the next driver if the car was supposed to be rented again on the same day : Customer service often reports users unsatisfied because they had to wait for the car to come back from the previous rental or users that even had to cancel their rental because the car wasn‚Äôt returned on time.


## Goals üéØ

In order to mitigate those issues we‚Äôve decided to implement a minimum delay between two rentals. A car won‚Äôt be displayed in the search results if the requested checkin or checkout times are too close from an already booked rental.

It solves the late checkout issue but also potentially hurts Getaround/owners revenues: we need to find the right trade off.

**Our Product Manager still needs to decide:**
* **threshold:** how long should the minimum delay be?
* **scope:** should we enable the feature for all cars?, only Connect cars?

In order to help them make the right decision, they are asking you for some data insights. Here are the first analyses they could think of, to kickstart the discussion. Don‚Äôt hesitate to perform additional analysis that you find relevant.

* Which share of our owner‚Äôs revenue would potentially be affected by the feature?
* How many rentals would be affected by the feature depending on the threshold and scope we choose?
* How often are drivers late for the next check-in? How does it impact the next driver?
* How many problematic cases will it solve depending on the chosen threshold and scope?

### Web dashboard

First build a dashboard that will help the product Management team with the above questions. You can use `streamlit` or any other technology that you see fit. 


### Machine Learning - `/predict` endpoint

In addition to the above question, the Data Science team is working on *pricing optimization*. They have gathered some data to suggest optimum prices for car owners using Machine Learning. 

You should provide at least **one endpoint** `/predict`. The full URL would look like something like this: `https://your-url.com/predict`.

This endpoint accepts **POST method** with JSON input data and it should return the predictions. We assume **inputs will be always well formatted**. It means you do not have to manage errors. We leave the error handling as a bonus.

Input example:

```
{
  "input": [[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.001, 3.0, 0.45, 8.8], [7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.001, 3.0, 0.45, 8.8]]
}
```

The response should be a JSON with one key `prediction` corresponding to the prediction.

Response example:

```
{
  "prediction":[6,6]
}
```

### Documentation page

You need to provide the users with a **documentation** about your API.

It has to be located at the `/docs` of your website. If we take the URL example above, it should be located directly at `https://your-url.com/docs`).

This small documentation should at least include:
- An h1 title: the title is up to you.
- A description of every endpoints the user can call with the endpoint name, the HTTP method, the required input and the expected output (you can give example).

You are free to add other any other relevant informations and style your HTML as you wish.

### Online production

You have to **host your API online**. We recommend you to use [Hugging Face](https://huggingface.co/spaces) as it is free of charge. But you are free to choose any other hosting provider.

## Helpers ü¶Æ

To help you start with this project we provide you with some pieces of advice:

* Spend some time understanding data 
* Don't overlook Data Analysis part, there is a lot of insights to find out. 
* Data Analysis should take 2 to 5 hours 
* Machine Learning should take 3 to 6 hours 
* You are not obligated to use libraries to handle your Machine Learning workflow like `mlflow` but we definitely advise you to do so.


### Share your code

In order to get evaluation, do not forget to share your code on a [Github](https://github.com/) repository. You can create a [`README.md`](https://guides.github.com/features/mastering-markdown/) file with a quick description about this project, how to setup locally and the online URL.

## Deliverable üì¨

To complete this project, you should deliver:

- A **dashboard** in production (accessible via a web page for example)
- The **whole code** stored in a **Github repository**. You will include the repository's URL.
- An **documented online API** on Hugging Face server (or any other provider you choose) containing at least **one `/predict` endpoint** that respects the technical description above. We should be able to request the API endpoint `/predict` using `curl`:

```shell
$ curl -i -H "Content-Type: application/json" -X POST -d '{"input": [[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.001, 3.0, 0.45, 8.8]]}' http://your-url/predict
```

Or Python:

```python
import requests

response = requests.post("https://your-url/predict", json={
    "input": [[7.0, 0.27, 0.36, 20.7, 0.045, 45.0, 170.0, 1.001, 3.0, 0.45, 8.8]]
})
print(response.json())
```

## Data 

There are two files you need to download: 

* [Delay Analysis](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/get_around_delay_analysis.xlsx) üëà Data Analysis 
* [Pricing Optimization](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/get_around_pricing_project.csv) üëà Machine Learning 


Happy coding! üë©‚Äçüíª





## Exploratory Data Analysis


In [1]:
import pandas as pd
import numpy as np

In [21]:
df_delay=pd.read_csv("Data/get_around_delay_analysis.csv",delimiter=";")
df_cars=pd.read_csv("Data/get_around_pricing_project.csv")

In [45]:
print("Describe")
display(df_delay.describe())
display(df_cars.describe())
print("Info")
display(df_delay.info())
display(df_cars.info())

print("Head")
display(df_delay.head(10))
display(df_cars.head(10))

print("Tail")
display(df_delay.tail(10))
display(df_cars.tail(10))

print("Sum of NA values")
display(df_delay.isna().sum().sum())
display(df_cars.isna().sum().sum())


display("nb NA sur colonne 7 :",df_cars.iloc[:,7].isna().count())
display("nb NA sur colonne 8 :",df_cars.iloc[:,8].isna().count())
display(df_delay["state"].unique())


df_delay["time_delta_with_previous_rental_in_minutes"]=df_delay["time_delta_with_previous_rental_in_minutes"].fillna(24.0*60.0)

Describe


Unnamed: 0,rental_id,car_id,delay_at_checkout_in_minutes,previous_ended_rental_id,time_delta_with_previous_rental_in_minutes
count,21308.0,21308.0,16345.0,1841.0,1841.0
mean,549716.383612,350030.899897,59.707862,550127.411733,279.28843
std,13859.380484,58208.494595,1002.591977,13184.023111,254.594486
min,504806.0,159250.0,-22433.0,505628.0,0.0
25%,540614.75,317632.5,-36.0,540896.0,60.0
50%,550352.0,368717.0,9.0,550567.0,180.0
75%,560470.0,394928.0,67.0,560823.0,540.0
max,576401.0,417675.0,71084.0,575053.0,720.0


Unnamed: 0.1,Unnamed: 0,mileage,engine_power,rental_price_per_day
count,4843.0,4843.0,4843.0,4843.0
mean,2421.0,140962.8,128.98823,121.214536
std,1398.198007,60196.74,38.99336,33.568268
min,0.0,-64.0,0.0,10.0
25%,1210.5,102913.5,100.0,104.0
50%,2421.0,141080.0,120.0,119.0
75%,3631.5,175195.5,135.0,136.0
max,4842.0,1000376.0,423.0,422.0


Info
<class 'pandas.core.frame.DataFrame'>
Index: 21308 entries, 0 to 21309
Data columns (total 7 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   rental_id                                   21308 non-null  int64  
 1   car_id                                      21308 non-null  int64  
 2   checkin_type                                21308 non-null  object 
 3   state                                       21308 non-null  object 
 4   delay_at_checkout_in_minutes                16345 non-null  float64
 5   previous_ended_rental_id                    1841 non-null   float64
 6   time_delta_with_previous_rental_in_minutes  1841 non-null   float64
dtypes: float64(3), int64(2), object(2)
memory usage: 1.8+ MB


None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4843 entries, 0 to 4842
Data columns (total 15 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   Unnamed: 0                 4843 non-null   int64 
 1   model_key                  4843 non-null   object
 2   mileage                    4843 non-null   int64 
 3   engine_power               4843 non-null   int64 
 4   fuel                       4843 non-null   object
 5   paint_color                4843 non-null   object
 6   car_type                   4843 non-null   object
 7   private_parking_available  4843 non-null   bool  
 8   has_gps                    4843 non-null   bool  
 9   has_air_conditioning       4843 non-null   bool  
 10  automatic_car              4843 non-null   bool  
 11  has_getaround_connect      4843 non-null   bool  
 12  has_speed_regulator        4843 non-null   bool  
 13  winter_tires               4843 non-null   bool  
 14  rental_p

None

Head


Unnamed: 0,rental_id,car_id,checkin_type,state,delay_at_checkout_in_minutes,previous_ended_rental_id,time_delta_with_previous_rental_in_minutes
0,505000,363965,mobile,canceled,,,
1,507750,269550,mobile,ended,-81.0,,
2,508131,359049,connect,ended,70.0,,
3,508865,299063,connect,canceled,,,
4,511440,313932,mobile,ended,,,
5,511626,398802,mobile,ended,-203.0,,
6,511639,370585,connect,ended,-15.0,563782.0,570.0
9,513434,256528,connect,ended,23.0,,
10,513743,330658,mobile,canceled,,,
11,514161,366037,connect,canceled,,,


Unnamed: 0.1,Unnamed: 0,model_key,mileage,engine_power,fuel,paint_color,car_type,private_parking_available,has_gps,has_air_conditioning,automatic_car,has_getaround_connect,has_speed_regulator,winter_tires,rental_price_per_day
0,0,Citro√´n,140411,100,diesel,black,convertible,True,True,False,False,True,True,True,106
1,1,Citro√´n,13929,317,petrol,grey,convertible,True,True,False,False,False,True,True,264
2,2,Citro√´n,183297,120,diesel,white,convertible,False,False,False,False,True,False,True,101
3,3,Citro√´n,128035,135,diesel,red,convertible,True,True,False,False,True,True,True,158
4,4,Citro√´n,97097,160,diesel,silver,convertible,True,True,False,False,False,True,True,183
5,5,Citro√´n,152352,225,petrol,black,convertible,True,True,False,False,True,True,True,131
6,6,Citro√´n,205219,145,diesel,grey,convertible,True,True,False,False,True,True,True,111
7,7,Citro√´n,115560,105,petrol,white,convertible,True,True,False,False,False,True,True,78
8,8,Peugeot,123886,125,petrol,black,convertible,True,False,False,False,False,True,True,79
9,9,Citro√´n,139541,135,diesel,white,convertible,False,False,False,False,True,False,True,132


Tail


Unnamed: 0,rental_id,car_id,checkin_type,state,delay_at_checkout_in_minutes,previous_ended_rental_id,time_delta_with_previous_rental_in_minutes
21300,572293,355023,connect,canceled,,,
21301,572736,393486,mobile,ended,180.0,,
21302,573285,394863,mobile,ended,,,
21303,573305,392752,mobile,ended,72.0,,
21304,573322,376491,connect,ended,-66.0,,
21305,573446,380069,mobile,ended,,573429.0,300.0
21306,573790,341965,mobile,ended,-337.0,,
21307,573791,364890,mobile,ended,144.0,,
21308,574852,362531,connect,ended,-76.0,,
21309,575056,351549,connect,ended,35.0,,


Unnamed: 0.1,Unnamed: 0,model_key,mileage,engine_power,fuel,paint_color,car_type,private_parking_available,has_gps,has_air_conditioning,automatic_car,has_getaround_connect,has_speed_regulator,winter_tires,rental_price_per_day
4833,4833,Toyota,14533,85,diesel,grey,van,True,True,True,False,False,False,True,130
4834,4834,Toyota,47782,110,diesel,blue,van,False,True,False,False,False,False,True,122
4835,4835,Toyota,165707,110,diesel,black,van,False,True,False,False,False,False,True,117
4836,4836,Toyota,81230,100,diesel,black,van,False,True,False,False,False,False,True,119
4837,4837,Toyota,66770,110,diesel,blue,van,False,True,False,False,False,False,True,116
4838,4838,Toyota,39743,110,diesel,black,van,False,True,False,False,False,False,True,121
4839,4839,Toyota,49832,100,diesel,grey,van,False,True,False,False,False,False,True,132
4840,4840,Toyota,19633,110,diesel,grey,van,False,True,False,False,False,False,True,130
4841,4841,Toyota,27920,110,diesel,brown,van,True,True,False,False,False,False,True,151
4842,4842,Audi,195840,160,diesel,grey,van,True,True,False,False,True,False,True,124


Sum of NA values


np.int64(43897)

np.int64(0)

'nb NA sur colonne 7 :'

np.int64(4843)

'nb NA sur colonne 8 :'

np.int64(4843)

array(['canceled', 'ended'], dtype=object)

The 7 and 8 columns for df_delay dataset can be removed since there is no value inside.


In [46]:
df_delay= df_delay.drop(df_delay.columns[[7, 8]], axis=1)


df_delay.describe()


IndexError: index 7 is out of bounds for axis 0 with size 7

In [59]:
import plotly_express as px
seuil=6000.0
mask_ended=df_delay[(df_delay["delay_at_checkout_in_minutes"]<seuil) & (df_delay["delay_at_checkout_in_minutes"]>-seuil)& (df_delay["state"]=="ended")]
mask_canceled=df_delay[ (df_delay["state"]=="canceled")]
display(px.histogram(mask_ended["delay_at_checkout_in_minutes"],title="ended"))
display(px.histogram(mask_canceled["delay_at_checkout_in_minutes"],title="canceled"))
display(px.histogram(df_delay["time_delta_with_previous_rental_in_minutes"],title="time_delta_with_previous_rental_in_minutes"))

In [57]:
display(df_delay.count())
print("nb canceled")
display(df_delay[df_delay["state"]=="canceled"].count())
print("nb ended")
display(df_delay[df_delay["state"]=="ended"].count())
display(df_delay[df_delay["state"]=="canceled"]["delay_at_checkout_in_minutes"].isna().sum())
display(df_delay[df_delay["state"]=="canceled"]["time_delta_with_previous_rental_in_minutes"].isna().sum())
display(df_delay[df_delay["state"]=="canceled"]["time_delta_with_previous_rental_in_minutes"].mean())

rental_id                                     21308
car_id                                        21308
checkin_type                                  21308
state                                         21308
delay_at_checkout_in_minutes                  16345
previous_ended_rental_id                       1841
time_delta_with_previous_rental_in_minutes    21308
dtype: int64

nb canceled


rental_id                                     3264
car_id                                        3264
checkin_type                                  3264
state                                         3264
delay_at_checkout_in_minutes                     1
previous_ended_rental_id                       229
time_delta_with_previous_rental_in_minutes    3264
dtype: int64

nb ended


rental_id                                     18044
car_id                                        18044
checkin_type                                  18044
state                                         18044
delay_at_checkout_in_minutes                  16344
previous_ended_rental_id                       1612
time_delta_with_previous_rental_in_minutes    18044
dtype: int64

np.int64(3263)

np.int64(0)

np.float64(1359.6599264705883)