# Insights from Failed Orders
---
## Background
Gett, previously known as GetTaxi, is an Israeli-developed technology platform solely focused on corporate Ground Transportation Managemen (GTM). They have an application where clients can order taxis, and drivers can accept their rides (offers). At the moment, when the client clicks the Order button in the application, the matching system searches for the most relevant drivers and offers them the order. In this task, we would like to investigate some matching metrics for orders that did not completed successfully, i.e., the customer didn't end up getting a car.

## Tables of Contents
[Data Description](#dd)

<a id='q'>Questions</a>

<a id='eda'>Exploratory Data Analysis</a>

<a id='q1'>Question 1</a>

<a id='q2'>Question 2</a>

<a id='q3'>Question 3</a>

[Question 4](#q4)

---

<a id='dd'>Data Description</a>

2 datasets: `data_orders` and `data_offers` being stored in CSV format are provided. 

The dataset `data_orders` contain:
- `order_datetime` - time of the order
- `origin_longtitude` - longtitude of the order
- `origin_latitude` - latitude of the order
- `m_order_eta` - time before order arrival
- `order_gk` - order number
- `order_status_key` - status of the order
    - `4` - cancelled by clinet, 
    - `9` - cancelled by system (a reject)
- `is_driver_assigned_key` - whether a driver has been assigned
- `cancellation_time_in_seconds` - how many seconds passed before cancellation

The dataset `data_offers` contain:
- `order_gk` - order number, associated with the same column from the `orders` dataset
- `origin_id` - ID of an offer

## <a href='q'>Questions</a>

1. Build up distribution of orders according to reasons for failure: cancellations before and after driver assignment, and reasons for order rejection. Analyse the resulting plot. Which category has the highest number of orders?
2. Plot the distribution of failed orders by hours. Is there a trend that certain hours have an abnormally high proportion of one category or another? What hours are the biggest fails? How can this be explained?
3. Plot the average time to cancellation with and without driver, by the hour. If there are any outliers in the data, it would be better to remove them. Can we draw any conclusions from this plot?
4. Plot the distribution of average ETA by hours. How can this plot be explained?

## <a href='#eda'>Exploratory Data Analysis</a>

In [1]:
import pandas as pd

In [3]:
orders = pd.read_csv(filepath_or_buffer="datasets/data_orders.csv")

In [4]:
#rows, columns
orders.shape

(10716, 8)

In [5]:
orders.head()

Unnamed: 0,order_datetime,origin_longitude,origin_latitude,m_order_eta,order_gk,order_status_key,is_driver_assigned_key,cancellations_time_in_seconds
0,18:08:07,-0.978916,51.456173,60.0,3000583041974,4,1,198.0
1,20:57:32,-0.950385,51.456843,,3000583116437,4,0,128.0
2,12:07:50,-0.96952,51.455544,477.0,3000582891479,4,1,46.0
3,13:50:20,-1.054671,51.460544,658.0,3000582941169,4,1,62.0
4,21:24:45,-0.967605,51.458236,,3000583140877,9,0,


In [6]:
offers = pd.read_csv(filepath_or_buffer="datasets/data_offers.csv")

In [7]:
offers.shape

(334363, 2)

In [8]:
offers.head()

Unnamed: 0,order_gk,offer_id
0,3000579625629,300050936206
1,3000627306450,300052064651
2,3000632920686,300052408812
3,3000632771725,300052393030
4,3000583467642,300051001196


As two datasets contain `orders` column in common, we may want to merge them into one for easier manipulation. 

In [9]:
df = orders.merge(right=offers, how='inner', on='order_gk')

In [10]:
df.head()

Unnamed: 0,order_datetime,origin_longitude,origin_latitude,m_order_eta,order_gk,order_status_key,is_driver_assigned_key,cancellations_time_in_seconds,offer_id
0,18:08:07,-0.978916,51.456173,60.0,3000583041974,4,1,198.0,300050983403
1,20:57:32,-0.950385,51.456843,,3000583116437,4,0,128.0,300050986179
2,20:57:32,-0.950385,51.456843,,3000583116437,4,0,128.0,300050986174
3,20:57:32,-0.950385,51.456843,,3000583116437,4,0,128.0,300050986180
4,12:07:50,-0.96952,51.455544,477.0,3000582891479,4,1,46.0,300050976275


## <a href='q1'>Question 1</a>

## <a href='q2'>Question 2</a>

## <a href='q3'>Question 3</a>

<a id='q4'>Question 4</a>