# Project Name- Exploratory Data Analysis on Dataset "UBER REQUEST                                                        DATA"

## Dataset - https://drive.google.com/file/d/1qxKTDG3cIJFW98Xt1YbM1Q6fwvjqpcId/view?usp=sharing

### Introduction
This data set is a masked data set which is similar to what data analysts at Uber handle. 


### Business Understanding
You may have some experience of travelling to and from the airport. Have you ever used Uber
or any other cab service for this travel? Did you at any time face the problem of cancellation by
the driver or non-availability of cars?

### Import the Lybraries

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Load Dataset

In [3]:
data=pd.read_csv('Uber Request Data.csv')
data

Unnamed: 0,Request id,Pickup point,Driver id,Status,Request timestamp,Drop timestamp
0,619,Airport,1.0,Trip Completed,11/7/2016 11:51,11/7/2016 13:00
1,867,Airport,1.0,Trip Completed,11/7/2016 17:57,11/7/2016 18:47
2,1807,City,1.0,Trip Completed,12/7/2016 9:17,12/7/2016 9:58
3,2532,Airport,1.0,Trip Completed,12/7/2016 21:08,12/7/2016 22:03
4,3112,City,1.0,Trip Completed,13-07-2016 08:33:16,13-07-2016 09:25:47
...,...,...,...,...,...,...
6740,6745,City,,No Cars Available,15-07-2016 23:49:03,
6741,6752,Airport,,No Cars Available,15-07-2016 23:50:05,
6742,6751,City,,No Cars Available,15-07-2016 23:52:06,
6743,6754,City,,No Cars Available,15-07-2016 23:54:39,


Well, if these are the problems faced by customers, these very issues also impact the business
of Uber. If drivers cancel the request of riders or if cars are unavailable, Uber loses out on its
revenue.

As an analyst, you decide to address the problem Uber is facing - driver cancellation and
non-availability of cars leading to loss of potential revenue.

### Business Objectives
The aim of analysis is to identify the root cause of the problem (i.e. cancellation and
non-availability of cars) and recommend ways to improve the situation. As a result of your
analysis, you should be able to present to the client the root cause(s) and possible hypotheses
of the problem(s) and recommend ways to improve them.

In [4]:
data.head()

Unnamed: 0,Request id,Pickup point,Driver id,Status,Request timestamp,Drop timestamp
0,619,Airport,1.0,Trip Completed,11/7/2016 11:51,11/7/2016 13:00
1,867,Airport,1.0,Trip Completed,11/7/2016 17:57,11/7/2016 18:47
2,1807,City,1.0,Trip Completed,12/7/2016 9:17,12/7/2016 9:58
3,2532,Airport,1.0,Trip Completed,12/7/2016 21:08,12/7/2016 22:03
4,3112,City,1.0,Trip Completed,13-07-2016 08:33:16,13-07-2016 09:25:47


#### There are six attributes associated with each request made by a customer

### description of data

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6745 entries, 0 to 6744
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Request id         6745 non-null   int64  
 1   Pickup point       6745 non-null   object 
 2   Driver id          4095 non-null   float64
 3   Status             6745 non-null   object 
 4   Request timestamp  6745 non-null   object 
 5   Drop timestamp     2831 non-null   object 
dtypes: float64(1), int64(1), object(4)
memory usage: 316.3+ KB


In [6]:
data.shape

(6745, 6)

In [7]:
data.columns

Index(['Request id', 'Pickup point', 'Driver id', 'Status',
       'Request timestamp', 'Drop timestamp'],
      dtype='object')

#### 1. Request id: A unique identifier of the request


In [8]:
len(data['Request id'].unique())

6745

#### Checking Missing values

In [9]:
data.isnull().sum()

Request id              0
Pickup point            0
Driver id            2650
Status                  0
Request timestamp       0
Drop timestamp       3914
dtype: int64

In [10]:
data.shape[0]

6745

In [11]:
(data.isnull().sum()/data.shape[0]*100)

Request id            0.000000
Pickup point          0.000000
Driver id            39.288362
Status                0.000000
Request timestamp     0.000000
Drop timestamp       58.028169
dtype: float64

In [12]:
data['Request timestamp']

0           11/7/2016 11:51
1           11/7/2016 17:57
2            12/7/2016 9:17
3           12/7/2016 21:08
4       13-07-2016 08:33:16
               ...         
6740    15-07-2016 23:49:03
6741    15-07-2016 23:50:05
6742    15-07-2016 23:52:06
6743    15-07-2016 23:54:39
6744    15-07-2016 23:55:03
Name: Request timestamp, Length: 6745, dtype: object

In [13]:
data['Request.timestamp']=pd.to_datetime(data['Request timestamp'],dayfirst=True)
data

Unnamed: 0,Request id,Pickup point,Driver id,Status,Request timestamp,Drop timestamp,Request.timestamp
0,619,Airport,1.0,Trip Completed,11/7/2016 11:51,11/7/2016 13:00,2016-07-11 11:51:00
1,867,Airport,1.0,Trip Completed,11/7/2016 17:57,11/7/2016 18:47,2016-07-11 17:57:00
2,1807,City,1.0,Trip Completed,12/7/2016 9:17,12/7/2016 9:58,2016-07-12 09:17:00
3,2532,Airport,1.0,Trip Completed,12/7/2016 21:08,12/7/2016 22:03,2016-07-12 21:08:00
4,3112,City,1.0,Trip Completed,13-07-2016 08:33:16,13-07-2016 09:25:47,2016-07-13 08:33:16
...,...,...,...,...,...,...,...
6740,6745,City,,No Cars Available,15-07-2016 23:49:03,,2016-07-15 23:49:03
6741,6752,Airport,,No Cars Available,15-07-2016 23:50:05,,2016-07-15 23:50:05
6742,6751,City,,No Cars Available,15-07-2016 23:52:06,,2016-07-15 23:52:06
6743,6754,City,,No Cars Available,15-07-2016 23:54:39,,2016-07-15 23:54:39


In [14]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6745 entries, 0 to 6744
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Request id         6745 non-null   int64         
 1   Pickup point       6745 non-null   object        
 2   Driver id          4095 non-null   float64       
 3   Status             6745 non-null   object        
 4   Request timestamp  6745 non-null   object        
 5   Drop timestamp     2831 non-null   object        
 6   Request.timestamp  6745 non-null   datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(1), object(4)
memory usage: 369.0+ KB


## 2.2. Time of request: The date and time at which the customer made the trip request


In [10]:
data['Drop timestamp']=pd.to_datetime(data['Drop timestamp'],dayfirst=True)
data

Unnamed: 0,Request id,Pickup point,Driver id,Status,Request timestamp,Drop timestamp
0,619,Airport,1.0,Trip Completed,11/7/2016 11:51,2016-07-11 13:00:00
1,867,Airport,1.0,Trip Completed,11/7/2016 17:57,2016-07-11 18:47:00
2,1807,City,1.0,Trip Completed,12/7/2016 9:17,2016-07-12 09:58:00
3,2532,Airport,1.0,Trip Completed,12/7/2016 21:08,2016-07-12 22:03:00
4,3112,City,1.0,Trip Completed,13-07-2016 08:33:16,2016-07-13 09:25:47
...,...,...,...,...,...,...
6740,6745,City,,No Cars Available,15-07-2016 23:49:03,NaT
6741,6752,Airport,,No Cars Available,15-07-2016 23:50:05,NaT
6742,6751,City,,No Cars Available,15-07-2016 23:52:06,NaT
6743,6754,City,,No Cars Available,15-07-2016 23:54:39,NaT


In [11]:
data['Request timestamp']=data['Request timestamp'].astype(str)

In [12]:
data['Request timestamp']=data['Request timestamp'].replace('/','-')

In [13]:
data['Request timestamp'][4]

'13-07-2016 08:33:16'

In [14]:
data['Drop timestamp']

0      2016-07-11 13:00:00
1      2016-07-11 18:47:00
2      2016-07-12 09:58:00
3      2016-07-12 22:03:00
4      2016-07-13 09:25:47
               ...        
6740                   NaT
6741                   NaT
6742                   NaT
6743                   NaT
6744                   NaT
Name: Drop timestamp, Length: 6745, dtype: datetime64[ns]

In [15]:
import datetime as dt

In [19]:
data['Request timestamp'].value_counts()


11/7/2016 17:57        6
11/7/2016 9:40         6
11/7/2016 19:02        6
11/7/2016 8:37         6
12/7/2016 4:46         5
                      ..
13-07-2016 15:41:05    1
15-07-2016 08:42:40    1
12/7/2016 5:12         1
13-07-2016 14:33:34    1
11/7/2016 18:19        1
Name: Request timestamp, Length: 5618, dtype: int64

In [21]:
data['Drop timestamp']

0      2016-07-11 13:00:00
1      2016-07-11 18:47:00
2      2016-07-12 09:58:00
3      2016-07-12 22:03:00
4      2016-07-13 09:25:47
               ...        
6740                   NaT
6741                   NaT
6742                   NaT
6743                   NaT
6744                   NaT
Name: Drop timestamp, Length: 6745, dtype: datetime64[ns]

In [36]:
import datetime as dt
day=data['Request timestamp'].dt.day

AttributeError: Can only use .dt accessor with datetimelike values

In [35]:
req_hour=data['Request timestamp'].dt.hour

AttributeError: Can only use .dt accessor with datetimelike values

In [29]:
pip install datetime

Collecting datetime
  Downloading DateTime-4.3-py2.py3-none-any.whl (60 kB)
Installing collected packages: datetime
Successfully installed datetime-4.3
Note: you may need to restart the kernel to use updated packages.


In [33]:
#hr=data['Request timestamp'].dt.hour

### Data Visualzation

In [42]:
sns.barplot(data=data,x='Status')
plt.show()
print(data['Status'].value_counts())

TypeError: Horizontal orientation requires numeric `x` variable.