## Assignment 1: Data Exploration

- Load Uber.csv dataset.

- Display the number of rows and columns.

- Show unique categories in the CATEGORY column.

- Find how many null values are there in PURPOSE.

- Rename all columns in uppercase.

## Assignment 2: Data Filtering & Transformation

- Display rides where CATEGORY == 'Business'.

- Show top 5 rides with the longest distance (MILES).

- Replace all missing PURPOSE values with "Not Specified".

- Create a new column TRIP_DURATION using END_DATE - START_DATE.

- Sort trips by distance in descending order.

## Assignment 3: Grouping and Aggregation

- Group by CATEGORY and find average miles per category.

- Find total trips for each PURPOSE.

- Identify top 3 start locations by number of rides.

- Plot:

- A bar chart of average miles by category.
- A pie chart of trip purposes.

# Assignment 1: Data Exploration

In [2]:
import pandas as pd
uber_data = pd.read_csv("Uber_Drives_2016.csv")
uber_data.head()

Unnamed: 0,START_DATE*,END_DATE*,CATEGORY*,START*,STOP*,MILES*,PURPOSE*
0,1/1/2016 21:11,1/1/2016 21:17,Business,Fort Pierce,Fort Pierce,5.1,Meal/Entertain
1,1/2/2016 1:25,1/2/2016 1:37,Business,Fort Pierce,Fort Pierce,5.0,
2,1/2/2016 20:25,1/2/2016 20:38,Business,Fort Pierce,Fort Pierce,4.8,Errand/Supplies
3,1/5/2016 17:31,1/5/2016 17:45,Business,Fort Pierce,Fort Pierce,4.7,Meeting
4,1/6/2016 14:42,1/6/2016 15:49,Business,Fort Pierce,West Palm Beach,63.7,Customer Visit


In [2]:
print(uber_data.shape)

(1156, 7)


In [12]:
print(uber_data['CATEGORY*'].unique())


['Business' 'Personal' nan]


In [14]:
print(uber_data['PURPOSE*'].isnull().sum())

503


In [15]:
uber_data.columns = [col.upper() for col in uber_data.columns]
print(uber_data.columns)

Index(['START_DATE*', 'END_DATE*', 'CATEGORY*', 'START*', 'STOP*', 'MILES*',
       'PURPOSE*'],
      dtype='object')


In [16]:
Uber_data=uber_data.rename(columns={'START_DATE*':'START_DATE','END_DATE*':'END_DATE','START*':'START'})
Uber_data

Unnamed: 0,START_DATE,END_DATE,CATEGORY*,START,STOP*,MILES*,PURPOSE*
0,1/1/2016 21:11,1/1/2016 21:17,Business,Fort Pierce,Fort Pierce,5.1,Meal/Entertain
1,1/2/2016 1:25,1/2/2016 1:37,Business,Fort Pierce,Fort Pierce,5.0,
2,1/2/2016 20:25,1/2/2016 20:38,Business,Fort Pierce,Fort Pierce,4.8,Errand/Supplies
3,1/5/2016 17:31,1/5/2016 17:45,Business,Fort Pierce,Fort Pierce,4.7,Meeting
4,1/6/2016 14:42,1/6/2016 15:49,Business,Fort Pierce,West Palm Beach,63.7,Customer Visit
...,...,...,...,...,...,...,...
1151,12/31/2016 13:24,12/31/2016 13:42,Business,Kar?chi,Unknown Location,3.9,Temporary Site
1152,12/31/2016 15:03,12/31/2016 15:38,Business,Unknown Location,Unknown Location,16.2,Meeting
1153,12/31/2016 21:32,12/31/2016 21:50,Business,Katunayake,Gampaha,6.4,Temporary Site
1154,12/31/2016 22:08,12/31/2016 23:51,Business,Gampaha,Ilukwatta,48.2,Temporary Site


# Assignment 2: Data Filtering & Transformation

In [3]:
Uber_data=uber_data.rename(columns={'START_DATE*':'START_DATE','END_DATE*':'END_DATE','START*':'START'})
Uber_data

Unnamed: 0,START_DATE,END_DATE,CATEGORY*,START,STOP*,MILES*,PURPOSE*
0,1/1/2016 21:11,1/1/2016 21:17,Business,Fort Pierce,Fort Pierce,5.1,Meal/Entertain
1,1/2/2016 1:25,1/2/2016 1:37,Business,Fort Pierce,Fort Pierce,5.0,
2,1/2/2016 20:25,1/2/2016 20:38,Business,Fort Pierce,Fort Pierce,4.8,Errand/Supplies
3,1/5/2016 17:31,1/5/2016 17:45,Business,Fort Pierce,Fort Pierce,4.7,Meeting
4,1/6/2016 14:42,1/6/2016 15:49,Business,Fort Pierce,West Palm Beach,63.7,Customer Visit
...,...,...,...,...,...,...,...
1151,12/31/2016 13:24,12/31/2016 13:42,Business,Kar?chi,Unknown Location,3.9,Temporary Site
1152,12/31/2016 15:03,12/31/2016 15:38,Business,Unknown Location,Unknown Location,16.2,Meeting
1153,12/31/2016 21:32,12/31/2016 21:50,Business,Katunayake,Gampaha,6.4,Temporary Site
1154,12/31/2016 22:08,12/31/2016 23:51,Business,Gampaha,Ilukwatta,48.2,Temporary Site


In [4]:
rides = uber_data.nlargest(5, 'MILES*')
rides

Unnamed: 0,START_DATE*,END_DATE*,CATEGORY*,START*,STOP*,MILES*,PURPOSE*
1155,Totals,,,,,12204.7,
269,3/25/2016 16:52,3/25/2016 22:22,Business,Latta,Jacksonville,310.3,Customer Visit
270,3/25/2016 22:54,3/26/2016 1:39,Business,Jacksonville,Kissimmee,201.0,Meeting
881,10/30/2016 15:22,10/30/2016 18:23,Business,Asheville,Mebane,195.9,
776,9/27/2016 21:01,9/28/2016 2:37,Business,Unknown Location,Unknown Location,195.6,


In [5]:
data= uber_data['PURPOSE*'].fillna('Not Specified')
print(uber_data['PURPOSE*'].isnull().sum())


503


In [9]:
START_DATE = pd.to_datetime(uber_data['START_DATE*'])
END_DATE = pd.to_datetime(uber_data['END_DATE*'])
TRIP_DURATION = END_DATE - START_DATE
data = pd.DataFrame({'START_DATE': START_DATE,'END_DATE': END_DATE,'TRIP_DURATION': TRIP_DURATION})
print(data.head())

           START_DATE            END_DATE   TRIP_DURATION
0 2016-01-01 21:11:00 2016-01-01 21:17:00 0 days 00:06:00
1 2016-01-02 01:25:00 2016-01-02 01:37:00 0 days 00:12:00
2 2016-01-02 20:25:00 2016-01-02 20:38:00 0 days 00:13:00
3 2016-01-05 17:31:00 2016-01-05 17:45:00 0 days 00:14:00
4 2016-01-06 14:42:00 2016-01-06 15:49:00 0 days 01:07:00


In [12]:
sorted_trips = uber_data.sort_values(by='MILES*', ascending=False)
sorted_trips

Unnamed: 0,START_DATE*,END_DATE*,CATEGORY*,START*,STOP*,MILES*,PURPOSE*
269,3/25/2016 16:52,3/25/2016 22:22,Business,Latta,Jacksonville,310.3,Customer Visit
270,3/25/2016 22:54,3/26/2016 1:39,Business,Jacksonville,Kissimmee,201.0,Meeting
881,10/30/2016 15:22,10/30/2016 18:23,Business,Asheville,Mebane,195.9,
776,9/27/2016 21:01,9/28/2016 2:37,Business,Unknown Location,Unknown Location,195.6,
546,7/14/2016 16:39,7/14/2016 20:05,Business,Morrisville,Banner Elk,195.3,
...,...,...,...,...,...,...,...
1110,12/24/2016 22:04,12/24/2016 22:09,Business,Lahore,Lahore,0.6,Errand/Supplies
516,7/5/2016 16:48,7/5/2016 16:52,Business,Whitebridge,Whitebridge,0.6,Errand/Supplies
44,1/26/2016 17:27,1/26/2016 17:29,Business,Cary,Cary,0.5,Errand/Supplies
120,2/17/2016 16:38,2/17/2016 16:43,Business,Katunayaka,Katunayaka,0.5,Errand/Supplies
