# Uber Request Data Exploratory Data Analysis

## Project Overview
This notebook contains the exploratory data analysis of Uber request data. We'll analyze patterns in requests, cancellations, wait times, and other key metrics.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# Set plot style
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (12, 6)

# Read the data
df = pd.read_csv('../data/Uber_Request_Data.csv')

# Display basic information
print("Data Shape:", df.shape)
print("
Data Columns:")
print(df.columns.tolist())

# Convert timestamp columns to datetime
df['Request_timestamp'] = pd.to_datetime(df['Request_timestamp'])
df['Pickup_timestamp'] = pd.to_datetime(df['Pickup_timestamp'])

## 1. Data Cleaning
Let's clean and prepare the data for analysis.

In [None]:
# Add calculated columns
df['Hour'] = df['Request_timestamp'].dt.hour
df['Day'] = df['Request_timestamp'].dt.day_name()
df['Weekday'] = df['Request_timestamp'].dt.weekday
df['Date'] = df['Request_timestamp'].dt.date

# Calculate wait time in minutes
df['Wait_Time'] = (df['Pickup_timestamp'] - df['Request_timestamp']).dt.total_seconds() / 60

# Clean Status column
df['Status'] = df['Status'].str.strip()

# Display cleaned data
df.head()

## 2. Data Analysis
### 2.1 Request Distribution by Location

In [None]:
# Request distribution by location
location_dist = df['Pickup_point'].value_counts()

plt.figure(figsize=(10, 6))
sns.barplot(x=location_dist.index, y=location_dist.values)
plt.title('Request Distribution by Location')
plt.xlabel('Pickup Point')
plt.ylabel('Number of Requests')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

### 2.2 Status Distribution

In [None]:
# Status distribution
status_dist = df['Status'].value_counts()

plt.figure(figsize=(8, 8))
plt.pie(status_dist.values, labels=status_dist.index, autopct='%.1f%%', startangle=90)
plt.title('Request Status Distribution')
plt.tight_layout()
plt.show()

### 2.3 Hourly Pattern Analysis

In [None]:
# Hourly request distribution
hourly_dist = df.groupby('Hour')['Request_timestamp'].count()

plt.figure(figsize=(12, 6))
sns.lineplot(x=hourly_dist.index, y=hourly_dist.values)
plt.title('Hourly Request Distribution')
plt.xlabel('Hour of Day')
plt.ylabel('Number of Requests')
plt.xticks(range(24))
plt.grid(True)
plt.tight_layout()
plt.show()

### 2.4 Daily Pattern Analysis

In [None]:
# Daily request distribution
daily_dist = df.groupby('Day')['Request_timestamp'].count().sort_values()

plt.figure(figsize=(10, 6))
sns.barplot(x=daily_dist.index, y=daily_dist.values)
plt.title('Daily Request Distribution')
plt.xlabel('Day of Week')
plt.ylabel('Number of Requests')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

### 2.5 Wait Time Analysis

In [None]:
# Wait time analysis (only for completed rides)
completed_df = df[df['Status'] == 'Completed']

plt.figure(figsize=(12, 6))
sns.boxplot(x='Pickup_point', y='Wait_Time', data=completed_df)
plt.title('Wait Time Distribution by Location')
plt.xlabel('Pickup Point')
plt.ylabel('Wait Time (minutes)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 3. Insights and Recommendations

### Key Findings:
1. Location Analysis:
   - Identify the most popular pickup locations
   - Analyze location-wise service demand
   - Optimize driver allocation

2. Time Patterns:
   - Identify peak hours for service demand
   - Analyze wait times during peak hours
   - Optimize driver scheduling

3. Service Performance:
   - Track completion vs cancellation rates
   - Monitor wait times
   - Identify areas for improvement

### Recommendations:
1. Driver Allocation:
   - Increase driver availability during peak hours
   - Focus on high-demand locations
   - Implement dynamic driver distribution

2. Service Improvement:
   - Reduce wait times in high-demand areas
   - Address cancellation hotspots
   - Enhance weekend service coverage

3. Resource Planning:
   - Schedule drivers based on historical patterns
   - Monitor performance metrics
   - Implement location-specific strategies