This project explores Uber ride booking data to gain insights into ride patterns, user behavior, and trends using Python data science libraries. It also applies Machine Learning techniques for predictive modeling. The dataset is sourced from Kaggle and analyzed step by step in a Jupyter Notebook.
- Source: Uber Ride Analytics Dashboard - Kaggle
- File Used:
ncr_ride_bookings.csv - Contains booking details such as ride status, pickup/drop information, payment type, and timestamps.
- Python
- Pandas β Data manipulation
- NumPy β Numerical computations
- Matplotlib & Seaborn β Data visualization
- Scikit-learn β Machine Learning models
- KaggleHub β To download dataset
- Loaded CSV data
- Handled missing values
- Converted datetime columns into features (day, month, weekday, hour)
- Encoded categorical variables
- Ride distribution by hours, weekdays, and months
- Pickup vs Drop location trends
- Ride status distribution (Completed, Cancelled, Incomplete)
- Payment method preferences
- Correlation heatmaps and count plots
- Feature Engineering: Extracted time-based and categorical features
- Classification Models (predicting ride status):
- Logistic Regression
- Random Forest
- Decision Tree
- Clustering: KMeans for grouping rides by locations and patterns
- Evaluation Metrics: Accuracy, Precision, Recall, F1-score
- Clone this repository:
git clone https://github.com/mahadev19/Uber-Data-Science.git