<a href="https://colab.research.google.com/github/ishadvay3928/Ola-Ride-Insights/blob/main/Ola_Ride_Insights.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **Ola Ride Insights Analysis**



# **GitHub Link -**

https://github.com/ishadvay3928/Ola-Ride-Insights/blob/main/Ola_Ride_Insights.ipynb

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

### Dataset Loading

In [None]:
# Load the Dataset
df = pd.read_csv("/content/OLA_DataSet.xlsx - July.csv")

### Dataset First View

In [None]:
# First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count of datasets
df.isnull().sum()

### What did you know about your dataset?

- There are 103024 rows and 20 columns in the dataset.
- There are missing Values in columns V_TAT, C_TAT, Canceled_Rides_by_Customer, Canceled_Rides_by_Driver, Incomplete_Rides, Incomplete_Rides_Reason, Payment_Method, Driver_Ratings and Customer_Rating.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
#Dataset Describe
df.describe(include='all')

### Variables Description

#### Variable Description

| Variable Name                  | Description                                            |
| :----------------------------- | :----------------------------------------------------- |
| **Date** | The date on which the ride was booked.                 |
| **Time** | The time when the ride was booked.                     |
| **Booking_ID** | A unique identifier for each ride booking.             |
| **Booking_Status** | The final status of the booking (e.g., success, canceled).|
| **Customer_ID** | A unique identifier for the customer.                  |
| **Vehicle_Type** | The type of vehicle used for the ride (e.g., Bike, Prime Sedan, eBike).|
| **Pickup_Location** | The location where the customer was picked up.         |
| **Drop_Location** | The location where the customer was dropped off.       |
| **V_TAT** | Vehicle Turn Around Time. The time taken for a vehicle to complete a trip. |
| **C_TAT** | Customer Turn Around Time. The time taken for the customer's journey. |
| **Canceled_Rides_by_Customer** | Indicates if a ride was canceled by the customer.      |
| **Canceled_Rides_by_Driver** | Indicates if a ride was canceled by the driver.        |
| **Incomplete_Rides** | Indicates if the ride was incomplete.                  |
| **Incomplete_Rides_Reason** | The reason for the incomplete ride.                    |
| **Booking_Value** | The total monetary value of the ride.                  |
| **Payment_Method** | The method used for payment (e.g., UPI, Cash).         |
| **Ride_Distance** | The total distance of the ride in kilometers.          |
| **Driver_Ratings** | The rating given to the driver for the ride.           |
| **Customer_Rating** | The rating given by the customer for the ride.         |
| **Vehicle Images** | A link or reference to an image of the vehicle.        |

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable of dataset.
df.nunique()

## ***3. Data Wrangling***

### Data Wrangling Code

In [None]:
# 1. Standardize column names
df.columns = df.columns.str.strip().str.replace(' ', '_').str.replace('__', '_')

In [None]:
# 2. Convert 'Date' column to datetime objects
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

In [None]:
# 3. Drop 'Vehicle_Images' column as it's not relevant for analysis
if 'Vehicle_Images' in df.columns:
        df = df.drop(columns=['Vehicle_Images'])

In [None]:
# 4. Handling Missing Values
try:

    # --- Imputation for Numerical Columns ---
    numerical_cols = ['C_TAT','V_TAT','Driver_Ratings', 'Customer_Rating']
    for col in numerical_cols:
        # Convert to numeric, coercing errors
        df[col] = pd.to_numeric(df[col], errors='coerce')
        # Fill missing values with 0 and assign the result back
        df[col] = df[col].fillna(0)

    # --- Imputation for Categorical Columns ---
    # Impute 'Payment_Method' with the mode
    mode_payment = df['Payment_Method'].mode()[0]
    df['Payment_Method'] = df['Payment_Method'].fillna(mode_payment)

    # Impute other categorical columns with 'Unknown'
    categorical_cols_unknown = ['Canceled_Rides_by_Customer', 'Canceled_Rides_by_Driver',
                                'Incomplete_Rides', 'Incomplete_Rides_Reason']
    for col in categorical_cols_unknown:
        df[col] = df[col].fillna('N/A')

    # Verify that missing values have been handled
    print("Missing values after imputation:")
    print(df[['C_TAT', 'V_TAT', 'Driver_Ratings', 'Customer_Rating',
              'Payment_Method', 'Canceled_Rides_by_Customer',
              'Canceled_Rides_by_Driver', 'Incomplete_Rides',
              'Incomplete_Rides_Reason']].isnull().sum())

    print("\nSuccessfully handled all missing values without warnings.")

except FileNotFoundError:
    print(f"Error: The file '{'/content/OLA_DataSet.xlsx - July.csv'}' was not found. Please ensure the file is in the correct directory.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

In [None]:
# Save cleaned dataset
df.to_csv("Ola_clean_dataset.csv", index=False)

### What all manipulations have you done and insights you found?

#### **Key Manipulations:**

* Standardized column names (removed spaces & duplicates).
* Converted `Date` column to datetime (invalid → NaT).
* Dropped irrelevant column `Vehicle_Images`.
* Converted numerical cols (`C_TAT`, `V_TAT`, `Driver_Ratings`, `Customer_Rating`) to numeric & filled missing with `0`.
* Filled missing `Payment_Method` with mode.
* Filled other categorical cols (`Canceled_Rides_by_Customer`, `Canceled_Rides_by_Driver`, `Incomplete_Rides`, `Incomplete_Rides_Reason`) with `'N/A'`.
* Saved cleaned dataset as **`Ola_clean_dataset.csv`**.

---

#### **Insights Gained:**

* Clean column names prevent referencing errors.
* Datetime conversion enables trend/time analysis.
* Removing irrelevant data reduces noise.
* Imputation ensures dataset completeness for analysis.
* Mode imputation reflects realistic payment trends.
* `'N/A'` marks missing categories clearly without losing data.
* Dataset is ready for EDA, ML, or reporting.