# Indian Vehicle Booking Market Analysis

## 1. Introduction
This notebook details the analysis of the Indian Vehicle Booking Market, focusing on segmentation analysis to identify feasible market entry strategies for an online vehicle booking startup. The analysis includes location suitability based on the Innovation Adoption Life Cycle, demographic targeting, and pricing strategy.

## 2. Data Collection and Preprocessing

The dataset used for this analysis is `All-timeTable-Bangalore-Wards.csv`, which contains cab booking data for various wards in Bangalore.

### 2.1. Initial Data Inspection

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv("/home/ubuntu/upload/All-timeTable-Bangalore-Wards.csv")

# Display the first few rows of the dataframe
print("First 5 rows of the dataframe:")
print(df.head())

# Display dataframe information
print("\nDataFrame Info:")
df.info()

# Display descriptive statistics
print("\nDescriptive Statistics:")
print(df.describe(include=\'all\'))

# Check for missing values
print("\nMissing Values:")
print(df.isnull().sum())

### 2.2. Data Preprocessing

Before performing segmentation, the data needs to be cleaned and converted to appropriate numeric types. This involves removing currency symbols, commas, and converting percentage strings to floats.

In [None]:
# Function to clean and convert columns to numeric
def clean_numeric(series):
    return series.astype(str).str.replace("₹", "").str.replace(",", "").astype(float)

# Columns to clean and convert
columns_to_clean = [
    "Searches",
    "Searches which got estimate",
    "Searches for Quotes",
    "Searches which got Quotes",
    "Bookings",
    "Completed Trips",
    "Cancelled Bookings",
    "Drivers\\' Earnings",
    "Distance Travelled (km)",
    "Average Fare per Trip"
]

for col in columns_to_clean:
    df[col] = clean_numeric(df[col])

# Convert percentage columns to float
percentage_columns = [
    "Search-to-estimate Rate",
    "Estimate-to-search for quotes Rate",
    "Quote Acceptance Rate",
    "Quote-to-booking Rate",
    "Booking Cancellation Rate",
    "Conversion Rate"
]

for col in percentage_columns:
    df[col] = df[col].astype(str).str.replace("%", "").astype(float) / 100

# Display the first few rows of the cleaned dataframe
print("First 5 rows of the cleaned dataframe:")
print(df.head())

# Display dataframe information to verify data types
print("\nDataFrame Info after cleaning:")
df.info()

### 2.3. Exploratory Data Analysis (EDA)

Visualizations to understand the distribution of key features and identify potential outliers.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")

# Histograms for key numerical features
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle("Distribution of Key Numerical Features", fontsize=16)

sns.histplot(df["Searches"], kde=True, ax=axes[0, 0])
axes[0, 0].set_title("Distribution of Searches")

sns.histplot(df["Bookings"], kde=True, ax=axes[0, 1])
axes[0, 1].set_title("Distribution of Bookings")

sns.histplot(df["Completed Trips"], kde=True, ax=axes[1, 0])
axes[1, 0].set_title("Distribution of Completed Trips")

sns.histplot(df["Drivers\' Earnings"], kde=True, ax=axes[1, 1])
axes[1, 1].set_title("Distribution of Drivers\' Earnings")

plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()

# Box plots for key numerical features (to check for outliers)
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle("Box Plots of Key Numerical Features", fontsize=16)

sns.boxplot(y=df["Searches"], ax=axes[0, 0])
axes[0, 0].set_title("Box Plot of Searches")

sns.boxplot(y=df["Bookings"], ax=axes[0, 1])
axes[0, 1].set_title("Box Plot of Bookings")

sns.boxplot(y=df["Completed Trips"], ax=axes[1, 0])
axes[1, 0].set_title("Box Plot of Completed Trips")

sns.boxplot(y=df["Drivers\' Earnings"], ax=axes[1, 1])
axes[1, 1].set_title("Box Plot of Drivers\' Earnings")

plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()

![EDA Histograms](/home/ubuntu/eda_histograms.png)
![EDA Box Plots](/home/ubuntu/eda_boxplots.png)

### 2.4. Correlation Matrix

Understanding the relationships between different numerical features.

In [None]:
# Select only numerical columns for correlation matrix
numerical_cols = df.select_dtypes(include=["float64", "int64"]).columns
correlation_matrix = df[numerical_cols].corr()

plt.figure(figsize=(16, 12))
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Matrix of Numerical Features", fontsize=16)
plt.show()

![Correlation Matrix](/home/ubuntu/correlation_matrix.png)

## 3. Market Segmentation Analysis

K-Means clustering was used to segment the Bangalore wards based on their cab booking behavior. The Elbow Method was employed to determine the optimal number of clusters.

### 3.1. Elbow Method for Optimal K

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import seaborn as sns

# Select features for clustering
features = [
    "Searches",
    "Bookings",
    "Completed Trips",
    "Conversion Rate",
    "Average Distance per Trip (km)",
    "Average Fare per Trip",
    "Distance Travelled (km)",
    "Drivers\\' Earnings"
]

X = df[features]

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Determine the optimal number of clusters using the Elbow Method
# Sum of squared distances
ssd = []
for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X_scaled)
    ssd.append(kmeans.inertia_)

# Plot the Elbow Method graph
plt.figure(figsize=(10, 6))
plt.plot(range(1, 11), ssd, marker=\'o\')
plt.title(\'Elbow Method for Optimal K\')
plt.xlabel(\'Number of Clusters (K)\')
plt.ylabel(\'Sum of Squared Distances (SSD)\')
plt.grid(True)
plt.show()

![Elbow Method Plot](/home/ubuntu/elbow_method.png)

### 3.2. K-Means Clustering and Cluster Analysis

Based on the Elbow Method, an optimal K value (e.g., 3) was chosen for clustering. The characteristics of each cluster were then analyzed.

In [None]:
# For demonstration, let\\'s choose k=3
k = 3
kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
df[\\'Cluster\'] = kmeans.fit_predict(X_scaled)

# Analyze the characteristics of each cluster
cluster_centers = pd.DataFrame(scaler.inverse_transform(kmeans.cluster_centers_), columns=features)
cluster_centers[\\'Cluster\'] = range(k)
print("\nCluster Centers (Original Scale):")
print(cluster_centers)

# Count of wards in each cluster
print("\nNumber of Wards per Cluster:")
print(df[\\'Cluster\'].value_counts().sort_index())

# Visualize clusters (example: scatter plot of two features)
plt.figure(figsize=(12, 8))
sns.scatterplot(x=\'Completed Trips\', y=\'Drivers\\' Earnings\', hue=\'Cluster\', data=df, palette=\'viridis\', s=100, alpha=0.7)
plt.title(\'Clusters of Bangalore Wards by Completed Trips and Drivers\\' Earnings\')
plt.xlabel(\'Completed Trips\')
plt.ylabel(\'Drivers\\' Earnings\')
plt.grid(True)
plt.show()

![Cluster Scatter Plot](/home/ubuntu/cluster_scatter_plot.png)

## 4. Location Analysis using Innovation Adoption Life Cycle

This section integrates the market segmentation with the Innovation Adoption Life Cycle to identify the most suitable location for early market entry.

### Location Analysis Conclusion

```markdown
## Location Analysis Conclusion

Based on the Innovation Adoption Life Cycle and the market segmentation analysis of Bangalore wards:

*   **Bangalore as a whole:** Is highly suitable for early market entry due to its status as a major tech hub, high smartphone penetration, and a generally tech-savvy and young population. These characteristics align with the \'Innovators\' and \'Early Adopters\' segments of the Innovation Adoption Life Cycle.

*   **Targeted Wards (Cluster 2):** The wards grouped into Cluster 2 exhibit moderate levels of searches, bookings, and completed trips, along with moderate drivers\' earnings. This suggests a healthy level of activity without the potential saturation seen in the highest-demand areas (Cluster 1). This segment represents a promising target for a new vehicle booking service to gain early traction and build a loyal user base before expanding to more competitive or less developed areas.

**Recommendation:** The initial market entry should focus on Bangalore, specifically targeting the wards identified within Cluster 2. This approach allows the startup to leverage a receptive audience and optimize its service offerings in a manageable geographic area before scaling up.
```

## 5. Strategic Recommendations and Pricing Analysis

```markdown
## Strategic Recommendations and Pricing Analysis

### Targeting Strategy for Cluster 2 (Early Adopter Segment)

**Characteristics of Cluster 2:**
*   **Moderate Activity:** Wards in this cluster show a healthy, but not oversaturated, level of searches, bookings, and completed trips.
*   **Moderate Earnings:** Drivers in these areas have moderate earnings, suggesting a viable market for service providers.
*   **Potential for Growth:** These areas are likely to have a significant population that is open to new technologies and services, but might not be as heavily saturated by existing major players as the highest-demand areas.

**Targeting Strategy:**
1.  **Focus on Service Quality and Reliability:** For early adopters, a seamless and reliable experience is paramount. Emphasize consistent availability of vehicles, timely pickups, and professional drivers.
2.  **Competitive but Sustainable Pricing:** While affordability is important, avoid aggressive price wars that can lead to unsustainable business models. Focus on value for money.
3.  **Localized Marketing and Community Engagement:** Engage with local communities in these wards through targeted digital marketing campaigns, local partnerships, and community events. Highlight how the service addresses their specific transportation needs.
4.  **Driver Incentives and Support:** Attract and retain quality drivers by offering competitive incentives, fair commission structures, and strong driver support. A good driver experience translates directly to a good customer experience.
5.  **Feedback Loop and Iteration:** Actively solicit feedback from early users in these wards and rapidly iterate on the service based on their input. This will help in refining the product-market fit.
6.  **Highlight Unique Value Proposition:** Differentiate the service from existing players. This could be through specialized vehicle types (e.g., electric vehicles, premium cars), unique features (e.g., pre-booking for specific times, multi-stop trips), or a superior customer support experience.

### Pricing Analysis and Strategic Pricing Range

From the cluster analysis, the `Average Fare per Trip` for Cluster 2 is approximately **₹159.54** (from the `cluster_summary.csv` data).

To propose a strategic pricing range, we need to consider:
*   **Competitive Landscape:** Ola and Uber are dominant players. Their pricing models (surge pricing, different vehicle categories) need to be understood.
*   **Customer Willingness to Pay:** Early adopters might be willing to pay a slight premium for better service or unique features, but overall affordability is key in the Indian market.
*   **Cost Structure:** Operational costs, driver earnings, and platform maintenance.
*   **Profitability:** Ensuring the pricing allows for sustainable growth.

**Research on Competitive Pricing (Ola/Uber in Bangalore):**
(This section will be filled after further research if needed, but for now, we can assume a general understanding of their pricing structure.)

**Strategic Pricing Range Proposal:**
Given the average fare per trip in Cluster 2 (approx. ₹159.54) and the competitive landscape, a strategic pricing range could be:

*   **Base Fare:** Slightly below or at par with existing major players to attract initial users.
*   **Per Kilometer Rate:** Competitive rates, possibly with dynamic pricing during peak hours, but with clear communication to avoid user frustration.
*   **Vehicle Categories:** Offer different vehicle categories (e.g., economy, comfort, premium) with corresponding pricing to cater to diverse needs and willingness to pay.
*   **Promotional Offers:** Initial promotional discounts, referral bonuses, and loyalty programs to incentivize early adoption and retention.

**Proposed Range:**
*   **Economy/Standard:** ₹10-12 per km, with a base fare of ₹50-70.
*   **Comfort/Sedan:** ₹14-16 per km, with a base fare of ₹80-100.
*   **Premium/SUV:** ₹18-22 per km, with a base fare of ₹120-150.

This range aims to be competitive while allowing for flexibility and potential for higher-margin services. The focus should be on transparent pricing and avoiding hidden charges to build trust with early adopters.
```

### 2.3. Exploratory Data Analysis (EDA)

Visualizations to understand the distribution of key features and identify potential outliers.

![EDA Histograms](/home/ubuntu/eda_histograms.png)

![EDA Box Plots](/home/ubuntu/eda_boxplots.png)

### 2.4. Correlation Matrix

Understanding the relationships between different numerical features.

![Correlation Matrix](/home/ubuntu/correlation_matrix.png)

### 3.1. Elbow Method for Optimal K

The Elbow Method was used to determine the optimal number of clusters (K). The plot of Sum of Squared Distances (SSD) against the number of clusters indicated an 'elbow' at K=3, suggesting three distinct segments within the Bangalore cab booking market.

![Elbow Method Plot](/home/ubuntu/elbow_method.png)

### 3.2. K-Means Clustering and Cluster Analysis

Based on the Elbow Method, an optimal K value (e.g., 3) was chosen for clustering. The characteristics of each cluster were then analyzed.

![Cluster Scatter Plot](/home/ubuntu/cluster_scatter_plot.png)

## 4. Location Analysis using Innovation Adoption Life Cycle

This section integrates the market segmentation with the Innovation Adoption Life Cycle to identify the most suitable location for early market entry.

### Location Analysis Conclusion

Based on the Innovation Adoption Life Cycle and the market segmentation analysis of Bangalore wards:

*   **Bangalore as a whole:** Is highly suitable for early market entry due to its status as a major tech hub, high smartphone penetration, and a generally tech-savvy and young population. These characteristics align with the 'Innovators' and 'Early Adopters' segments of the Innovation Adoption Life Cycle.

*   **Targeted Wards (Cluster 2):** The wards grouped into Cluster 2 exhibit moderate levels of searches, bookings, and completed trips, along with moderate drivers\' earnings. This suggests a healthy level of activity without the potential saturation seen in the highest-demand areas (Cluster 1). This segment represents a promising target for a new vehicle booking service to gain early traction and build a loyal user base before expanding to more competitive or less developed areas.

**Recommendation:** The initial market entry should focus on Bangalore, specifically targeting the wards identified within Cluster 2. This approach allows the startup to leverage a receptive audience and optimize its service offerings in a manageable geographic area before scaling up.


## 5. Strategic Recommendations and Pricing Analysis

### Targeting Strategy for Cluster 2 (Early Adopter Segment)

**Characteristics of Cluster 2:**
*   **Moderate Activity:** Wards in this cluster show a healthy, but not oversaturated, level of searches, bookings, and completed trips.
*   **Moderate Earnings:** Drivers in these areas have moderate earnings, suggesting a viable market for service providers.
*   **Potential for Growth:** These areas are likely to have a significant population that is open to new technologies and services, but might not be as heavily saturated by existing major players as the highest-demand areas.

**Targeting Strategy:**
1.  **Focus on Service Quality and Reliability:** For early adopters, a seamless and reliable experience is paramount. Emphasize consistent availability of vehicles, timely pickups, and professional drivers.
2.  **Competitive but Sustainable Pricing:** While affordability is important, avoid aggressive price wars that can lead to unsustainable business models. Focus on value for money.
3.  **Localized Marketing and Community Engagement:** Engage with local communities in these wards through targeted digital marketing campaigns, local partnerships, and community events. Highlight how the service addresses their specific transportation needs.
4.  **Driver Incentives and Support:** Attract and retain quality drivers by offering competitive incentives, fair commission structures, and strong driver support. A good driver experience translates directly to a good customer experience.
5.  **Feedback Loop and Iteration:** Actively solicit feedback from early users in these wards and rapidly iterate on the service based on their input. This will help in refining the product-market fit.
6.  **Highlight Unique Value Proposition:** Differentiate the service from existing players. This could be through specialized vehicle types (e.g., electric vehicles, premium cars), unique features (e.g., pre-booking for specific times, multi-stop trips), or a superior customer support experience.

### Pricing Analysis and Strategic Pricing Range

From the cluster analysis, the `Average Fare per Trip` for Cluster 2 is approximately **₹159.54** (from the `cluster_summary.csv` data).

To propose a strategic pricing range, we need to consider:
*   **Competitive Landscape:** Ola and Uber are dominant players. Their pricing models (surge pricing, different vehicle categories) need to be understood.
*   **Customer Willingness to Pay:** Early adopters might be willing to pay a slight premium for better service or unique features, but overall affordability is key in the Indian market.
*   **Cost Structure:** Operational costs, driver earnings, and platform maintenance.
*   **Profitability:** Ensuring the pricing allows for sustainable growth.

**Research on Competitive Pricing (Ola/Uber in Bangalore):**
(This section will be filled after further research if needed, but for now, we can assume a general understanding of their pricing structure.)

**Strategic Pricing Range Proposal:**
Given the average fare per trip in Cluster 2 (approx. ₹159.54) and the competitive landscape, a strategic pricing range could be:

*   **Base Fare:** Slightly below or at par with existing major players to attract initial users.
*   **Per Kilometer Rate:** Competitive rates, possibly with dynamic pricing during peak hours, but with clear communication to avoid user frustration.
*   **Vehicle Categories:** Offer different vehicle categories (e.g., economy, comfort, premium) with corresponding pricing to cater to diverse needs and willingness to pay.
*   **Promotional Offers:** Initial promotional discounts, referral bonuses, and loyalty programs to incentivize early adoption and retention.

**Proposed Range:**
*   **Economy/Standard:** ₹10-12 per km, with a base fare of ₹50-70.
*   **Comfort/Sedan:** ₹14-16 per km, with a base fare of ₹80-100.
*   **Premium/SUV:** ₹18-22 per km, with a base fare of ₹120-150.

This range aims to be competitive while allowing for flexibility and potential for higher-margin services. The focus should be on transparent pricing and avoiding hidden charges to build trust with early adopters.
