# **Project Name**    - Hotel Booking EDA



##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

The `hotel_booking_eda` dataset consists of 119,390 entries and 32 columns, capturing detailed information about hotel bookings. Key features include booking details (`lead_time`, `is_canceled`, `stays_in_weekend_nights`), customer demographics (`adults`, `children`, `country`), and reservation specifics (`reserved_room_type`, `deposit_type`, `adr`).

Notable issues include missing values in the `children`, `country`, `agent`, and `company` columns, with `company` having significant data gaps. The dataset is well-suited for analyzing booking trends, cancellation patterns, and customer preferences, providing actionable insights for improving hotel operations and customer satisfaction.

The project will involve cleaning and preprocessing data, conducting exploratory data analysis (EDA), and deriving insights using Python libraries like Pandas, Matplotlib, and Seaborn.

# **GitHub Link -**

https://github.com/kush-agra-soni/7_hotel_booking_eda.git

# **Problem Statement**


### Problem Statement

The hospitality industry faces challenges in understanding customer behavior, managing cancellations, and optimizing resources to enhance guest satisfaction and operational efficiency. The `hotel_booking_eda` dataset provides detailed information about hotel bookings, customer demographics, and reservation details.

The goal is to analyze this data to uncover patterns and trends, identify factors contributing to cancellations, and derive actionable insights. These insights can help hotels improve their services, predict customer needs, and implement strategies to reduce cancellations and maximize revenue.

#### **Define Your Business Objective?**

The primary objective is to leverage data analysis to enhance decision-making in the hospitality industry. By examining the `hotel_booking_eda` dataset, the project aims to:  

1. **Understand Customer Behavior**: Identify key trends in booking patterns, customer demographics, and preferences.  
2. **Reduce Cancellations**: Determine factors driving cancellations and suggest actionable strategies to mitigate them.  
3. **Optimize Revenue and Resources**: Analyze average daily rates, special requests, and parking requirements to improve operational planning and maximize profitability.  
4. **Improve Customer Satisfaction**: Use insights to enhance the overall guest experience and increase customer retention.  

This analysis will provide hotels with data-driven recommendations to improve efficiency and better meet customer expectations.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
import missingno as msno
from sklearn.preprocessing import StandardScaler
from scipy import stats
import plotly.express as px
import plotly.io as pio

### Dataset Loading

In [None]:
# Load Dataset
# GitHub raw URLs for your datasets
dataset_url = "https://raw.githubusercontent.com/kush-agra-soni/7_hotel_booking_eda/refs/heads/main/Hotel%20Bookings.csv"

df = pd.read_csv(dataset_url)

### Dataset First View

In [None]:
# Dataset First Look
df.head(1)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Check for missing values
missing_values = df.isnull().sum()
missing_values = missing_values[missing_values > 0].sort_values(ascending=False)

# Plot missing values
plt.figure(figsize=(10, 6))
sns.barplot(
    x=missing_values.values,
    y=missing_values.index,
    hue=missing_values.index,
    palette="viridis",
    dodge=False,
    legend=False
)
plt.title("Missing Values by Column", fontsize=16)
plt.xlabel("Number of Missing Values", fontsize=12)
plt.ylabel("Columns", fontsize=12)
plt.show()

### What did you know about your dataset?

### Insights About the Dataset

The `hotel_booking_eda` dataset provides comprehensive details about hotel reservations. Here's what we know about the data so far:

1. **Structure**:  
   - Contains 119,390 rows and 32 columns.  
   - Features include numeric, categorical, and date data types.

2. **Content**:  
   - The dataset captures various aspects of hotel bookings, including:
     - **Booking Details**: Lead time, cancellation status, length of stay, room type.  
     - **Demographics**: Number of adults, children, babies, and country of origin.  
     - **Revenue**: Average daily rate (ADR).  
     - **Special Requests**: Parking spaces, additional services.  

3. **Key Variables**:  
   - `is_canceled`: Indicates whether a booking was canceled.  
   - `adr`: Represents the average revenue generated per room.  
   - `stays_in_weekend_nights` and `stays_in_week_nights`: Capture the length of stay.

4. **Missing Values**:  
   - Notable columns with missing data:
     - `children`: 4 missing values.
     - `country`: 488 missing values.
     - `agent`: 16,340 missing values.
     - `company`: Significant missing values (112,593).  

5. **Business Relevance**:  
   - Helps analyze trends like seasonality, booking cancellations, customer segmentation, and revenue generation.

6. **Potential Issues**:  
   - Missing values need to be handled carefully.
   - Columns like `company` with significant missing data might need imputation or exclusion.

This dataset provides ample opportunities for exploratory data analysis, trend identification, and deriving actionable insights to support hotel operations and improve customer satisfaction.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

#### **1. Booking Information**  
- `hotel`: Type of hotel (City Hotel or Resort Hotel).  
- `is_canceled`: Booking cancellation status (1: Canceled, 0: Not Canceled).  
- `lead_time`: Days between booking and arrival.  
- `arrival_date_year`, `arrival_date_month`, `arrival_date_week_number`, `arrival_date_day_of_month`: Details about arrival date.  
- `stays_in_weekend_nights`, `stays_in_week_nights`: Nights stayed on weekends and weekdays.  

#### **2. Guest Demographics**  
- `adults`, `children`, `babies`: Number of guests in each category.  
- `country`: Country of origin for the guests.  

#### **3. Pricing and Revenue**  
- `adr`: Average daily rate per room.  

#### **4. Reservation Details**  
- `reserved_room_type`, `assigned_room_type`: Codes for reserved and assigned room types.  
- `deposit_type`: Type of deposit paid (e.g., No Deposit, Refundable, Non-Refundable).  
- `booking_changes`: Number of changes made to a booking.  
- `required_car_parking_spaces`: Number of parking spaces requested.  
- `total_of_special_requests`: Number of special requests made.  

#### **5. Customer History**  
- `is_repeated_guest`: Indicates if the customer is a repeat guest.  
- `previous_cancellations`, `previous_bookings_not_canceled`: History of cancellations and completed bookings.  

#### **6. Market and Distribution**  
- `market_segment`: Market segment (e.g., Online, Offline).  
- `distribution_channel`: Channel used for booking.  
- `agent`, `company`: IDs for travel agents and companies.  

#### **7. Reservation Status**  
- `reservation_status`: Final booking status (e.g., Check-Out, Canceled).  
- `reservation_status_date`: Date of the reservation status update.  

#### **8. Miscellaneous**  
- `meal`: Meal plan booked.  
- `days_in_waiting_list`: Days the booking spent on a waiting list.  
- `customer_type`: Type of customer (e.g., transient, group).  

This grouping provides clarity on the dataset structure and aids in focused analysis.

### Check Unique Values for each variable.

In [None]:
# Check unique values for each column in the dataset
unique_values = df.nunique()

# Display the unique values count for each column
print(unique_values)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Handling missing values for 'country' and 'agent' columns

# 1. Impute missing values in 'country' with the mode (most frequent value)
country_mode = df['country'].mode()[0]
df['country'] = df['country'].fillna(country_mode)

# 2. Impute missing values in 'agent' with 0 (assuming missing means no agent involved)
df['agent'] = df['agent'].fillna(0)

# Verify if missing values are handled
missing_values = df.isnull().sum()
print(missing_values[['country', 'agent']])  # Should show 0 for both columns after imputation

### What all manipulations have you done and insights you found?

### Manipulations:

1. **Handling Missing Values**:
   - **`country`**: Missing values were imputed with the most frequent (mode) country. This method assumes that the majority value in the dataset is a reasonable replacement for the missing data, reflecting the most common country for the bookings.
   - **`agent`**: Missing values were filled with `0`, assuming that if the `agent` value is missing, it indicates no agent involvement in the booking. This simplifies the data and avoids losing rows.

2. **Verification**:
   - After imputing the missing values, the dataset was checked for any remaining null values to confirm that the imputation was successful. The `country` and `agent` columns now have no missing values.

### Insights Found:

1. **`country`**:
   - There were a small number of missing values (488 out of 119,384), which is a relatively small proportion of the dataset. Filling with the most frequent country value is a simple yet effective way to handle these missing entries without introducing significant bias.
   
2. **`agent`**:
   - The `agent` column had a larger number of missing values (16,337 out of 119,384). Imputing with `0` (no agent) assumes that the absence of an agent is the default condition. This could be a valid assumption, but further investigation into how often agents are involved in bookings could refine this approach.

3. **Data Quality**:
   - The imputations prevent the loss of data, ensuring that rows are preserved for further analysis. However, the imputation method chosen (mode for `country` and 0 for `agent`) might affect the distribution of values in these columns, so care should be taken when analyzing them.

Overall, the manipulations ensure that the dataset remains usable while minimizing data loss. However, further validation or domain-specific knowledge could improve the imputation methods if necessary.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1  Hotel Booking Cancellation Rate by Hotel Type

In [None]:
plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='hotel', hue='is_canceled')
plt.title('Hotel Booking Cancellation Rate by Hotel Type')
plt.xlabel('Hotel Type')
plt.ylabel('Booking Count')
plt.legend(title='Cancellation Status', labels=['Not Canceled', 'Canceled'])
plt.show()

##### 1. Why did you pick the specific chart?

A grouped bar chart was chosen because it effectively compares the distribution of two categorical variables (Hotel Type and Cancellation Status) across two categories (Resort Hotel and City Hotel). It allows for easy visualization of the number of bookings that were canceled and not canceled for each hotel type.

##### 2. What is/are the insight(s) found from the chart?

- City Hotels have a higher number of bookings overall compared to Resort Hotels.
- The cancellation rate is significantly higher for City Hotels than for Resort Hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can help create a positive business impact:

- Targeted Marketing: The higher cancellation rate for City Hotels suggests a need for targeted marketing strategies to reduce cancellations. This could involve offering incentives or flexible booking policies.
- Inventory Management: Understanding the booking patterns for each hotel type can help optimize inventory management. For example, Resort Hotels might require more flexible inventory management due to potential cancellations.
- Pricing Strategy: Analyzing cancellation rates can help inform pricing strategies. If cancellations are frequent, adjusting prices to incentivize bookings or penalize cancellations might be considered.
>No, there are no insights leading to negative growth. The insights highlight areas where improvements can be made to increase revenue and optimize operations.

#### Chart - 2 Cancellation Rate by Lead Time

In [None]:
plt.figure(figsize=(8, 6))
sns.boxplot(data=df, x='is_canceled', y='lead_time')
plt.title('Cancellation Rate by Lead Time')
plt.xlabel('Cancellation Status')
plt.ylabel('Lead Time (days)')
plt.show()

##### 1. Why did you pick the specific chart?

A box plot was chosen to visualize the distribution of lead times for both canceled and not canceled bookings. Box plots are effective in showing the median, quartiles, and outliers of numerical data.

##### 2. What is/are the insight(s) found from the chart?

1. A longer lead time is associated with a higher likelihood of cancellation: The median lead time for canceled bookings is significantly higher than for not canceled bookings.
2. There is a greater variability in lead times for canceled bookings: The box plot for canceled bookings is wider, indicating a larger range of lead times.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Yes, these insights can help create a positive business impact:

- Dynamic Pricing: The hotel can implement dynamic pricing strategies that adjust prices based on lead time. Higher prices could be charged for short lead times to minimize cancellations.
- Incentivizing Early Bookings: Offering discounts or rewards for early bookings can encourage guests to book further in advance, reducing the risk of cancellations.
- Cancellation Policies: The hotel can review its cancellation policies and consider stricter policies for longer lead times.
- Targeted Marketing: Targeted marketing campaigns can be directed towards guests with longer lead times to encourage them to confirm their bookings.
2. No, there are no insights leading to negative growth. The insights highlight areas where improvements can be made to reduce cancellations and increase revenue.

#### Chart - 3 Average Daily Rate (ADR) by Hotel Type

In [None]:
plt.figure(figsize=(8, 6))
sns.violinplot(data=df, x='hotel', y='adr')
plt.title('Average Daily Rate (ADR) by Hotel Type')
plt.xlabel('Hotel Type')
plt.ylabel('Average Daily Rate (ADR)')
plt.show()

##### 1. Why did you pick the specific chart?

A violin plot was selected because it combines features of a box plot and a kernel density plot. It provides a richer visualization of the distribution of Average Daily Rates (ADR) for each hotel type. The violin plot shows the density of ADR values, revealing the shape and spread of the data.

##### 2. What is/are the insight(s) found from the chart?

- City Hotels have a wider range of ADRs compared to Resort Hotels. The violin plot for City Hotels is more spread out, indicating a greater variation in prices.
- The median ADR for City Hotels is slightly higher than that for Resort Hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. Yes, these insights can help create a positive business impact:

- Pricing Strategy: Understanding the distribution of ADRs for each hotel type can help optimize pricing strategies. City Hotels can leverage their wider range of ADRs to attract different segments of customers.
- Revenue Management: The violin plot can help identify peak pricing periods and potential discounts for off-peak times.
- Targeted Marketing: Targeted marketing campaigns can be tailored to different segments of customers based on their preferred ADR range.
2. No, there are no insights leading to negative growth. The insights highlight opportunities to optimize pricing and marketing strategies to increase revenue.

#### Chart - 4 Booking Count by Market Segment

In [None]:
plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='market_segment')
plt.title('Booking Count by Market Segment')
plt.xlabel('Market Segment')
plt.ylabel('Booking Count')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart was chosen because it effectively compares the distribution of a categorical variable (Market Segment) across different categories. It allows for easy visualization of the number of bookings for each market segment.

##### 2. What is/are the insight(s) found from the chart?

- Online TA is the dominant market segment with the highest number of bookings.
- Direct bookings and Offline TA/TO have a significant presence.
- Complementary, Groups, and Aviation segments have a much smaller contribution to the overall bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can help create a positive business impact:

- Marketing Strategy: The hotel can focus its marketing efforts on the dominant market segments like Online TA and Direct bookings to attract more customers.
- Distribution Channels: The hotel can optimize its distribution channels to prioritize Online TA platforms and direct bookings.
- Partnerships: The hotel can explore partnerships with complementary businesses to increase bookings in the complementary segment.
- Group Bookings: Specific marketing strategies and incentives can be targeted towards groups to increase their contribution.
- Aviation Segment: The hotel can collaborate with airlines and travel agencies to attract more aviation-related bookings.

> No, there are no insights leading to negative growth. The insights highlight opportunities to optimize marketing and distribution strategies to increase bookings from different market segments.

#### Chart - 5 Cancellation Rate by Market Segment

In [None]:
plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='market_segment', hue='is_canceled')
plt.title('Cancellation Rate by Market Segment')
plt.xlabel('Market Segment')
plt.ylabel('Booking Count')
plt.xticks(rotation=45)
plt.legend(title='Cancellation Status', labels=['Not Canceled', 'Canceled'])
plt.show()

##### 1. Why did you pick the specific chart?

A grouped bar chart was chosen because it effectively compares the distribution of two categorical variables (Market Segment and Cancellation Status) across multiple categories. It allows for easy visualization of the number of bookings that were canceled and not canceled for each market segment.

##### 2. What is/are the insight(s) found from the chart?

- Online TA has the highest number of both canceled and not canceled bookings.
- The cancellation rate varies across different market segments. Online TA and Groups have a relatively higher cancellation rate compared to segments like Direct and Corporate.
- The Complementary and Aviation segments have a very low number of bookings and cancellations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can help create a positive business impact:

- Targeted Marketing: The hotel can implement targeted marketing strategies to reduce cancellations in segments with higher cancellation rates, such as Online TA and Groups. This could involve offering incentives or flexible booking policies.
- Distribution Channel Optimization: The hotel can optimize its distribution channels to prioritize segments with lower cancellation rates, such as Direct and Corporate.
- Cancellation Policies: The hotel can review its cancellation policies and consider stricter policies for segments with higher cancellation rates.
- Partnerships: The hotel can explore partnerships to increase bookings in segments like Complementary and Aviation.

> No, there are no insights leading to negative growth. The insights highlight areas where improvements can be made to reduce cancellations and increase revenue from different market segments.

#### Chart - 6 Arrival Year vs. Number of Bookings

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='arrival_date_year')
plt.title('Bookings per Year')
plt.xlabel('Year')
plt.ylabel('Booking Count')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart was chosen because it effectively compares the distribution of a categorical variable (Year) across different categories. It allows for easy visualization of the number of bookings for each year.

##### 2. What is/are the insight(s) found from the chart?

- The number of bookings has increased significantly from 2015 to 2016.
- There was a slight decrease in bookings from 2016 to 2017.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can help create a positive business impact:

- Demand Forecasting: The hotel can use this data to forecast future demand and adjust its capacity and staffing accordingly.
- Marketing Strategy: The hotel can analyze the reasons for the increase in bookings in 2016 and implement similar strategies to maintain or increase bookings in the future.
- Pricing Strategy: The hotel can adjust its pricing strategy based on demand patterns. For example, they could increase prices during peak periods and offer discounts during off-peak periods.
>No, there are no insights leading to negative growth. The insights highlight opportunities to optimize operations and marketing strategies to increase bookings.

#### Chart - 7  Lead Time Distribution for Canceled and Non-Canceled Bookings

In [None]:
plt.figure(figsize=(8, 6))
sns.histplot(data=df, x='lead_time', hue='is_canceled', kde=True)
plt.title('Lead Time Distribution for Canceled and Non-Canceled Bookings')
plt.xlabel('Lead Time')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

A histogram was chosen to visualize the distribution of lead times for both canceled and not canceled bookings. Histograms are effective in showing the frequency distribution of numerical data.

##### 2. What is/are the insight(s) found from the chart?

- Most bookings have a lead time of less than 100 days.
- The distribution of lead times for canceled bookings is skewed to the right, indicating a higher proportion of longer lead times for canceled bookings.
- The distribution of lead times for not canceled bookings is more concentrated on shorter lead times.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can help create a positive business impact:

- Dynamic Pricing: The hotel can implement dynamic pricing strategies that adjust prices based on lead time. Higher prices could be charged for short lead times to minimize cancellations.
- Incentivizing Early Bookings: Offering discounts or rewards for early bookings can encourage guests to book further in advance, reducing the risk of cancellations.
- Cancellation Policies: The hotel can review its cancellation policies and consider stricter policies for longer lead times.
- Targeted Marketing: Targeted marketing campaigns can be directed towards guests with longer lead times to encourage them to confirm their bookings.
> No, there are no insights leading to negative growth. The insights highlight areas where improvements can be made to reduce cancellations and increase revenue.

#### Chart - 8 Bookings with Special Requests by Hotel Type

In [None]:
plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='hotel', hue='total_of_special_requests')
plt.title('Bookings with Special Requests by Hotel Type')
plt.xlabel('Hotel Type')
plt.ylabel('Booking Count')
plt.show()

##### 1. Why did you pick the specific chart?

A stacked bar chart was chosen because it effectively compares the distribution of two categorical variables (Hotel Type and Total Special Requests) across multiple categories. It allows for easy visualization of the number of bookings with different levels of special requests for each hotel type.

##### 2. What is/are the insight(s) found from the chart?

- City Hotels have a significantly higher number of bookings with special requests compared to Resort Hotels.
- The majority of bookings for both hotel types have no special requests.
- As the number of special requests increases, the number of bookings decreases for both hotel types.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can help create a positive business impact:

- Staffing and Resource Allocation: The hotel can allocate staff and resources based on the expected number of special requests for each hotel type.
- Training: Staff can be trained to handle special requests efficiently and effectively.
- Customer Service: The hotel can implement strategies to proactively address customer needs and exceed expectations, especially for bookings with special requests.
- Pricing Strategy: The hotel can consider charging a premium for bookings with special requests, especially for high-demand services.
> No, there are no insights leading to negative growth. The insights highlight opportunities to improve customer service and operational efficiency.

#### Chart - 9 Lead Time vs. ADR

In [None]:
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='lead_time', y='adr')
plt.title('Lead Time vs. ADR')
plt.xlabel('Lead Time')
plt.ylabel('Average Daily Rate (ADR)')
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot was chosen because it effectively visualizes the relationship between two numerical variables: Lead Time and Average Daily Rate (ADR). It allows us to observe any patterns or trends in the data.

##### 2. What is/are the insight(s) found from the chart?

There is a general negative correlation between Lead Time and ADR. As the lead time increases, the average daily rate tends to decrease. This suggests that guests who book further in advance may be more price-sensitive.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Dynamic Pricing: The hotel can implement dynamic pricing strategies that adjust prices based on lead time. Higher prices could be charged for short lead times and lower prices for longer lead times.
- Incentivizing Early Bookings: Offering discounts or rewards for early bookings can encourage guests to book further in advance, potentially leading to higher revenue.
- Revenue Management: The hotel can use this information to optimize its revenue management strategies and maximize revenue.
>No, there are no insights leading to negative growth. The insights highlight opportunities to optimize pricing and marketing strategies to increase revenue.

#### Chart - 10 Average Lead Time by Month

In [None]:
plt.figure(figsize=(10, 6))
sns.boxplot(x='arrival_date_month', y='lead_time', data=df)
plt.title('Average Lead Time by Month')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A box plot was chosen because it effectively visualizes the distribution of Lead Time for each month. Box plots are effective in showing the median, quartiles, and outliers of numerical data.

##### 2. What is/are the insight(s) found from the chart?

- There is a variation in lead times across different months. Some months have a higher median lead time compared to others.
- There are outliers in lead time for most months, indicating some bookings with significantly longer lead times.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Demand Forecasting: The hotel can use this data to forecast demand for each month and adjust its capacity and staffing accordingly.
- Pricing Strategy: The hotel can adjust its pricing strategy based on the demand patterns observed in the lead times. For example, they could charge higher prices during peak months with longer lead times.
- Marketing Strategy: The hotel can implement targeted marketing campaigns to encourage bookings during off-peak months with shorter lead times.
> No, there are no insights leading to negative growth. The insights highlight opportunities to optimize operations and marketing strategies to increase revenue.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the analysis of the charts and plots, here are some suggestions to help the client achieve their business objectives:

**1. Optimize Pricing and Revenue Management:**
   * Implement dynamic pricing strategies that adjust prices based on factors like lead time, seasonality, and demand.
   * Analyze the impact of different pricing strategies on revenue and occupancy rates.
   * Consider offering discounts for early bookings or off-peak periods to stimulate demand.

**2. Improve Customer Experience and Loyalty:**
   * Focus on providing exceptional customer service, especially for bookings with special requests.
   * Implement efficient processes to handle cancellations and refunds.
   * Consider loyalty programs to reward repeat customers.

**3. Enhance Marketing and Distribution Strategies:**
   * Target marketing efforts towards specific market segments with higher potential.
   * Optimize distribution channels to prioritize high-performing channels like Online TAs and direct bookings.
   * Collaborate with travel agents and online platforms to increase visibility and bookings.

**4. Optimize Operations and Resource Allocation:**
   * Use data-driven insights to forecast demand and allocate resources effectively.
   * Implement efficient processes for check-in, check-out, and room assignments.
   * Train staff to handle special requests and provide excellent customer service.

**5. Monitor and Analyze Performance:**
   * Continuously monitor key performance indicators (KPIs) like occupancy rates, average daily rate, and revenue per available room.
   * Use data analytics to identify trends and opportunities for improvement.
   * Conduct regular reviews of marketing and operational strategies.

By implementing these strategies, the client can improve their business performance, increase revenue, and enhance customer satisfaction.


# **Conclusion**

Through the analysis of various charts and plots, we have gained valuable insights into the hotel's booking patterns, customer behavior, and operational performance. Key findings include the impact of lead time on cancellation rates, the distribution of bookings across different market segments, and the relationship between average daily rate and lead time.

By leveraging these insights, the hotel can implement targeted strategies to optimize pricing, improve customer experience, and increase revenue. These strategies include dynamic pricing, effective marketing, efficient operations, and data-driven decision-making. By continuously monitoring performance and adapting to changing market conditions, the hotel can maintain a competitive edge and achieve long-term success.
