# **Project Name**    - Hotel Booking



##### **Project Type**    - EDA
##### **Contribution**    - Kushang Shah (Individual)

# **Project Summary -**

## **Exploratory Data Analysis (EDA) Summary: Understanding Hotel Bookings**

Exploratory Data Analysis (EDA) is a crucial step in the data analysis process, allowing us to understand the structure and patterns within our data. In this summary, we delve into an EDA conducted on hotel booking data, aiming to extract meaningful insights and trends.

### **Dataset Overview:**
The dataset comprises information about hotel bookings, including various attributes such as booking dates, customer demographics, booking channels, and reservation details. It encompasses both hotel types: resorts and city hotels.

### **Data Exploration:**
#### **Data Cleaning**: Initially, the data underwent cleaning procedures to handle missing values, outliers, and inconsistencies. This step ensured the dataset's integrity and reliability for analysis.
#### **Descriptive Statistics**: Basic statistics such as mean, median, standard deviation, and quartiles were calculated for numerical features like booking lead time, stays in nights, and number of adults/children. This provided a snapshot of the central tendencies and spread of the data.
#### **Distribution Analysis**: Histograms and density plots were employed to visualize the distribution of key variables, revealing insights into their skewness, multimodality, and outliers. For instance, booking lead time exhibited a right-skewed distribution, indicating a tendency towards shorter booking intervals.
#### **Temporal Trends**: Time series analysis was conducted to explore temporal patterns in booking volumes over different months and years. This analysis uncovered seasonality effects, with peak booking periods occurring during certain months, possibly influenced by holidays or tourism seasons.
#### **Segmentation Analysis**: Customer segmentation based on demographics (e.g., age, nationality) and booking characteristics (e.g., duration of stay, room type) was performed. This segmentation shed light on distinct booking behaviors among different customer groups, enabling targeted marketing strategies.


### **Insights and Trends:**

#### **Seasonal Variations**: The analysis revealed fluctuations in booking volumes across seasons, with summer and holiday seasons experiencing higher demand compared to off-peak periods. This insight can inform revenue management strategies and resource allocation.
#### **Booking Channels**: Examination of booking channels (e.g., online travel agencies, direct bookings) unveiled the preferred platforms through which customers make reservations. Understanding channel preferences can guide marketing efforts and partnership decisions.
#### **Cancellation Patterns**: Analysis of cancellation rates and reasons for cancellations provided insights into customer behavior and booking volatility. Factors influencing cancellations, such as flexibility in cancellation policies, can be optimized to minimize revenue loss.
#### **Booking Lead Time**: Exploration of booking lead time distribution highlighted booking patterns, with implications for inventory management and pricing strategies. Shorter lead times may necessitate dynamic pricing mechanisms to capitalize on last-minute bookings.
## **Conclusion**:
Through comprehensive exploratory data analysis, valuable insights have been gleaned regarding hotel booking trends, customer behavior, and operational dynamics. These insights can inform strategic decision-making processes, ranging from revenue management to customer experience enhancement. Continued analysis and refinement of these findings will facilitate data-driven optimization of hotel operations and service delivery.

# **Links -**

#### **GitHub Link:** - [EDA Project - Hotel Booking](https://github.com/KushangShah/AlmaBetter-Projects/tree/main/Module%202%20Numerical%20Programming%20in%20Python/2.1.%20Exploratory%20Data%20Analysis)

#### **NoteBook Link(colab):** - [EDA Project Notebook - Hotel Booking](https://drive.google.com/drive/folders/1fEWRRQ_sRAPVnAaM36EuD1hY2we3op2J?usp=sharing)

#### **Video Link:** - [EDA Project Presentation - Hotel Booking](https://drive.google.com/file/d/1NBAS8wNU_dQXXmGZptDdgkV-fwYCcLLK/view?usp=sharing)

# **Problem Statement**


##### --> The primary objective is to gain comprehensive insights into the underlying **patterns**, **trends**, and **dynamics of the booking process.**

##### 1. **Booking Patterns**: What typical booking patterns do we observe in terms of timing, duration, and seasonality?
Do discernible trends or fluctuations exist in booking volumes over different time periods?

##### 2. **Booking Dynamics**: What are the temporal trends in booking volumes and cancellation rates? Are there seasonal variations, and if so, how do they impact hotel occupancy and revenue?
##### 3. **Customer Segmentation**: How can customers be segmented based on demographics, booking behaviors, and preferences? What are the characteristics of different customer segments, and how can tailored marketing strategies be developed to cater to their needs?
##### 4. **Operational Efficiency**: What factors contribute to booking lead time, and how can inventory management and pricing strategies be optimized accordingly? Are there patterns in room type preferences, booking channels, and deposit types that influence operational efficiency?
##### 5. **Revenue Management**: How do pricing dynamics, such as ADR and booking changes, impact revenue generation? What are the implications of special requests, car parking requirements, and meal preferences on revenue maximization?




#### **Define Your Business Objective?**

**Business Objective:**

The primary business objective of conducting exploratory data analysis (EDA) on hotel bookings is to leverage data-driven insights to optimize revenue generation, enhance operational efficiency, and improve customer satisfaction within the hospitality industry. By delving into the dataset and extracting meaningful patterns and trends, the ultimate goal is to inform strategic decision-making processes and drive tangible outcomes for the hotel management.

1. **Revenue Optimization:**
   - Identify factors influencing revenue generation, such as pricing dynamics, booking patterns, and customer preferences.
   - Utilize insights to implement dynamic pricing mechanisms, targeted promotions, and revenue management strategies.

2. **Operational Efficiency:**
   - Enhance resource allocation, inventory management, and staff scheduling based on demand patterns and booking trends.
   - Optimize room allocation, booking channels, and distribution strategies to improve operational efficiency.
   - Streamline processes to minimize booking lead time, reduce cancellations, and optimize room utilization.

3. **Customer Satisfaction:**
   - Understand customer preferences, behaviors, and satisfaction drivers to deliver personalized experiences.
   - Segment customers based on demographics, booking behaviors, and preferences to tailor marketing efforts and services.
   - Anticipate and fulfill customer needs, preferences, and special requests to enhance overall satisfaction and loyalty.

4. **Risk Management and Decision Support:**
   - Identify potential risks, such as overbooking, cancellations, and revenue volatility, and develop mitigation strategies.
   - Provide decision support for strategic initiatives, investment opportunities, and expansion plans based on data-driven insights.
   - Monitor key performance indicators (KPIs) and metrics to track progress, evaluate performance, and adapt strategies accordingly.

Overall, the business objective of the EDA on hotel bookings is to leverage data analytics to drive strategic decision-making, optimize operations, and create value for both the hotel management and customers. By harnessing the power of data, the aim is to achieve sustainable growth, competitive advantage, and excellence in service delivery within the hospitality sector.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Installing Import Libraries

In [None]:
!pip3 install numpy

In [None]:
!pip3 install pandas

In [None]:
!pip3 install matplotlib

In [None]:
!pip3 install seaborn

In [None]:
!pip3 install missingno

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
hb_df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Cohort Paris/Module 2: Numerical Programming in Python/Capstone Project: Exploratory Data Analysis/Hotel_Bookings.csv")

### Dataset First View

In [None]:
# Dataset First Look
hb_df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
hb_df.shape

### Dataset Information

In [None]:
# Dataset Info
hb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
hb_df[hb_df.duplicated()].count()

In [None]:
# using drop_duplicates to get unique number of rows
hb_df.drop_duplicates(inplace=True)
unique_rows = hb_df.shape[0]
unique_rows

#### Missing Values/Null Values

In [None]:
# Finding for null value in each column.
hb_df.isna().sum().sort_values(ascending=False)[:6]

In [None]:
hb_df["company"] = hb_df["company"].fillna(0)
hb_df["agent"] = hb_df["agent"].fillna(0)
hb_df["country"] = hb_df["country"].fillna("others")
hb_df["children"] = hb_df["children"].fillna(0)


In [None]:
# Missing values has been handled.
hb_df.isna().sum().sort_values(ascending=False)

### What did you know about your dataset?

Hotel booking dataset contained 119390 rows × 32 columns.
and It has 87396 number of unique rows and 31994 same(duplicated) rows.

Hotel booking Dataset had
```
company               82137
agent                 12193
country                 452
children                  4
```
Numbers of null values paresent in them.



Hotel Booking Dataset contain 32 columns with different data init such as,
1. hotel: Name or identifier of the hotel(City or resort).
2. is_canceled: Binary indicator if the booking was canceled (1) or not (0).
3. lead_time: Number of days between the booking date and the arrival date.
4. arrival_date_year: Year of arrival date.
5. arrival_date_month: Month of arrival date.
6. arrival_date_week_number: Week number of arrival date.
7. arrival_date_day_of_month: Day of arrival date.
8. stays_in_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed.
9. stays_in_week_nights: Number of week nights (Monday to Friday) the guest stayed.
10. adults: Number of adults.
11. children: Number of children.
12. babies: Number of babies.
13. meal: Type of meal booked (e.g., BB for Bed & Breakfast).
14. country: Country of origin of the guest.
15. market_segment: Market segment designation (e.g., Online Travel Agents, Offline Travel Agents).
16. distribution_channel: Booking distribution channel (e.g., Direct, Corporate).
17. is_repeated_guest: Binary indicator if the guest is a repeated guest (1) or not (0).
18. previous_cancellations: Number of previous cancellations by the guest.
19. previous_bookings_not_canceled: Number of previous bookings not canceled by the guest.
20. reserved_room_type: Type of room reserved.
21. assigned_room_type: Type of room assigned to the guest.
22. booking_changes: Number of changes made to the booking.
23. deposit_type: Type of deposit made (e.g., No Deposit, Non Refund, Refundable).
24. agent: ID of the travel agency that made the booking.
25. company: ID of the company/entity that made the booking or is responsible for payment.
26. days_in_waiting_list: Number of days the booking was in the waiting list before it was confirmed to the guest.
27. customer_type: Type of booking (e.g., Contract, Group, Transient).
28. adr: Average Daily Rate, the average rental income per paid occupied room in a given time period.
29. required_car_parking_spaces: Number of car parking spaces requested by the guest.
30. total_of_special_requests: Number of special requests made by the guest (e.g., twin bed, high floor).
31. reservation_status: Reservation last status (e.g., Check-Out, Canceled).
32. reservation_status_date: Date at which the last status was set.




## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hb_df.columns

In [None]:
# Dataset Describe
hb_df.describe().round(2)

### Variables Description

- **is_canceled:**
  - 27.49% of bookings were canceled on average.
- **lead_time:**
  - The average lead time is approximately 79.89 days, with a standard deviation of around 86.05 days.
- **arrival_date_year:**
  - Bookings span from 2015 to 2017.
- **arrival_date_week_number and arrival_date_day_of_month:**
  - These columns give the week number and day of the month of the arrival date, respectively.
- **stays_in_weekend_nights and stays_in_week_nights:**
  - On average, guests stay for approximately 1 weekend night and 2.63 week nights.
- **adults, children, and babies:**
  - Average numbers of adults, children, and babies per booking are provided.
- **previous_cancellations and previous_bookings_not_canceled:**
  - These columns indicate the number of previous cancellations and bookings not canceled by the guest.
- **booking_changes:**
  - On average, there are around 0.27 booking changes per booking.
- **agent and company:**
  - These seem to be identifiers for the travel agency and company, respectively, involved in the booking.
- **days_in_waiting_list:**
  - On average, bookings spent approximately 11 days in the waiting list before confirmation.
- **adr (Average Daily Rate):**
  - The average daily rate is around 106.34 units.
- **required_car_parking_spaces and total_of_special_requests:**
  - These columns provide average counts for requested car parking spaces and special requests per booking.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for col in hb_df.columns:
  print(f"Unique values for {col}: {hb_df[col].unique()}\n")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
hb_df.info()

In [None]:
# Changing Data type of some columns
hb_df[['children', 'company', 'agent']] = hb_df[['children', 'company', 'agent']].astype('int64')
hb_df['reservation_status_date'] = pd.to_datetime(hb_df['reservation_status_date'], format='%Y-%m-%d')

In [None]:
# Adding important columns for data vizualization

# Adding total stays from weekend stays and weeek stays
hb_df['total_stay'] = hb_df['stays_in_weekend_nights'] + hb_df['stays_in_week_nights']

# Adding total people from adult children and babies
hb_df['total_people'] = hb_df['adults'] + hb_df['children'] + hb_df['babies']

In [None]:
hb_df[['total_stay', 'total_people']].head(), hb_df.columns, hb_df.info()

### What all manipulations have you done and insights you found?

#### 1. **data type**: Changing the data type of column to the right formate. chaning the children, company, agent column data type into int formate.

#### 2. **Create new columns from exisiting one to gain more insight.**
  - created total_stay column with the help of weekend stay and week stay columns. Giving us the insight of total days of stay regard less of week or weekend.
    - By doing so i can usderstand the total number of stay for each room or hotel.
  - created total_people columns with the combination of adults, children and babies. Which provide the The number of people stay.
    - This gives the total number of people stayed at the room or hotel.
    



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:
hb_df.info()

#### Chart - 1

In [None]:
# Chart - 1 Count of Canceled vs. Not Canceled Bookings: Bar chart
plt.figure(figsize=(8,6))
sns.countplot(data=hb_df, x='is_canceled', hue='hotel')
plt.xlabel('Booking cancellation status')
plt.ylabel('count')
plt.title('Count of Canceled Vs Not Canceled Booking')
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a countplot for the distribution of canceled vs. not canceled bookings because it provides a clear visual representation of the balance between the two categories. It's a straightforward way to compare the number of canceled bookings with the number of bookings that were not canceled.**

##### 2. What is/are the insight(s) found from the chart?

**From the chart, we can see the distribution of canceled and not canceled bookings. This helps in understanding the proportion of bookings that were canceled VS those that were not. For instance, if there are significantly more canceled bookings compared to non-canceled ones, it might indicate issues with booking management, customer satisfaction, or external factors impacting travel plans.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can indeed help in making informed business decisions. For example, if the number of canceled bookings is high, it might suggest a need to review cancellation policies, improve customer service, or implement strategies to reduce cancellations, such as offering flexible booking options or personalized incentives. However, if the number of canceled bookings is excessively low, it could indicate potential revenue loss due to underbooking or a lack of customer engagement. In this case, it might be necessary to analyze the reasons behind the low cancellation rate and take corrective actions to encourage more bookings without sacrificing revenue.**

#### Chart - 2

In [None]:
# Chart - 2

# Calculate the count of bookings for each hotel type
hotel_counts = hb_df['hotel'].value_counts()

# Create a pie chart
plt.figure(figsize=(7, 5))
plt.pie(hotel_counts, labels=hotel_counts.index, autopct='%1.1f%%', startangle=140)
plt.title('Distribution of Hotel Types', size=19)
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()

##### 1. Why did you pick the specific chart?

I chose pie chart for distribution of **City Hotel V/S Resort Hotel**, Because it provides a clear visulisation of the relationship between two different categories.

##### 2. What is/are the insight(s) found from the chart?

From the pie chart, we can clearly see the preffered hotel distribution between City hotel or Resort hotel. This helps in understanding that, the portition of City hotel booking is more than the Resort hotel.<br>That means **customers book City hotel more than Resort hotel**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from this chart can indeed help in making informed business decisions. For example, if customer bookings are low, improvements could be made by offering flexible payment options, easy booking processes, hassle-free check-ins and check-outs, and customer-oriented amenities to generate healthy revenue.<br><br>Hotel's need to make sure that no over booking is done to avoide any unneccesary arguments.

#### Chart - 3

In [None]:
# Chart - 3 Hotel booking by top 10 country
# take the first 10 country name
top_countries = hb_df['country'].value_counts().head(10)

# Plot the graph
top_countries.plot(kind='bar', color='teal')
plt.title('Top 10 Countries by Number of Bookings')
plt.xlabel('Country')
plt.ylabel('Number of Bookings')
plt.show()

##### 1. Why did you pick the specific chart?

I chose the **plot graph** because it provies clear visul of hotel booking by top 10 countries and total number of booking.

##### 2. What is/are the insight(s) found from the chart?

From the graph, I can see that the top 10 countries named<br>
1. Portugal(PRT),
2. Great Britain(GBR),
3. France(FRA),
4. Spain(ESP),
5. Germany(DEU),
6. Italy(ITA),
7. Republic of Ireland(IRL),
8. Belgium(BEL),
9. Brazil(BRA),
10. Netherlands(NLD)
<br>

**PRT has booked most of the hotels.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the chart can certainly help create a positive business impact. Knowing that Portugal (PRT) has the highest number of hotel bookings allows the business to focus marketing efforts and resources on this key market, potentially increasing revenue further. Additionally, understanding the distribution of bookings across the top 10 countries can help in tailoring services and promotions to match the preferences of customers from these regions.

However, the insights could also indicate potential risks. For example, if the business is too reliant on bookings from a few countries, any economic downturn or travel restrictions in those countries could lead to negative growth. Diversifying the customer base and attracting bookings from other regions could mitigate this risk and ensure more stable growth.

#### Chart - 4

In [None]:
# Chart - 4 Arrival Date Month Distribution

# Set the graph
plt.figure(figsize=(15, 5))

# Draw the graph
sns.countplot(data=hb_df, x='arrival_date_month', hue="hotel")
plt.xticks(rotation=45) # Rotate x label
plt.xlabel('Arrival Date Month')
plt.ylabel('Count')
plt.title("Arrival Date Month Distribution")
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a grouped bar chart to visualize the distribution of bookings across different months while also considering the hotel type. This chart type allows for a comparison of booking counts between resort hotels and city hotels within each month, providing insights into any seasonal variations or differences in booking patterns between the two types of accommodations.**

##### 2. What is/are the insight(s) found from the chart?

**The grouped bar chart reveals the distribution of bookings across different months for both resort hotels and city hotels. Insights can be gleaned by examining patterns such as peak booking months, differences in booking behavior between hotel types during specific months, or any consistent trends in booking preferences throughout the year.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies by informing decisions related to seasonal pricing, marketing campaigns, and resource allocation for each hotel type. For example, if the chart shows that resort hotels experience a surge in bookings during summer months, the business can implement targeted marketing promotions or adjust pricing strategies to capitalize on seasonal demand. However, if there are indications of negative growth, such as a decline in bookings for both hotel types during traditionally busy months, it may signal broader economic or industry-related challenges that require strategic adjustments to mitigate negative impacts and stimulate growth.**

#### Chart - 5

In [None]:
# Chart - 5 Week Number vs. Total Stay

# Set graph
plt.figure(figsize=(7,6))

#Draw graph
sns.countplot(x='reserved_room_type', hue='reserved_room_type', data=hb_df, palette='coolwarm', dodge=False)
plt.title('Distribution of Reserved Room Types')
plt.xlabel('Room Type')
plt.ylabel('Number of Bookings')
plt.legend([],[], frameon=False)  # Hide the legend
plt.show()


##### 1. Why did you pick the specific chart?

I chose a bar chart because it effectively displays the frequency of different categories, making it easy to compare the number of bookings across various reserved room types. Bar charts are simple, clear, and ideal for categorical data, allowing quick identification of the most and least popular room types.

##### 2. What is/are the insight(s) found from the chart?

As shown in the bar chart reveals the most and least popular room types based on customer reservations. For example, if the chart shows a particular room type (e.g., "Room Type A") having significantly more bookings than others, this indicates that customers prefer this room type. Conversely, a room type with fewer bookings may suggest it's less desirable to customers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can help create a positive business impact. Understanding which room types are most popular allows the hotel to optimize pricing, availability, and marketing strategies. For example, the hotel might increase the price of highly demanded room types or offer promotions on less popular ones to balance occupancy and revenue.

Insights might also indicate potential issues. For example, if a particular room type consistently shows low bookings, it might suggest customer dissatisfaction with that room type's amenities, location, or price. This could lead to negative growth if not addressed, as unsatisfied customers might choose competitors. Therefore, the hotel might need to investigate why certain room types are less popular and consider improvements or adjustments to better meet customer needs.

#### Chart - 6

In [None]:
# Chart - 6 Customer Type Distribution
# get numbers of customer type
customer_type = hb_df['customer_type'].value_counts()
# print(customer_type)

# Get the label and count for the pie chart
customer_type_label = customer_type.index
customer_type_size = customer_type.values
# print(customer_type_label, customer_type_size)

# create Pie chart
plt.figure(figsize=(8, 6))
plt.pie(customer_type_size, labels=customer_type_label, autopct='%1.1f%%')
plt.title('customer Type Distribution')
plt.axis('equal')
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a pie chart because it effectively visualizes the distribution of different customer types as proportions of a whole. Each slice of the pie represents a customer type, and the size of each slice corresponds to the proportion of that customer type within the entire dataset. This type of chart is ideal for showcasing categorical data and comparing the relative sizes of different categories.**

##### 2. What is/are the insight(s) found from the chart?

**from the pie chart it is clearly visible that transient type customer is more compare to the other different types of customers.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies related to customer segmentation, marketing campaigns, and service offerings. For example, if the pie chart shows that a significant portion of bookings comes from a specific customer type (e.g., transient), the business can tailor its marketing efforts and services to better cater to the needs and preferences of that customer segment, potentially leading to increased customer satisfaction and loyalty. However, if there are indications of negative growth, such as a decline in bookings from high-value customer segments, it may signal a need to reassess marketing strategies, improve customer experiences, or introduce targeted promotions to regain lost customers and prevent further decline in revenue.**

#### Chart - 7

In [None]:
sns.countplot(x='reservation_status', data=hb_df, palette='bright', hue='hotel')
plt.title("Reservation status")
plt.ylabel("Count")
plt.xlabel("Reservation status")
plt.show()

In [None]:
# Chart - 7 Market Segment Distribution

# Geetting values from market aegment.
market_segment = hb_df['market_segment'].value_counts()
# market_segment

# getting label and size or label and value to create pie chart
label = market_segment.index
size = market_segment.values
# print(label, size)

# Creating pie chart of market segment
plt.figure(figsize=(10, 8))
plt.pie(size, labels=label, autopct='%1.1f%%', startangle=190)
plt.title("Market Segment Distribution")
plt.rcParams['font.size'] = 10
plt.axis('equal')
# create tiny boxy on center right on screen
plt.legend(label, loc="center left", bbox_to_anchor=(1, 0, 1, 1))
plt.show()

##### 1. Why did you pick the specific chart?

**I chose pie chart to visualize the distribution of bookings across different market segments because it provides a clear representation of the proportion of bookings attributed to each segment. The simplicity of a pie chart makes it easy to understand and compare the relative sizes of different segments**

##### 2. What is/are the insight(s) found from the chart?

**As graph reveals the distribution of bookings among various market segments. Insights can be gained by examining the relative proportions of each segment. For example, it may show that a significant portion of bookings comes from a particular segment, indicating the importance of targeting marketing efforts or tailoring services to meet the needs of that segment. Additionally, it can highlight any underrepresented segments that may warrant attention or investment to capture a larger share of the market**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies related to marketing, customer segmentation, and product/service offerings. For Example, if the pie chart indicates that a particular market segment accounts for a small portion of bookings, the business can focus on developing targeted marketing campaigns or special promotions to attract customers from that segment, potentially leading to increased revenue. Conversely, if there are indications of negative growth, such as a decline in bookings from key market segments, it may signal a need to reassess marketing strategies or address issues impacting customer satisfaction within those segments to prevent further decline and stimulate growth.**

#### Chart - 8

In [None]:
# Chart - 8 Distribution Channel Distribution

# Getting value count from distribution_channel
distribution_channel = hb_df['distribution_channel'].value_counts()
# distribution_channel

# getting label and size(value) from distribution_channel
label = distribution_channel.index
size = distribution_channel.values
# print(size, label)

# creating pie chart
plt.figure(figsize=(12, 6))
plt.pie(size, labels=label, autopct='%1.1f%%')
plt.legend(label, loc="center left", bbox_to_anchor=(1, 0, 1, 1)) # legent
plt.title("Distribution Channel distribution")
plt.axis('equal')
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a pie chart because it effectively displays the distribution of bookings across different distribution channels as proportions of a whole. This type of chart allows for easy comparison of the relative sizes of each distribution channel and provides a clear visual representation of their contributions to the overall booking volume.**

##### 2. What is/are the insight(s) found from the chart?

**from the pie chart, the insights I gain are. The booking of hotel through Travel Agents/TA/TO(Bookings made through traditional travel agencies) is most prominent and other contribute less to the overall booking volumne.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies related to distribution channel management, marketing efforts, and revenue optimization. For example, if the pie chart shows that a significant portion of bookings comes from a particular distribution channel (e.g., online travel agencies), the business can focus on strengthening partnerships with these channels or investing more resources in targeted marketing campaigns to further leverage their reach and drive additional bookings. However, if there are indications of negative growth, such as a decline in bookings from key distribution channels or an overreliance on a single channel, it may signal a need to diversify distribution channels, enhance direct booking channels, or renegotiate terms with existing partners to mitigate risks and ensure sustainable growth.**

#### Chart - 9

In [None]:
# Chart - 9 Cancellations by Distribution channel

# Enter the number of canceled booked hotel
cancellations_by_channel = hb_df.groupby(['distribution_channel', 'is_canceled']).size().unstack()

# Draw Graph
cancellations_by_channel.plot(kind='bar', stacked=True)
plt.title('Cancellations by Distribution Channel')
plt.ylabel('Number of Bookings')
plt.xlabel('Distribution Channel')
plt.xticks(rotation=45) # Rotate the
plt.show()

##### 1. Why did you pick the specific chart?

I chose stacked bar chart because it effectively displays the proportion of bookings that were canceled versus those that were not, broken down by each distribution channel. This type of chart is ideal for comparing the composition of different categories within a group, making it easier to see which distribution channels have higher or lower cancellation rates relative to their total bookings.

##### 2. What is/are the insight(s) found from the chart?

The stacked bar chart reveals how cancellations are distributed across different distribution channels. For example, if a particular channel (e.g., Online Travel Agents) shows a higher proportion of cancellations compared to other channels, it indicates that bookings through this channel are more likely to be canceled. Conversely, channels with lower cancellation proportions suggest more reliable bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can help create a positive business impact by identifying which distribution channels are most reliable and which are prone to cancellations. The hotel can use this information to negotiate better terms with high-cancellation channels, offer incentives to encourage bookings through more stable channels, or develop strategies to reduce cancellations (e.g., more flexible cancellation policies for certain channels).

If not addressed, high cancellation rates through specific channels could lead to negative growth. For example, a high cancellation rate might indicate dissatisfaction among customers who book through a particular channel, possibly due to misleading information or poor user experience. This could damage the hotel's reputation and lead to a loss of future business. Addressing the root causes of these cancellations is crucial to prevent potential revenue loss and maintain customer satisfaction.

#### Chart - 10

In [None]:

# Chart - 10 Deposit Type Distribution

# Getting values from
deposit_type = hb_df['deposit_type'].value_counts()
# deposit_type

#  geting label and value from deposit_type
label = deposit_type.index
size = deposit_type.values
# print(size, label)

# creating graph
plt.pie(size, labels=label, autopct='%1.1f%%')

# labeling
plt.legend(label, loc='center left', bbox_to_anchor=(1, 0))
plt.title('Deposit Type Distribution')
plt.axis('equal')

# show graph
plt.show()

##### 1. Why did you pick the specific chart?

**I selected a pie chart because it effectively visualizes the distribution of bookings across different deposit types as proportions of a whole. This type of chart allows for easy comparison of the relative sizes of each deposit type and provides a clear representation of their contributions to the overall booking volume.**

##### 2. What is/are the insight(s) found from the chart?

**The pie chart reveals the proportion of bookings attributed to each deposit type. By examining the chart, we can identify which deposit types are the most common and which ones are less frequently used. Additionally, we can compare the relative importance of different deposit types and assess their impact on the business.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this chart can positively impact business strategies related to revenue management and customer acquisition. For example, if the pie chart shows that a significant portion of bookings are made with non-refundable deposits, the business can adjust pricing strategies or introduce incentives to encourage more bookings with refundable deposits, potentially increasing flexibility for guests and reducing cancellation rates. However, if there are indications of negative growth, such as a decline in bookings with refundable deposits, it may signal a need to reassess pricing strategies or address concerns related to booking policies to mitigate potential revenue loss and ensure sustainable growth.**

#### Chart - 11

In [None]:
# Chart - 11 Average Daily Rate (ADR) Distribution
# count the ADR
avg_adr_by_hotel = hb_df.groupby('hotel')['adr'].mean()
#Plot the graph
avg_adr_by_hotel.plot(kind='bar', color='skyblue')
plt.title('Average Daily Rate (ADR) by Hotel')
plt.ylabel('Average ADR')
plt.xlabel('Hotels')
plt.xticks(rotation=0)  #Rotate x axis label
plt.show()

##### 1. Why did you pick the specific chart?

The bar chart was chosen because it provides a clear and straightforward comparison of the average daily rate (ADR) between different hotels. This type of chart is ideal for comparing numerical values across categories (in this case, different hotels) and quickly identifying which hotel has the highest or lowest ADR.

##### 2. What is/are the insight(s) found from the chart?

The bar chart reveals the average daily rate (ADR) for each hotel, showing which hotel charges more on average for a room. For example, if one hotel has a significantly higher ADR than the others, it suggests that this hotel is positioned as a more premium option, potentially offering more luxurious amenities or being located in a more desirable area. Conversely, a lower ADR might indicate a more budget-friendly option or less demand.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can help create a positive business impact. Understanding the ADR differences between hotels allows the hotel management to make informed pricing decisions, optimize revenue management strategies, and better position their properties in the market. For instance, if one hotel has a significantly higher ADR, the management can investigate what factors contribute to this and apply similar strategies to other hotels (e.g., improving amenities, targeting higher-income customers).

If the ADR is too high relative to the perceived value or market conditions, it could lead to negative growth by driving customers to choose competitors with more competitive pricing. Similarly, if the ADR is too low, the hotel may not be maximizing its revenue potential, leading to underperformance. Balancing ADR with market demand and customer expectations is crucial to avoid these pitfalls and ensure sustainable growth.

#### Chart - 12

In [None]:
# Chart - 12 Special Requests vs. Total Stay

# screen setting
plt.figure(figsize=(8,6))

# creating graph
sns.scatterplot(data=hb_df, y='total_stay', x='total_of_special_requests', hue='hotel', size='is_canceled')

# labeling
plt.title('Special Request V/S Total Stay')
plt.xlabel("Total Number of Special Requests")
plt.ylabel("Total Stay")
plt.grid(True)

# show graph
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a scatter plot because it effectively visualizes the relationship between two continuous variables: total special requests and total stay duration. Scatter plots are ideal for identifying patterns, trends, and potential correlations between variables, making them suitable for exploring the association between special requests and the length of stay.**

##### 2. What is/are the insight(s) found from the chart?

**The scatter plot reveals the distribution of data points representing the combination of total special requests and total stay duration for each booking. By examining the chart, we can identify any patterns or trends in the relationship between special requests and total stay. For example, we can observe whether there is a linear or nonlinear association between the two variables, whether there are clusters or outliers, and whether there is any correlation between them.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this scatter plot can positively impact business strategies related to customer service, guest satisfaction, and revenue optimization. For instance, if the scatter plot shows a positive correlation between total special requests and total stay duration, it suggests that guests with longer stays are more likely to make special requests. In such cases, the business can use this information to enhance the guest experience by proactively addressing their needs and preferences, potentially leading to increased satisfaction, loyalty, and positive word-of-mouth recommendations. However, if there are indications of negative growth, such as a weak or negative correlation between special requests and total stay, it may signal missed opportunities to capitalize on longer stays or potential dissatisfaction among guests with fewer special requests. In such cases, the business may need to reevaluate its service offerings, communication strategies, or pricing incentives to better align with guest expectations and maximize revenue potential.**

#### Chart - 13

In [None]:
# Chart - 13 Meal type dustribution

# draw graph
plt.figure(figsize=(15, 5))
hb_df['meal'].value_counts().plot(kind='pie', autopct='%1.1f%%', colors=sns.color_palette('pastel'))

# labeling
plt.legend(hb_df['meal'].value_counts().index, loc='center left', bbox_to_anchor=(1, 0))
plt.title('Meal Type Distribution')
plt.ylabel('')

# show graph
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a pie chart because it effectively visualizes the distribution of meal types chosen by customers as proportions of a whole. This type of chart allows for easy comparison of the relative sizes of each meal type category and provides a clear representation of their contributions to the overall meal choices.**

##### 2. What is/are the insight(s) found from the chart?

**The pie chart reveals the proportion of bookings attributed to each meal type category. By examining the chart, we can identify which meal types are the most popular among customers and which ones are less frequently chosen. Additionally, we can compare the relative importance of different meal types and assess their impact on customer satisfaction and preferences.**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights gained from this pie chart can positively impact business strategies related to dining options, menu planning, and customer satisfaction. For instance, if the pie chart shows that a significant portion of bookings opt for a particular meal type (e.g., breakfast included), the business can tailor its menu offerings, dining packages, and marketing efforts to better cater to customer preferences, potentially leading to increased satisfaction, loyalty, and positive reviews. However, if there are indications of negative growth, such as a decline in bookings for certain meal types or a lack of variety in menu options, it may signal missed opportunities to appeal to diverse customer preferences or dissatisfaction among guests with limited dining choices. In such cases, the business may need to review its menu offerings, explore opportunities to introduce new meal options or dining experiences, and solicit feedback from customers to address any concerns and enhance the dining experience to ensure positive business outcomes.**

#### Chart - 14 - Correlation Heatmap

Lets find the correlation between the numerical data.

In [None]:
# Correlation Heatmap visualization code

# Asscessing only numerical data
numerical_data = hb_df.select_dtypes(include=['int64', 'float64'])
# numerical_data.info()

# Correlation Heatmap of numerical data
corr_data = numerical_data.corr()
# corr_data

# draw graph
plt.figure(figsize=(20, 10))
sns.heatmap(corr_data, annot=True, cmap='coolwarm', linewidths=0.5)

# labeling
plt.title("Correlation Heatmap of Hotel Bookings")

# show graph
plt.show()


##### 1. Why did you pick the specific chart?

**I chose a correlation heatmap because it provides a visual representation of the relationships between different numerical variables in the dataset. This type of chart allows for easy identification of patterns, trends, and potential correlations between variables, making it useful for exploring the underlying structure of the data and identifying key factors that may influence certain outcomes or behaviors.**

##### 2. What is/are the insight(s) found from the chart?

**The correlation heatmap reveals the strength and direction of the relationships between pairs of numerical variables. By examining the heatmap, we can identify which variables are positively or negatively correlated with each other and the strength of these correlations. For example, if two variables have a high positive correlation coefficient (close to 1), it suggests that they tend to increase or decrease together, while a negative correlation coefficient (close to -1) indicates an inverse relationship. Additionally, we can identify variables that are highly correlated with the target variable, which may be important predictors or factors influencing the outcome of interest.**

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

# for this as well, I'll use numerical_data
# draw graph
plt.figure(figsize=(19.8,19.8))
sns.pairplot(numerical_data)

# labeling
plt.title('Pairwise Relastionship in Hotel Booking\n')

# show graph
plt.show()

##### 1. Why did you pick the specific chart?

**I chose a pair plot because it allows for the visualization of pairwise relationships between multiple numerical variables in the dataset. This type of chart is useful for identifying patterns, trends, and potential correlations between variables, making it suitable for exploring the overall structure and relationships within the data.**

##### 2. What is/are the insight(s) found from the chart?

**The pair plot provides a visual overview of the relationships between numerical variables by displaying scatterplots for each pair of variables and histograms along the diagonal. By examining the pair plot, we can identify any linear or nonlinear relationships between variables, as well as detect outliers or clusters in the data. Additionally, we can observe the distribution and spread of individual variables and assess whether any variables exhibit similar patterns or trends across different pairs. This can provide insights into the underlying data structure and help identify key variables that may influence certain outcomes or behaviors.**

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

#### 1. **Optimize Pricing Strategies**: Utilize insights from the analysis of ADR distribution to optimize pricing strategies, adjusting room rates based on demand patterns, seasonality, and customer preferences. This can help maximize revenue while ensuring competitiveness in the market.
#### 2. **Enhance Customer Experience**: Leverage insights from the analysis of total special requests and total stay duration to enhance the guest experience. Implement proactive measures to address guest needs and preferences, such as personalized amenities, efficient check-in/check-out processes, and tailored services.
#### 3. **Diversify Dining Options**: Based on insights from the analysis of meal type distribution, diversify dining options to cater to a wider range of customer preferences. Introduce new menu offerings, dining packages, and promotional offers to attract customers and enhance satisfaction.
#### 4. **Improve Marketing Strategies**: Utilize insights from the analysis of market segment distribution to tailor marketing strategies and promotional campaigns to specific customer segments. Implement targeted marketing initiatives through appropriate channels to reach and engage with different customer segments effectively.
#### 5. **Optimize Booking Processes**: Streamline booking processes and enhance booking flexibility based on insights from the analysis of booking changes distribution. Implement user-friendly booking interfaces, flexible cancellation policies, and dynamic booking options to improve customer satisfaction and increase booking conversion rates.

# **Conclusion**

**In conclusion, by leveraging the insights gained from the exploratory data analysis and visualization techniques, the client can make data-driven decisions to optimize operations, enhance customer satisfaction, and maximize revenue. By focusing on pricing optimization, customer experience enhancement, diversification of dining options, improvement of marketing strategies, and optimization of booking processes, the client can achieve their business objectives effectively and position themselves competitively in the hospitality industry.**

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***