<a href="https://colab.research.google.com/github/mahhem36/EDA_Hotel_booking/blob/main/Copy_of_Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel Booking Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name**            -Hemlata Mahajan


# **Project Summary -**


**Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions!
This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data.**




# **GitHub Link -**

https://github.com/mahhem36/EDA_Hotel_booking

# **Problem Statement**


The objective of this project is to perform an exploratory data analysis (EDA) on a hotel booking dataset and gain insights into various aspects of hotel bookings. The dataset contains information about bookings made at hotels, including features such as booking dates, guest demographics, room types, and booking status.

 **Define Your Business Objective?**

The business objective of this project is to leverage the hotel booking dataset to gain actionable insights that can drive business decisions and improve overall performance in the hotel industry. By conducting a comprehensive analysis of the dataset, the project aims to achieve the following business objectives:

1.  Enhance Revenue Management
2.  Improve Booking Conversion Rates

1.  Identify Market Opportunities
2.  Data-driven Decision Making





# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


### Dataset Loading

In [None]:
# Load Dataset
hotel_df=pd.read_csv("/content/drive/MyDrive/CAPSTONE EDA PROJECT ALMABETTER/DATASETS HOTEL BOOKING/Hotel Bookings.csv")

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
#view the data using first few row
hotel_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
#Dimensions of the datasets
row,columns=hotel_df.shape

print("number of rows in datasets:",row)
print("number of columns in datasets:",columns)

### Dataset Information

In [None]:
# Dataset Info
#getting information of null value and data_types
hotel_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_value=hotel_df.duplicated().sum()
print("duplicated value count:",duplicate_value)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

missing_values_count=hotel_df.isnull().sum()
missing_values_count


In [None]:
# Visualizing the missing values

# Create a bar plot of missing values

plt.figure(figsize=(10,6))
missing_values_count.plot(kind='bar')
plt.title('Missing Values Count')
plt.xlabel('Columns')
plt.ylabel('Count')
# plt.xticks(rotation=45)
plt.show()



**Observations**

So we Have Null values in columns- Company, agent, Country,children.

### What did you know about your dataset?





1. Hotel booking datasets have total 119390 rows and 32 columns are present
2. Allover datasets include only company and agent column have missing value.other columns are perfect
1.   The count of duplicated values in the dataset is 31,994.we have to remove this values.
1.   This dataset contains booking information for a city hotel and a resort hotel.
2.   This dataset contains lot of information regarding hotel bookings like cancelled room booking,customer types,which country belonging peoples,room rates etc






## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hotel_df_columns=hotel_df.columns
print(hotel_df_columns)

In [None]:
# Dataset Describe
hotel_df.describe()

### Variables Description



- hotel: Name of hotel ( City or Resort)
- is_canceled: Whether the booking is canceled or not (0 for no canceled and 1 for canceled)
- lead_time: time (in days) between booking transaction and actual arrival.
- arrival_date_year: Year of arrival
- arrival_date_month: month of arrival
- arrival_date_week_number: week number of arrival date.
- arrival_date_day_of_month: Day of month of arrival date
- stays_in_weekend_nights: No. of weekend nights spent in a hotel
- stays_in_week_nights: No. of weeknights spent in a hotel
- adults: No. of adults in single booking record.
- children: No. of children in single booking record.
- babies: No. of babies in single booking record.
- meal: Type of meal chosen
- country: Country of origin of customers (as mentioned by them)
- market_segment: What segment via booking was made and for what purpose.
- distribution_channel: Via which medium booking was made.
- is_repeated_guest: Whether the customer has made any booking before(0 for No and 1 for Yes)
- previous_cancellations: No. of previous canceled bookings.
- previous_bookings_not_canceled: No. of previous non-canceled bookings.
- reserved_room_type: Room type reserved by a customer.
- assigned_room_type: Room type assigned to the customer.
- booking_changes: No. of booking changes done by customers
- deposit_type: Type of deposit at the time of making a booking (No deposit/ Refundable/ No refund)
- agent: Id of agent for booking
- company: Id of the company making a booking
- days_in_waiting_list: No. of days on waiting list.
- customer_type: Type of customer(Transient, Group, etc.)
- adr: Average Daily rate.
- required_car_parking_spaces: No. of car parking asked in booking
- total_of_special_requests: total no. of special request.
- reservation_status: Whether a customer has checked out or canceled,or not showed
- reservation_status_date: Date of making reservation status.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
# Iterate over each column and display unique values
for column in hotel_df.columns:
    unique_values = hotel_df[column].unique()
    print(f"Unique values for {column}:")
    print(unique_values)
    print()


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

hotel_df.duplicated().value_counts()

In [None]:
hotel_df=hotel_df.drop_duplicates()

In [None]:
hotel_df.shape

In [None]:
hotel_df.isna().sum()

In [None]:
# Filling/replacing null values with 0.
null_columns=['agent','children','company']
for col in null_columns:
  hotel_df[col].fillna(0,inplace=True)


# Replacing NA values with 'Unknown'
hotel_df['country'].fillna('Unknown',inplace=True)

In [None]:
hotel_df.isna().sum()

**Now we have no null values in columns.**





## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

# Univariate Analysis


**1) Which type of hotel is mostly prefered by the guests?**

In [None]:
# Chart - 1 visualization code
# Visualizsing the by pie chart.
hotel_df['hotel'].value_counts().plot.pie(explode=[0.06, 0.06], autopct='%1.1f%%', shadow=True, figsize=(10,8),fontsize=20)
plt.title('Pie Chart for Most Preffered  Hotel')

##### 1. Why did you pick the specific chart?






A pie chart is typically used to display the distribution or proportion of different categories within a dataset

##### 2. What is/are the insight(s) found from the chart?

City Hotel is most preffered hotel by guests. Thus city hotels has maximum bookings.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

yes,the gained insights help creating positive business impact.using these insight we can easily find out city hotel mostly preffered by guest

#### Chart - 2

2)What is the pecentage of cancellation?

In [None]:
# Chart - 2 visualization code
hotel_df['is_canceled'].value_counts().plot.pie(explode=[0.06, 0.06], autopct='%1.1f%%', shadow=True, figsize=(10,8),fontsize=20)
plt.title("Cancellation and non Cancellation")

##### 1. Why did you pick the specific chart?





The pie chart was used in the code example provided because it is a suitable visualization for displaying the distribution of canceled and non-canceled hotel bookings.

##### 2. What is/are the insight(s) found from the chart?

0= not cancled
1= canceled
27.5 % of the bookings were cancelled.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the distribution of canceled and non-canceled hotel bookings can potentially create a positive business impact. Understanding the proportion of cancellations and their impact on revenue allows hotels to implement strategies to minimize cancellations and optimize revenue. Effective resource allocation based on booking status distribution ensures efficient operations. By focusing on reducing cancellations and enhancing the customer experience, hotels can foster positive guest experiences, encourage guest loyalty, and drive growth. However, a high cancellation rate and ineffective revenue management strategies may have negative implications, leading to revenue loss and decreased guest loyalty. By addressing these challenges, hotels can create a positive business impact and foster long-term growth.

#### Chart - 3

**4) What is the Percentage of repeated guests?**

In [None]:
# Chart - 3 visualization code

hotel_df['is_repeated_guest'].value_counts().plot.pie(explode=(0.05,0.05),autopct='%1.1f%%',shadow=True,figsize=(12,8),fontsize=20)

plt.title(" Percentgae (%) of repeated guests")

##### 1. Why did you pick the specific chart?

The pie chart was chosen in the provided code example because it effectively visualizes the distribution of repeated and non-repeated guests in the hotel

##### 2. What is/are the insight(s) found from the chart?



1.   Repeated guests are very few which only 3.2 %.

2.   In order to retained the guests management should take feedbacks from guests and try to imporve the services.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Gained insights on the percentage of repeated guests can help create a positive business impact by highlighting the effectiveness of customer retention strategies and fostering guest loyalty. However, if the proportion of repeated guests is significantly low, it may indicate a lack of guest loyalty and hinder business growth, requiring efforts to enhance retention and attract repeat customers.





#### Chart - 4

**4)What is the percentage of booking changes made by the customer.?**

In [None]:
# Chart - 4 visualization code
booking_changes_df=hotel_df['booking_changes'].value_counts().reset_index().rename(columns={'index': "number_booking_changes",'booking_changes':'Counts'})

plt.figure(figsize=(12,8))
sns.barplot(x=booking_changes_df['number_booking_changes'],y=booking_changes_df['Counts']*100/hotel_df.shape[0])
plt.title("% of Booking change")
plt.xlabel('Number of booking changes')
plt.ylabel('Percentage(%)')

##### 1. Why did you pick the specific chart?



1.   The bar chart was chosen in the provided code example because it effectively visualizes the percentage of booking changes for different categories
2.   The bar chart allows for a clear comparison of different categories of booking changes. Each bar represents a specific number of booking changes, making it easy to observe and compare the frequencies or percentages associated with each category.



##### 2. What is/are the insight(s) found from the chart?

Almost 85% of the bookings were not changed by guests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Gained insights on booking changes can create a positive business impact by improving customer satisfaction and operational efficiency. However, a high percentage of booking changes may lead to negative growth due to increased costs and potential revenue loss.

#### Chart - 5

**What is Percentage distribution of Deposite type ?**

In [None]:
# Chart - 5 visualization code
hotel_df['deposit_type'].value_counts().plot.pie(explode=(0.05,0.05,0.05),autopct='%1.1f%%',shadow=False,figsize=(10,8),fontsize=20,labels=None)
plt.title("% Distribution of deposit type")
labels=hotel_df['deposit_type'].value_counts().index.tolist()
plt.legend(bbox_to_anchor=(1, 1), loc='upper left', labels=labels)


##### 1. Why did you pick the specific chart?

The pie chart was chosen in the provided code example to effectively visualize the distribution of deposit types in the hotel booking dataset. It allows for a clear representation of the proportion or percentage of each deposit type, making it easy to understand the relative prevalence of different categories at a glance. The exploded slices, along with the percentage labels, enhance the visual impact and aid in conveying the distribution information in a concise and visually appealing manner.

##### 2. What is/are the insight(s) found from the chart?

From the chart, the insights are:

1.   The majority of bookings in the dataset do not require a deposit, indicating a flexible booking policy that may attract a wide range of guests.
2.   Non-refundable deposits are prevalent, suggesting a significant proportion of guests are committed to their reservations and less likely to cancel or modify their bookings.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights on deposit types can help create a positive business impact by optimizing deposit policies, enhancing guest satisfaction, and improving revenue management. However, a high prevalence of non-refundable deposits may lead to negative growth as it can deter potential guests who prefer more flexible cancellation options, resulting in potential revenue loss and decreased customer satisfaction.

#### Chart - 6

**In which month most of the bookings happened?**


In [None]:
# Chart - 6 visualization code

# groupby arrival_date_month and taking the hotel count
bookings_by_months_df=hotel_df.groupby(['arrival_date_month'])['hotel'].count().reset_index().rename(columns={'hotel':"Counts"})
# Create list of months in order
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
# creating df which will map the order of above months list without changing its values.
bookings_by_months_df['arrival_date_month']=pd.Categorical(bookings_by_months_df['arrival_date_month'],categories=months,ordered=True)
# sorting by arrival_date_month
bookings_by_months_df=bookings_by_months_df.sort_values('arrival_date_month')

bookings_by_months_df

In [None]:
# set plot size
plt.figure(figsize=(20,8))

#pltting lineplot on x- months & y- booking counts
sns.lineplot(x=bookings_by_months_df['arrival_date_month'],y=bookings_by_months_df['Counts'])

# set title for the plot
plt.title('Number of bookings across each month')
#set x label
plt.xlabel('Month')
#set y label
plt.ylabel('Number of bookings')

##### 1. Why did you pick the specific chart?

The line chart was chosen in the provided code example to effectively visualize the trend and variation in the number of bookings across each month.

##### 2. What is/are the insight(s) found from the chart?

 The chart reveals any seasonal patterns in booking trends, highlighting months with higher or lower booking volumes. This insight can help hotels plan their operations and marketing strategies to optimize occupancy during peak seasons and attract guests during slower periods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights on seasonal variations and booking trends can help create a positive business impact by enabling hotels to optimize occupancy, plan resources effectively, and tailor marketing strategies. However, if the line chart shows consistently declining booking trends, it may lead to negative growth as it indicates a potential decrease in demand, requiring proactive measures to attract bookings and mitigate revenue loss.

#### Chart - 7

**Which Distribution channel is mostly used for hotel bookings?**

In [None]:
# Chart - 7 visualization code

# Visualizsing the by pie chart.


#Creating labels
labels=hotel_df['distribution_channel'].value_counts().index.tolist()

# creating new df of distribution channel
distribution_channel_df=hotel_df['distribution_channel'].value_counts().reset_index().rename(columns={'index':"distribution_channel",'distribution_channel':'count'})

#adding percentage columns to the distribution_channel_df
distribution_channel_df['percentage']=round(distribution_channel_df['count']*100/hotel_df.shape[0],1)

#Creating list of percentage
sizes=distribution_channel_df['percentage'].values.tolist()

#plotting the piw chart
hotel_df['distribution_channel'].value_counts().plot.pie(explode=[0.05, 0.05,0.05,0.05,0.05], shadow=False, figsize=(15,8),fontsize=10,labels=None)

# setting legends with the percentage values
labels = [f'{l}, {s}%' for l, s in zip(labels, sizes)]
plt.legend(bbox_to_anchor=(0.85, 1), loc='upper left', labels=labels)
plt.title(' Mostly Used Distribution Channel for Hotel Bookings ')


##### 1. Why did you pick the specific chart?


1.   The pie chart was chosen in the provided code example to effectively visualize the distribution of distribution channels for hotel bookings

2.    pie chart is well-suited for representing proportions or percentages. In this case, it helps to showcase the percentage distribution of different distribution channels, allowing for a clear visual understanding of the relative prevalence of each channel.



##### 2. What is/are the insight(s) found from the chart?

TA/TO' is mostly(82%) used for booking hoetls.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact by enabling hotels to optimize their distribution strategies, allocate resources effectively, and target their marketing efforts. However, if there is a heavy reliance on a single distribution channel, it can lead to negative growth as it increases vulnerability to changes or disruptions in that channel, potentially resulting in revenue loss and limited market reach. Diversifying distribution channels can help mitigate this risk and ensure a more stable and sustainable business growth.

#### Chart - 8

**Which year had the highest bookings?**

In [None]:
# Chart - 8 visualization code
# set plot size
plt.figure(figsize=(12,8))

#  plot with countplot
sns.countplot(x=hotel_df['arrival_date_year'],hue=hotel_df['hotel'])
plt.title("Year Wise bookings")

##### 1. Why did you pick the specific chart?

The specific chart chosen in the provided code example is a countplot, which is essentially a bar plot that displays the count of observations in each category. This chart was selected because it effectively visualizes the number of bookings for each year and allows for easy comparison between different years. By using different hues for the hotel types, it further provides insights into the distribution of bookings across hotels for each year.

##### 2. What is/are the insight(s) found from the chart?




1.  2016 had the higest bookings.
2.   2015 had less bookings.

overall City hotels had the most of the bookings.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the year-wise bookings countplot can help create a positive business impact. By understanding the increasing bookings trend, hotels can anticipate and plan for growing demand, adjust pricing strategies, and allocate resources effectively. However, if there is a significant variation in bookings between different hotel types, it could lead to negative growth for less popular or declining hotel segments, necessitating strategic adjustments to improve their market position or focus on more profitable segments.

# Bivariate and Multivariate Analysis

In [None]:
# group by hotel
grup_by_hotel=hotel_df.groupby('hotel')

#### Chart - 9

**Which Hotel type has the highest ADR?**

In [None]:
# Chart - 9 visualization code
#grouping by hotel adr
highest_adr=grup_by_hotel['adr'].mean().reset_index()

#set plot size
plt.figure(figsize=(10,8))

# set labels
plt.xlabel('Hotel type')
plt.ylabel('ADR')
plt.title("Avg ADR of each Hotel type")

#plot the graph
sns.barplot(x=highest_adr['hotel'],y=highest_adr['adr'])

##### 1. Why did you pick the specific chart?




The specific chart chosen in the provided code example is a bar plot, which effectively displays and compares the average ADR (Average Daily Rate) of each hotel type. This chart was selected because it allows for a clear visualization of the ADR values for each hotel type, making it easy to identify any variations and compare the rates across different hotels.

##### 2. What is/are the insight(s) found from the chart?

City hotel has the highest ADR. That means city hotels are generating more revenues than the resort hotels. More the ADR more is the revenue.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the average ADR visualization can help create a positive business impact. By understanding the ADR variations across different hotel types, businesses can strategically set pricing strategies, identify opportunities for revenue optimization, and target specific customer segments effectively. However, if there are significant differences in ADR between hotel types and certain types consistently have lower rates, it could lead to negative growth as it may indicate a less profitable market segment or potential loss of revenue opportunities. In such cases, businesses may need to reevaluate their pricing or marketing strategies to address the issue.

#### Chart - 10

**Which hotel type has the more lead time?**

In [None]:
# Chart - 10 visualization code
#group by hotel and taking mean of lead time
avg_lead_time=grup_by_hotel['lead_time'].mean().reset_index()

#set plot size
plt.figure(figsize=(10,8))

# plot the bar plot
sns.barplot(x=avg_lead_time['hotel'],y=avg_lead_time['lead_time'])
# set lables
plt.xlabel('Hotel type')
plt.ylabel('Average Lead time')
plt.title("Average Lead Time for each Hotel type")

##### 1. Why did you pick the specific chart?

The specific chart chosen in the provided code example is a bar plot, which effectively displays and compares the average lead time for each hotel type. This chart was selected because it allows for a clear visualization of the lead time values for each hotel type, making it easy to identify any variations and compare the average lead times across different hotels.

##### 2. What is/are the insight(s) found from the chart?

Resort hotels has slightly high avg lead time. That means customers plan their trips very early.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the average lead time visualization can help create a positive business impact. By understanding the average lead time variations across different hotel types, businesses can optimize their operations, streamline processes, and enhance customer experience, resulting in improved efficiency and customer satisfaction.

#### Chart - 11

**Which distribution channel contributed more to adr in order to increase the the income.?**

In [None]:
# Chart - 11 visualization code

# group by distribution channel and hotel
distribution_channel_df=hotel_df.groupby(['distribution_channel','hotel'])['adr'].mean().reset_index()

# set plot size and plot barchart
plt.figure(figsize=(16,8))
sns.barplot(x='distribution_channel', y='adr', data=distribution_channel_df, hue='hotel')
plt.title('ADR across Distribution channel')

##### 1. Why did you pick the specific chart?

The specific chart chosen in the provided code example is a grouped bar chart, which effectively compares the average ADR across different distribution channels while differentiating the bars by hotel types, allowing for clear visual comparison and analysis of the ADR trends within each distribution channel.

##### 2. What is/are the insight(s) found from the chart?


*   Corporate- These are corporate hotel booing companies which makes bookings
possible.

*  GDS-A GDS is a worldwide conduit between travel bookers and suppliers, such as hotels and other accommodation providers. It communicates live product, price and availability data to travel agents and online booking engines, and allows for automated transactions.

*   Direct- means that bookings are directly made with the respective hotels
TA/TO- means that booings are made through travel agents or travel operators.
*   Undefined- Bookings are undefined. may be customers made their bookings on arrival.


**Observation**


1.   From the plot is clear that
'Direct' and 'TA/TO' has almost equally contributed in adr in both type of hotels i.e. 'City Hotel' and 'Resort Hotel'.
2.  GDS has highly contributed in adr in 'City Hotel' type.
3.   GDS needs to increase Resort Hotel bookings.











##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the ADR visualization across distribution channels can help create a positive business impact. By understanding the ADR variations across different distribution channels, businesses can identify the most profitable channels, optimize pricing strategies, and allocate resources effectively.

#### Chart - 12

**Which distribution channel has the higest cancellation rate?**

In [None]:
# Chart - 12 visualization code

canceled_df=hotel_df[hotel_df['is_canceled']==1] # 1= canceled

#group by distribution channel
canceled_df=canceled_df.groupby(['distribution_channel','hotel']).size().reset_index().rename(columns={0:'Counts'})
# canceled_df['Percentage']=canceled_df['Counts']*100/df1[df1['is_canceled']==1][0]
canceled_df

#set plot size and plot barchart
plt.figure(figsize=(12,8))
sns.barplot(x='distribution_channel',y='Counts',hue="hotel",data=canceled_df)

# set labels
plt.xlabel('Distribution channel')
plt.ylabel('counts')
plt.title('Cancellation Rate Vs Distribution channel')

##### 1. Why did you pick the specific chart?

The specific chart chosen in the provided code example is a grouped bar chart, which effectively compares the counts of canceled bookings across different distribution channels while differentiating the bars by hotel types, allowing for clear visual comparison of the cancellation rates within each distribution channel and across different hotel types.

##### 2. What is/are the insight(s) found from the chart?



1.  In "TA/TO", City hotels has the high cancellation rate compared to resort hotels
2.In "direct" both the hotels has almost same cancellation rate.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the cancellation rate visualization across distribution channels can help create a positive business impact by identifying channels with higher cancellation rates, allowing businesses to optimize their booking policies and strategies to minimize cancellations and increase revenue; however, if certain distribution channels consistently exhibit high cancellation rates, it could lead to negative growth as it may indicate inefficiencies or challenges in those channels that need to be addressed to reduce revenue loss and improve overall performance.

#### Chart - 13

**Which Market Segment has the higest cancellation rate?**

In [None]:
market_segment_df=hotel_df[hotel_df['is_canceled']==1]   # canceled=1
market_segment_df
market_segment_df=market_segment_df.groupby(['market_segment','hotel']).size().reset_index().rename(columns={0:'counts'})   # group by

market_segment_df

In [None]:
#set plotsizde and plot barchart
plt.figure(figsize=(20,8))
sns.barplot(x='market_segment',y='counts',hue="hotel",data= market_segment_df)

# set labels
plt.xlabel('market_segment')
plt.ylabel('Counts')
plt.title('Cancellation Rate Vs market_segment')

##### 1. Why did you pick the specific chart?

The specific chart chosen in the provided code example is a grouped bar chart, which effectively compares the cancellation counts across different market segments while differentiating the bars by hotel types, allowing for clear visual comparison of the cancellation rates within each market segment and across different hotel types.

##### 2. What is/are the insight(s) found from the chart?

**Online T/A' has the highest cancellation in both type of cities**

In order to reduce the booking cancellations hotels need to set the refundable/ no refundable and deposit policies policies

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the cancellation rate visualization across market segments can help create a positive business impact

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

plt.figure(figsize=(18,10))
sns.heatmap(hotel_df.corr(),annot=True)
plt.title('Co-relation of the columns')


##### 1. Why did you pick the specific chart?



*   The specific chart chosen in the provided code example is a heatmap, which effectively visualizes the correlation between columns in the dataset.
*   The heatmap provides a color-coded representation of the correlation values, making it easy to identify strong, weak, or negative correlations between columns.



##### 2. What is/are the insight(s) found from the chart?

From the heatmap of column correlations, the observations are:

1.   There is a strong positive correlation between the variables 'stays_in_week_nights' and 'stays_in_weekend_nights', indicating that longer stays during weekdays are often accompanied by longer stays during weekends.
2.  The variables 'is_canceled' and 'lead_time' show a moderate positive correlation, suggesting that longer lead times may be associated with a higher likelihood of cancellation.

1.   is_repeated guest and previous bookings not canceled has strong corelation. may be repeated guests are not more likely to cancel their bookings.
2.   Adults,childrens and babies are corelated to each other. That means more the people more will be adr.





#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
import seaborn as sns

# Select the columns of interest for the pairplot
columns_of_interest = ['lead_time', 'arrival_date_year', 'stays_in_weekend_nights', 'stays_in_week_nights', 'adults', 'children', 'babies']

# Create the pairplot
sns.pairplot(hotel_df[columns_of_interest])




##### 1. Why did you pick the specific chart?

The specific chart chosen in the provided example is a pairplot, which allows for the visualization of pairwise relationships between variables in the dataset. This chart was selected because it provides a comprehensive overview of the relationships and distributions between multiple variables, making it easy to identify patterns, correlations, and outliers in the data.

##### 2. What is/are the insight(s) found from the chart?

The insights that can be gained from the pairplot of the hotel booking dataset can include:

**Relationship between Variables:** The pairplot allows for visualizing the relationships between different variables. For example, it can reveal if there is a linear or non-linear relationship between variables like lead time and the number of stays in weekends or weekdays.

**Distribution of Variables:** The diagonal of the pairplot shows the distribution of each variable. This can provide insights into the spread, skewness, and outliers present in the dataset. For instance, it can indicate if there are any unusual patterns or concentrations in variables like the number of adults, children, or babies.

By examining the scatter plots and distributions in the pairplot, additional insights specific to the hotel booking dataset can be obtained, depending on the variables selected for visualization.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To achieve the business objective of the hotel booking dataset project, I would suggest the following:

1.  Analyze the insights gained from the EDA to optimize booking strategies. Identify peak booking periods, popular room types, and customer preferences to maximize occupancy and revenue

2.   This can involve adjusting pricing, offering attractive packages or promotions, and targeting specific customer segments effectively.

3.   Improve Customer Experience

4.   This can lead to positive reviews, customer loyalty, and increased word-of-mouth referrals.


5.   Regularly analyze the data, adapt strategies based on market trends and customer feedback, and stay updated with industry best practices to remain competitive.

By implementing these suggestions based on the gained insights, the client can work towards achieving the business objective of maximizing revenue, improving customer satisfaction, and maintaining a competitive edge in the hotel booking industry.







# **Conclusion**

Write the conclusion here.

1.   City hotels are the most preferred hotel type by the guests. We can say City hotel is the busiest hotel.
2.   Most of the customers do not require car parking spaces

1.   BB( Bed & Breakfast) is the most preferred type of meal by the guests.
2.   Maximum number of guests were from Portugal,

1.  Most of the bookings for City hotels and Resort hotel were happened in 2016.
2.  Average ADR for city hotel is high as compared to resort hotels. These City hotels are generating more revenue than the resort hotels.

1.   Booking cancellation rate is high for City hotels which almost 30 %.
2.   Average lead time for resort hotel is high.

1.   Waiting time period for City hotel is high as compared to resort hotels. That means city hotels are much busier than Resort hotels.
2.   Resort hotels have the most repeated guests.

1.   Optimal stay in both the type hotel is less than 7 days. Usually people stay for a week.











### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***