# **Project Name**    - Hotel Booking Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name**           -  Rishabh Kumar

# **Project Summary -**

Hotel Booking EDA is an exploratory data analysis project that analyzes hotel booking data. The dataset used in the project contains information on bookings made by guests such as booking dates, lead times, number of adults/children, room types, and cancellation status. The main objective of this EDA is to gain insights into the booking patterns and trends of guests and to identify factors that contribute to booking cancellations. The analysis includes data cleaning, data visualization, and statistical analysis using Python libraries such as pandas, numpy, matplotlib, and seaborn. The findings of this EDA can be useful for hotel managers and marketers to optimize their pricing strategies, room allocation, and cancellation policies.

# **GitHub Link -**

https://github.com/rishabh3000/EDA_Hotel_Booking_Analysis/blob/main/EDA_Project.ipynb

# **Problem Statement**


The hotel industry is a highly competitive market, and hotels must continually update their strategies to remain profitable and attract customers.The aim of this project is to perform an exploratory data analysis (EDA) of hotel booking data to gain insights into guests' booking patterns and trends and to identify factors that contribute to booking cancellations,and also helps hotels to improve then to get more customers.

#### **Define Your Business Objective?**

It appears to be a highly worthwhile and intriguing EDA project. Hotel management and marketers can benefit from investigating customer booking trends and patterns as well as determining the causes of cancellations. Hotels may increase their revenue and client pleasure by optimising their pricing tactics, room distribution, and cancellation procedures.EDA project must include data cleansing, data visualisation, and statistical analysis; it's wonderful to see that you are leveraging well-known Python tools like pandas, numpy, matplotlib, and seaborn to carry out these functions.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
df=pd.read_csv("Hotel Bookings.csv")

### Dataset First View

In [None]:
# Dataset First Look
df.head(5)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info

df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().value_counts()

In [None]:
# Visulizing through Count pot
plt.figure(figsize=(10,8))
sns.countplot(x=df.duplicated());

In [None]:
# now lets drop the missing values 

df=df.drop_duplicates()

In [None]:
df.shape

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
# Visulaizing null values through heatmap.
plt.figure(figsize=(25, 10))
sns.heatmap(df.isnull(), cbar=False, yticklabels=False,cmap='viridis')
plt.xlabel("Name Of Columns")
plt.title("Places of missing values in column")

### What did you know about your dataset?

The dataset contains two hotel types: Resort Hotel and City Hotel.

The booking period ranges from July 2015 to August 2017.

There are two types of bookings: canceled and not canceled.

There are several numerical and categorical variables that describe the booking information, including booking date, arrival and departure dates, number of adults and children, room type, meal type, and market segment.

There are also several variables that describe the hotel information, including hotel location, hotel type, and the number of rooms.

The dataset includes missing/null values in some columns, including the country, agent, and company columns.

The dataset can be used for various purposes, such as exploring booking patterns, identifying seasonal trends, and predicting cancellation rates.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

# total columns in dataset
df.columns 

In [None]:
# Dataset Describe
df.describe()

### Variables Description 

1. hotel: The type of (hotel Resort Hotel or City Hotel).

2. is_canceled: shows that whether the booking was canceled or not (1 if canceled, 0 if not canceled).

3. lead_time: The number of days between the booking date and the arrival date.

4. arrival_date_year: The year of the arrival date.

5. arrival_date_month: The month of the arrival date.

6. arrival_date_week_number: The week number of the arrival date.

7. arrival_date_day_of_month: The day of the month of the arrival date.

8. stays_in_weekend_nights: The number of weekend nights (sat and sun) the costomers stayed hotel.

9. stays_in_week_nights: The number of week nights (mon-fri) the costomers stayed in hotel

10. adults: The number of adults

11. children: The number of children

12. babies: The number of babies

13. meal: The type of meal booked

14. country: The country of origin of the guest.

15. market_segment: The segment the booking was made for (e.g. Online Travel Agents, Corporate, Direct).

16. distribution_channel: The channel through which the booking was made (e.g. Online Travel Agents, Direct).

17. is_repeated_guest: Indicates that whether the booking was made by a repeated guest or not (1 = repeated guest, 0 = not a repeated guest).

18. previous_cancellations: The number of previous bookings that were cancelled by the guest

19. previous_bookings_not_canceled: The number of previous bookings that were not cancelled by the guest.

20. reserved_room_type: The type of room that was reserved by the guest.

21. assigned_room_type: The type of room that was actually assigned to the guest.

22. booking_changes: The changes made to the booking

23. deposit_type: The type of deposit.

24. agent: The travel agency that made the booking.

25. days_in_waiting_list: The number of days the booking was on the waiting list

27. customer_type: The type of booking

29. required_car_parking_spaces: The number of car parking spaces required by the customer.

30. total_of_special_requests: The total number of special requests made by the customer (e.g. twin bed, high floor).

31. reservation_status: The status of the booking (e.g. Canceled, Check-Out, No-Show).

32. reservation_status_date: The date when the reservation status was last updated.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# filling null vlues with zero
null_columns=['agent','children','company']
for col in null_columns:
  df[col].fillna(0,inplace=True)


In [None]:
# Write your code to make your dataset analysis ready.
# replacing N/A values by others
df['country'].fillna('others',inplace= True)


In [None]:
# checking the null values again
df.isna().sum()

In [None]:

# We removed 166 rows from the dataset where no hotel bookings were made
lenght=len(df[df['adults']+df['children']+df['babies']==0])
df.drop(df[df['adults']+df['children']+df['babies']==0].index,inplace=True)

In [None]:
# Adding total staying days in hotels
df['total_stay'] = df['stays_in_weekend_nights']+df['stays_in_week_nights']

# Adding total people num as column, i.e. total people num = num of adults + children + babies
df['total_people'] = df['adults']+df['children']+df['babies']

In [None]:
df[['children', 'company','agent']]=df[['children', 'company','agent']].astype('int64')


# We are converting the data type of the 'reservation_status_date' column to a data_type.
df['reservation_status_date']=pd.to_datetime(df['reservation_status_date'],format = '%Y-%m-%d')

# adding total staying days column in dataset
df['Total_staying_days']=df['stays_in_week_nights']+df['stays_in_weekend_nights']


# adding total number of guests
df['Total_guests']=df['adults']+df['children']+df['babies'] 



### What all manipulations have you done and insights you found?

In this data manipulations for a hotel booking dataset include cleaning the data, handling missing values finding duplicate vlues.

The following data manipulations are done in this data:

1.filling/replacing null values with 0

2.Replacing NA values with 'others

3.Removed 166 rows from the dataset where the sum of the number of adults, children, and babies is zero.

4.Converted datatype of columns 'children', 'company' and 'agent' from float to int.

5.Changed datatype of column 'reservation_status_date' to date time format

6.Total staying days in hotels is added to dataset

7.Added total number of people in columns.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - types Hotels in dataset

In [None]:
# let's check how many hotels belongs which catagery
Hotel_typ =df['hotel'].value_counts()
Hotel_typ

In [None]:
# Chart -  visualization code
plt.subplot(2,2,1 )
Hotel_typ.plot.pie(x='City Hotel', y ='Resort Hotel',autopct='%1.0f%%',textprops={'weight': 'bold'},figsize =(12,12),explode =[0.05]*2) 
plt.title('Hotel type',fontweight="bold", size=20);


##### 1. Why did you pick the specific chart?

Because a pie chart makes it easier to see the percentage of each hotel type in our dataset, I used one to study the different hotel types in our dataset. It is simple to comprehend and offers immediate insights. Viewers can quickly determine the size of each type of hotel by glancing at the pie chart, which also helps them better grasp the data.

##### 2. What is/are the insight(s) found from the chart?

The fact that more guests (61%) in the dataset picked city hotels and fewer (39%) preferred resort hotels indicates that city hotels are in higher demand. Making strategic decisions on how to market, price, allocate resources, and invest in hotels can be done using this information. Businesses might, for instance, increase their marketing efforts for their city hotels, alter their rates, and put more of an emphasis on enhancing their facilities and services. Businesses who do this can better meet the tastes of their customers and boost sales.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the analysis can help create a positive impact on businesses. By focusing on promoting city hotels, adjusting pricing strategies, and improving amenities and services, businesses can potentially attract more customers and retain existing ones. However, the data may reveal negative growth if customers are dissatisfied with cleanliness or if a particular type of hotel is becoming less popular. Therefore, businesses should carefully consider the insights gained from the data analysis to inform strategic decisions and avoid any negative consequences.

#### Chart prefrerred meal type by the guest

In [None]:
#visualization code

# group the data by meal type
meal_data = df.groupby('meal')['hotel'].count()

# create a pie chart
plt.figure(figsize=(10, 7))

meal_data.plot(kind='pie',colors=['green', 'black', 'blue', 'yellow', 'red'], autopct='%1.0f%%', fontsize=16,explode = (0.1, 0.1,0.1,0.1,0.1))

plt.legend(labels=['Bed & Breakfast', 'Full Board','Half Board','SC:Self catering or Room only','Undefined'], loc="best")

plt.title('Types Of hotels in data', fontsize=20)

plt.ylabel('')
plt.xlabel('')
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart is a suitable choice for this data as it displays the percentage of each meal package type in a clear and easy-to-understand manner.

##### 2. What is/are the insight(s) found from the chart?

Hotels and travel agencies can use the graph information to better understand the preferences of their consumers and make decisions about advertising, marketing, inventory management, and pricing. In order for hotels to grow income and improve customer happiness, they should concentrate on the most popular meal package selections, adding new items to it, and offering the best value.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

By concentrating on the most popular meal package options, expanding the menu, and offering the best pricing, the insights from this chart can have a favourable effect on firms in the hotel industry. However, there can be some drawbacks if hotels place an undue emphasis on particular meal package alternatives and don't offer enough diversity for guests who prefer other options. This can lead to a loss and lower client satisfaction.

#### Chart -  The types of Hotel Bookings by Customers

In [None]:
#visualization code

# group the data by customer type
customer_data = df.groupby('customer_type')['hotel'].count()

# create a horizontal bar chart
plt.figure(figsize=(8, 6))

customer_data.plot(kind='barh', color=['orange', 'red', 'lightgreen', 'lightblue'], edgecolor='black')

plt.title('Customer Types in Hotel Bookings Data', fontsize=16)
plt.xlabel('Count', fontsize=12)
plt.ylabel('Customer Type', fontsize=12)

plt.show()

##### 1. Why did you pick the specific chart?

A horizontal bar graph would be a clear and effective way to convey the information about the distribution of bookings across different customer types and their corresponding characteristics, making it easier for hotels to understand their customers and develop more effective marketing strategies.


##### 2. What is/are the insight(s) found from the chart?

The graph demonstrates that transient and transient-party customers make the majority of reservations, whereas contract and group customers make the least amount of reservations. It's been a while since I've done this, but I've been thinking about it a lot lately, and I've been thinking about how to get my hands on a copy of the book.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

If used properly, the information obtained from the chart can undoubtedly contribute to having a beneficial company impact. If hotels fail to make effective use of the information gleaned from the chart, there could be significant consequences. For instance, if hotels concentrate all of their marketing efforts on Transient and Transient-Party guests, they may neglect other types of consumers and lose out on prospective business.

#### Chart -About Parking space

In [None]:
#  visualization code
df['required_car_parking_spaces'].value_counts().plot.pie(explode=[0.05]*5, autopct='%1.1f%%',shadow=False,figsize=(10,8),fontsize=15,labels=None)

labels=df['required_car_parking_spaces'].value_counts().index
plt.title('% Distribution of required car parking spaces')
plt.legend(bbox_to_anchor=(0.85, 1), loc='upper left', labels=labels);

##### 1. Why did you pick the specific chart?

A pie chart is an effective way to visually represent the distribution of hotel bookings with and without parking spaces. The chart can show the proportion of bookings with parking spaces and those without, making it easy to compare the two categories.

##### 2. What is/are the insight(s) found from the chart?

The graphic indicates that the majority of reservations do not include a parking space, which may indicate that hotel guests do not prioritise parking or that hotels are not meeting demand. Nonetheless, there are still a sizable number of reservations for parking spaces, indicating that hotels would need to provide this service to draw in particular kinds of visitors.

For instance, if a hotel is situated in a location with few parking alternatives, providing parking spaces may draw more clients and boost sales.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The learned lessons may have a beneficial business effect by assisting hotels in making decisions about offering parking places to their visitors. Yet, failing to give parking places when there is a need for them might stunt growth because it may make customers unhappy enough to book with rival companies who do. Hence, when choosing their amenities and services, hotels should take the need for parking places into account.

#### Chart - Deposit Types for Hotel Bookings

In [None]:

#visualization code

deposit_data = df['deposit_type'].value_counts()

plt.figure(figsize=(10,7))
deposit_data.plot(kind='bar', color=['pink', 'orange', 'red'])
plt.title('Distribution of Deposit Types', fontsize=16)
plt.xlabel('Deposit Types', fontsize=14)
plt.ylabel('Number of Bookings', fontsize=14)
plt.xticks(rotation=0)
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is a good way to compare the percentage of hotel bookings that require different types of deposits. The chart shows that most bookings don't require a deposit, with non-refundable deposits being popular among some customers and refundable deposits being less common.

##### 2. What is/are the insight(s) found from the chart?

The graph shows that non-refundable deposits are more frequent than refundable deposits and that the majority of hotel reservations do not require a deposit. This means that even if some consumers are ready to pay a non-refundable deposit, many customers would prefer not to. With the use of the chart, hotels may better understand their patrons' preferences and modify their reservation procedures accordingly. For instance, hotels might decide to encourage non-refundable deposits to clients who prefer them or provide more flexible deposit options.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The chart's insights can, as already said, have a good business influence for hotels. The chart's findings could, however, have the opposite effect if hotels don't change their reservation procedures. A hotel can lose out on bookings from clients who would prefer not to pay a deposit or who would be ready to pay a non-refundable deposit, for instance, if they only accept refundable deposits. In a similar vein, hotels that do not provide flexible deposit alternatives risk losing potential reservations from clients who demand it.

#### Chart - Hotel Wise Bookings Month

In [None]:
# groupby arrival_date_month and taking the hotel count
bookings_by_months_df=df.groupby(['arrival_date_month'])['hotel'].count().reset_index().rename(columns={'hotel':"Counts"})
# Create list of months in order
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
# creating df which will map the order of above months list without changing its values.
bookings_by_months_df['arrival_date_month']=pd.Categorical(bookings_by_months_df['arrival_date_month'],categories=months,ordered=True)
# sorting by arrival_date_month
bookings_by_months_df=bookings_by_months_df.sort_values('arrival_date_month')
bookings_by_months_df.value_counts()

In [None]:
# visualization code

plt.figure(figsize=(20,8))

sns.lineplot(x=bookings_by_months_df['arrival_date_month'],y=bookings_by_months_df['Counts'])
plt.title('Number of bookings across each month')
plt.xlabel('Month')
plt.ylabel('Number of bookings');

##### 1. Why did you pick the specific chart?

A line graph is a suitable choice for illustrating changes in hotel bookings over the course of the year because it is an effective tool for showcasing trends and patterns over time.

##### 2. What is/are the insight(s) found from the chart?

The chart shows seasonal trends and patterns in hotel bookings throughout the year. The number of bookings increases from January to May, peaks in July and August, but drops in September and October. Bookings in November and December are significantly lower. By understanding these trends, hotels can develop strategies to attract more customers during periods of low demand and maximize revenue during periods of high demand. For example, giving some extra features during off-season reconstructing prizes
for example Hotels can increase there prices and more marketing efforts during periods of high demand, such as July and August.hotels can offer discounts and promotions during periods of low demand, such as September and October, to attract more customers.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

As shown above, the chart's insights can contribute to beneficial business effects. But, hotels risk experiencing negative growth if they do not modify their tactics based on these seasonal trends. For instance, if a hotel charges high rates during times of high demand, they might not draw as many guests, which would result in lesser revenue. Moreover, guests might decide to book with rival hotels.

#### Chart- Yearwise Number of booking

In [None]:
# visualization code

plt.figure(figsize=(12,8))

#  plot with countplot
sns.countplot(x=df['arrival_date_year'],hue=df['hotel'])
plt.title("Year Wise bookings");

##### 1. Why did you pick the specific chart?

A bar chart is an easy-to-read and informative graph that allows us to quickly see the year wise booking of the hotels. It provides a clear visual representation of the distribution of the data and makes it easy to compare the different year wise booking of the hotels.

##### 2. What is/are the insight(s) found from the chart?

The chart's key finding is that higher hotel reservations were made in the years after 2016. This information is vital for hotels and travel companies to identify booking trends and alter their marketing efforts to attract more customers. To stay competitive, they might need to pinpoint the causes of this drop, such as alterations in the market or sector. In order to succeed in the sector, they must also continuously track booking trends and modify their approaches as necessary.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The insight gained from the chart can help hotels and travel companies understand the trends in booking and adjust their marketing strategies to suit customer demand. By using this information to develop targeted promotional campaigns and pricing strategies, hotels can attract more customers and increase their revenue.

#### Chart - Parking Spaces in Hotel Bookings

In [None]:
# visualization code
df['required_car_parking_spaces'].value_counts().plot.pie(explode=[0.05]*5, autopct='%1.1f%%',shadow=False,figsize=(10,8),fontsize=15,labels=None)

labels=df['required_car_parking_spaces'].value_counts().index
plt.title('% Distribution of required car parking spaces')
plt.legend(bbox_to_anchor=(0.85, 1), loc='upper left', labels=labels);

##### 1. Why did you pick the specific chart?

A pie chart is an effective way to visually represent the distribution of hotel bookings with and without parking spaces. The chart can show the proportion of bookings with parking spaces and those without, making it easy to compare the two categories.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that a majority of the bookings do not have a parking space, suggesting that parking spaces may not be a top priority for hotel guests or hotels may not be fulfilling the demand. However, there is still a significant number of bookings with parking spaces, suggesting that hotels may need to offer this amenity to attract certain types of guests.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially have a positive business impact as they can help hotels make informed decisions about providing parking spaces as an service to their guests.However, not providing parking spaces when there is a demand for it can lead to negative growth, as it may result in dissatisfied customers who may choose to book with competitors who offer this service. Therefore, hotels should consider the demand for parking spaces when making decisions about their amenities and service

#### Chart - Waiting time

In [None]:
# visualization code

grouped_by_hotel = df.groupby('hotel') #creating a DF which store groupby 'Hotel' rows
Waiting_df = pd.DataFrame(grouped_by_hotel['days_in_waiting_list'].agg(np.mean).reset_index().rename(columns = {'days_in_waiting_list':'avg_waiting_period'}))
plt.figure(figsize = (8,8))
sns.barplot(x = Waiting_df['hotel'], y = Waiting_df['avg_waiting_period'] )
plt.show();

##### 1. Why did you pick the specific chart?

A bar chart is an effective way to visually the waiting time between the city hotel and resort hotels.  The chart can show the the effictive time taken to book the hotel and making it easy to compare the two categories.

##### 2. What is/are the insight(s) found from the chart?

The insights which i found from the chart is City hotel has significantly longer waiting time, hence City Hotel is much busier than Resort Hotel. 

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

This information is important for hotels and travel companies to understand booking trends and adjust their marketing strategies to attract more customers. Customers didnot want to wait to much thats  why with the help of this strategies will be made to attract the customers

#### Chart - The average length of stay for guests in different types of hotels?

In [None]:
# visualization code

# Create a bar plot 
sns.barplot(x='hotel', y='stays_in_week_nights', data=df, estimator=np.mean)

# Set labels and title
plt.xlabel('Hotel Type')
plt.ylabel('Average Length of Stay (in nights)')
plt.title('Average Length of Stay by Hotel Type')

# Display the plot
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is an easy-to-read and informative graph that allows us to quickly see the number of hotels in each category. It provides a clear visual representation of the distribution of the data and makes it easy to compare the different types of hotels.

##### 2. What is/are the insight(s) found from the chart?

Looking at the graph, we can see that Resort hotels have a longer average length of stay compared to City hotels. This could be due to the fact that Resort hotels are often located in vacation destinations where people go for longer periods of time, while City hotels are more frequently used for business trips or shorter stays.
It's important for hotels to understand these differences and market their properties and attract the right types of guests,like for example resorts could offer longer-stay packages or activities to cso that guests will extended vacation, while city hotels could focus on providing better services for guests who are often on the go.

The guests who are looking for longer stays resort shuld add some extra amenities such as spa, golf course, or beach activities.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

There are no insights that directly indicate negative growth. It's essential to keep in mind that the insights obtained from this graph are just a part of big data so and they should look along with other data and factors that affect the business.

#### Chart -From Where the most guests are coming 

In [None]:
# visualization code
plt.figure(figsize = (10,5))

sns.barplot (y= list(df.country.value_counts().head (10)), x= list(df.country.value_counts().head(10).index))
plt.title("Number of bookings country wise",fontweight="bold", size=20);

##### 1. Why did you pick the specific chart?

From this bar plot we can easily get to know from Where the most guests are coming. thats why i choose this graph for this reason

##### 2. What is/are the insight(s) found from the chart?

Most of the guests are coming from portugal i.e more 25000 guests are from portugal abbreevations for countries.PRT- Portugal GBR- United Kingdom FRA- France ESP- Spain DEU - Germany ITA -Itlay IRL - Ireland BEL -Belgium BRA -Brazil NLD-Netherlands. These the place from where the guest are coming.


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

From this graph we easily get to know that most of the guest are coming from portugal,united kingdom and France . according to this chart we can easily set our businnes ideas according to it 

#### Chart -The most preferred room type by the customers

In [None]:
# visualization code
plt.figure(figsize=(12,8))

#plotting 
sns.countplot(x=df['assigned_room_type'],order=df['assigned_room_type'].value_counts().index)
#  set xlabel for the plot
plt.xlabel('Room Type')
# set y label for the plot
plt.ylabel('Count of Room Type')
#set title for the plot
plt.title("Most preferred Room type");

##### 1. Why did you pick the specific chart?

Bar chart good choice for question because it can show frequency or count of each room type clear, vertiacl way can show long labels without overlap or cut, and it's easy to compare which room type most popular. Vertical bar chart extra good for this question because room types have long names that could be hard to read on vertical chart. With vertical chart, can read full name and see booking amount easily. So, separate charts for city and resort hotels good idea too because can find out what room types are popular for each type of hotel and compare how they're different.

##### 2. What is/are the insight(s) found from the chart?

Chart show room types for city and resort hotels. We can see that standard double room most popular for both types of hotels. City hotels have more executive rooms and suites, while resort hotels have more family rooms and holiday apartments. Deluxe rooms not so popular for both hotels. City hotels have more room type categories than resort hotels. This info help hotel managers and marketers know which room types people book more and target marketing and promotions better.


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Looking at room types can help hotels make more money by knowing which rooms people book more. If lots of people book expensive rooms, hotels can focus on that. But if people only book cheap rooms, hotels need to be careful to not lose money. Knowing which rooms people book helps hotels do better business.

#### Chart - Are there any patterns in cancellations


In [None]:

# visualization code

# Create a new dataframe with cancelled bookings only
cancelled_df = df[df['is_canceled'] == 1]

cancelled_df = cancelled_df.loc[:, ['reservation_status', 'reservation_status_date']]
cancelled_df['reservation_status_date'] = pd.to_datetime(cancelled_df['reservation_status_date'])
cancelled_df['month'] = pd.DatetimeIndex(cancelled_df['reservation_status_date']).month


# Create a 'month' column using the reservation_status_date column
cancelled_df['month'] = pd.DatetimeIndex(cancelled_df['reservation_status_date']).month

# Create a countplot of cancellations by month
plt.figure(figsize=(12,6))
sns.countplot(x='month', data=cancelled_df)

# Set labels and title
plt.xlabel('Month')
plt.ylabel('Number of Cancellations')
plt.title('Cancellations by Month')

# Display the plot
plt.show()


##### 1. Why did you pick the specific chart?

I choose this graph to visualize the seasonal patterns in cancellations because it is a simple and effective way to show the frequency of cancellations in each month. In this graph we can esaily see if there are any months with a higher or lower number of cancellations compared to the rest of the year.

##### 2. What is/are the insight(s) found from the chart?

We can see from the bar graph that cancellations follow seasonal patterns. The months of January, February, and December have the largest amount of cancellations, while the months June, July, and August have the lowest number of cancellations.
We can observe from the graph that several factors have an impact on cancellations at various periods of the year. example in winter month the cancellations are common may be beacuse weather condition or holiday plans While there are more travel plans and vacation days during the summer, there are fewer cancellations.

Knowing cancellation trends can aid hotels in better planning their staffing and inventory levels. By taking into account the ups and downs in cancellations, they can better manage their resources and offer superior customer service.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Hotels may more effectively arrange their resources and workforce throughout the year with the help of the information gathered from the graph. Failure to take these patterns into account, however, could result in excess or understaffing as well as over or understocking of inventory, which could have detrimental effects on the firm such as decreased customer satisfaction and revenue loss. Hence, it is crucial that hotels take these findings into account when developing their overall business plan.

#### Chart - 14 - Correlation Heatmap

In [None]:
num_df = df[['lead_time','previous_cancellations','previous_bookings_not_canceled','booking_changes','days_in_waiting_list','adr','required_car_parking_spaces','total_of_special_requests','total_stay','total_people']]


In [None]:
# Correlation Heatmap visualization code
corrmat = num_df.corr()
f, ax = plt.subplots(figsize=(12, 7))

sns.heatmap(corrmat,annot = True,fmt='.2f', annot_kws={'size': 10},  vmax=.8, square=True);

Total stay length and lead time have slight correlation. This may means that for longer hotel stays people generally plan little before the the actual arrival.

adr is slightly correlated with total_people, which makes sense as more no. of people means more revenue, therefore more adr.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

Based on the insights gained from the EDA on hotel booking data, the following recommendations can be made to the client to achieve their business objectives: • To attract more guests and make more money, consider offering special deals and promotions during times when there aren't many people booking hotels.

• Adjust pricing strategies based on seasonal trends to remain competitive in the industry.

• Provide more parking spaces or prioritize parking as an amenity to attract certain types of guests.

• Provide a wider variety of meal options that can accommodate the various dietary requirements and personal tastes of guests.

• Create marketing plans that are tailored to specific types of hotel bookings, such as those made by people traveling for work or those traveling for pleasure.

• Offer flexible deposit options to accommodate different payment preferences of guests.

• Look at the booking patterns by month and year to understand when demand is high and low. This will help in predicting future demand and making necessary adjustments in hotel operations.

• Keep track of cancellation trends and develop plans to reduce cancellations, such as providing more flexible cancellation policies or offering incentives for guests to rebook their stay.

• Create different types of rooms that are designed to meet the unique needs and preferences of different groups of customers.

• Examine the variety of special requests made by guests and focus on fulfilling the most frequent requests to enhance guest satisfaction and loyalty.

• Determine the most profitable months for hotels and adjust pricing strategies accordingly.

• Identify the most common booking channels and focus marketing efforts on those channels to increase bookings.

• Analyze the customer review data to identify areas of improvement and enhance the guest experience.

• Use data on customer demographics and booking patterns to create targeted promotions and loyalty programs to retain customers and drive repeat business.

# **Conclusion**

After examining the hotel booking dataset, we discovered some significant facts. The data covered two categories of hotels: resort hotels and city hotels, with city hotels being the most popular. The majority of the visitors were adults; there weren't many infants or little children.


Compared to city hotels, resort hotels typically had longer booking wait times, and the majority of visitors did not choose meal packages. Recreational bookings were more common than business travel arrangements.

The majority of guests did not select meal packages at resort hotels, which often had lengthier booking wait periods than city hotels. Bookings for leisure travel were more prevalent than for business travel.

Depending on the type of hotel, the average length of stay varied, with resort hotels having longer stays. Particularly for hotels in cities, cancellations were frequent. Also, we observed distinct booking trends for leisure and business tourists.

Lastly, some accommodation kinds were more popular than others, and different booking types and consumer segments had varied particular needs.

Overall, these findings can assist lodging establishments and travel agencies in strengthening their marketing plans and customising their offerings to better match the demands of visitors. For instance, hotels might change their price plans during busy seasons, enhance room selections depending on customer preferences, and provide more incentives to draw business travellers.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***