# **Project Name**    -  **HOTEL BOOKING ANALYSIS**



##### **Project Type**    - EDA
##### **Contribution**    - Individual (Shantanu Choudhary)


# **Project Summary -**

This project aims to perform an exploratory data analysis (EDA) on a dataset containing information about hotel bookings. The dataset includes information such as the hotel's location, the time of the booking, the length of stay, and the number of guests.

The objective of this project is to gain insights into the booking patterns and trends of hotel guests, as well as identify any patterns or factors that may influence their booking decisions. 

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**




**The hotel industry is highly competitive, and hotels of all sizes are looking for ways to optimize their revenue management strategies. One key area of focus is understanding guest behavior and preferences when it comes to booking and canceling reservations. By analyzing hotel booking data, we can identify patterns and trends that can help hotels better anticipate cancellations, adjust their pricing strategies, and increase their occupancy rates. Specifically, this project aims to explore the factors that influence cancellations and occupancy rates in order to help hotels make data-driven decisions about their pricing and marketing strategies. By doing so, we hope to provide valuable insights that will help hotels improve their business performance and stay ahead in an increasingly crowded market.**


#### **Define Your Business Objective?**

**The objective of this Hotel Booking Analysis project is to gain insights into customer behavior and preferences, and to identify opportunities for improving hospitality services by analyzing a dataset of hotel booking records. The analysis will focus on answering questions such as: what are the most popular booking months, days of the week, and duration of stay? Which types of hotels and rooms are most commonly booked? Which hotel generated the maximum revenue over the course of 3 years ? What are the cancellation rates and how can we tackle them? Are there any seasonal trends or patterns in hotel booking data? What was the most preferred distribution channel for hotel bookings based on the analysis of the data?**

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import numpy as np
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt
import folium
from folium.plugins import HeatMap
import plotly.express as pltx

### Dataset Loading

In [None]:
# Load Dataset

from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Creating a file path for the dataset

path = '/content/drive/MyDrive/Hotel Bookings Analysis..csv'

In [None]:
# Assigning a dataset to a relevant variable

hotel_df = pd.read_csv(path)

### Dataset First View

In [None]:
# Dataset First Look

hotel_df.head(5)

In [None]:
# Check last 5 rows
hotel_df.tail(5)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

hotel_df.shape

### Dataset Information

In [None]:
# Dataset summary

hotel_df.describe(include = 'all')

In [None]:
# Dataset Info

hotel_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

hotel_df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

hotel_df.isnull().sum()

In [None]:
hotel_df_sample = pd.DataFrame(index = hotel_df.columns)

hotel_df_sample['Datatypes'] = hotel_df.dtypes

hotel_df_sample['Non Null Values'] = hotel_df.count()

hotel_df_sample['Null Values'] = hotel_df.isnull().sum()

hotel_df_sample['Percent Of Null Values'] = hotel_df_sample['Null Values'] / hotel_df_sample['Non Null Values'] * 100

hotel_df_sample

In [None]:
# Visualizing the missing values

plt.figure(figsize = (10,6), dpi = 100)

sns.heatmap(hotel_df.isnull(), cbar = False)


plt.show()

### What did you know about your dataset?

The dataset for hotel booking analysis contains information such as booking dates, check-in/check-out dates, room type, price, customer information, cancellation status, etc. The dataset may consist of thousands of rows and columns, and it is essential to ensure there are no missing or duplicate values in the dataset.

By analyzing this dataset, hotels can gain valuable insights into customer behavior and preferences, allowing them to tailor their services to better meet the needs of their guests. Additionally, by identifying factors that contribute to cancellations, hotels can implement measures to reduce the likelihood of cancellations and improve customer retention.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

hotel_df.columns

In [None]:
# Dataset Describe

hotel_df.describe(include = 'all')

### Variables Description 

1. hotel: indicates the type of hotel (city or resort)

2. lead_time: the number of days between the booking date and the arrival date

3. arrival_date_year: the year of arrival date

4. arrival_date_month: the month of arrival date

5. arrival_date_week_number: the week number of arrival date

6. arrival_date_day_of_month: the day of the month of arrival date

7. stays_in_weekend_nights: number of weekend nights (Saturday/Sunday) the guest stayed or booked to stay at the hotel

8. stays_in_week_nights: number of week nights (Monday to Friday) the guest 
stayed or booked to stay at the hotel

9. adults: number of adults in the booking

10. children: number of children in the booking

11. babies: number of babies in the booking

12. meal: type of meal booked (e.g. Full board, Half board, etc.)

13. country: country of origin of the guest

14. market_segment: market segment designation (e.g. Online Travel Agent, Offline Travel Agent, etc.)

15. distribution_channel: distribution channel through which the booking was made (e.g. Travel Agents, Direct, etc.)

16. is_repeated_guest: binary variable indicating if the guest is a repeated guest or not

17. previous_cancellations: number of previous bookings that were cancelled by the same guest prior to the current booking

18. previous_bookings_not_canceled: number of previous bookings not cancelled by the same guest prior to the current booking

19. reserved_room_type: code of room type reserved

20. assigned_room_type: code for the type of room assigned to the booking

21. booking_changes: number of changes/amendments made to the booking before or during the stay

22. deposit_type: type of deposit made to secure the booking
agent: ID of the travel agency that made the booking

23. company: ID of the company/entity that made the booking or responsible for paying the bill

24. days_in_waiting_list: number of days the booking was on the waiting list before it was confirmed to the customer

25. customer_type: type of booking (e.g. transient, contract, group, etc.)

26. adr: average daily rate (total booking revenue divided by total number of nights)

27. required_car_parking_spaces: number of car parking spaces required by the guest

28. total_of_special_requests: number of special requests made by the guest (e.g. extra beds, late check-in, etc.)

29. reservation_status: current status of the booking (e.g. cancelled, checked-in, no-show, etc.)

30. reservation_status_date: date at which the reservation status was last updated

31. is_canceled: Containing values 0 and 1 where 0 represents non cancelled bookings and 1 representing cancelled bookings.

32. agent: epresents the identifier of the travel agent or booking agent who made the reservation on behalf of the customer. 



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

for col in hotel_df.columns:
  num_unique_values = len(hotel_df[col].unique())

  print(col,':', num_unique_values)
  

## 3. ***Data Cleaning & Feature Engineering***

DATA CLEANING.

1. 
 There are 31999 duplicate values in the above hotel_df dataframe. Dropping all the duplicate values.

In [None]:
# Dropping all the duplicate values

hotel_df.drop_duplicates(inplace = True)

In [None]:
# Checking if duplicate values still exist.

hotel_df.duplicated().sum()

2. Company, country and agent column has a lot of null values.

In [None]:
# Counting the number of null values in the Company column

hotel_df['company'].isna().sum()

In [None]:
# Counting the number of Non Null Values in the Company column

hotel_df['company'].count().sum()

Since in the above country column more than 50% of data is missing it would be more appropriate to drop this column as this data will not be much helpful in our hotel booking analysis.

In [None]:
hotel_df.drop(['company'], axis = 1, inplace = True)

In [None]:
# Checking null values in column agent

hotel_df['agent'].isnull().sum()

In [None]:
# Checking for non null values in column agent

hotel_df['agent'].count()

Dropping the above agent column from the dataframe as it is very unclear what does the integer values in it represents, therefore it is better to drop these column from our analysis.

In [None]:
# Dropping the column agent from the dataframe 

hotel_df.drop(['agent'], axis = 1, inplace = True)


In [None]:
# Checking the null values in the column country

hotel_df['country'].isnull().sum()

Country column has 452 null values. 
It might have happened that while collecting data some the name of some countries might have been missed or not recorded because of some inconsistency.
Therefore what we can do is fill the null values with (Others).

In [None]:
# Filling the null values in the country column with 'others'

hotel_df['country'].fillna('others', inplace = True)

3. The columns "previous_cancellations" and "previous_bookings_not_canceled" might not be relevant to the analysis of hotel bookings, as they only provide information about the past behavior of the customers and not their current booking status. Dropping these columns can help to simplify the dataset and reduce noise, allowing us to focus on more important variables that can impact current bookings. Additionally, these columns may not have a significant impact on the analysis, as they are not directly related to the current booking patterns. Therefore, dropping these columns can help to improve the overall quality and accuracy of the analysis.

In [None]:
hotel_df.drop(['previous_cancellations','previous_bookings_not_canceled'], axis =1, inplace = True)

FEATURE ENGINEERING.

In [None]:
# Adding the necessary columns for data analysis

In [None]:
hotel_df['total_people'] = hotel_df['adults'] + hotel_df['children'] + hotel_df['babies']


hotel_df['total_stay'] = hotel_df['stays_in_weekend_nights'] + hotel_df['stays_in_week_nights']

In [None]:
# Calculating the revenue earned by multiplying adr * total_stay with those bookings which are not cancelled

hotel_df['revenue'] = hotel_df.loc[hotel_df['is_canceled'] == 0, 'adr'] * hotel_df.loc[hotel_df['is_canceled'] == 0, 'total_stay']


## 4 . ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

1. Comparison of guest across 3 years month wise for both City Hotel and Resort Hotel.

In [None]:
# Filter out data for Resort Hotel bookings that were not canceled
resort_data = hotel_df[(hotel_df['hotel'] == 'Resort Hotel') & (hotel_df['is_canceled'] == 0)]

# Filter out data for City Hotel bookings that were not canceled
city_data = hotel_df[(hotel_df['hotel'] == 'City Hotel') & (hotel_df['is_canceled'] == 0)]


In [None]:
# Count the number of guests arriving at Resort Hotels each month
guest_resort = resort_data['arrival_date_month'].value_counts().reset_index()

# Rename the columns to make them more descriptive
guest_resort.columns = ['month','count_of_guest']

# Count the number of guests arriving at City Hotels each month
guest_city = city_data['arrival_date_month'].value_counts().reset_index()

# Rename the columns to make them more descriptive
guest_city.columns = ['month','count_of_guest']


In [None]:
# Merge the guest data for Resort and City Hotels on the 'month' column
merged_hotel_data_for_bookings = guest_resort.merge(guest_city, on='month')

# Rename the columns to make them more descriptive
merged_hotel_data_for_bookings.columns = ['month', 'resort_hotel', 'city_hotel']


In [None]:
# Define a list of month names in the desired order
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

# Convert the 'month' column to an ordered categorical variable with the defined month order
merged_hotel_data_for_bookings['month'] = pd.Categorical(merged_hotel_data_for_bookings['month'], categories=month_order, ordered=True)

# Sort the DataFrame by month
merged_hotel_data_for_bookings = merged_hotel_data_for_bookings.sort_values('month')

# Reset the index of the DataFrame after sorting
merged_hotel_data_for_bookings = merged_hotel_data_for_bookings.reset_index(drop=True)

In [None]:
merged_hotel_data_for_bookings

2. Calculating the average ADR for over the course of 3 years month wise for both City Hotel and Resort Hotel. (Here i have considered only those bookings which were completed i.e excluded the **Cancelled Bookings**.)

In [None]:
# Group the Resort Hotel data by arrival month and calculate the mean ADR value for each month
resort_data1 = resort_data.groupby(['arrival_date_month'])['adr'].mean().reset_index()

# Group the City Hotel data by arrival month and calculate the mean ADR value for each month
city_data1 = city_data.groupby(['arrival_date_month'])['adr'].mean().reset_index()

In [None]:
# Merge the Resort Hotel and City Hotel ADR data on the 'arrival_date_month' column
merged_hotel_data = resort_data1.merge(city_data1, on='arrival_date_month')

# Rename the columns to make them more descriptive
merged_hotel_data.columns = ['month', 'resort_hotel', 'city_hotel']

In [None]:
# Define a list of month names in the desired order
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

# Convert the 'month' column to an ordered categorical variable with the defined month order
merged_hotel_data['month'] = pd.Categorical(merged_hotel_data['month'], categories=month_order, ordered=True)

# Sort the DataFrame by month
merged_hotel_data = merged_hotel_data.sort_values('month')

# Reset the index of the DataFrame after sorting
merged_hotel_data = merged_hotel_data.reset_index(drop=True)

# Print the resulting DataFrame
merged_hotel_data

3. From which country did the guest arrival was maximum from ?

In [None]:
# Filtering the data by inlcuding only non cancelled reservations and then calculating the each country's booking count
country_wise_max_guest = hotel_df[(hotel_df['is_canceled'] == 0)]['country'].value_counts().reset_index()

# Changing the column names
country_wise_max_guest.columns = ['name_of_country','number_of_guest']

# Display the resulting dataframe
country_wise_max_guest

4. Find the number of instances when the rooms assigned matched with the rooms reserved and when they didn't ? Also remember for this take into the considerations all the bookings which were cancelled and non cancelled.

In [None]:
# Checking those instances where reserved room types matched with assigned room types
rooms_match= hotel_df[hotel_df['reserved_room_type'] == hotel_df['assigned_room_type']].shape[0]

# Checking instances where reserved room types does not matches with assigned room type
rooms_unmatch = hotel_df[hotel_df['reserved_room_type'] != hotel_df['assigned_room_type']].shape[0]

# Display the resulting dataframe
rooms_match,rooms_unmatch

# Display the resulting dataframe

5. Which room type was assigned the most by both the hotels ?


In [None]:
# Group hotel_df by assigned_room_type and hotel, and count the number of occurrences
grouped_assigned_room_type = hotel_df.groupby(['assigned_room_type', 'hotel'])['hotel'].size().reset_index(name='count')

# Reshape the data so that there is one row for each assigned room type and columns for each hotel type, with the counts in each cell
most_assigned_room_type = pd.pivot_table(grouped_assigned_room_type, values='count', index='assigned_room_type', columns='hotel').reset_index()

# Show the resulting table with counts of assigned room types per hotel type
most_assigned_room_type

6. Which room type was reserved most by the customers for both hotels ?

In [None]:
# Group the data by 'reserved_room_type' and 'hotel', and count the number of values for each group
grouped = hotel_df.groupby(['reserved_room_type', 'hotel'])['hotel'].size().reset_index(name='counts')

# Create a new DataFrame using pivot_table, with 'reserved_room_type' as the index,
# 'hotel' as the columns, and 'counts' as the values
most_reserved_room_type = pd.pivot_table(grouped, values='counts', index='reserved_room_type', columns='hotel').reset_index()

most_reserved_room_type

7. Which room type generated the maximum revenue throughout the 3 years ?

In [None]:
# Filter out the cancelled bookings (assuming the column is named 'is_canceled')
filtered_hotel_df = hotel_df[hotel_df['is_canceled'] == 0]

# Calculate total revenue and add it to the DataFrame as a new column
filtered_hotel_df['total_revenue'] = filtered_hotel_df['adr'] * filtered_hotel_df['total_stay']

# Group the filtered DataFrame by 'assigned_room_type' and sum the 'total_revenue' column
revenue_by_room_type = filtered_hotel_df.groupby('assigned_room_type')['total_revenue'].sum().reset_index().sort_values(by = 'total_revenue', ascending = True)

# Display the resulting dataframe
revenue_by_room_type


8. How many bookings were cancelled because of mismatch in the rooms reserved vs rooms assigned ?

In [None]:
# Filter the data to only include canceled bookings
canceled_bookings = hotel_df[hotel_df["is_canceled"] == 1]

# Filter the data to only include canceled bookings where the reserved room type and assigned room type do not match
mismatched_bookings = canceled_bookings[canceled_bookings["reserved_room_type"] != canceled_bookings["assigned_room_type"]]

# Count the number of bookings that match this criterion
num_cancellations_due_to_mismatch = len(mismatched_bookings)

# Print the result
print("Number of cancellations due to mismatched room types:", num_cancellations_due_to_mismatch)

9. What are the number of booking changes made by the customer ? Again for this include all the bookings irrespective of whether it was cancelled or not.

In [None]:
# Calculating the booking changes by individual values and storing them in a dataframe
booking_changes_count = hotel_df['booking_changes'].value_counts().reset_index()

# Changing the column names
booking_changes_count.columns = ['no_of_changes','count_of_those_changes']

# Display the resulting dataframe
booking_changes_count

10. Which customer category has generated the maximum revenue ? 

In [None]:
# Grouping the dataframe by customer type and calculating the sum of revenue and then sorting the values
revenue_generated_by_customer_type = hotel_df.groupby(['customer_type'])['revenue'].sum().reset_index().sort_values(by = 'revenue', ascending = True)

# Display the resulting dataframe
revenue_generated_by_customer_type

11. Which market segment has generated the maximum revenue ?

In [None]:
# Grouping the dataframe by market segment and then summing up the revenue and then sorting the values
revenue_by_market_segment = hotel_df.groupby(['market_segment'])['revenue'].sum().reset_index().sort_values(by = 'revenue', ascending = True)

# Display the resulting dataframe
revenue_by_market_segment

12. Which customer category has made the most booking changes ? (Consider here **Booking Changes** > 0)

In [None]:
#Grouping the dataframe according to the customer type for only those booking changes which are greater than 0 and then sorting the values
most_booking_changes_by_customer_type = hotel_df[hotel_df['booking_changes'] > 0].groupby(['customer_type'])['booking_changes'].count().reset_index().sort_values(by = 'booking_changes', ascending = True)

# Display the resulting dataframe
most_booking_changes_by_customer_type

13. Which customer category had the most number of repeated guest ?

In [None]:
# Grouping the dataframe  customer wise by filtering the repeated guest column and then sorting the values
repeated_customer_type = hotel_df[hotel_df['is_repeated_guest'] == 1].groupby(['customer_type'])['is_repeated_guest'].count().reset_index(name='count')

# Display the resulting dataframe
repeated_customer_type


14. Show the bifurcation of cancelled bookings according to distribution channel for both City Hotel and Resort Hotel.

In [None]:
# Filter the cancelled bookings only
cancelled_df = hotel_df.loc[hotel_df['is_canceled'] == 1]

# Group by distribution channel and count the cancellations
cancellations_by_distribution_channel = cancelled_df.groupby(['distribution_channel', 'hotel'])['is_canceled'].count().unstack()

cancellations_by_distribution_channel


15. Show the bifurcation of booking cancellations market segment for both City Hotels and Resort Hotels.

In [None]:
# Filter the cancelled bookings only
cancelled_df = hotel_df.loc[hotel_df['is_canceled'] == 1]

# Group by distribution channel and count the cancellations
cancellations_by_market_segment = cancelled_df.groupby(['market_segment', 'hotel'])['is_canceled'].count().unstack()

cancellations_by_market_segment


16. Show the bifurcation of booking cancellations according to customer category for both City Hotel and Resort Hotel.

In [None]:
# Filter the cancelled bookings only
cancelled_df = hotel_df.loc[hotel_df['is_canceled'] == 1]

# Group by distribution channel and count the cancellations
cancellations_by_customer_type = cancelled_df.groupby(['customer_type', 'hotel'])['is_canceled'].count().unstack()

cancellations_by_customer_type


17. Show the average ADR for each customer category for both City Hotel and Resort Hotel. (Consider over here all the bookings irrespecive whether they wer cancelled or not)

In [None]:
# Group by hotel type and customer type, and calculate average ADR
customer_cum_hotel_wise_adr = hotel_df.groupby(['hotel', 'customer_type'])['adr'].mean().unstack()

# Print the resulting dataframe
customer_cum_hotel_wise_adr

18. Show the average ADR for each market segment for both City Hotel and Resort Hotel. (Again consider here all the bookings irrespective whether they were cancelled or not.)

In [None]:
# Group by hotel type and market segment, and calculate average ADR
average_adr_df_for_market_segment = hotel_df.groupby(['hotel', 'market_segment'])['adr'].mean().unstack()

# Print the resulting dataframe
average_adr_df_for_market_segment

19. Through which distribution channel the maximum bookings were made for both the hotels ?

In [None]:
# Calculating the distribution channel booking for individual channels
max_bookings_through_distribution_channel = hotel_df['distribution_channel'].value_counts().reset_index()

# Changing the column names
max_bookings_through_distribution_channel.columns = ['distribution_channel','count_of_bookings']

# Display the resulting dataframe
max_bookings_through_distribution_channel

19. Which customer category had the most car park requirements as per (different car park slots available) ? Again here consider all the bookings irrespective whether the bookings were cancelled or not ?


In [None]:
# group the data by customer type and required car parking spaces
grouped_df = hotel_df.groupby(['customer_type', 'required_car_parking_spaces'])

# count the number of occurrences of each combination
count_df = grouped_df.size().reset_index()

# Changing the column names
count_df.columns = ['customer_type','required_car_parking_spaces','count_of_parkings_required']

# Display the resulting dataframe
count_df


21. Which customer category had the most car park requirements ? (Again here consider all the bookings irrespective whether they bookings were cancelled or not.)


In [None]:
# Grouping the dataframe customer wise and then calculating the car park requirements
most_car_park_required  = hotel_df.groupby(['customer_type'])['required_car_parking_spaces'].count().reset_index()

# Display the resulting dataframe
most_car_park_required

22. Which customer category had made the most special request ? (Again here consider all the bookings irrespective whether it was cancelled or not.)

In [None]:
# Grouping the dataframe customer wise and then calculating the special requests made by the customers and then sorting those values
max_special_request_by_customer_type = hotel_df.groupby(['customer_type'])['total_of_special_requests'].count().reset_index().sort_values(by = 'total_of_special_requests', ascending = True)

# Display the resulting dataframe
max_special_request_by_customer_type


23. How many bookings were cancelled because of longer waiting days i.e more than 10 days for both the hotels ?

In [None]:
# filter the data to include only cancelled bookings with a waiting time of more than 50 days
cancelled_long_wait = hotel_df[(hotel_df['is_canceled'] == 1) & (hotel_df['days_in_waiting_list'] > 10)]

# count the number of bookings that meet the criteria
num_cancelled_long_wait = len(cancelled_long_wait)

# print the result
print(f"The number of bookings cancelled due to a waiting period of more than 10 days is {num_cancelled_long_wait}")


24. Which meal category was considered the most by the customers of both the hotels ?

In [None]:
# Calculating the individual meal count of each categories for both cancelled and non cancelled bookings
most_meal_type = hotel_df['meal'].value_counts().reset_index()

# Changing the column names for the above
most_meal_type.columns  = ['meal_catgories','meal_count']

# Display the resulting dataframe
most_meal_type

25. Which meal category was (consumed) the most considering only non cancelled bookings ?




In [None]:
#   Filtering the dataframe by considering only non cancelled bookings and then calcualting the individual meal count
most_meal_type_by_non_cancelled_bookings = hotel_df[(hotel_df['is_canceled'] ==  0)]['meal'].value_counts().reset_index()

# Changing the column names for above
most_meal_type_by_non_cancelled_bookings.columns = ['meal_categories','meal_count']

# Display the resulting dataframe
most_meal_type_by_non_cancelled_bookings

26. Which country contributed maximum in terms of revenue generations ?

In [None]:
# Grouping the dataframe country wise by filtering those bookings which are cancelled and then summing up the revenue and then finally soritng the values
country_wise_max_revenue = hotel_df[(hotel_df['is_canceled'] == 0)].groupby(['country'])['revenue'].sum().reset_index().sort_values(by = 'revenue', ascending = False)

# Display the resulting dataframe
country_wise_max_revenue


27. Which hotel generated the maximum revenue in the course of 3 years ?

In [None]:
# Grouping the dataframe hotel wise by filtering out cancelled bookings and then summing up the revenue
hotel_max_revenue  = hotel_df[(hotel_df['is_canceled'] == 0)].groupby(['hotel'])['revenue'].sum().reset_index()

# Display the resulting dataframe
hotel_max_revenue

28. Show the bifurcation of meal categories for each customer type for only non cancelled bookings.

In [None]:
# Filter out the cancelled bookings
non_cancelled_bookings = hotel_df[hotel_df['is_canceled'] == 0]

# Group the data by customer type and meal category, and count the number of bookings
meal_bookings_by_customer_type = non_cancelled_bookings.groupby(['customer_type', 'meal']).size().unstack().reset_index()

meal_bookings_by_customer_type


### What all manipulations have you done and insights you found?

1. When examining the bookings for both hotels over a period of three years, it is evident that the city hotel had a greater number of successful bookings compared to the resort hotel, with the difference between the two widening during the middle years. Additionally, towards the end of the year, there was a decline in successful bookings for both hotels.


2. When considering the average ADR of both city and resort hotels for each month over a period of three years, it is observed that the mean ADR of the city hotel is higher than that of the resort hotel. Furthermore, during the middle months, the ADR of the city hotel remains relatively stable, while there is a significant upward trend in the ADR of the resort hotel, which eventually declines towards the end of the period.

3. It was also discovered that out of a total of 24,056 cancellations, only 617 were due to a mismatch in the rooms assigned versus those reserved by the customers. This accounts for just 2.56% of the total cancellations, suggesting that there may be various other reasons why customers cancel their bookings at both hotels besides room assignment mismatches.


4. The number of bookings that were cancelled due to a waiting period of more than 10 days was 268. When compared to the total number of bookings that were cancelled over the course of these three years, this represents a very small percentage. This suggests that the bookings management system for this hotel is well-managed, as cancellations due to waiting time are relatively low.

5. Among the total number of guests who arrived at the hotel, there were 3,415 repeated guests. Of these repeated guests, the highest number of repetitions came from the transient party category.

6. Although the city hotel had a greater number of cancellations, it generated revenue that was approximately 13 lakhs higher than that of the resort hotel.

7. In the context of hotel booking analysis, finding a 15% mismatch in the rooms reserved versus the rooms assigned by the hotels suggests that there may be issues with the hotel's room allocation process or with the booking system. This may result in customer dissatisfaction, cancellations, or potentially even revenue loss for the hotel. Improvements to the booking system or room allocation process could help reduce the number of mismatches and improve the overall customer experience.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Lineplot (Hotel Booking comparison month wise.)

In [None]:
# Set the figure size and dots per inch (dpi) for the plot
plt.figure(figsize=(10, 6), dpi=100)

# Set the style of the plot to 'darkgrid'
sns.set_style('darkgrid')

# Melt the dataframe to combine the resort and city hotel columns into one column
merged_hotel_data_for_bookings_melted = pd.melt(merged_hotel_data_for_bookings, id_vars=['month'], value_vars=['resort_hotel', 'city_hotel'], var_name='hotel_type', value_name='count')

# Create a bar plot showing the counts of bookings for both resort and city hotels for each month
sns.barplot(data=merged_hotel_data_for_bookings_melted, x='month', y='count', hue='hotel_type')

# Label the y-axis with bold font
plt.ylabel('Count of Bookings', fontdict={'fontsize': 12, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Label the x-axis with bold font
plt.xlabel('Month', fontdict={'fontsize': 12, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Add a bold title to the plot
plt.title('Hotel Bookings by Month', fontdict={'fontsize': 14, 'fontweight': 'bold', 'fontfamily': 'serif'}, pad=20)

# Rotating the xticks
plt.xticks(rotation = 90)

# Show the legend
plt.legend()

# Display the plot
plt.show()


##### 1. Why did you pick the specific chart?

I picked this chart for visualization in Seaborn because it is a line chart that helps me visualize the number of bookings made for a resort hotel and a city hotel over the course of different months. 

Firstly, it provides a clear and concise way to compare the booking trends for different types of hotels. Secondly, it can help me identify any seasonal patterns or fluctuations in hotel bookings, which can be useful for optimizing hotel operations and revenue generation. 

##### 2. What is/are the insight(s) found from the chart?

It is evident that the city hotel had more bookings than the resort hotel throughout the three years of data collection, in every month.
Additionally, the gap between the number of bookings for the city hotel and the resort hotel widens in the middle months of the year.
This suggests that the city hotel is more popular than the resort hotel and that there may be seasonal trends that impact the popularity of the two hotel types differently. 
This insight could be useful for hotel management to understand customer preferences and adjust their marketing and operational strategies accordingly.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.


Yes, the gained insights can help create a positive business impact. For example, the insight that the city hotel had more bookings than the resort hotel throughout the three years and that the gap between the number of bookings for the city hotel and the resort hotel widens in the middle months of the year can help hotel management understand customer preferences and adjust their marketing and operational strategies accordingly. They may focus on advertising the city hotel more in the middle months of the year to increase bookings and revenue.

However, there is no insight provided that leads to negative growth. All of the insights provided in the context are based on observations of booking trends and can help hotel management make data-driven decisions that could result in positive business impact.

#### Chart - 2 Lineplot (Analysis of ADR of both City Hotel and Resort Hotel month wise.)

In [None]:
# Chart - 2 visualization code

# Set the figure size and dots per inch (dpi) for the plot
plt.figure(figsize=(18, 5), dpi=150)

# Set the style of the plot to 'darkgrid'
sns.set_style('darkgrid')

# Create a line plot for the resort hotel and city hotel data
sns.lineplot(data=merged_hotel_data, x='month', y='resort_hotel', label='Resort Hotel')
sns.lineplot(data=merged_hotel_data, x='month', y='city_hotel', label='City Hotel')

# Rotate the x-axis labels by 90 degrees
plt.xticks(rotation=90)

# Label the y-axis with bold font
plt.ylabel('Average ADR', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=12,)

#Label the x-axis with bold font
plt.xlabel('Month', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=12,)


# Add a bold title to the plot
plt.title('Average Daily Rate (ADR) by Hotel Type', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    pad=15,)

# Display the plot
plt.show()


##### 1. Why did you pick the specific chart?

I chose this chart to visualize the average daily rate (ADR) for two types of hotels - the resort hotel and the city hotel. The chart shows the changes in ADR over the course of different months. This chart can be useful to identify trends and patterns in the ADR for each hotel type and to compare the ADR between the two hotels.

##### 2. What is/are the insight(s) found from the chart?

Based on the provided information, it can be inferred that the city hotel has a greater ADR than the resort hotel throughout the three-year period, except for the months of June to September where the ADR for the resort hotel exceeds that of the city hotel. This insight could be useful for hotel management to understand the seasonal trends in pricing and demand for the two types of hotels. They may choose to adjust their pricing strategies during the peak season (June to September) to ensure they are competitive with the resort hotel and capture more revenue.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.



Yes, the gained insights from hotel booking analysis can definitely help create a positive business impact for hotels.

For example, if the analysis shows that the resort hotel has a higher ADR during peak season, the city hotel could adjust its pricing to be more competitive during that time and capture more revenue. Similarly, if the analysis shows that customers are increasingly looking for specific amenities or services, the hotel could consider adding those to its offerings to attract more bookings.


Yes, there can be insights from hotel booking analysis that lead to negative growth. For example, if the analysis shows that there is a declining trend in demand for a certain type of hotel room or service, hotel management may need to adjust their offerings or pricing to remain competitive. Failure to do so could result in a loss of business to competitors and a decrease in revenue.

#### Chart - 3 Basemap

In [None]:
# Chart - 3 visualization code

# Create a base map
Base_map = folium.Map()

# Create a choropleth map that displays the maximum number of guests in each country
Guest_map = pltx.choropleth(country_wise_max_guest, 
                            locations=country_wise_max_guest['name_of_country'],
                            color=country_wise_max_guest['name_of_country'],
                            hover_name=country_wise_max_guest['name_of_country'])

# Display the map
Guest_map.show()

##### 1. Why did you pick the specific chart?

I chose this choropleth map to visualize country-wise data for the maximum number of guests because it provides an intuitive way to display this information on a geographic map. This type of map is particularly useful when analyzing data that is location-based, such as travel and hospitality industry data. By color-coding countries based on the maximum number of guests, I can quickly identify which countries have the highest numbers of guests and which have the lowest.

##### 2. What is/are the insight(s) found from the chart?

The insight from the data analysis is that PRT, GBR, FRA, ESP, DEU, IRL, ITA, and BEL are the top countries in terms of guest arrival at the hotel. This information is valuable to hotel management as it can inform decisions related to marketing, customer service, and staffing. By focusing on the needs and preferences of guests from these countries, hotel management can work to improve the guest experience and tailor their services to meet the specific demands of these guests. 

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights from hotel booking analysis can help create a positive business impact for hotels. By identifying the top countries in terms of guest arrival, hotel management can tailor their marketing efforts to target these countries specifically. This can include creating targeted marketing campaigns in the local language of these countries, partnering with travel agencies in these countries, and offering promotions or discounts specifically for guests from these countries.


It is unlikely that insights from guest arrival data analysis would lead to negative growth for hotels. However, if the analysis shows that guests from certain countries are dissatisfied with certain aspects of the hotel, such as cleanliness or customer service, and those issues are not addressed, it could lead to negative reviews and a decrease in bookings from those countries. 


#### Chart - 4 Histplot (Analysis of Waiting List Days.)

In [None]:
# Chart - 4 visualization code

# Set the figure size and dots per inch (dpi) for the plot
plt.figure(figsize = (10,6), dpi = 100)

# Set the style of the plot to 'whitegrid' with a light grey background
sns.set_style('whitegrid', {'axes.facecolor': '0.95'})

# Plot the histogram using seaborn with blue bars
sns.histplot(data = hotel_df, x = 'days_in_waiting_list', bins = 60, color='blue')

# Set the title and axis labels with bold font
plt.title('Distribution of Days in Waiting List',fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    pad=20,)
plt.xlabel('Days in Waiting List', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15,)
plt.ylabel('Frequency', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15,)

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

I chose to use a histogram to visualize the distribution of days in waiting 
list because it allows me to easily see how the waiting list is distributed across different time periods. By using the histplot function from the seaborn library, I can create a chart that shows the frequency of days in the waiting list in a clear and concise manner. Additionally, by setting the number of bins to 60, I can create a detailed view of the distribution, making it easier to identify any outliers or trends in the data. This chart is useful for identifying how long guests are typically waiting on the waiting list, which can help hotel management optimize their reservation and check-in processes to reduce guest wait times and improve the overall guest experience.

##### 2. What is/are the insight(s) found from the chart?

If 97% of customers had a short waiting period before checking into their rooms, this could suggest that the hotel has an efficient process for managing bookings and check-ins, and most customers were able to access their rooms quickly. Alternatively, it could indicate that the hotel has a policy of overbooking rooms, resulting in a shorter waiting list, which could be a potential source of dissatisfaction for guests. In either case, it is important for the hotel to strike a balance between ensuring a smooth check-in process and providing guests with flexibility and options to manage their booking preferences.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

:Yes, the gained insights from the histogram chart of days in waitingc list can definitely help create a positive business impact for hotels. By understanding the distribution of days in waiting list, hotel management can identify potential bottlenecks in their booking and reservation process and make necessary improvements. For example, if a significant number of guests are consistently waiting for long periods of time, the hotel may need to consider increasing the number of staff or improving their booking technology to reduce wait times and improve the guest experience.

There may be insights from the histogram chart of days_in_waiting_list that lead to negative growth for hotels. For example, if the analysis shows that a significant number of guests are consistently waiting for long periods of time, and the hotel is unable to address this issue, it could lead to negative reviews and a decrease in bookings. 


#### Chart - 5 - Histplot (Analysis of Lead Time.)

In [None]:
# Chart - 5 visualization code
# Set the size of the figure and the resolution in dots per inch (dpi)
plt.figure(figsize = (14,7), dpi = 110)

sns.set_style('darkgrid')

# Create a histogram plot of lead time for both City and Resort hotels with default style
sns.histplot(data=hotel_df, x="lead_time", bins=50)

# Rotate the x-axis labels by 90 degrees for better readability
plt.xticks(rotation = 90)

# Set the y-axis label for the frequency of lead times
plt.ylabel('Frequeny of lead times of both the hotels', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15,)

# Set the x-axis label for the range of different lead times in both the hotels
plt.xlabel('Range of different lead times in both the hotels',fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15,)

# Set the title of the plot with bold font
plt.title('Histplot showing the distribution of lead time over the 3 year period for both City and Resort Hotels',fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    pad=20,)

# Display the plot
plt.show()



##### 1. Why did you pick the specific chart?

I would choose to use a histogram to visualize the distribution of the lead_time variable for several reasons. Firstly, histograms are easy to understand and familiar to most people, as the bars represent the frequency or count of data points that fall within a certain range of values. Secondly, by using a histogram, I can identify patterns and trends in the data, such as whether most of the lead times fall within a certain range, or whether there are peaks or valleys in the distribution.

##### 2. What is/are the insight(s) found from the chart?

Based on the data, it seems that the maximum lead time lies between 0-100. This could provide an interesting insight into hotel booking analysis. Specifically, it suggests that most customers tend to make their bookings within a relatively short time frame before their intended stay. This could have implications for hotel management, as they may want to adjust their pricing or marketing strategies to better target customers who book further in advance or customers who tend to book last minute. Additionally, it may be worthwhile to investigate why customers are making last-minute bookings, as this could provide insights into potential areas for improvement or opportunities to provide better customer service.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights from the analysis of maximum lead time can definitely help create a positive business impact for hotels. 

For example, if the data shows that a significant number of customers tend to book last minute, the hotel may want to consider offering special last-minute deals or promotions to incentivize these customers to book with them rather than their competitors. Alternatively, if the data shows that a significant number of customers tend to book well in advance, the hotel may want to consider offering early booking discounts or special packages to target this segment of customers.

#### Chart - 6 Barplot (Comparing Assigned Room Types between Two Hotels Using a Bar Plot)

In [None]:
# Create a grouped bar chart using pandas built-in plot function
fig, ax = plt.subplots(figsize=(14,6), dpi=110)
most_assigned_room_type.plot(x='assigned_room_type', kind='bar', width=0.8, ax=ax)

# Set axis labels and title with padding

ax.set_xlabel('Assigned Room Type', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15,)
ax.set_ylabel('Count of Bookings',fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15,)
ax.set_title('Assigned Room Type by Hotel', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    pad=20,)

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

I would choose to use this bar plot to visualize the different types of rooms assigned by the city hotel and resort hotel for several reasons. Firstly, the use of bars makes it easy to compare the different categories of assigned rooms and quickly identify which categories are the most common. Secondly, the plot provides a clear and concise overview of the distribution of assigned room types, which can help inform decisions about pricing and resource allocation. Thirdly, the plot is visually appealing and easy to understand, making it suitable for both technical and non-technical audiences. Finally, the plot allows for easy customization, such as changing the color scheme or adding labels, to meet the specific needs of the analysis. 

##### 2. What is/are the insight(s) found from the chart?

This chart provides an interesting insight into the different types of rooms assigned by the city hotel and resort hotel. The bar plot shows that the majority of the rooms assigned by both hotels were of category A, followed by category B. This could indicate that these room types are the most popular or most in-demand among customers. As a result, hotel management may want to consider pricing these room types accordingly to optimize revenue. Additionally, understanding which room types are most popular among customers can help hotels allocate their resources more efficiently, such as ensuring that they have enough staff available to clean and prepare these rooms for the next guest.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact by allowing hotel management to price the most popular room types appropriately and allocate resources more efficiently.

There are no insights provided that would lead to negative growth. However, if the hotel management fails to act on the insights gained and does not adjust their pricing or resource allocation accordingly, it could potentially result in missed revenue opportunities and decreased customer satisfaction.

#### Chart - 7 Barplot (Comparison of Reserved Room Types between Two Hotels using a Bar Plot.)

In [None]:

# Create a grouped bar chart using pandas built-in plot function
fig, ax = plt.subplots(figsize=(14,6), dpi=110)
most_reserved_room_type.plot(x='reserved_room_type', kind='bar', width=0.8, ax=ax)

# Set axis labels and title with padding
ax.set_xlabel('Reserved Room Type', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15,)
ax.set_ylabel('Count of bookings', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15,)
ax.set_title('Reserved Room Type by Hotel',fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    pad=20,)

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

I would choose to use this bar plot to analyze revenue generated by different room categories for both City and Resort hotels in the context of hotel booking analysis. The plot allows for easy comparison of revenue earned by different room types and provides insight into which categories of rooms are the most profitable for the hotels. This information can be useful for hotel management when making decisions regarding pricing and resource allocation.

##### 2. What is/are the insight(s) found from the chart?

Based on the insight that room categories A, D, and E were reserved the most by customers for both hotels, we can make the following observations:

Room categories A, D, and E are the most popular among customers, which could mean that these rooms offer desirable amenities or are priced attractively compared to other room categories.

The popularity of these room categories could also be an indicator of the customer's budget or travel preferences.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights about the most reserved room categories (A, D, and E) can potentially create a positive business impact for the hotels. The hotels can use this information to better understand the preferences of their customers and use it to optimize their pricing strategies and marketing efforts. By highlighting the amenities and features of these popular room categories, the hotels can attract more customers and increase their revenue.


However, there could be potential negative impacts if the hotels become overly reliant on these popular room categories and neglect the other ones. This could lead to a lack of variety in room options, which might turn away customers who are looking for specific features or amenities that are not available in these popular room categories. Additionally, if the hotels increase the prices of these popular room categories too much, it could lead to a decline in customer satisfaction and ultimately hurt the reputation of the hotels. Therefore, it is important for the hotels to use this insight as a guide and not as a sole basis for their business decisions.

#### Chart - 8 Countplot (Analysis of Frequency in Booking Changes)

In [None]:
# Chart - 8 visualization code

# Set the figure size and DPI
plt.figure(figsize=(14,6), dpi=110)

# Set the plot style and the face color of the plot
sns.set_style('darkgrid')

# Create a count plot of the booking changes variable in the hotel_df dataset
sns.countplot(x='booking_changes', data=hotel_df)

# Set the x-axis label with custom font style
plt.xlabel('Number of Booking Changes', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Set the y-axis label with custom font style
plt.ylabel('Frequency', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Set the plot title with custom font style
plt.title('Frequency of Booking Changes',fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, pad= 20)

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

 I would recommend using a countplot to visualize the frequency of booking changes made by customers. A countplot is a clear and concise way to display the distribution of a categorical variable, in this case the number of booking changes. It allows you to easily see the most common number of changes made by customers, as well as any outliers or uncommon values.

##### 2. What is/are the insight(s) found from the chart?

Based on the information you provided, I noticed that the majority of booking changes made were for 0, with a frequency of 70k, followed by 1 change with a frequency of 10k. This suggests that most customers did not make any changes to their bookings after making the initial reservation, indicating that the hotel's booking process is user-friendly and provides sufficient information to customers at the time of booking.

However, it is important to note that there were still a significant number of customers who made at least one change to their booking. As a hotel booking analyst, it would be worth exploring the reasons behind these changes, such as changes in travel plans, preferences, or unexpected circumstances.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

 Yes, the gained insight can help create a positive business impact. By recognizing that the majority of customers did not make changes to their bookings, it indicates that the hotel's booking process is user-friendly and effective, which can lead to increased customer satisfaction and loyalty.

#### Chart - 9 Pie Chart (Analysis of Booking Percentage of both hotels)

In [None]:
# Count the number of bookings for each hotel type
hotel_counts = hotel_df['hotel'].value_counts()

# Define the colors for the pie slices
colors = ['#FFA500', '#1F77B4']

# Create a pie plot with styling
plt.figure(figsize=(10,6), dpi=110)
plt.pie(hotel_counts, labels=hotel_counts.index, colors=colors, explode=[0.05, 0], autopct='%1.1f%%', startangle=90,
        textprops={'fontsize': 12, 'fontfamily': 'sans-serif'}, wedgeprops={'width': 0.5}, shadow=True)

# Set the title and legend of the plot with styling
plt.title('Percentage of Bookings by Hotel Type', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
          pad=15)
plt.legend(loc='upper right', labels=['City Hotel', 'Resort Hotel'], prop={'size': 10, 'family': 'sans-serif'})

# Remove unnecessary chart borders and ticks
plt.box(False)
plt.axis('equal')
plt.tick_params(axis='both', which='both', length=0)

# Display the plot
plt.show()


##### 1. Why did you pick the specific chart?

I would pick a pie chart to analyze the number of bookings by hotel type because it provides a clear and concise visual representation of the data. A pie chart can easily show the proportion of bookings for each hotel type, making it easy to compare the number of bookings between the two hotel types. Additionally, the use of colors in the chart can help to make the data more visually appealing and easier to understand.

##### 2. What is/are the insight(s) found from the chart?

Based on the data you provided, I noticed that the majority of bookings came from the City Hotel, accounting for 61.1% of all bookings, while the remaining bookings came from the Resort Hotel. This suggests that the City Hotel is a more popular choice among customers, which could be due to its location, amenities, or pricing strategy.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insight can help create a positive business impact. By recognizing that the City Hotel is more popular among customers, the hotel management can focus on leveraging this popularity to attract even more customers, potentially through targeted marketing campaigns or promotions. Additionally, understanding why the City Hotel is more popular, such as its location or pricing strategy, can help inform strategic decisions to further optimize revenue and customer satisfaction.

#### Chart - 10 Scatterplot (Analysis of Total stay vs Waiting days list)

In [None]:
# Set figure size and DPI for the plot
plt.figure(figsize=(14,6), dpi=110)

# Set the style of the plot
sns.set_style('darkgrid')

# Create scatter plot with days_in_waiting_list on x-axis and total_stay on y-axis
sns.scatterplot(x='days_in_waiting_list', y='total_stay', data=hotel_df, alpha=0.5, color='#1F77B4')

# Set the y-label and x-label with styling
plt.ylabel('Total Stay', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=12)
plt.xlabel('Waiting Days List', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=12)

# Set the title of the plot with styling
plt.title('Relationship between Total Stay and Waiting Days List', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
          pad=15)

# Add a horizontal line to highlight the average total stay
plt.axhline(y=hotel_df['total_stay'].mean(), color='red', linestyle='--', linewidth=1.5, label='Average Total Stay')

# Add a legend with styling
plt.legend(loc="upper right", prop={'size': 12, 'family': 'sans-serif'})

# Display the plot
plt.show()


##### 1. Why did you pick the specific chart?

In this case, the scatterplot can provide insights into the typical length of stay and waiting times of hotel guests, as well as potential minimum stay requirements or booking behaviors. Additionally, it can help identify any outliers or unusual data points that may warrant further investigation.

##### 2. What is/are the insight(s) found from the chart?

The scatterplot shows the relationship between the number of waiting days and the total number of stay days in a hotel. The majority of data points on the scatterplot are clustered in the lower left portion, indicating that most guests had short stays and did not have to wait long before checking in.



There is a straight line of data points extending up to 70 units on the y-axis, which may indicate a minimum stay requirement of 1-2 nights or suggest that some guests book a room for a minimum number of nights regardless of how long they have to wait to check in.


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

he insights gained from analyzing the scatterplot can potentially help create a positive business impact. For instance, identifying the pattern of short stays and low waiting times can help hotels adjust their pricing and room allocation strategies to maximize revenue. 


However, there are also insights that could potentially lead to negative growth. For example, if the outliers in the scatterplot represent a significant number of guests who had to wait an exceptionally long time before checking in, this could indicate a problem with the hotel's booking and check-in processes. This may lead to negative customer reviews and could potentially result in a loss of business.



#### Chart - 11 Barplot (Comparison of Bookings done through different Distribution Channels)

In [None]:
# Group the data by distribution channel and count the number of bookings made through each channel
max_bookings_through_distribution_channel = hotel_df.groupby('distribution_channel')['distribution_channel'].count().reset_index(name='count_of_bookings')

# Define a custom color palette with bright colors
custom_palette = ['#FF5733', '#FFC300', '#DAF7A6']

# Create a bar plot to compare the number of bookings made through each distribution channel
plt.figure(figsize=(14,6), dpi=100)
sns.set_style('darkgrid')
sns.barplot(
    data=max_bookings_through_distribution_channel,
    x='distribution_channel',
    y='count_of_bookings',
    palette=custom_palette,
    alpha=0.8,  
)

# Rotate the x-axis labels to make them more readable
plt.xticks(rotation=90)

# Add axis labels and a title to the plot
plt.xlabel(
    'Distribution channel',
    fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15
)
plt.ylabel(
    'No of bookings made',
    fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15,
)
plt.title(
    'Comparison of bookings made through different distribution channels',
    fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    pad=20,
)

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

As a hotel analyst, I would recommend picking this chart as it provides valuable insights into the booking behavior of customers through different distribution channels. By visualizing the number of bookings made through each channel, we can identify the most popular and effective channels for the hotel to focus on, and tailor its distribution and marketing strategies accordingly

##### 2. What is/are the insight(s) found from the chart?

The insight gained from this chart is that TA/TO is the most preferred booking channel for customers, accounting for almost 70,000 bookings, followed by Direct with around 12,000 bookings and Corporate with around 6,000 bookings. This suggests that customers are more likely to book through third-party channels such as travel agents or online travel agencies, rather than directly with the hotel or through corporate bookings.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The hotel can prioritize partnerships with popular third-party booking channels to increase visibility and attract more bookings. Additionally, the hotel can explore ways to incentivize customers to book directly through the hotel's website or customer service channels, such as offering exclusive deals or loyalty rewards. 

#### Chart - 12 Pie Chart (Comparing Cancellation Percentages for both Hotels)

In [None]:
# Chart - 12 visualization code

# calculate cancellation percentages for each hotel
cancel_percentages = hotel_df.groupby('hotel')['is_canceled'].mean()

# sort the values to find the hotel with lower cancellation percentage
cancel_percentages = cancel_percentages.sort_values()

# explode the slice with lower cancellation percentage
explode = (0.1, 0)

# create the pie chart
fig1, ax1 = plt.subplots(figsize=(8, 6), dpi=100)
ax1.pie(cancel_percentages, explode=explode, labels=cancel_percentages.index, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('Bookings Cancellation Percentages for Hotels', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    pad=15, )
plt.show()

##### 1. Why did you pick the specific chart?

One reason for picking this pie chart is that it provides a clear visual representation of the cancellation percentages for each hotel. The use of different colors and an exploded slice highlights the contrast between the cancellation percentages of the two hotels. 

##### 2. What is/are the insight(s) found from the chart?

Based on the pie chart, it is evident that the resort hotel has a lower cancellation percentage (43.9%) than the city hotel (56.1%). This suggests that the resort hotel has a more stable demand for bookings compared to the city hotel, which could be due to factors such as location, amenities, or pricing strategy. As a hotel manager, understanding these cancellation percentages can help me optimize my revenue by adjusting pricing or resource allocation accordingly.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from the chart can help create a positive business impact. By identifying the difference in cancellation percentages between the Resort Hotel and the City Hotel, the hotels' management can take appropriate measures to reduce the cancellation rate of the Resort Hotel.

It is possible that the insights gained from the chart can lead to negative growth if the hotels' management fails to take appropriate action to address the higher cancellation rate of the City Hotel.

#### Chart - 13 Barplot (Visualizing Top 10 Revenue Generating Countries with respect to guest arrivals Bar Plot)

In [None]:
# sort dataframe by revenue and select top 10 countries
top_10 = country_wise_max_revenue.sort_values(by='revenue', ascending=False).head(10)

# define colors for bars
colors = ['#FFD700', '#FFA500', '#FF6347', '#ADFF2F', '#00FFFF', '#8B008B', '#FF00FF', '#1E90FF', '#FF1493', '#87CEEB']

# create bar plot
plt.figure(figsize=(12, 7), dpi=100)
sns.set_style('darkgrid')
plt.bar(top_10['country'], top_10['revenue'], color=colors)

# add labels and title
plt.xlabel(
    'Country',
    fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=12,
)
plt.ylabel(
    'Revenue (in lakhs)',
    fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=12,
)
plt.title(
    'Top 10 revenue generating countries with respect to guest arrivals',
    fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    pad=15,
)

# rotate x-labels
plt.xticks(rotation=45)

# display plot
plt.show()


##### 1. Why did you pick the specific chart?

I would pick this bar plot to analyze the top 10 countries generating the maximum revenue. It provides a clear visual representation of the revenue generated by each country, making it easy to identify the countries that contribute the most to the hotel's revenue. The use of different colors for each bar also enhances the clarity of the plot.

##### 2. What is/are the insight(s) found from the chart?

Based on the analysis, it can be inferred that the country PRT (Portugal) is the leading contributor to the hotel booking industry in terms of guest arrival and revenue generation. This can be further studied to identify the reasons behind the popularity of Portugal among travelers.

Additionally, the other top countries in terms of guest arrivals and revenue generation are GBR (United Kingdom), FRA (France), and ESP (Spain). These insights can be leveraged by hotel owners and managers to tailor their services and marketing strategies towards the preferences of travelers from these countries to attract more bookings and increase their revenue.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can definitely help create a positive business impact for the hotel booking analysis. Knowing that the country PRT generates the maximum guest arrivals and revenue for both city and resort hotels, the hotels can focus on targeting and catering to the needs of this particular market segment. They can also analyze the factors that led to the high guest arrivals and revenue from PRT, such as the time of year they prefer to travel, the types of rooms they book, and the amenities they seek, and use that information to create tailored marketing strategies to attract more guests from this market segment.

There do not seem to be any insights that lead to negative growth. However, it is important to keep in mind that insights should always be used as a starting point for further analysis and strategy development. Factors such as seasonality, economic changes, and shifts in consumer preferences can impact business growth, and it is important to continuously monitor and analyze these factors to ensure continued success.

#### Chart - 14 Barplot (Visualizing Revenue Generated by Customer Type using Bar Plot)

In [None]:
# Set the figure size and DPI
plt.figure(figsize=(10, 6), dpi=100)

# Define a custom color palette
custom_palette = ['#FFA500', '#1E90FF']

# Set the style to darkgrid and apply the custom color palette
sns.set_style('darkgrid')
sns.set_palette(custom_palette)

# Create a barplot of revenue generated by customer type
sns.barplot(data=revenue_generated_by_customer_type, x='customer_type', y='revenue', width=0.4) # Decrease bar width here

# Add labels for the x and y axes with styling
plt.xlabel(
    'Customer Type',
    fontdict={'fontsize': 12, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15,
)
plt.ylabel(
    'Revenue Generated',
    fontdict={'fontsize': 12, 'fontweight': 'bold', 'fontfamily': 'serif'},
    labelpad=15,
)

# Add a title for the plot with styling
plt.title(
    'Revenue Generated by Customer Type',
    fontdict={'fontsize': 12, 'fontweight': 'bold', 'fontfamily': 'serif'},
    pad=20,
)

# Calculate the percentages for each bar
totals = []
for i in revenue_generated_by_customer_type['revenue']:
    totals.append(i)

for i, val in enumerate(revenue_generated_by_customer_type['revenue']):
    plt.text(
        i,
        val + 5, # Adjust the position of the text labels
        f"{round(val / sum(totals) * 100, 2)}%",
        horizontalalignment='center',
        fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
        color='black' # Set the text color to white for better visibility
    )

# Remove top and right spines and set the color of remaining spines
sns.despine(top=True, right=True)
ax = plt.gca()
ax.spines['left'].set_color('#888888')
ax.spines['bottom'].set_color('#888888')

# Set the font size and weight for tick labels
plt.xticks(fontsize=12, fontweight='bold', fontfamily='serif')
plt.yticks(fontsize=12, fontweight='bold', fontfamily='serif')

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

As a data analyst, I would select this bar chart to visualize the revenue generated by customer type because it effectively shows the contribution of each customer category to the total revenue. The use of different colors for each bar and the addition of percentage labels on top of the bars enhances the clarity and allows for easy comparison between the different categories.

##### 2. What is/are the insight(s) found from the chart?

Based on the analysis of the revenue generated by customer type, it can be concluded that the Transient customer category is the major revenue driver for the hotel. Transient customers alone accounted for 80.6% of the total revenue, while Transient-Party customers accounted for 12.76%. This indicates that the hotel should focus on attracting and retaining Transient customers, as they are the most valuable segment in terms of revenue generation. However, it is also important to maintain a balance and not neglect other customer categories, as they may provide opportunities for growth and diversification in the long run.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Based on the information provided, the insights gained from analyzing the top 10 countries generating maximum revenue using a bar plot can definitely help create a positive business impact. By identifying the countries that contribute the most to the hotel's revenue, the hotel management can make strategic decisions to allocate resources, adjust pricing strategies, and tailor marketing efforts to target these countries more effectively. This can lead to increased revenue and profitability for the hotel.


However, there is also a possibility that the insights gained from the analysis may lead to negative growth in certain scenarios. For example, if the top 10 countries generating maximum revenue are all heavily impacted by a sudden economic downturn or a natural disaster, the hotel's revenue may decrease significantly if it relies too heavily on these markets. 

#### Chart - 15 Pie Chart (Exploring Revenue Earned by Hotel Type)

In [None]:
# Set the figure size and DPI
plt.figure(figsize=(7,5), dpi=110)

# Define colors for the pie slices
colors = ['#FFD700', '#00BFFF']


# Define the value of explode parameter. Here we are exploding the first slice by 0.05
explode = (0.05, 0)

# Create a pie chart of revenue generated by hotel type with slicing feature (explode)
plt.pie(
    hotel_max_revenue['revenue'],
    labels=hotel_max_revenue['hotel'],
    colors=colors,
    autopct='%1.1f%%',
    startangle=90,
    textprops={'fontsize': 12, 'fontfamily': 'sans-serif'},
    wedgeprops={'width': 0.5},
    explode=explode, # Use explode parameter here
    shadow=True,
)

# Add a title for the plot with styling
plt.title(
    'Percentage of Revenue by Hotel Type',
    fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'},
    pad=15,
)

# Add a legend with styling
plt.legend(loc="upper right", prop={'size': 10, 'family': 'sans-serif'})

# Remove unnecessary chart borders and ticks
plt.box(False)
plt.axis('equal')
plt.tick_params(axis='both', which='both', length=0)

# Display the plot
plt.show()


##### 1. Why did you pick the specific chart?

One reason for picking this pie chart is that it provides a clear visual representation of the revenue percentage for each hotel. The use of different colors and an exploded slice highlights the contrast between the revenue percentages of the two hotels. 

##### 2. What is/are the insight(s) found from the chart?

Based on the above interpretation we can see that City Hotel has generated more revenue than Resort Hotel but the difference in terms of percentage is not much and is hardly 6%.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insight that City Hotel has generated slightly more revenue than Resort Hotel, albeit by a small margin of 6%, can still be useful for creating a positive business impact. Hotels can leverage this information to identify the reasons behind the difference and take actions to further increase revenue, such as offering more attractive deals or promotions for customers to book directly through their website or through preferred channels. However, as the difference in revenue percentage is small, there may not be any significant negative impact on the business.

#### Chart - 16 Bar Plot (Exploring Booking Cancellations  for both City Hotels and Resort Hotels according to Distribution Channel.)

In [None]:
# Set the figure size and DPI
fig = plt.figure(figsize=(14,6), dpi=110)

# Grouped bar plot with city and resort hotels side by side
ax = cancellations_by_distribution_channel.plot(kind='bar', stacked=False, ax=fig.add_subplot(1, 1, 1))

# Add legend outside the plot area
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))

# Add gridlines for better readability
ax.grid(True, axis='y')

# Set the x-label and y-label with padding for better visibility
plt.xlabel('Distribution Channel', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)
plt.ylabel('Cancelled Bookings', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Add a title to the plot
plt.title('Cancelled Bookings by Distribution Channel and Hotel Type', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, pad= 20)

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

A barplot is a good choice in this context because it allows for easy comparison of cancellation rates across different distribution channels, which is the main focus of the analysis.

##### 2. What is/are the insight(s) found from the chart?

Based on the above analysis, the hotel can focus on optimizing its distribution channels to minimize booking cancellations. The (TA/TO) distribution channel seems to have the highest cancellation rate, and efforts can be made to identify the reasons behind it and take corrective actions. The hotel can also explore the possibility of incentivizing customers to book directly through their website or other preferred channels to reduce the dependence on third-party channels with higher cancellation rates.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can help create a positive business impact by allowing the hotel to focus on optimizing its distribution channels and reducing booking cancellations, which can improve overall revenue and profitability. There are no insights that lead to negative growth, as all the analysis and insights provided aim to improve the hotel's performance and customer experience.

#### Chart - 17 Bar Plot (Exploring Cancelled Bookings  for both City Hotels and Resort Hotels according to Market Segment)

In [None]:
# Set the figure size and DPI
fig = plt.figure(figsize=(14,6), dpi=110)

# Grouped bar plot with city and resort hotels side by side
ax = cancellations_by_market_segment.plot(kind='bar', stacked=False, ax=fig.add_subplot(1,1,1))

# Add legend outside the plot area
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))

# Add gridlines for better readability
ax.grid(True, axis='y')

# Set the x-label and y-label with padding for better visibility
plt.xlabel('Market Segment', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)
plt.ylabel('Cancelled Bookings', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Add a title to the plot
plt.title('Cancelled Bookings by Market Segment and Hotel Type', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, pad= 20)

# Save the plot as a high-resolution image
plt.savefig('cancelled_bookings.png', dpi=300, bbox_inches='tight')

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

The grouped bar chart is an effective choice for visualizing the cancellation rates by market segment and hotel type because it allows for easy comparison of the cancellation rates between different market segments and hotel types.

##### 2. What is/are the insight(s) found from the chart?

The finding that the Online TA/TO market segment is most vulnerable to booking cancellations for City Hotels suggests that hotels should focus on addressing the concerns of this segment. One possible reason for the high cancellation rates in this segment could be the lack of transparency in the booking process or unexpected changes in travel plans. Hotels can address these issues by providing more detailed information about their cancellation policies, offering flexible booking options, and communicating more effectively with their customers.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insight can positively impact the hotel's business by optimizing market segments and reducing cancellation rates. Neglecting to address the high cancellation rates of the Offline TA/TO segment can lead to negative growth, resulting in revenue loss and decreased customer satisfaction. Proactive measures are crucial for improving overall business performanc

#### Chart - 18 Bar Plot (Exploring room categories for both City Hotels and Resort Hotels according to Customer Type)

In [None]:
# Set the figure size and DPI
fig = plt.figure(figsize=(14,6), dpi=110)

# Grouped bar plot with city and resort hotels side by side
ax = cancellations_by_customer_type.plot(kind='bar', stacked=False, ax=fig.add_subplot(111))

# Add legend outside the plot area
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))

# Add gridlines for better readability
ax.grid(True, axis='y')

# Set the x-label and y-label with padding for better visibility
plt.xlabel('Customer Type', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)
plt.ylabel('Cancelled Bookings', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Add a title to the plot
plt.title('Cancelled Bookings by Customer Category and Hotel Type', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, pad=20)

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

A bar plot is an effective visualization tool to represent categorical data. In this case, we have two categorical variables, namely "Customer Type" and "Hotel Type," and we want to compare the cancelled bookings between them

##### 2. What is/are the insight(s) found from the chart?

The Transient customer category has the highest cancellation rates for both City Hotels and Resort Hotels.
Hotels should focus on understanding the reasons for higher cancellation rates in the Transient category and take measures to reduce them.
One possible reason for the higher cancellation rate in the Transient category could be the flexibility of the booking policies, which may encourage customers to make bookings and cancel them at a later date.
Hotels should consider targeting group bookings to reduce the risk of cancellations, as the Group customer category has the lowest cancellation rates.
The cancellation rates for the Contract and Group customer categories are relatively stable across different hotel types.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from the hotel booking analysis can have a positive impact on the business. By understanding the reasons for higher cancellation rates in the Transient category and taking measures to reduce them, hotels can improve their revenue management and operational efficiency. For example, hotels can implement stricter booking policies or offer incentives to customers who book non-refundable rates.


However, there are no insights that directly lead to negative growth. Instead, the lack of action or failure to address the higher cancellation rates in the Transient category can result in lost revenue and decreased customer satisfaction. Moreover, if hotels fail to adapt to changing customer behavior or preferences, they may lose market share to competitors who are more responsive to customer needs.





#### Chart - 19 Bar Plot (Exploring ADR for both City Hotels and Resort Hotels for each Customer Type)

In [None]:
# Define a list of colors for each customer type
colors = ['#F08080', '#00BFFF', '#90EE90', '#FFD700']

# Set the DPI and figure size
fig = plt.figure(figsize=(14,6), dpi=110)

# Create a bar plot of average ADR by customer type and hotel
ax = customer_cum_hotel_wise_adr.plot(kind='bar', ax=fig.add_subplot(1, 1, 1), color=colors)

# Set the x-axis label
ax.set_xlabel('Customer Type', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Set the y-axis label
ax.set_ylabel('Average ADR', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Set the plot title
ax.set_title('Average ADR by Customer Type and Hotel', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, pad= 20)

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

A reason for picking a barplot in the above context is that it is an effective way to compare the average ADR of different customer types for each hotel, allowing for easy visualization of any patterns or trends in the data. 

##### 2. What is/are the insight(s) found from the chart?

One possible insight that can be framed from the above analysis is that the City Hotel tends to attract customers who are willing to pay higher prices for their accommodation compared to the Resort Hotel. This could be due to a number of factors such as location, amenities, or overall reputation. This insight can be useful for hotel managers in making pricing decisions and marketing strategies that cater to the preferences and expectations of their target customers. For example, the City Hotel could focus on promoting its premium amenities and services to attract more high-paying customers, while the Resort Hotel could emphasize its scenic location and recreational facilities to appeal to a broader audience with more varied budgets.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

There are no insights in this analysis that lead to negative growth as the analysis does not provide any information that suggests a decline in revenue or demand for either hotel. However, if the hotel managers do not act upon the gained insights and continue with the same pricing and marketing strategies, it may lead to negative growth due to missed opportunities to attract and retain high-paying customers.

#### Chart - 20 Bar Plot (Exploring ADR for both City Hotels and Resort Hotels for each Market Segment)

In [None]:
# Create a list of colors for each market segment
colors = ['#F08080', '#00BFFF', '#90EE90', '#FFD700', '#FFA07A', '#BA55D3', '#B0E0E6', '#F4A460']

# Create a Figure object with the desired DPI
fig = plt.figure(figsize=(10,6), dpi=110)

# Create a bar plot of average ADR by market segment
ax = average_adr_df_for_market_segment.plot(kind='bar', ax=fig.add_subplot(1, 1, 1), color=colors)

# Set the x-axis label
ax.set_xlabel('Market Segment', fontdict={'fontsize': 12, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Set the y-axis label
ax.set_ylabel('Average ADR', fontdict={'fontsize': 12, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Set the plot title
ax.set_title('Average ADR by Market Segment and Hotel', fontdict={'fontsize': 12, 'fontweight': 'bold', 'fontfamily': 'serif'}, pad=20)

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

It allows for easy comparison of the average ADR between the two hotel types for each market segment.

##### 2. What is/are the insight(s) found from the chart?

The insight gained from the analysis is that City Hotels have a higher average ADR than Resort Hotels across all market segments. This suggests that there is a higher demand for City Hotels among customers, and they are willing to pay more for the services provided by City Hotels compared to Resort Hotels. Hotel management can utilize this insight to adjust pricing strategies and marketing efforts to capitalize on this demand and maximize profits. Additionally, this insight can inform decisions related to investments and improvements in both City and Resort Hotels to attract more customers and increase profitability.





##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights that the average ADR is higher for City Hotel compared to Resort Hotel for each market segment can help create a positive business impact. This suggests that there is a higher demand for City Hotel among customers and the hotel management can utilize this information to adjust pricing and marketing strategies to maximize profits.


However, if the Resort Hotel is not able to attract as many customers as the City Hotel, this may lead to negative growth for the Resort Hotel. The management of the Resort Hotel may need to investigate why there is a lower demand for their hotel and take necessary measures to address any issues to ensure sustainable business growth.





#### Chart - 21 Bar Plot (Exploring Different Meal Categories for each Customer Category)

In [None]:

# Create a Figure object with the desired DPI
fig = plt.figure(figsize=(14,6), dpi=110)

# Define a list of colors for each meal category
colors = ['#F08080', '#00BFFF', '#90EE90', '#FFD700', '#FFA07A']

# Create a bar plot of meal categories by customer type, using the colors list
ax = meal_bookings_by_customer_type.plot(x='customer_type', kind='bar', ax=fig.add_subplot(1, 1, 1), color=colors)

# Set the x-axis label
ax.set_xlabel('Customer Type', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Set the y-axis label
ax.set_ylabel('Count Of Meal categories', fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, labelpad=15)

# Set the plot title
ax.set_title('Comparison of different meal categories for different customer categories',fontdict={'fontsize': 10, 'fontweight': 'bold', 'fontfamily': 'serif'}, pad= 20)

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

A bar chart is an appropriate chart type to use in this context because it allows us to compare the number of meal bookings across different customer types and meal categories. 

##### 2. What is/are the insight(s) found from the chart?

In the above context, it can be interpreted that the demand for (BB) meal category is the highest for every customer type followed by (HB) meal category. The other meal categories, such as (FB), (SC), and (Undefined) have relatively low demand among customers.

One possible reason for the high demand for (BB) meal category could be its cost-effectiveness, as it includes breakfast only and is generally less expensive than other meal options. The (HB) meal category, which includes both breakfast and one other meal, is also popular among customers as it offers more value than (BB) but still at a reasonable cost.

On the other hand, (FB) meal category, which includes breakfast, lunch, and dinner, may not be as popular due to its higher cost compared to other meal options. Additionally, some customers may prefer to explore local cuisine or dine outside the hotel, leading to lower demand for (FB) meal category.

The (SC) and (Undefined) meal categories may have low demand due to their limited offerings or unclear definition. Customers may prefer to have more options or a clearer understanding of what is included in their meal plan, leading to lower demand for these categories.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

In this case, the insight that BB meal category is the highest for every customer type followed by HB meal category can have a positive impact on the hotel's revenue. By understanding the customers' meal preferences, hotels can optimize their food and beverage offerings and pricing strategies to attract and retain customers. For example, if a hotel identifies that a majority of its customers prefer BB meal plans, it can create attractive packages and deals that include this meal plan to attract more customers and increase its revenue.

However, there can also be insights that lead to negative growth if not addressed properly. For instance, if the data analysis reveals that a significant number of customers are canceling their bookings due to poor customer service or cleanliness issues, it can result in negative growth. Such insights can harm the hotel's reputation and lead to a loss of customers and revenue in the long run. Therefore, it is important for hotels to identify such negative insights and take necessary corrective measures to address them and ensure customer satisfaction.

#### Chart - 22 Correlation Heatmap

In [None]:
sns.set(style="white")

# Create correlation matrix
corr = hotel_df.corr()

# Define color palette
cmap = cmap=sns.diverging_palette(5, 250, as_cmap=True)

# Define function to magnify table on hover
def magnify():
    return [dict(selector="th",
                 props=[("font-size", "7pt")]),
            dict(selector="td",
                 props=[('padding', "0em 0em")]),
            dict(selector="th:hover",
                 props=[("font-size", "12pt")]),
            dict(selector="tr:hover td:hover",
                 props=[('max-width', '200px'),
                        ('font-size', '12pt')])]

# Apply background gradient and other formatting options
corr.style.background_gradient(cmap, axis=1)\
    .set_properties(**{'max-width': '80px', 'font-size': '10pt'})\
    .set_caption("Hover to magnify")\
    .set_precision(2)\
    .set_table_styles(magnify())


##### 1. Why did you pick the specific chart?

A correlation matrix is a table showing correlation coefficients between variables. Each cell in the table shows the correlation between two variables. A correlation matrix is used to summarize data, as an input into a more advanced analysis, and as a diagnostic for advanced analyses. The range of correlation is [-1,1].

##### 2. What is/are the insight(s) found from the chart?

There is a strong positive correlation (0.95) between total_stay and stays_in_week_nights. This suggests that guests tend to stay longer on weekdays than on weekends.
There is a moderate positive correlation (0.79) between total_stay and stays_in_weekend_nights. This indicates that guests tend to stay more frequently at the hotel during weekends than weekdays.
There is a strong positive correlation (0.80) between total_people and adults. This suggests that most of the guests visiting the hotel are adults.
There is a moderate positive correlation (0.70) between revenue and stays_in_weekend_nights. This implies that the hotel tends to generate more revenue during weekends, presumably due to higher occupancy rates.


Based on these insights, the hotel could potentially adjust its pricing strategies by offering promotional deals or discounts on weekdays to encourage longer stays and increase occupancy during weekdays. Additionally, they could focus marketing efforts on attracting more adult guests, given that the majority of their visitors fall into this demographic. Finally, the hotel could explore ways to further increase revenue during weekends, such as offering additional amenities or services that would appeal to weekend guests.

#### Chart - 23 - Pair Plot 

In [None]:
# Pair Plot visualization code

# Set plot size and resolution
sns.set(rc={'figure.figsize':(10,8)}, font_scale=1.2)
dpi= 100

# Select the columns to plot
cols_to_plot = ['total_people', 'total_stay', 'revenue', 'lead_time', 'stays_in_weekend_nights', 'stays_in_week_nights', 'days_in_waiting_list']

# Create pairplot with hue based on weekend or weekday stays
sns.pairplot(hotel_df[cols_to_plot], corner=True)


##### 1. Why did you pick the specific chart?

Using pairplot, we can visualize the relationships between these variables in a grid of scatterplots. For example, we may be interested in examining the relationships between total_people, total_stay, and revenue.

##### 2. What is/are the insight(s) found from the chart?

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.


*   The hotels should strive to reduce their cancellation rates by improving their services and amenities, as a cancellation rate of 15% is considered to be quite significant. In order to achieve this goal, the hotels may need to focus on enhancing the guest experience and ensuring that their offerings meet the expectations of their guests.


*   Out of the total number of guests, only 3415 are repeat customers, which is relatively low considering the three-year timeframe and the two hotels. This presents an opportunity for the hotels to focus on improving guest loyalty and increasing the number of repeat customers. It is essential to investigate the reasons why guests are not returning and to take steps to address any issues that may be contributing to this trend. By doing so, the hotels can work towards building a loyal customer base and increasing their revenue over time.


* City Hotel was the most preferred hotel in terms of bookings and also in terms of revenue generation was ahead of Resort Hotel.


* To resolve the problem of a mismatch between reserved rooms and assigned rooms, hotels can take steps such as implementing a more comprehensive room assignment process that considers various factors or improving communication between front desk and housekeeping staff. By doing so, hotels can reduce the likelihood of errors and ensure that guests receive the rooms they were expecting.


* The peak season for hotel bookings in both city and resort hotels falls between the months of June to September, and there could be several reasons for this trend. One possible explanation is that these months coincide with summer vacation and holiday periods when many people have more free time to travel. Additionally, the warmer weather during this time of year may make it more appealing for people to take a trip, particularly to beach or resort destinations.This can be considered as an seasonal trend.

*  Based on the analysis, it was found that room types A, D, and E were the most assigned room types in both the City and Resort Hotels. This information can be used by the hotel management to optimize their operations and resources by ensuring that they have an adequate supply of these popular room types. Additionally, the management can also use this information to price the rooms accordingly to maximize revenue. Therefore, the solution to the business objective is to identify the most frequently assigned room types in both the City and Resort Hotels, which are room types A, D, and E.



# **Conclusion**

In conclusion, the EDA project on hotel booking analysis highlights several important insights. The hotels should focus on reducing cancellation rates and improving guest loyalty by enhancing the guest experience. City Hotel had more successful bookings and generated higher revenue than Resort Hotel. The most assigned room types were A, D, and E, which can help optimize resources and pricing. The peak season for hotel bookings is between June to September, and cancellations due to waiting time or room assignment mismatches are relatively low. However, the issue of room assignment mismatches should be addressed as it can result in customer dissatisfaction and revenue loss.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***