<a href="https://colab.research.google.com/github/sanwre001/EDA-project-/blob/main/EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel booking analysis 



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Team Member 1 -** Avinash Jat
##### **Team Member 2 -**
##### **Team Member 3 -**
##### **Team Member 4 -**

# **Project Summary -**

The study of hotel booking data is important for any hospitality business because it provides insights into customer booking behavior and the channels through which bookings are made. This project involves analyzing a dataset of hotel bookings for city and resort hotels. The project activities are categorized as follows:



1.   Defining the problem statement: The objective of the study is determined.
2.   Collecting and preparing data: The data is cleaned, outliers are treated,And any necessary data preparation is performed.
3.   Performing exploratory data analysis: A deep study of the relationship between different features is conducted, and new variables are generated as needed to align with the related business objectives. The data is presented in an easily understandable form.
4.   Providing observations and recommendations: Based on the EDA, observations are made and recommendations are given to the business.

Overall, the project aims to provide valuable insights into hotel booking behavior that can be used to improve the hospitality business's operations and increase revenue.






# **GitHub Link -**

https://github.com/sanwre001/EDA-project-

# **Problem Statement**


This project aims to study different aspects of hotel booking behavior to help solve various business challenges faced by the hotel. The study includes: 
1. Understanding where guests come from to increase footfall from those countries and retain them through personalized contact even after they leave.
2. Analyzing monthly booking patterns to identify the busiest quarter/trimester of the year and adjust manpower, promotions, and prices to maximize occupancy during lean periods. 
3. Studying the duration of guests' stays to provide a smart pricing structure that encourages longer stays. 
4. Analyzing meal patterns to optimize kitchen inventory and cafeteria staffing. 
5. Examining booking patterns by market segment to reward customers who provide more business. 
6. Analyzing the average daily rate's dependence on the season, room type, and allotment of demanded room types to optimize pricing for higher occupancy. 
7. Studying booking cancellation patterns based on various parameters like hotel type, lead time, season, deposit, etc., to plan for overbooking and devise penalty amounts. 
8. Analyzing customer retention patterns based on hotel type to improve the experience and retain customers. 

Overall, this analysis aims to help the hotel improve operations, increase revenue, and provide better customer experiences.

#### **Define Your Business Objective?**

Maximise Bookings, Minimise Cancellations , Maximise Customer retention as well as stay longevity.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd

# Importing visualization libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import folium

import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

df = pd.read_csv('/content/drive/MyDrive/Hotel Bookings.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape[0] , df.shape[1]

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(df[df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
temp_df=df.isnull().sum().reset_index().rename(columns={'index':'Columns',0:'Null Values'})
# Checking Null Value by plotting Bar Graph
px.bar(temp_df,x='Columns',y='Null Values',width= 1000, height= 500,text_auto=True)

### What did you know about your dataset?

The given dataset is from the hotel industry, and our objective is to explore and analyze the data to discover important factors that influence bookings. The dataset has 119,390 rows and 32 columns. All columns except for four (children, country, agent, and company) have no null values. However, the dataset has 31,994 duplicate values. Our goal is to analyze the dataset by exploring the data provided under various column headings to gain insights that can help improve the hotel's operations and increase revenue.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description 

* **Hotel:**  H1= Resort Hotel, H2= City Hotel
* **is_canceled** : If the booking was canceled(1) or not(0)
* **lead_time** : Number of days that elapsed between the entering date of the  booking into the PMS(Property Management System) and the arrival date
* **arrival_date_year** : Year of arrival date.
* **arrival_date_month** : Month of arrival date.
* **arrival_date_week_number** : Week number for arrival date.
* **arrival_date_day_of_month**: Which day of the months guest is arriving.
* **stays_in_weekend_nights**: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel.
* **stays_in_week_nights**: Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel.
* **adults** : Number of adults.
* **children** : Number of children.
* **babies** : Number of babies.
* **meal**: kind of meal opted for.
* **country** : Country code.
* **market_segment**: Through which channel hotels were booked.
* **distribution_channel** : How the customer accessed the stay- Corporate Booking/Direct/TA.TO
* **is_repeated_guest** : The values indicating if the booking name was from a repeated guest (1) or not (0).
* **previous_cancellations** : Was there a cancellation before.
* **previous_bookings_not_canceled** : Count of previous bookings not cancelled.
* **reserved_room_type** : Code of room type reserved.
* **assigned_room_type** : Code for the type of room assigned to the booking.
* **booking_changes** : Count of changes made to booking.
* **deposit_type** : Deposit type.
* **agent** : If the booking happens through agents or not.
* **company** : If the booking happens through companies, the company ID that made the booking or responsible for paying the booking.
* **days_in_waiting_list** : Number of days the booking was on the waiting list before the confirmation to the customer.
* **customer_type** : Booking type like Transient – Transient-Party – Contract – Group.
* **adr** : Average Daily Rates that described via way of means of dividing the sum of all accommodations transactions using entire numbers of staying nights.
* **required_car_parking_spaces** : How many parking areas are necessary for the customers.
* **total_of_special_requests** : Total unique requests from consumers.
* **reservation_status**: The last status of reservation, assuming one of three categories: Canceled – booking was cancelled by the customer; Check-Out;No-Show.
* **reservation_status_date**: The last status date.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for item in list(df.columns):
  print(f"Column name: {item} - No. of unique values: {df[item].nunique()}")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Firstly lets make a copy of our df to work on
copy_df=df.copy()

In [None]:
# Dataset First Look
copy_df.head()

In [None]:
# The first thing we need to do to make our data clean is delete duplicate values from the dataset
copy_df.drop_duplicates(inplace=True)
copy_df.shape

In [None]:
# Now checking percenatge of null values for each column
100*(copy_df.isna().sum()/copy_df.shape[0]).sort_values(ascending=False)

In [None]:
# Here we can see that the company column has approx 94% missing data, which is very high so we should drop that column
copy_df.drop(columns=['company'],inplace=True)

In [None]:
# Now for other columns they have less % of null values, so we can just replace them.
copy_df.agent.fillna(0,inplace=True)
copy_df.country.fillna('Others',inplace=True)
copy_df.children.fillna(0,inplace=True)

In [None]:
# Now again checking for Missing Values/Null Values Count
temp_df=copy_df.isnull().sum().reset_index().rename(columns={'index':'Columns',0:'Null Values'})
px.bar(temp_df,x='Columns',y='Null Values',width= 1000, height= 500,text_auto=True)

In [None]:
#data set info 
copy_df.info()

In [None]:
# In the above info we can see the datatypes of all columns are correct except for children and agent.
# So to make further operations easy we should change their datatypes to suitable types.
copy_df['children']=copy_df['children'].astype(int)
copy_df['agent']=copy_df['agent'].astype(int) 

In [None]:
copy_df[['children','agent']].info()

In [None]:
# Now babies, adults, and childrens can't be zero at the same time, so we can drop all the observations having zero at the same time.
copy_df=copy_df[~((copy_df['adults']==0) & (copy_df['children']==0) & (copy_df['babies']==0))]

In [None]:
copy_df.shape

In [None]:
# Adding new columns for analysis
copy_df['total_stay_nights']=copy_df['stays_in_week_nights']+copy_df['stays_in_weekend_nights']
copy_df[['stays_in_week_nights','stays_in_weekend_nights','total_stay_nights']]

In [None]:
# Creating seperate datasets for resort and city hotel
resort_df=copy_df[copy_df['hotel']=='Resort Hotel']
city_df=copy_df[copy_df['hotel']=='City Hotel']

### What all manipulations have you done and insights you found?



*   	the dataset was first copied to avoid making changes to the original data. 
*   The data cleaning process was then carried out, which involved removing all duplicated rows and handling missing values. 


*   The data types of columns were also changed to appropriate data types for easier analysis.
*   Erroneous data was fixed, including spelling errors, outliers, and inconsistencies. 
*   New columns were added to the dataset for better analysis, and two separate data frames were created for each hotel type to facilitate comparison. 
*   These steps were taken to ensure that the dataset was ready for analysis and to gain insights that could help improve the hotel's operations and increase revenue.





## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Checking most preffered hotel 
# Visualizsing the by pie chart.
df['hotel'].value_counts().plot.pie(explode=[0.05, 0.05], autopct='%1.1f%%', shadow=True, figsize=(10,8),fontsize=20)   
plt.title('Pie Chart for Most Preffered  Hotel')

##### 1. Why did you pick the specific chart?

I Picked pie chart because it is a good way to show the proportion of different categories in a dataset.

##### 2. What is/are the insight(s) found from the chart?

The pie chart shows that City Hotel is more popular than Resort Hotel in the dataset. Specifically, City Hotel accounts for about 66.4% of the dataset, while Resort Hotel only accounts for about 33.6%.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the pie chart showing the proportion of City Hotel and Resort Hotel can potentially help or harm a business depending on the context. If the business owns or manages City Hotel, the finding that City Hotel is more popular than Resort Hotel may suggest that the business should focus on promoting City Hotel more and improving its amenities and services to maintain its popularity. On the other hand, if the business owns or manages Resort Hotel, the finding may suggest that the business should investigate the reasons for Resort Hotel being less popular and consider ways to improve it, such as by offering more amenities, reducing prices, or enhancing the quality of service. In general, the insights gained from the pie chart can inform strategic decisions and actions that align with the business goals and values. However, it's important to be careful not to draw hasty conclusions from the data and to consider the limitations and biases of the dataset.


#### Chart - 2

In [None]:
# Country wise guest count 
guest_country = copy_df[copy_df['is_canceled'] == 0]['country'].value_counts().reset_index()
guest_country.columns = ['Country', 'No of guests']
guest_country

In [None]:
basemap = folium.Map()
ax = px.choropleth(guest_country, locations = guest_country['Country'],
                           color = guest_country['No of guests'], hover_name = guest_country['Country'])
ax.show()

In [None]:
#Top 10 Country wise Guests count
x=copy_df.country.value_counts()
z=sns.countplot(x=copy_df[copy_df['is_canceled'] == 0]['country'], data=copy_df,order=pd.value_counts(copy_df['country']).iloc[:10].index,palette= 'colorblind')
plt.title('Top 10 Countries of Origin of the Guests', weight='bold')
plt.xlabel('Country')
plt.ylabel('Reservation Count')
for p in z.patches:
    z.annotate(str(p.get_height()), (p.get_x() * 1.005, p.get_height() * 1.005))

##### 1. Why did you pick the specific chart?

I picked the This charts because it is a great way to visualize and compare the distribution of categorical data. In this case, we wanted to Check that Where do the guests come from? and From which country most guests come?

##### 2. What is/are the insight(s) found from the chart?

From the chart, we can see that in the Top 10 Countries of Origin of the Guests PRT is on top of the list

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Definatly it will help to creat a positve impact on there business now the hotel menager and oweners have more clear idea that what type of customers need to face every day so acordingley the can menage theare resorces like the waiter and other staff should know theare language.  what kind of food they like and etc.

#### Chart - 3

In [None]:
# Checking total bookings for each year
copy_df['arrival_date_year'].value_counts().sort_index()

In [None]:
# Checking total bookings for each year for both hotel types
copy_df[['arrival_date_year','hotel']].value_counts().unstack()

In [None]:
# Function to show percentage in countplot
def barPerc(copy_df,xVar,ax):

    # 1. how many X categories
    ##   check for NaN and remove
    numX=len([x for x in df[xVar].unique() if x==x])

    # 2. The bars are created in hue order, organize them
    bars = ax.patches
    ## 2a. For each X variable
    for ind in range(numX):
        ## 2b. Get every hue bar
        ##     ex. 8 X categories, 4 hues =>
        ##    [0, 8, 16, 24] are hue bars for 1st X category
        hueBars=bars[ind:][::numX]
        ## 2c. Get the total height (for percentages)
        total = sum([x.get_height() for x in hueBars])

        # 3. Print the percentage on the bars
        for bar in hueBars:
            ax.text(bar.get_x() + bar.get_width()/2.,
                    bar.get_height(),
                    f'{bar.get_height()/total:.0%}',
                    ha="center",va="bottom")

In [None]:
# Plotting count plot to show distribution of booking among Resort Hotels and City Hotels for all three years
plt.figure(figsize=(10,5))
ax=sns.countplot (x= 'arrival_date_year', data= copy_df, hue= 'hotel')
plt.xticks(size=15)
plt.xlabel('Year Of Booking',size=15)
plt.yticks(size=15)
plt.ylabel('count',size=15)
plt.title("Hotel Booking Distribution")
barPerc(copy_df,"arrival_date_year",ax)

##### 1. Why did you pick the specific chart?

this is a countplot form seaborn library which is a type of histogram used to visualize the counts of observations in each categorical bin using bars.Using count plot we can easily visualize and compare distribution of bookings.

##### 2. What is/are the insight(s) found from the chart?

From the graph we can conclude that in 2016 and 2017 city hotels received about 1.5 times more bookings than resort hotels while in 2015 bookings were quite low compared to 2016 and 2017 and also both city and resort hotels received equal no of bookings.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, it will make a positive business impect through this they can analyse that witch year they make more profit and then the can look for the reasons why the make more profit that year. 

#### Chart - 4

In [None]:
# Monthly arrival pattern of Guests
activities = ['August','July','May','October','April','June','September','March','February',"November",'December','January']
plt.figure(figsize=(14,7))
plt.pie(copy_df.arrival_date_month.value_counts(),
labels =activities,
startangle = 90,
shadow = True,
explode =(0.1,0.1,0.1,0,0,0,0,0,0,0,0,0),
autopct ='%1.1f%%')
plt.title('Monthly Guest arrival pattern')

# Print the chart
plt.show()

In [None]:
# Bar plot
sns.countplot(x='arrival_date_month', hue='hotel', data=copy_df);
plt.title('Monthly Guest arrival pattern');

##### 1. Why did you pick the specific chart?

Here in First graph i draw the Pie chart and in second chart the Count plot is drawn to show the explicit view of Monthly guest arrival.

##### 2. What is/are the insight(s) found from the chart?

from pie and count plot chart i found that what is the count of gust arrive in month wise and hotel wise. 

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

As i mentioned there is two graph pie chart and count plot for monthly gust arrival wich show for which month most of the gust come and stay in witch tyoe of hotel City hotel or resort hotel. 

#### Chart - 5

In [None]:
# Checking monthly customer bookings and cancelations
copy_df[['arrival_date_month','is_canceled']].value_counts().unstack()

In [None]:
# Visualizing how many bookings got canceled for each month
plt.figure(figsize=(20,5))
ax=sns.countplot (x= 'arrival_date_month', data= copy_df, hue= 'is_canceled')
plt.xticks(size=15)
plt.xlabel('Month Of Booking',size=15)
plt.yticks(size=15)
plt.ylabel('count',size=15)
plt.title("Monthly Booking cancelations")
barPerc(copy_df,'arrival_date_month',ax)

In [None]:
# Visualizing monthly bookings that were not canceled
plt.figure(figsize=(12,5))
sort_order=['January','February','March','April','May','June','July','August','September','October','November','December']
ax=copy_df[copy_df['is_canceled']==0]['arrival_date_month'].value_counts().reindex(sort_order)
plt.plot(ax,'bo',linestyle='dashed')
for x,y in zip(ax.index,ax.values):

    label = "{:.2f}".format(y)

    plt.annotate(label, # this is the text
                 (x,y), # these are the coordinates to position the label
                 textcoords="offset points", # how to position the text
                 xytext=(0,10), # distance from text to points (x,y)
                 ha='center') # horizontal alignment can be left, right or center

plt.show()

##### 1. Why did you pick the specific chart?

for Checking the monthly customer bookings and cancelations i use countplot and second one is line chart

##### 2. What is/are the insight(s) found from the chart?

From these charts we can conclude that the months of July and August receive the most no. of bookings almost double as compared to January and December. And in other months the bookings remain constant around 5000.


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

it wil definatly create a positive busniness impact through this we can check that which time most of the gust caome this location so we can plan the further this acordingly and also chek what is the reason why gust is canciling the bookings. 

#### Chart - 6

In [None]:
# Checking for how long people stay in Resort hotels
resort_df['total_stay_nights'].value_counts()

In [None]:
# Plotting graph to visualize and also show percentage distribution 
plt.figure(figsize=(24,5))
total = float(len(resort_df))
ax = sns.countplot(x='total_stay_nights', data=resort_df)
plt.title('Distribution Of Stay Duration For Resort Hotels', fontsize=20)
for p in ax.patches:
    percentage = '{:.1f}%'.format(100 * p.get_height()/total)
    x = p.get_x() + p.get_width()
    y = p.get_height()
    ax.annotate(percentage, (x, y),ha='right')
plt.show()

In [None]:
# Checking for how long people stay in City hotels
city_df['total_stay_nights'].value_counts()

In [None]:
# Plotting graph to visualize and also show percentage distribution 
plt.figure(figsize=(24,5))
total = float(len(city_df))
ax = sns.countplot(x='total_stay_nights', data=city_df)
plt.title('Distribution Of Stay Duration For City Hotels', fontsize=20)
for p in ax.patches:
    percentage = '{:.1f}%'.format(100 * p.get_height()/total)
    x = p.get_x() + p.get_width()
    y = p.get_height()
    ax.annotate(percentage, (x, y),ha='right')
plt.show()

##### 1. Why did you pick the specific chart?

For this objective, Countplot has been used because as mentioned above it is best for visualizing the count of observations.We have plotted two graphs, one for Resort Hotels and the other for City Hotels.

##### 2. What is/are the insight(s) found from the chart?

From these we can conclude that in resort hotels most of the  people stayed for a duration of just 1 day. While in case of city hotels the most people stayed for 3 days followed by 2 day and then 1 day.
Duration of stay  reduces drastically after week's stay for  City hotels and reduces drastically after fortnight for resort hotels.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

it will definitely make an businees impact it will help then to know that how much time a gust wil stay is a city hotel or a resort hotel and what type of gust they are like they are staying in our hotel for any kind of work related or they are traveling

#### Chart - 7

In [None]:
# Checking value counts for different meal types
ax=copy_df['meal'].value_counts()
ax

In [None]:
# Visualizing different meal types
plt.figure(figsize=(10,5))
plt.bar(ax.index, ax.values, color=(0.1, 0.1, 0.1, 0.1),  edgecolor='blue')
plt.title("Meal Types and their count")
plt.xlabel('meal')
plt.ylabel("count")

In [None]:
ax=copy_df[['meal','hotel']].value_counts().unstack()
ax

In [None]:
# Comparing preferred meal types for both hotels
plt.figure(figsize=(10,5))
ax=sns.countplot (x= 'hotel', data= copy_df, hue= 'meal')
plt.xticks(size=15)
plt.xlabel('Hotel',size=15)
plt.yticks(size=15)
plt.ylabel('count',size=15)
plt.title("Preferred Meal Types for both hotels")

##### 1. Why did you pick the specific chart?

Here in the first graph, barplot has been used to show the relationship between a numeric and a categorical variable. And the second graph, Countplot has been used to show comparison among preferred meal types between both type of hotels.

##### 2. What is/are the insight(s) found from the chart?


It can be concluded that for BB is the most preferred meal type for both the hotels. Followed by HB for Resort Hotel and SC for City Hotel.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

It will make the busness impact that the hotel manegars can make there food quality more batter so gust want to come again there and it will make ther profit high.

#### Chart - 8

In [None]:
# Most common Distribution channel for booking hotels
group_by_dc = copy_df.groupby('distribution_channel')
plt.figure(figsize=(8,4))
d1 = pd.DataFrame(round((group_by_dc.size()/df.shape[0])*100,2)).reset_index().rename(columns = {0: 'Booking_%'})
plt.figure(figsize = (8,8))
data = d1['Booking_%']
labels = d1['distribution_channel']
plt.pie(x=data, autopct="%.2f%%", explode=[0.05]*5, labels=labels, pctdistance=0.5)
plt.title("Booking % by distribution channels", fontsize=14);

##### 1. Why did you pick the specific chart?

Pie chart has been used here to visualize the objective. it will make the visealizeation easy. 

##### 2. What is/are the insight(s) found from the chart?

So according to pie chart almost 80% of the bookings are done through Travel agents and Tour operators.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

There is immense scope to increase the books through GDS(Global Distribution System ) as it is grossly underutilized with just 0.21% bookings.

#### Chart - 9

In [None]:
# Visualizing comparison in bookings done by different agent
copy_df.loc[copy_df['agent']!=0,'agent'].value_counts().plot(kind='pie',title='Bookings done by agents comparison')
plt.figure(figsize = (20,5))

##### 1. Why did you pick the specific chart?

 Pie Chart has been used to visualize which agent covers how much proportion of bookings.

##### 2. What is/are the insight(s) found from the chart?

It is observed that  agent 240 and 9 are the most valuable agents. Their total bookings combined account for more than 50% of total bookings done by agents

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

It will make a positive busniess impact 

#### Chart - 10

In [None]:
# Distribution of Market Segment based on Deposit Type
plt.figure(figsize = (10,5))
sns.set(style = "whitegrid")
plt.title("Countplot Distribution of Market Segment by Deposit Type", fontdict = {'fontsize':20})
ax = sns.countplot(x = "market_segment", hue = 'deposit_type', data = copy_df)

##### 1. Why did you pick the specific chart?

Here Countplot fron sns i.e seaborn library has been used to display the distribution of Market segment By Deposit Type.


##### 2. What is/are the insight(s) found from the chart?


Most of the bookings are done for "No deposit" by all market segments. Few group bookings and Offline TA/TO Bookings are done through Non refundable deposits too.Refundable deposits are very less.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

It will make an positive business impact there is refundable deposits are verry less so if they incress the refundable money persentage increases.

#### Chart - 11

In [None]:
# calculating average adr
grouped_by_hotel = copy_df.groupby('hotel')
d3 = grouped_by_hotel['adr'].agg(np.mean).reset_index().rename(columns = {'adr':'avg_adr'})   # calculating average adr
plt.figure(figsize = (8,5))
sns.barplot(x = d3['hotel'], y = d3['avg_adr'] )
plt.title(" calculating average adr")
plt.show()

##### 1. Why did you pick the specific chart?

I picked Bar graph its ease to use and visualiaze 

##### 2. What is/are the insight(s) found from the chart?

comparison of both hotels with respect to adr i.e. average daily rate. And from the graph we can conclude that adr is greater in case of City Hotels as compared to Resort Hotels

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

it will create a positive business impact through you can check for the reason why city hotels have more adr. 

#### Chart - 12

In [None]:
# Checking How does the price per night vary over the year 
sns.lineplot(x = "arrival_date_month", y="adr", hue="hotel", data=copy_df, hue_order = ["City Hotel", "Resort Hotel"],palette= 'Set1')
plt.title("Room price per night  over the year", weight = 'bold')
plt.xlabel("Arrival Month")
plt.xticks(rotation=45)
plt.ylabel("ADR [EUR]")
plt.show()

##### 1. Why did you pick the specific chart?

The graph here is Line plot of seaborn library which is used to show how the prices per night vary for both hotel types. 


##### 2. What is/are the insight(s) found from the chart?

From visualization we can observe that there is less variation in prices for city Hotels as compared to Resort Hotels. Prices for Resort Hotel are highest in August and least in November and January. For City hotel, prices are highest in May and least in January.


##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

It will create a positive busniees impact which month most of the hotels decrise there prices.

#### Chart - 13

In [None]:
# Cheacking Effect of non allotment of demanded room on adr
def check_room_allot(x):
  if x['reserved_room_type'] != x['assigned_room_type']:
    return 1
  else:
    return 0
copy_df['same_room_not_alloted'] = copy_df.apply(lambda x : check_room_allot(x), axis = 1)

In [None]:
plt.figure(figsize = (12,6))
sns.boxplot(x = 'same_room_not_alloted', y = copy_df['adr'][:5000], data = copy_df)
plt.title("Effect of not allotment of demanded room on adr")
plt.show()

##### 1. Why did you pick the specific chart?

BoxPlot is used as it is best Graph to display the  Mean value on graph and the changes in the Averages.

##### 2. What is/are the insight(s) found from the chart?

It is seen that effect of not allocating the same room reduces the adr.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

It will did not make any positive business impact. 

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
corr_df_data = copy_df[['lead_time', 'previous_cancellations', 'previous_bookings_not_canceled', 'booking_changes', 'days_in_waiting_list','adr',
          'required_car_parking_spaces','total_of_special_requests','total_stay_nights']]
corr_df=corr_df_data.corr()
plt.figure(figsize=(10,5))
sns.heatmap(corr_df, vmin=-1,annot=True,cmap='coolwarm')

##### 1. Why did you pick the specific chart?

I have used heatmap here, because it is considered one of the best chart to visualize correlation between different variables in a dataframe.

##### 2. What is/are the insight(s) found from the chart?

From this heatmap we can clearly see that the highest correlation value is 0.39.

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code
cols = ['hotel', 'is_canceled', 'lead_time', 'adults', 'children', 'babies']
sns.pairplot(copy_df[cols], hue='hotel')
plt.show()

##### 1. Why did you pick the specific chart?

I have used Pair plot here, because it is also considered one of the best chart to visualize the tings.

##### 2. What is/are the insight(s) found from the chart?

As per the chart the canceled persentage is high for city hotels as compair to resort hotels and most of the adults prefered city hotels.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

My business objective was to increase bookings, decrease cancellations, and increase customer retention while also extending stays. Based on My investigation, some recommendations we came up with are as follows:

*   Additional public marketing can help raise the number of visitors from * certain nations. Even after they depart, more effort can be taken to keep them by keeping in touch with them by personalised emails, phone calls, etc.

*   Agents and market sectors that bring in more clients should also be recognised with awards and incentives.

*   Cancellations had a strong connection with new clients. If new clients are prone to cancel, further efforts should be taken to retain them by providing discounts and offers. Additionally, greater efforts should be made to retain clients as Repeated clients generally cancel less bookings. 

*   Data from each visitor's stay can be used to send them personalized offers to maximize their chances of booking again.

*   Launching customer loyalty programmes to reward loyal customers.

*   The period of each visitor's stay can be utilized to create a clever pricing model that should prolong one‘s stay.

*   Extra efforts should be made to foster positive relationships with clients which can be done by sending them engaging emails, such as "Thank you" and "Happy Holidays",etc.

*   Good customer evaluations can have a significant impact on a hotel's brand value, and it is important to consider customer feedback and reviews in order to improve hotel amenities and the guest experience.

# **Conclusion**

1) Most of the guests are local i.e. from Portugal. Among overseas visitors, European neighbours like United Kingdom, France, Spain, Germany, Italy had maximum footfall. Least number of visitors came from Countries like Zambia, Madagascar, Seychelles, Faroe Islands etc. Hence there is scope to increase footfall from these countries from where number of customers were less.

2) Barring 2015, almost 2/3rd of the bookings were for City Hotels for each year. This may be due to frequent business travellers as well as easy accessibility.

3) Monthly visit pattern shows a "wave like" pattern where months of July & August were heavily visited and November, December, January were least visited. Other months were moderately visited.

4) For all the months, at least 21% of bookings were cancelled. Percentage of cancellation was least in least visited months of Nov , December, January and was most in heavily visited months. Hence there is scope of overbooking for all Heavily visited as well as moderately visited months.

5) Duration of stay reduces drastically after week's stay for City hotels and reduces drastically after fortnight for resort hotels.Resort hotels seem to be visited for stays either on weekend or weekly or fortnightly basis. People use city hotels mostly for short stays(1-4 days). Special customised Packages can be suggested to increase the stay duration.

6) Breakfast is most preferred meal type followed by Breakfast+Dinner and Self catering. Therefore customer experience has to be taken special care during timings of these meals as well as availability of utensils, gas in kitchen or kitchenette where guests can prepare their own food.

7) Almost 80% of the bookings are done through Travel agents and Tour operators.There is immense scope to increase the books through GDS(Global Distribution System ) as it is grossly underutilized with just 0.21% bookings.
 
8) Lion's share of agent bookings is done by 2 agents (Agent 9 & 240 ). Also, 75% of the bookings are done by just 8 agents. Therefore there is immense scope to increase bookings by other agents. Also special care needs to be taken to provide best services to these two agents providing most business.

9) Most of the bookings are done without deposit by all market segments. However some group bookings and Offline TA/TO Bookings are done through Non refundable deposits too. Refundable deposits are miniscule and hence can be ignored.

10) City hotels have higher adr and hence make more revenue per room w.r.t Resort hotels.

11) Price per night is dependent on season.Heavily booked months have highest adr.

12) Not allotment of demanded room lowers the adr i.e. price for room for customers except for few customers who have paid more adr even when they were not allotted the same room. These customers do not seem to cancel their visit plans in spite of different room allotment.

13) Cancellation rate of City hotel is around 30%. Cancellation rate of Resort Hotels is around 20% lesser than that of city hotels.

14) Cancellation shows similar pattern w.r.t. waiting list period as well as lead time. Most of the bookings which were cancelled as well as those not cancelled show similar waiting time as well as lead time values.

15) Cancellations are more for higher average lead times (80 days). For lower average lead times (70 or less) there are no cancellations as people seem to have firm visit plans.

16) Repeated guest tend to cancel less as compared to those visiting for first time. This might be true because of their prior satisfactory experience with the hotel. Effort should be made to retain customers to reduce chance of cancellations too.

17) Most cancellations are done by Online TA followed by Offline TA/TO, Direct groups and corporate. Complementary segment has no cancellations.

18) Guests who did not pay any deposit cancelled most of the bookings. An incentive of reduced adr can be given to customers who give deposit and furthermore reduced adr to those who give non refundable deposits.

19) Resort hotels have slightly more repeat customers as compared to City hotels. Therefore there is ample scope for city hotels to improve their services to increase repeated footfall.

20) Early bookings are done mostly by TA/TO followed by Direct, GDS and corporate. This might be because of visits by corporate at relatively shorter notice.

21) Among distribution channels, average waiting time for TA/TO is highest followed by Direct and corporate. Waiting time for GDS is negligible. This might be because of no. of bookings itself for GDS are very less, hence waiting time is less too.

22) Adr for city hotels have been increasing every year. For resort hotel price dipped in 2016 but bounced back and increased appreciably in the succeeding year.

23) City hotel has higher cancellation rate as itineraries of the customers visiting seem to change frequently leading to more cancellations.


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***