# Exploring the dataset & Identifying the different factors that affect hotel reservation

Our goal is to identify the different factors that affect the reservation and thus revenues of the two different hotel types:

•	Explore the dataset to find how different variables like agents, booking channel, lead time etc. contribute to the reservation of City and Resort Hotel.

•	Visualize how the reservation rates of City and Resort Hotel differ from each other.

Dataset Source: https://www.kaggle.com/jessemostipak/hotel-booking-demand

Original Source:  Hotel Booking Demand Datasets, written by Nuno Antonio, Ana Almeida, and Luis Nunes for Data in Brief, Volume 22, February 2019. 

https://www.sciencedirect.com/science/article/pii/S2352340918315191



#### Describe important characteristics of the data
- The booking situation about different hotels?(How many bookings for each type?)
- What's the average level of cancellation（for different types of hotels and different countries).
- Time range related to booking and cancellation (What are the peak periods for cancellations and bookings?).
- Lead time related to hotel booking.
- What are the average daily rates(ADR) of the two types of Hotels over the three-year period?

#### Focus on the cancellation
- Is lead_time related to cancellation?
- Does is_repeated_guest affect the probability of cancellation?

#### Something more about customers and customer behavior
- What's the composition of the customers?
- How about customer behaviors( lead_time, special requests, meals, parking sapces)--descriptive
- Would customers' request be different if travel with children?



# A glance at the data set:

In [13]:
%%bigquery 
--project ba775-team-6
SELECT *
FROM `ba-775-project-team6.HotelBooking.hotelbooking`
LIMIT 5;

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 507.54query/s]                          
Downloading: 100%|██████████| 5/5 [00:01<00:00,  3.35rows/s]


Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,4,2015,November,45,2,1,1,1,...,No Deposit,1,,0,Transient,25.0,0,0,Check-Out,2015-11-04
1,Resort Hotel,0,5,2015,November,49,30,1,3,1,...,No Deposit,1,,0,Transient,25.0,0,0,Check-Out,2015-12-04
2,Resort Hotel,0,11,2016,January,2,3,2,1,2,...,No Deposit,1,,0,Transient,39.0,1,1,Check-Out,2016-01-06
3,City Hotel,0,3,2015,July,27,2,0,3,1,...,No Deposit,1,,0,Transient-Party,58.67,0,0,Check-Out,2015-07-05
4,City Hotel,0,43,2015,July,27,3,0,2,2,...,No Deposit,1,,0,Transient-Party,86.0,0,0,Check-Out,2015-07-05


In [32]:
%%bigquery
SELECT COUNT(*) AS Total_Sample_Number FROM `ba-775-project-team6.HotelBooking.hotelbooking`

Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 1164.28query/s]                        
Downloading: 100%|██████████| 1/1 [00:01<00:00,  1.60s/rows]


Unnamed: 0,Total_Sample_Number
0,119390


The dataset contains 32 columns and 119390 rows. Including important variables like is_canceled(Whether this booking case is canceled), lead_time(How many days is this reservation made in advance), is_repeated_guest(Whether this customer is a repeated guest). The cases are collected from all over the world, and divided into two hotel types: resort hotel and city hotel.

# Describe important characteristics of the data

### How many booking cases and cancellations do different types of hotels have? 
### What's the cancellation rate of the two types of hotels?

In [78]:
%%bigquery
SELECT 
hotel AS Hotel_Type,
COUNT(*) AS Total_Number_of_Booking,
SUM(is_canceled) AS Total_Number_of_Cancellation,
AVG(is_canceled) AS Cancellation_Rate
FROM `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY hotel

Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 987.13query/s]                         
Downloading: 100%|██████████| 2/2 [00:01<00:00,  1.77rows/s]


Unnamed: 0,Hotel_Type,Total_Number_of_Booking,Total_Number_of_Cancellation,Cancellation_Rate
0,Resort Hotel,40060,11122,0.277634
1,City Hotel,79330,33102,0.41727


The cancellation rate of Resort Hotel is 0.277634, and the cancellation rate of City Hotel is 0.417270.

### Identify the cancellation rate for countres having the most booking cases.

In [81]:
%%bigquery  
SELECT
  country,
  avg(is_canceled) as cancellation_rate
FROM
  `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY
  country
ORDER BY
  count(*) DESC
LIMIT
  5


Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1727.95query/s]                        
Downloading: 100%|██████████| 5/5 [00:01<00:00,  2.90rows/s]


Unnamed: 0,country,cancellation_rate
0,PRT,0.566351
1,GBR,0.202243
2,FRA,0.185694
3,ESP,0.254085
4,DEU,0.167147


The cancellation rates if the top five cities which have the most booking cases are: PRT 0.566351, GBR 0.202243,FRA 0.185694,ESP 0.254085,DEU 0.167147.(Ordered by the cancellation rate in decending order.)

![](https://2615d6a73e1c8ea7-dot-us-west1.notebooks.googleusercontent.com/files/Project/Screenshot%202021-08-29%20at%2012.10.33%20PM.png?_xsrf=2%7C97c0573c%7C9ee4eeeb0f3089fb46944e66596858da%7C1629735722)


### What are the peak periods for cancellations and bookings?

In [83]:
%%bigquery
SELECT Month,Number_of_Booking,
ROUND(Number_of_Booking/Total_Booking,3) AS Proportion_of_Booking,
Number_of_Cancellation,
Round(Number_of_Cancellation/Number_of_Booking,3) AS Cancellation_Rate
FROM 
(SELECT arrival_date_month AS Month,
COUNT(*) AS Number_of_Booking,
SUM(is_canceled) Number_of_Cancellation,
 (SELECT COUNT(*) FROM `ba-775-project-team6.HotelBooking.hotelbooking`) AS Total_Booking
FROM `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY arrival_date_month)
ORDER BY PARSE_DATE('%b',Month)


Query complete after 0.00s: 100%|██████████| 5/5 [00:00<00:00, 2750.72query/s]                        
Downloading: 100%|██████████| 12/12 [00:01<00:00,  9.19rows/s]


Unnamed: 0,Month,Number_of_Booking,Proportion_of_Booking,Number_of_Cancellation,Cancellation_Rate
0,January,5929,0.05,1807,0.305
1,February,8068,0.068,2696,0.334
2,March,9794,0.082,3149,0.322
3,April,11089,0.093,4524,0.408
4,May,11791,0.099,4677,0.397
5,June,10939,0.092,4535,0.415
6,July,12661,0.106,4742,0.375
7,August,13877,0.116,5239,0.378
8,September,10508,0.088,4116,0.392
9,October,11160,0.093,4246,0.38


#### a) For a booking case

In [91]:
%%bigquery
SELECT Month,Number_of_Booking,
ROUND(Number_of_Booking/Total_Booking,3) AS Proportion_of_Booking,
Number_of_Cancellation,
Round(Number_of_Cancellation/Number_of_Booking,3) AS Cancellation_Rate
FROM 
(SELECT arrival_date_month AS Month,
COUNT(*) AS Number_of_Booking,
SUM(is_canceled) Number_of_Cancellation,
 (SELECT COUNT(*) FROM `ba-775-project-team6.HotelBooking.hotelbooking`) AS Total_Booking
FROM `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY arrival_date_month)
ORDER BY Number_of_Booking DESC
LIMIT 3

Query complete after 0.00s: 100%|██████████| 5/5 [00:00<00:00, 2824.45query/s]                        
Downloading: 100%|██████████| 3/3 [00:01<00:00,  1.94rows/s]


Unnamed: 0,Month,Number_of_Booking,Proportion_of_Booking,Number_of_Cancellation,Cancellation_Rate
0,August,13877,0.116,5239,0.378
1,July,12661,0.106,4742,0.375
2,May,11791,0.099,4677,0.397


![](https://2615d6a73e1c8ea7-dot-us-west1.notebooks.googleusercontent.com/files/Project/Screenshot%202021-08-29%20at%2012.31.37%20PM.png?_xsrf=2%7C97c0573c%7C9ee4eeeb0f3089fb46944e66596858da%7C1629735722)

The month with the highest number and highest proportion of booking is August.

#### b) For a cancellation

In [92]:
%%bigquery
SELECT Month,Number_of_Booking,
ROUND(Number_of_Booking/Total_Booking,3) AS Proportion_of_Booking,
Number_of_Cancellation,
Round(Number_of_Cancellation/Number_of_Booking,3) AS Cancellation_Rate
FROM 
(SELECT arrival_date_month AS Month,
COUNT(*) AS Number_of_Booking,
SUM(is_canceled) Number_of_Cancellation,
 (SELECT COUNT(*) FROM `ba-775-project-team6.HotelBooking.hotelbooking`) AS Total_Booking
FROM `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY arrival_date_month)
ORDER BY Cancellation_Rate DESC
LIMIT 3

Query complete after 0.00s: 100%|██████████| 5/5 [00:00<00:00, 2626.04query/s]                        
Downloading: 100%|██████████| 3/3 [00:01<00:00,  1.72rows/s]


Unnamed: 0,Month,Number_of_Booking,Proportion_of_Booking,Number_of_Cancellation,Cancellation_Rate
0,June,10939,0.092,4535,0.415
1,April,11089,0.093,4524,0.408
2,May,11791,0.099,4677,0.397


![](https://2615d6a73e1c8ea7-dot-us-west1.notebooks.googleusercontent.com/files/Project/Screenshot%202021-08-29%20at%2012.33.52%20PM.png?_xsrf=2%7C97c0573c%7C9ee4eeeb0f3089fb46944e66596858da%7C1629735722)

The month with the highest rate of cancellation is June.

### How many days will the hotel be booked in advance(lead_time)？
### Is there any difference for different hotel types?

In [1]:
%%bigquery 
SELECT hotel,avg(lead_time)as avg_lead_time FROM `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY hotel


Query complete after 0.01s: 100%|██████████| 2/2 [00:00<00:00, 1184.83query/s]                        
Downloading: 100%|██████████| 2/2 [00:01<00:00,  1.44rows/s]


Unnamed: 0,hotel,avg_lead_time
0,Resort Hotel,92.675686
1,City Hotel,109.735724


![](https://2615d6a73e1c8ea7-dot-us-west1.notebooks.googleusercontent.com/files/Project/Screenshot%202021-08-29%20at%201.29.20%20PM.png?_xsrf=2%7C97c0573c%7C9ee4eeeb0f3089fb46944e66596858da%7C1629735722)

#### Attention:
The average lead time for Resort Hotel is 92.675686 (days), and the average lead time for City Hotel is 109.735724 (days). But that's not the whole picture. The average lead_time is so high is not because people always book the hotel nearly 100 days in advance. It is mainly the result of outliers. Let's see the next two query.

In [21]:
%%bigquery
SELECT
lead_weeks AS Weeks_in_advance,
COUNT(*) AS Number_of_Booking,
COUNT(*)/(SELECT COUNT(*) FROM `ba-775-project-team6.HotelBooking.hotelbooking`) AS Proportion_of_Booking
FROM
(SELECT ROUND(lead_time/7)+1 AS lead_weeks
FROM `ba-775-project-team6.HotelBooking.hotelbooking`)
GROUP BY lead_weeks
ORDER BY lead_weeks
LIMIT 5

Query complete after 0.00s: 100%|██████████| 5/5 [00:00<00:00, 2629.66query/s]                        
Downloading: 100%|██████████| 5/5 [00:01<00:00,  3.22rows/s]


Unnamed: 0,Weeks_in_advance,Number_of_Booking,Proportion_of_Booking
0,1.0,13690,0.114666
1,2.0,9162,0.07674
2,3.0,6582,0.05513
3,4.0,5108,0.042784
4,5.0,4849,0.040615


In [24]:
%%bigquery
SELECT
lead_weeks AS Weeks_in_advance,
COUNT(*) AS Number_of_Booking,
COUNT(*)/(SELECT COUNT(*) FROM `ba-775-project-team6.HotelBooking.hotelbooking`) AS Proportion_of_Booking
FROM
(SELECT ROUND(lead_time/7)+1 AS lead_weeks
FROM `ba-775-project-team6.HotelBooking.hotelbooking`)
GROUP BY lead_weeks
ORDER BY lead_weeks DESC
LIMIT 5

Query complete after 0.00s: 100%|██████████| 5/5 [00:00<00:00, 2920.82query/s]                        
Downloading: 100%|██████████| 5/5 [00:01<00:00,  3.90rows/s]


Unnamed: 0,Weeks_in_advance,Number_of_Booking,Proportion_of_Booking
0,106.0,1,8e-06
1,102.0,1,8e-06
2,91.0,17,0.000142
3,90.0,47,0.000394
4,89.0,17,0.000142


From the above results, we can see that most people don't make reservation so much in advance, but a few people make it too early, which increases the average number of days in advance.

### What are the average daily rates of the two types of Hotels over the three-year period?

Average Daily Rate (ADR) - Calculated by dividing the sum of all lodging transactions by the total number of staying nights

In [25]:
%%bigquery

SELECT hotel, AVG(adr) AS Avg_Dailyrate, arrival_date_year
FROM
  `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY
  arrival_date_year, hotel
ORDER BY 
  arrival_date_year, hotel;


Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1236.77query/s]                        
Downloading: 100%|██████████| 6/6 [00:01<00:00,  4.32rows/s]


Unnamed: 0,hotel,Avg_Dailyrate,arrival_date_year
0,City Hotel,85.856915,2015
1,Resort Hotel,89.353417,2015
2,City Hotel,103.483683,2016
3,Resort Hotel,87.730762,2016
4,City Hotel,117.501864,2017
5,Resort Hotel,108.660217,2017


The average daily rate in year 2015, 2016, 2017 are: City Hotel = 85.856915 Resort Hotel = 89.353417 in 2015, City Hotel = 103.483683 Resort Hotel = 87.730762 in 2016, City Hotel = 117.501864 Resort Hotel = 108.660217 in 2017.

# Focus on the cancellation

### If a reserviation is made too early, would it affect the possibility of cancellation?

### Method 1

In [97]:
%%bigquery
SELECT
is_canceled As Canceled_or_Not,
AVG(lead_time) AS Days_in_Advance
FROM  `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY is_canceled

Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 984.00query/s]                         
Downloading: 100%|██████████| 2/2 [00:01<00:00,  1.34rows/s]


Unnamed: 0,Canceled_or_Not,Days_in_Advance
0,0,79.984687
1,1,144.848815


The average days in advance for bookings not canceled is 79.984687, the average days in advance for bookings canceled is 144.848815.
Let's go further in next query.

### Method 2

In [104]:
%%bigquery
SELECT
lead_weeks AS Weeks_in_advance,
AVG(is_canceled) AS Cancellation_Rate
FROM
(SELECT 
is_canceled,
lead_time,
CASE WHEN lead_time<=7 THEN 'One week'
WHEN lead_time<=14 THEN 'Two weeks'
WHEN lead_time<=21 THEN 'Three weeks'
ELSE 'More than three weeks' END AS lead_weeks
FROM `ba-775-project-team6.HotelBooking.hotelbooking`)
GROUP BY lead_weeks
ORDER BY Cancellation_Rate

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1630.33query/s]                        
Downloading: 100%|██████████| 4/4 [00:01<00:00,  2.27rows/s]


Unnamed: 0,Weeks_in_advance,Cancellation_Rate
0,One week,0.096323
1,Two weeks,0.22004
2,Three weeks,0.286881
3,More than three weeks,0.450422


![](https://2615d6a73e1c8ea7-dot-us-west1.notebooks.googleusercontent.com/files/Project/Screenshot%202021-08-30%20at%205.35.22%20PM.png?_xsrf=2%7C97c0573c%7C9ee4eeeb0f3089fb46944e66596858da%7C1629735722)

The cancellation rate goes up with the increase in number of weeks in advance. The result coincides with our common behaviour.
For bookings with one week in advance the cancellation rate is 0.096323, two weeks in advance is 0.220040 and three weeks in advance is 0.286881, more than three weeks in advance is 0.450422. 

### Does change of rooom type related to cancellation?

In [105]:
%%bigquery
SELECT
Change_of_room_type,
AVG(is_canceled) AS Cancellation_Rate
FROM
(SELECT
is_canceled,
CASE WHEN reserved_room_type=assigned_room_type THEN 0
WHEN reserved_room_type!=assigned_room_type THEN 1 END AS Change_of_room_type
FROM `ba-775-project-team6.HotelBooking.hotelbooking`)
GROUP BY Change_of_room_type

Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 1055.44query/s]                        
Downloading: 100%|██████████| 2/2 [00:01<00:00,  1.07rows/s]


Unnamed: 0,Change_of_room_type,Cancellation_Rate
0,0,0.415629
1,1,0.053764


The cancellation rate for bookings with out room type change is 0.415629, the cancellation rate for bookings with room type change is 0.053764. This is strange, but considering that the change of room type does not mean the reduction of quality, hotels often only improve the guest room level for free. In this case, guests will not cancel their reservation.

### If a customer is a repeated guest, would it affect the probability of cancellation?

In [111]:
%%bigquery
SELECT
is_repeated_guest,
AVG(is_canceled) AS Cancellation_Rate
FROM `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY is_repeated_guest

Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 1053.71query/s]                        
Downloading: 100%|██████████| 2/2 [00:01<00:00,  1.33rows/s]


Unnamed: 0,is_repeated_guest,Cancellation_Rate
0,0,0.377851
1,1,0.144882


A repeated guest tend to hold the booking, since they have booked for the second time, they are naturally satisfied with the hotel environment and price.
The cancellation rate for not repeated guest is 0.377851, the cancellation rate for repeated guest is 0.144882.

# Something more about customers and customer behavior

### Customer composition: What are the different types of customers who make reservations?

In [4]:

%%bigquery
SELECT 
DISTINCT customer_type, 
CASE
WHEN customer_type = 'Transient' THEN ' The booking is not part of a group'
WHEN customer_type = 'Transient-Party' THEN ' The booking is associated to other booking'
WHEN customer_type = 'Contract' THEN 'The booking has a contract associated to it'
WHEN customer_type = 'Group' THEN 'The booking is associated to a group' END AS Description,
COUNT(*) AS Number_of_Booking
FROM
  `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY
  customer_type
ORDER BY
  customer_type;

Query complete after 0.00s: 100%|██████████| 4/4 [00:00<00:00, 1366.22query/s]                        
Downloading: 100%|██████████| 4/4 [00:01<00:00,  2.98rows/s]


Unnamed: 0,customer_type,Description,Number_of_Booking
0,Contract,The booking has a contract associated to it,4076
1,Group,The booking is associated to a group,577
2,Transient,The booking is not part of a group,89613
3,Transient-Party,The booking is associated to other booking,25124


The types of customers are: Contract, Group, Transient, Transient-Party.

### What are the different types of Distribution Channels available?

In [28]:
%%bigquery

SELECT
  distribution_channel
FROM
  `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY
  distribution_channel;

Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 1173.89query/s]                        
Downloading: 100%|██████████| 5/5 [00:01<00:00,  3.18rows/s]


Unnamed: 0,distribution_channel
0,TA/TO
1,Direct
2,Corporate
3,GDS
4,Undefined


The types of distribution channels available are: Travel Agents/Tour Operators, Direct, Corporate, GDS, and Undefined.

### Total cancellations of two hotel types with respect to the different distribution channels

In [6]:
%%bigquery

SELECT
  hotel, distribution_channel, SUM(is_canceled) AS Total_Cancellations, AVG(is_canceled) AS Cancellation_Rate
FROM
  `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY
  hotel, distribution_channel
ORDER BY 
  distribution_channel, hotel ;

Query complete after 0.00s: 100%|██████████| 3/3 [00:00<00:00, 1825.73query/s]                        
Downloading: 100%|██████████| 9/9 [00:01<00:00,  7.56rows/s]


Unnamed: 0,hotel,distribution_channel,Total_Cancellations,Cancellation_Rate
0,City Hotel,Corporate,786,0.230634
1,Resort Hotel,Corporate,688,0.210462
2,City Hotel,Direct,1232,0.181711
3,Resort Hotel,Direct,1325,0.168468
4,City Hotel,GDS,37,0.19171
5,City Hotel,TA/TO,31043,0.450257
6,Resort Hotel,TA/TO,9109,0.314918
7,City Hotel,Undefined,4,1.0
8,Resort Hotel,Undefined,0,0.0


![](https://2615d6a73e1c8ea7-dot-us-west1.notebooks.googleusercontent.com/files/Project/Screenshot%202021-08-29%20at%2011.57.58%20AM.png?_xsrf=2%7C97c0573c%7C9ee4eeeb0f3089fb46944e66596858da%7C1629735722)

The Travel Agents/Tour Operators distribution has the highest cancellation rate for Resort and City Hotel with 0.450257 and 0.314918 respectively.

### If travel with children, would the family tend to drive a car?

In [39]:
%%bigquery
SELECT
  (CASE WHEN children>0 Then 'With_child'
WHEN children=0 Then 'Not_with_child' END) AS With_child_or_not,
   ROUND(AVG(required_car_parking_spaces),3) AS Posibility_of_Driving
FROM
  `ba-775-project-team6.HotelBooking.hotelbooking`
GROUP BY 
With_child_or_not

Query complete after 0.00s: 100%|██████████| 2/2 [00:00<00:00, 1049.36query/s]                        
Downloading: 100%|██████████| 2/2 [00:01<00:00,  1.38rows/s]


Unnamed: 0,With_child_or_not,Posibility_of_Driving
0,Not_with_child,0.058
1,With_child,0.116


The results show that if guests travel with their children, they are more likely to drive and ask for reserved parking spaces.

![](https://2615d6a73e1c8ea7-dot-us-west1.notebooks.googleusercontent.com/files/Project/Screenshot%202021-08-30%20at%205.58.25%20PM.png?_xsrf=2%7C97c0573c%7C9ee4eeeb0f3089fb46944e66596858da%7C1629735722)

![](https://2615d6a73e1c8ea7-dot-us-west1.notebooks.googleusercontent.com/files/Project/Screenshot%202021-08-30%20at%205.59.02%20PM.png?_xsrf=2%7C97c0573c%7C9ee4eeeb0f3089fb46944e66596858da%7C1629735722)

# Conclusion

This dataset consists of hotel booking cases that are collected from all over the world, and divided into two hotel types: resort hotel and city hotel. We analyzed this dataset to determine the different variables that affected the booking and cancellation rate of the hotels.The variables that had the most impact on the hotel booking rate are lead time, repeated customers and peak season. 

During the peak season from June to August, people tend to book more hotels. The intended stay period is in August, which was indicated by the huge volume of bookings for August. People tend to book for the August season couple of months in advance and are more likely to change their plans as observed from the higher cancellation rate in June.

The next variable that impacts the cancellation rate of hotels is the lead time of booking. We observed from the dataset that a higher lead time leads to a tremendous increase in cancellation rate. For bookings with more than three weeks of lead time, the cancellation rates are almost 5 times than that of bookings with a lead time of 1 week.

The next variable that we investigated was the impact of a room change on the cancellation rate. We expected to see an increase in the cancellation rate when the hotel changed the room type of the customers. But the result was contradictory to our expectations. The cancellation rate of the people who didn't receive a room change was almost 8 times more than that of people who received a room change. This implies that the change in room type didn't compromise on the quality of the rooms and it still met the expectations of the majority of the customers. Additionally, this room change might have been an upgrade for some and hence the very low cancellation rate. We then investigated the booking behaviour of repeated customers. We found that repeated customers tend to be more satisfied with the hotel and have a lower caancellation rate. 
