<a href="https://colab.research.google.com/github/ish-war/Hotel-Booking-Analysis/blob/main/Hotel_Booking_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **🏩Hotel booking analysis 🏨**

Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions! This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data. Explore and analyse the data to discover important factors that govern the bookings.



#**Objective 👽**
This EDA capstone project aims to extract meaningful insights from hotel booking data to improve decision-making in the hospitality industry. We will analyze booking trends, cancellations, pricing strategies, and customer preferences to provide actionable recommendations for optimizing occupancy rates, revenue, and customer satisfaction. 😀

# **Methodology 🙂**

In this project, We used Python's most useful libraries like Pandas, Matplotlib, Seaborn etc. to examine, cleaning and to analyse the “Hotel Booking Analysis” dataset

In [15]:
# first step - let's import necessary tools/libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# let's upload the dataset
# from google.colab import files
# uploaded = files.upload()

In [16]:
# load the csv file
df = pd.read_csv(r'/content/Hotel Bookings.csv')

In [17]:
# before cleaning the data total number of rows and columns in data
print(f"We have total {df.shape[0]} rows and {df.shape[1]} columns")

We have total 119390 rows and 32 columns


## **let's take a look at some rows and columns in the given dataset ✅**

In [19]:
# checking first 5 rows of data
df.head()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.0,0,0,Check-Out,2015-07-02
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.0,0,0,Check-Out,2015-07-02
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.0,0,1,Check-Out,2015-07-03


In [18]:
# checking last 5 rows of data
df.tail()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
119385,City Hotel,0,23,2017,August,35,30,2,5,2,...,No Deposit,394.0,,0,Transient,96.14,0,0,Check-Out,2017-09-06
119386,City Hotel,0,102,2017,August,35,31,2,5,3,...,No Deposit,9.0,,0,Transient,225.43,0,2,Check-Out,2017-09-07
119387,City Hotel,0,34,2017,August,35,31,2,5,2,...,No Deposit,9.0,,0,Transient,157.71,0,4,Check-Out,2017-09-07
119388,City Hotel,0,109,2017,August,35,31,2,5,2,...,No Deposit,89.0,,0,Transient,104.4,0,0,Check-Out,2017-09-07
119389,City Hotel,0,205,2017,August,35,29,2,7,2,...,No Deposit,9.0,,0,Transient,151.2,0,2,Check-Out,2017-09-07


## **Dataset**
This dataset contains information on records for client stays at hotels. More specifically, it contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the guest arecoming for first time, number of days in waiting list, among other things. For the purpose of this post, We only focused on some of these variables to examine.


In [None]:
# checking the basic information of given dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119390 entries, 0 to 119389
Data columns (total 32 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   hotel                           119390 non-null  object 
 1   is_canceled                     119390 non-null  int64  
 2   lead_time                       119390 non-null  int64  
 3   arrival_date_year               119390 non-null  int64  
 4   arrival_date_month              119390 non-null  object 
 5   arrival_date_week_number        119390 non-null  int64  
 6   arrival_date_day_of_month       119390 non-null  int64  
 7   stays_in_weekend_nights         119390 non-null  int64  
 8   stays_in_week_nights            119390 non-null  int64  
 9   adults                          119390 non-null  int64  
 10  children                        119386 non-null  float64
 11  babies                          119390 non-null  int64  
 12  meal            

✅ the .info() method provided by panda gives us information about the data types of each column, number of non-null values, and memory usage. ✅

In [None]:
# Investigating the summary statistics to understand the data's distribution.
# here we are rounding the decimal value at 2 for better understanding
df.describe(include = 'all').round(2)


Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
count,119390,119390.0,119390.0,119390.0,119390,119390.0,119390.0,119390.0,119390.0,119390.0,...,119390,103050.0,6797.0,119390.0,119390,119390.0,119390.0,119390.0,119390,119390
unique,2,,,,12,,,,,,...,3,,,,4,,,,3,926
top,City Hotel,,,,August,,,,,,...,No Deposit,,,,Transient,,,,Check-Out,2015-10-21
freq,79330,,,,13877,,,,,,...,104641,,,,89613,,,,75166,1461
mean,,0.37,104.01,2016.16,,27.17,15.8,0.93,2.5,1.86,...,,86.69,189.27,2.32,,101.83,0.06,0.57,,
std,,0.48,106.86,0.71,,13.61,8.78,1.0,1.91,0.58,...,,110.77,131.66,17.59,,50.54,0.25,0.79,,
min,,0.0,0.0,2015.0,,1.0,1.0,0.0,0.0,0.0,...,,1.0,6.0,0.0,,-6.38,0.0,0.0,,
25%,,0.0,18.0,2016.0,,16.0,8.0,0.0,1.0,2.0,...,,9.0,62.0,0.0,,69.29,0.0,0.0,,
50%,,0.0,69.0,2016.0,,28.0,16.0,1.0,2.0,2.0,...,,14.0,179.0,0.0,,94.58,0.0,0.0,,
75%,,1.0,160.0,2017.0,,38.0,23.0,2.0,3.0,2.0,...,,229.0,270.0,0.0,,126.0,0.0,1.0,,


Here, we see 32 columns in the dataframe and some columns like 'adults', 'babies', 'agent' have null values. ✅

## Variables Description



*  Hotel: Type of hotel(City or Resort)
*  is_cancelled: If the booking was cancelled(1) or not(0)
* lead_time: Number of days before the actual arrival of the guests
* arrival_date_year: Year of arrival date
* arrival_date_month: Month of arrival date
* arrival_date_week_number: Week number of year for arrival date
* arrival_date_day_of_month: Day of arrival date
* stays_in_weekend_nights: Number of weekend nights(Saturday or Sunday) spent at the hotel by the guests.
* stays_in_weel_nights: Number of weeknights(Monday to Friday) spent at the hotel by the guests.
* adults: Number of adults among the guests
* children: Number of children
* babies: Number of babies
* meal: Type of meal booked
* country: country of the guests
* market_segment: Designation of market segment
* distribution_channel: Name of booking distribution channel
* is_repeated_guest: If the booking was from a repeated guest(1) or not(0)
* previous_cancellation: Number of previous bookings that were cancelled by the customer prior to the current booking
* previous_bookings_not_cancelled: Number of previous bookins not cancelled by the customer prior to the current bookin
* reserved_room_type: Code from room type reserved
* assigned_room_type: Code of room type assigned
* booking_changes: Number of changes made to the booking
* deposit_type: Type of deposite made by the guest
* agent: ID of travel agent who made the booking
* comapny: ID of the company that made the booking
* days_in_waiting_list: Number of the days the booking was in the waiting list
* customer_type: Type of customer, assuming one of four categories
* adr: Average daily rate
* required_car_parking_spaces: Number of car parking spaces required bt the customer
* total_of_special_requesrs: Number of special requests made by the customer
* reservation_statuse: Reservation status(Canceled, check-out or no-show)
* reservation_status_date: Date at which the last reservation status was updated
