 # **Project Name** - **Booking.com - Hotel Booking Analysis**

##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Problem Statement**

##### Booking.com manages a wide range of hotel bookings across various locations, customer profiles, and channels. The dataset includes key details like booking lead times, room types, customer preferences, and reservation statuses. The challenge is to analyze how factors like booking windows, guest demographics, and reservation types affect hotel performance and customer experience. By leveraging these insights, Booking.com aims to **optimize booking efficiency, predict guest needs, and improve customer satisfaction, ensuring a competitive edge in the dynamic hospitality industry**

# **General Guidelines :**

#***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [188]:
df = pd.read_csv('/content/drive/MyDrive/Local Disk (D:)/Data Analyst/Project_Root/Data/Raw/Hotel Bookings.csv')

### Dataset First View

In [4]:
df.head()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.0,0,0,Check-Out,2015-07-02
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.0,0,0,Check-Out,2015-07-02
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.0,0,1,Check-Out,2015-07-03


### Dataset Rows & Columns count

In [84]:
df.shape

(119390, 32)

### Dataset Information

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119390 entries, 0 to 119389
Data columns (total 32 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   hotel                           119390 non-null  object 
 1   is_canceled                     119390 non-null  int64  
 2   lead_time                       119390 non-null  int64  
 3   arrival_date_year               119390 non-null  int64  
 4   arrival_date_month              119390 non-null  object 
 5   arrival_date_week_number        119390 non-null  int64  
 6   arrival_date_day_of_month       119390 non-null  int64  
 7   stays_in_weekend_nights         119390 non-null  int64  
 8   stays_in_week_nights            119390 non-null  int64  
 9   adults                          119390 non-null  int64  
 10  children                        119386 non-null  float64
 11  babies                          119390 non-null  int64  
 12  meal            

In [77]:
null_values = df.isnull().sum()
columns_with_nulls = null_values[null_values > 0]
print(columns_with_nulls)

children         4
country        488
agent        16340
company     112593
dtype: int64


### What did you know about your dataset?

In [209]:
!git status
!git add .
!git commit -m "Dataset loaded"

On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	[32mnew file:   Data/Raw/Hotel Bookings.csv[m
	[32mmodified:   Notebooks/Booking.com - Hotel Booking Analysis.ipynb[m

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   Notebooks/Booking.com - Hotel Booking Analysis.ipynb[m

[master 854c8fa] Dataset loaded
 2 files changed, 119392 insertions(+), 1 deletion(-)
 create mode 100644 Data/Raw/Hotel Bookings.csv
 rewrite Notebooks/Booking.com - Hotel Booking Analysis.ipynb (81%)


##### Booking offers two types of hostels across 177 countries. The dataset contains 119390 rows and 32 columns. Four columns—children, country, agent, and company—have missing/NaN values, with the company column having the highest number of NaN values.

## ***2. Understanding Your Variables***

In [87]:
df.columns

Index(['hotel', 'is_canceled', 'lead_time', 'arrival_date_year',
       'arrival_date_month', 'arrival_date_week_number',
       'arrival_date_day_of_month', 'stays_in_weekend_nights',
       'stays_in_week_nights', 'adults', 'children', 'babies', 'meal',
       'country', 'market_segment', 'distribution_channel',
       'is_repeated_guest', 'previous_cancellations',
       'previous_bookings_not_canceled', 'reserved_room_type',
       'assigned_room_type', 'booking_changes', 'deposit_type', 'agent',
       'company', 'days_in_waiting_list', 'customer_type', 'adr',
       'required_car_parking_spaces', 'total_of_special_requests',
       'reservation_status', 'reservation_status_date'],
      dtype='object')

#### Variables Description

* **hotel** : hotel type (City Hotel, Resort Hotel)
* **is_canceled :** Booking canceled(1) and if not canceled=(0)
* **lead_time :** Elapsed days between entering date of booking and arrival date
* **arrival_date_year :** Year of arrival
* **arrival_date_month :** Month of arrival
* **arrival_date_week_number :** Week number of arrival date
* **arrival_date_day_of_month :** Date of arrival
* **stays_in_weekend_nights :** Number of weekend night (Saturday or Sunday) the guest stayed or booked in the hotel
* **stays_in_week_nights :** Number of weekday night (Monday to Friday) the guest stayed or booked in the hotel
* **adults :** Number of adults
* **children :** Number of children
* **babies :** Number of babies
* **meal :** Kind of mean
* **country :** Country code
* **market_segment :** Which segment customer belong to
* **distribution_channel :** How customer assessed stay - corporate booking/Direct/TA.TO
* **is_repeated_guest :** If guest coming first time(0) else (1)
* **previous_cancellations :** Was there cancelation before
* **previous_bookings_not_canceled :** Previous bookings was not canceled
* **reserved_room_type :** Types of room reserved
* **assigned_room_type :** Type of room assined to guest
* **booking_changes :** If there was any change made in booking
* **deposit_type :** Type of deposit - ['No Deposit', 'Refundable', 'Non Refund']
* **agent :** Booking through agent
* **company :** Company Name
* **days_in_waiting_list :** Number of days in waitlist
* **customer_type :** Type of customer
* **adr :** Average daily rate
* **required_car_parking_spaces :** If car parking is required
* **total_of_special_requests :** Number of special requirement
* **reservation_status :** Reservation status
* **reservation_status_date :** Reservation status date

In [92]:
df.describe(include='all')

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
count,119390,119390.0,119390.0,119390.0,119390,119390.0,119390.0,119390.0,119390.0,119390.0,...,119390,103050.0,6797.0,119390.0,119390,119390.0,119390.0,119390.0,119390,119390
unique,2,,,,12,,,,,,...,3,,,,4,,,,3,926
top,City Hotel,,,,August,,,,,,...,No Deposit,,,,Transient,,,,Check-Out,2015-10-21
freq,79330,,,,13877,,,,,,...,104641,,,,89613,,,,75166,1461
mean,,0.370416,104.011416,2016.156554,,27.165173,15.798241,0.927599,2.500302,1.856403,...,,86.693382,189.266735,2.321149,,101.831122,0.062518,0.571363,,
std,,0.482918,106.863097,0.707476,,13.605138,8.780829,0.998613,1.908286,0.579261,...,,110.774548,131.655015,17.594721,,50.53579,0.245291,0.792798,,
min,,0.0,0.0,2015.0,,1.0,1.0,0.0,0.0,0.0,...,,1.0,6.0,0.0,,-6.38,0.0,0.0,,
25%,,0.0,18.0,2016.0,,16.0,8.0,0.0,1.0,2.0,...,,9.0,62.0,0.0,,69.29,0.0,0.0,,
50%,,0.0,69.0,2016.0,,28.0,16.0,1.0,2.0,2.0,...,,14.0,179.0,0.0,,94.575,0.0,0.0,,
75%,,1.0,160.0,2017.0,,38.0,23.0,2.0,3.0,2.0,...,,229.0,270.0,0.0,,126.0,0.0,1.0,,


#### Unique values in each variables

In [101]:
for i in df.columns.tolist():
  print(f"Unique value for {i} : { df[i].nunique()}.")

Unique value for hotel : 2.
Unique value for is_canceled : 2.
Unique value for lead_time : 479.
Unique value for arrival_date_year : 3.
Unique value for arrival_date_month : 12.
Unique value for arrival_date_week_number : 53.
Unique value for arrival_date_day_of_month : 31.
Unique value for stays_in_weekend_nights : 17.
Unique value for stays_in_week_nights : 35.
Unique value for adults : 14.
Unique value for children : 5.
Unique value for babies : 5.
Unique value for meal : 5.
Unique value for country : 177.
Unique value for market_segment : 8.
Unique value for distribution_channel : 5.
Unique value for is_repeated_guest : 2.
Unique value for previous_cancellations : 15.
Unique value for previous_bookings_not_canceled : 73.
Unique value for reserved_room_type : 10.
Unique value for assigned_room_type : 12.
Unique value for booking_changes : 21.
Unique value for deposit_type : 3.
Unique value for agent : 333.
Unique value for company : 352.
Unique value for days_in_waiting_list : 128.


## ***3. Data Wrangling***

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

## ***5. Solution to Business Objective***

## ***Conclusion***