<a href="https://colab.research.google.com/github/surya12518/Project/blob/main/Hotel_Data_Analysis_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Hotel Data Visualization

**Project Type**    - EDA

**Contribution**    - Individual

# **Project Summary -**


The hotel reservations dataset serves as a rich repository of diverse information, encompassing the intricacies of guest bookings, demographic details, and stay-related variables. This exploratory analysis project seeks to unveil valuable insights and trends within this dataset, shedding light on critical aspects of the hotel reservation process. The exploration begins with a thorough examination of booking patterns, specifically delving into the distribution of lead time to discern booking horizons and unraveling trends in stays during weekends and weeknights. Understanding these temporal nuances provides a foundation for comprehending guest behavior and preferences. The analysis further extends to demographic explorations, unraveling the distribution of adults, children, and babies in reservations, alongside the examination of the ratio of repeated guests to new guests. These insights contribute to a nuanced understanding of the hotel's clientele and potential strategies for customer engagement. The Average Daily Rate (ADR), a pivotal metric in the hospitality industry, takes center stage in subsequent analyses. The distribution of ADR across different room types, meal types, and deposit types offers critical insights into pricing dynamics and the impact of specific reservation attributes on financial outcomes. Additionally, the relationship between ADR and the total number of people in a reservation is probed, providing a holistic view of how pricing aligns with group sizes. Reservation status becomes a focal point, with a focus on visualizing ADR distributions based on reservation outcomes and exploring the frequency and nature of booking changes made by customers. The correlation heatmap serves as a powerful tool to uncover relationships between numerical features, facilitating a nuanced understanding of interdependencies within the dataset. To enhance the depth of analysis, pair plots are employed to visually dissect relationships and distributions between selected numerical features, offering a comprehensive overview of how these variables interact. By the project's conclusion, this exploratory analysis aims to equip stakeholders in the hospitality industry with actionable insights. From refining pricing strategies and optimizing customer services to identifying target demographics and enhancing overall customer satisfaction, the findings from this project are poised to inform data-driven decision-making processes. The nuanced understanding derived from this exploration not only illuminates past trends but also lays the groundwork for informed strategies that can be pivotal in navigating the evolving landscape of the hotel reservation domain.






# **GitHub Link -**

https://github.com/surya12518/Project/tree/main

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [1]:
# Import Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import drive

### Dataset Loading

In [2]:
# Dataset loading from drive
drive.mount("/content/drive")
hotel = pd.read_csv("/content/drive/MyDrive/Practice data/project data/Hotel Bookings.csv")

Mounted at /content/drive


### Dataset First View

In [4]:
# Dataset First Look

hotel.head()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,deposit_type,agent,company,days_in_waiting_list,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,No Deposit,,,0,Transient,0.0,0,0,Check-Out,2015-07-01
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,No Deposit,,,0,Transient,75.0,0,0,Check-Out,2015-07-02
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,No Deposit,304.0,,0,Transient,75.0,0,0,Check-Out,2015-07-02
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,No Deposit,240.0,,0,Transient,98.0,0,1,Check-Out,2015-07-03


### Dataset Rows & Columns count

In [5]:
hotel.shape

(119390, 32)

## Dataset Information

In [6]:
# Dataset Info

hotel.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119390 entries, 0 to 119389
Data columns (total 32 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   hotel                           119390 non-null  object 
 1   is_canceled                     119390 non-null  int64  
 2   lead_time                       119390 non-null  int64  
 3   arrival_date_year               119390 non-null  int64  
 4   arrival_date_month              119390 non-null  object 
 5   arrival_date_week_number        119390 non-null  int64  
 6   arrival_date_day_of_month       119390 non-null  int64  
 7   stays_in_weekend_nights         119390 non-null  int64  
 8   stays_in_week_nights            119390 non-null  int64  
 9   adults                          119390 non-null  int64  
 10  children                        119386 non-null  float64
 11  babies                          119390 non-null  int64  
 12  meal            

### Duplicate Values

In [7]:
# Dataset Duplicate Value Count

hotel.duplicated().sum()

31994

### Missing Values/Null Values

In [8]:
# Missing Values/Null Values Count

hotel.isna().sum()

hotel                                  0
is_canceled                            0
lead_time                              0
arrival_date_year                      0
arrival_date_month                     0
arrival_date_week_number               0
arrival_date_day_of_month              0
stays_in_weekend_nights                0
stays_in_week_nights                   0
adults                                 0
children                               4
babies                                 0
meal                                   0
country                              488
market_segment                         0
distribution_channel                   0
is_repeated_guest                      0
previous_cancellations                 0
previous_bookings_not_canceled         0
reserved_room_type                     0
assigned_room_type                     0
booking_changes                        0
deposit_type                           0
agent                              16340
company         

### What did you know about your dataset?

The hotel dataset, encompassing 119,390 rows and 32 columns, provides a comprehensive overview of various facets of hotel reservations. Among these rows, 31,994 are duplicates, while 4 columns exhibit null values, with the 'company' column having the highest count and 'children' the lowest. The dataset comprises diverse data types and indicators, encapsulating critical information for each reservation entry. Key attributes include the type of hotel, cancellation status, lead time, arrival details (year, month, week number, day of the month), stay duration on weekends and weekdays, guest demographics (adults, children, babies), meal type, guest country, market segment, and distribution channel. Additionally, it captures whether the guest is a repeated visitor, history of cancellations and previous bookings, room details (reserved and assigned types), booking modifications, deposit type, agent and company IDs, waiting list duration, customer type, average daily rate, car parking requests, special requests, reservation status, and related dates. This rich dataset encompasses the entire lifecycle of hotel reservations, providing valuable insights into booking patterns, guest characteristics, and operational dynamics from booking to check-out.

## ***2. Understanding Your Variables***

In [9]:
# Dataset Columns

hotel.columns

Index(['hotel', 'is_canceled', 'lead_time', 'arrival_date_year',
       'arrival_date_month', 'arrival_date_week_number',
       'arrival_date_day_of_month', 'stays_in_weekend_nights',
       'stays_in_week_nights', 'adults', 'children', 'babies', 'meal',
       'country', 'market_segment', 'distribution_channel',
       'is_repeated_guest', 'previous_cancellations',
       'previous_bookings_not_canceled', 'reserved_room_type',
       'assigned_room_type', 'booking_changes', 'deposit_type', 'agent',
       'company', 'days_in_waiting_list', 'customer_type', 'adr',
       'required_car_parking_spaces', 'total_of_special_requests',
       'reservation_status', 'reservation_status_date'],
      dtype='object')

In [10]:
# Dataset Describe

hotel.describe()

Unnamed: 0,is_canceled,lead_time,arrival_date_year,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,children,babies,is_repeated_guest,previous_cancellations,previous_bookings_not_canceled,booking_changes,agent,company,days_in_waiting_list,adr,required_car_parking_spaces,total_of_special_requests
count,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119390.0,119386.0,119390.0,119390.0,119390.0,119390.0,119390.0,103050.0,6797.0,119390.0,119390.0,119390.0,119390.0
mean,0.370416,104.011416,2016.156554,27.165173,15.798241,0.927599,2.500302,1.856403,0.10389,0.007949,0.031912,0.087118,0.137097,0.221124,86.693382,189.266735,2.321149,101.831122,0.062518,0.571363
std,0.482918,106.863097,0.707476,13.605138,8.780829,0.998613,1.908286,0.579261,0.398561,0.097436,0.175767,0.844336,1.497437,0.652306,110.774548,131.655015,17.594721,50.53579,0.245291,0.792798
min,0.0,0.0,2015.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,6.0,0.0,-6.38,0.0,0.0
25%,0.0,18.0,2016.0,16.0,8.0,0.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0,62.0,0.0,69.29,0.0,0.0
50%,0.0,69.0,2016.0,28.0,16.0,1.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,14.0,179.0,0.0,94.575,0.0,0.0
75%,1.0,160.0,2017.0,38.0,23.0,2.0,3.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,229.0,270.0,0.0,126.0,0.0,1.0
max,1.0,737.0,2017.0,53.0,31.0,19.0,50.0,55.0,10.0,10.0,1.0,26.0,72.0,21.0,535.0,543.0,391.0,5400.0,8.0,5.0


### Variables Description

There are 32 variables each variable has 1,19,390 no. of elements without data cleaning. For every variable with numerical values summarized above, what is the mean, std, minimum value, maximum value, 25%, 50%, or 75% of each numerical variable.
1.is_canceled: Indicates whether the booking was canceled (0 for not canceled, 1 for canceled).

2.lead_time: The number of days between booking and arrival.

3.arrival_date_year: The year of arrival.

4.arrival_date_month: The month of arrival.

5.arrival_date_week_number: The week number of the arrival date.

6.arrival_date_day_of_month: The day of the month of arrival.

7.stays_in_weekend_nights: The number of weekend nights (Saturday and Sunday) the guest stayed.

8.stays_in_week_nights: The number of weekday nights the guest stayed.

9.adults: The number of adults in the booking.

10.children: The number of children in the booking.

11.babies: The number of babies in the booking.

12.meal: The type of meal booked (e.g., "BB" for Bed & Breakfast).

13.country: The country of the guest.

14.market_segment: The market segment of the booking.

15.distribution_channel: The distribution channel used to make the booking.

16.is_repeated_guest: Indicates whether the guest is a repeated guest (0 for no, 1 for yes).

17.previous_cancellations: The number of previous cancellations by the guest.

18.previous_bookings_not_canceled: The number of previous bookings that were not canceled by the guest.

19.reserved_room_type: The room type reserved.

20.assigned_room_type: The room type assigned to the guest.

21.booking_changes: The number of changes made to the booking.

22.deposit_type: The type of deposit made for the booking.

23.agent: The ID of the travel agent handling the booking.

24.company: The ID of the company if the booking is associated with a company.

25.days_in_waiting_list: The number of days the booking was on the waiting list.

26.customer_type: The type of customer (e.g., "Transient" or "Corporate").

27.adr: The average daily rate (a pricing metric).

28.required_car_parking_spaces: The number of car parking spaces requested.

29.total_of_special_requests: The total number of special requests made by the guest.

30.reservation_status: The status of the reservation (e.g., "Check-Out").

31.reservation_status_date: The date of the reservation status.

## ***3. Data Wrangling***

### Data Wrangling Code

In [11]:
# Removing duplicated data
hotel.drop_duplicates(inplace= True)

In [12]:
hotel.shape

(87396, 32)

In [13]:
# Removing Null values

hotel.dropna(subset= ["children"], inplace = True, axis= 0)

hotel["country"].fillna(hotel["country"].mode()[0], inplace= True)

In [14]:
# Droping row agent and company

hotel.drop(['company', 'agent'], axis=1, inplace=True)

In [15]:
hotel.shape

(87392, 30)


### What all manipulations have you done and insights you found?



1.  Removing duplicated Data
2.  Checking shape of Data
3.  Removing all Null values
4.  Droped two columns
5.  Rechecking shape of Dataset



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 2

In [3]:
# Chart - 2 visualization code
# Question:- What is the distribution of the number of adults, children, and babies in the bookings?

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 3

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 4

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 5

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 6

#### Chart - 7

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 8

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 9

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 10

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 11

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 12

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 13

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 14

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

#### Chart - 15

##### 1. Why did you pick the specific chart?

##### 2. What is/are the insight(s) found from the chart?

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***