<a href="https://colab.research.google.com/github/somnathsap123/almabetter/blob/main/Hotel_Booking_Analysis_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Hotel_Booking_Analysis_EDA


##### **Project Type**    - EDA
##### **Contribution**    - Individual
Name - Somnath Sapkal


# **Project Summary -**
Hotels and resorts are always searching for new ways to attract the customers attention towards them. One of the most important aspects of this industry is managing bookings and reservations. Hotel owner need to be able to predict customer behavior, determine when to offer promotions and discounts, and optimize pricing strategies. Thats why exploratory data analysis (EDA) play an important role in understanding the factors that affect hotel bookings.

Write the summary here within 500-600 words.

# **GitHub Link -**

https://github.com/somnathsap123/almabetterProvide 

# **Problem Statement**


To analyse and gain insights from hotel bookings dataset, and to identify patterns and trends in the data. The main goal of this EDA project is to give useful insights to hotel managers and owners that can help them to optimize their pricing and marketing strategies, and to increase the customer count.

#### **Define Your Business Objective?**

Identify key factors that influence hotel bookings in order to improve the performance and profitability of hotels.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import missingno as msno

# Mount drive
from google.colab import drive
drive.mount('/content/drive')

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
df = pd.read_csv('/content/drive/MyDrive/Copy of Copy of Hotel Bookings.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape


### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values C
df.isnull().sum

In [None]:
# Visualizing the missing value
plt.figure(figsize = (5,5))
sns.heatmap(df.isna(),cmap='Reds',cbar = False)
plt.show()

### What did you know about your dataset?

 The dataset has a total of 119,390 records and 32 columns.
 
 The dataset contains missing values, which are mainly in agent and company column.
 
 
 It has information such as the booking date, length of stay, number of adults, children, babies, the number of available parking spaces, etc.
 
 The dataset contains information on bookings for two types of hotels - a city hotel and a resort hotel.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description 

Variable description are as follows:-

hotel: The type of hotel that are a resort hotel or a city hotel.

is_canceled: Whether the booking was canceled or not, where 1 indicates the booking was canceled and 0 indicates the booking was not canceled.

lead_time: The number of days between the booking date and the arrival date.

arrival_date_year: The year of arrival date.

arrival_date_month: The month of arrival date.

arrival_date_week_number: The week number of arrival date.

arrival_date_day_of_month: The day of the month of arrival date.

stays_in_weekend_nights: The number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel.

stays_in_week_nights: The number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel.

adults: The number of adults in the booking.

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
#Dropping duplicate values
df.drop_duplicates(inplace = True)


In [None]:
df.shape

In [None]:
#fill null values in company and agent column by 0
df[['children','agent']]=df[['company','agent']].fillna(0)

In [None]:
#There are 4 null values in children and column so replace them by mean values of children
df['children'].fillna(df['children'].mean(),inplace=True)


In [None]:
#fill missing values in country column by 'others'
df['country'].fillna('others', inplace = True)

In [None]:
#Checking if all null values are removed
df.isnull().sum

In [None]:
# changing datatype of column 'reservation_status_date' to data_type.
df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'], format = '%Y-%m-%d')

In [None]:
# Adding total staying days in hotels
df['total_stay'] = df['stays_in_weekend_nights']+df['stays_in_week_nights']

In [None]:
# Adding total people num as column, i.e. total people num = num of adults + children + babies
df['total_people'] = df['adults']+df['children']+df['babies']

In [None]:
df.head()

**Finding** **Insights**






In [None]:
# Monthly booking count for both hotel types
# This code gives the number of bookings done by the month for both hotels
busy_months = df.groupby(["hotel", "arrival_date_month"])["hotel"].count().reset_index(name="count") # Groupby 'hotel' and 'arrival_date_month' 
busy_months = busy_months.sort_values(["hotel", "count"], ascending=False) # Sort count in desending order for both type of hotel
print(busy_months) # Print busy months

In [None]:
# The distribution of stays in the hotel by market segment
stay_by_market = df.groupby(["market_segment", "hotel"])["total_stay"].sum().reset_index(name="stays_in_hotel")
print(stay_by_market)

In [None]:
# Number of bookings by lead time
# This code gives the nummber of booking done by lead time
bookings_by_lead_time = df.groupby(["lead_time", "hotel"])["hotel"].count().reset_index(name="count") #grouping by lead_time and hotel
print(bookings_by_lead_time) # print bookings_by_lead_time


In [None]:
# Top 3 most common reserved room type
# This code gives the count of reserved_room_type
booking_types = df.groupby(["hotel", "reserved_room_type"])["reserved_room_type"].count().reset_index(name="count") # Grouping hotel and reserved_room_type
booking_types = booking_types.sort_values(["hotel", "count"], ascending=False) # Sorting the count values
booking_types = booking_types.groupby("hotel").head(3) # Filtering top three
print(booking_types) # Printing the booking_types

In [None]:
# Top 3 most common reserved room type
# This code gives the count of reserved_room_type
booking_types = df.groupby(["hotel", "reserved_room_type"])["reserved_room_type"].count().reset_index(name="count") # Grouping hotel and reserved_room_type
booking_types = booking_types.sort_values(["hotel", "count"], ascending=False) # Sorting the count values
booking_types = booking_types.groupby("hotel").head(3) # Filtering top three
print(booking_types) # Printing the booking_types

In [None]:
# The most preffered hotel type
# This code gives the most preffered hotel type (city hotel or resort hotel)
preferred_hotel_type = df.groupby(["hotel"])["hotel"].count().reset_index(name="count") # Grouping the hotel and the count of the hotel
preferred_hotel_type = preferred_hotel_type.sort_values(["count"], ascending=False).reset_index(drop=True) # Sorting the count value
most_preferred = preferred_hotel_type.loc[0, "hotel"] # The hotel with most number of counts
print(f"The most preferred hotel type is {most_preferred}")


In [None]:
# Find the average daily rate for whole year by hotel type
# This code gives the average daily rate for respective hotel type 
adr_by_hotel_type = df.groupby(["hotel"])["adr"].mean().reset_index(name="adr") # Groupby hotel and sum the adr column
print(adr_by_hotel_type)

In [None]:
# Average daily rates per month for both hotel types
# This code gives the adr for all months for both the hotel types
daily_rates = df.groupby(["hotel", "arrival_date_month"])["adr"].mean().reset_index(name="avg_daily_rate") # Grouping hotel and arrival_date_month and finding the average of adr
daily_rates = daily_rates.sort_values(["hotel", "avg_daily_rate"], ascending=False) # Sorting the average daily rate in descending order
print(daily_rates)

In [None]:
# The distribution of room prices by hotel
# This code gives the average price for reserved room type per type of hotel
price_distribution = df.groupby(["hotel", "reserved_room_type"])["adr"].mean().reset_index(name="avg_price") # Grouping the hotel and reserved_room_type and finding average of adr
price_distribution = price_distribution.sort_values(["hotel", "avg_price"], ascending=False) # Sorting the average price in descending order
print(price_distribution)

In [None]:
# Total revenue per hotel
# This code gives the total revenue generated by both hotels respectively
revenue = df.groupby("hotel")["adr"].sum().reset_index(name="total_revenue") # Grouping hotel and summing adr
print(revenue)

In [None]:
# Average daily rates by lead time
rates_by_lead_time = df.groupby(["lead_time", "hotel"])["adr"].mean().reset_index(name="avg_daily_rate") # Grouping of lead_time and hotel and finding the mean of adr
print(rates_by_lead_time)

In [None]:
# The distribution of bookings by customer type and hotel type
# This code gives the number of bookings for customer type and hotel type
bookings_by_customer_type = df.groupby(["customer_type", "hotel"])["hotel"].count().reset_index(name="count") # Grouping customer_type and hotel type
print(bookings_by_customer_type)

In [None]:
# Distribution of bookings by market segment and customer type
# This code gives the distribution of bookings by market segment and customer type
bookings_by_market_customer = df.groupby(["market_segment", "customer_type"])["hotel"].count().reset_index(name="count") # Grouping market_segment and customer_type
print(bookings_by_market_customer)


In [None]:
# The most common distribution channels by hotel type
# This code gives the count of bookings with respective distribution channels
booking_channels = df.groupby(["hotel", "distribution_channel"])["distribution_channel"].count().reset_index(name="count") # Grouping by hotel and distribution_channel and count of distribution_channel
booking_channels = booking_channels.sort_values(["hotel", "count"], ascending=False) # Sorting the count 
print(booking_channels)

In [None]:
# Distribution of bookings by customer type and deposit type
# This code gives the bookings by the customer type and deposti type
bookings_by_deposit_type = df.groupby(["customer_type", "deposit_type"])["hotel"].count().reset_index(name="count") # Grouping of customer_type and deposit_type and hotel count
print(bookings_by_deposit_type)


In [None]:
# Distribution of bookings by customer type and cancellation status
# This code shows the number of bookings which are canceled and not canceled by customer type
bookings_by_cancellation_status = df.groupby(["customer_type", "is_canceled"])["hotel"].count().reset_index(name="count") # Grouping customer_type and is_canceled and sum of hotel
print(bookings_by_cancellation_status)

In [None]:
# Average daily rate by customer type
# This code shows the adr for hotel type and customer type
rate_by_customer_type = df.groupby([ "hotel","customer_type"])["adr"].mean().reset_index(name="avg_daily_rate") # Grouping of hotel and customer_type and mean of adr
rate_by_customer_type = rate_by_customer_type.sort_values(["hotel", "avg_daily_rate"], ascending=False) # Sorting the values in descending order
print(rate_by_customer_type)

In [None]:
# Distribution of bookings by room type and hotel type
# This code shows the numner of bookings for room_type and hotel_type
bookings_by_room_type = df.groupby(["hotel", "reserved_room_type"])["is_canceled"].count().reset_index(name="count") # Grouping of hotel and reserved_room type
print(bookings_by_room_type)

In [None]:
# Average daily rate by room type
# This code shows the adr for reserved room type and hotel type
rate_by_room_type = df.groupby(["hotel", "reserved_room_type"])["adr"].mean().reset_index(name="avg_daily_rate") # Grouping of hotel and reserved_room_type and mean of adr
rate_by_room_type = rate_by_room_type.sort_values(["hotel", "avg_daily_rate"], ascending=False) # soring the adr values
print(rate_by_room_type)

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***