# **Project Name**    -



##### **Project Type**    -  Hotel  booking EDA
##### **Contribution**    - Individual

# **Project Summary -**


**Objective**: To analyze the hotel booking dataset to uncover insights that can inform business decisions, optimize hotel booking strategies, and enhance customer satisfaction.

**Key Questions:**

**Booking Patterns:** What are the peak months for bookings at the city hotel and the resort hotel?

**Stay Duration:** How does the length of stay impact the daily rate and overall revenue?

**Customer Preferences:** Are there any trends in special requests made by guests?

**Resource Allocation:** How can the hotel optimize the use of parking spaces based on booking data?

**Data Set Overview:** The dataset includes booking information for a city hotel and a resort hotel, with variables such as booking date, length of stay, number of guests (adults, children, babies), and parking space availability.

**Methodology:**

**Data Cleaning:** Remove duplicates, handle missing values, and ensure data consistency.

**Data Visualization:** Use graphs and charts to visualize booking trends, stay duration, and special requests.



**Expected Outcomes**

**Optimal Booking Times:** Identify the best times of year to promote bookings.

**Rate Strategies:** Determine optimal pricing strategies based on stay duration.

**Special Requests Insights:** Understand the factors influencing special requests to improve guest experience.

Parking Utilization: Provide recommendations for efficient parking space management.

**Business Impact:** The insights gained from this EDA will enable the hotel to tailor its marketing efforts, price its rooms more effectively, and enhance guest satisfaction, ultimately leading to increased profitability.

**Conclusion:** This EDA project aims to provide actionable insights that align with the business context, helping the hotel to make data-driven decisions that improve operational efficiency and customer experience.



# **GitHub Link -**

# **Problem Statement**




# Problem Statement:


In the competitive hospitality industry, understanding customer behavior and optimizing hotel operations are crucial for success. The hotel booking dataset presents an opportunity to analyze various factors that influence booking patterns and guest preferences. However, the challenge lies in extracting meaningful insights from this data that can directly impact business strategies and outcomes.

The primary problem is to determine the most influential factors affecting hotel bookings and revenue. This includes identifying the optimal times for room promotions, understanding the impact of stay duration on daily rates, predicting the likelihood of special requests, and managing resources like parking spaces effectively.

The goal is to leverage Exploratory Data Analysis (EDA) to:


*   Analyze the dataset to identify trends and patterns in hotel bookings.

*   
Understand the relationship between booking details (such as booking time, length of stay, number of guests) and hotel revenue.   




*   Predict factors that lead to a higher number of special requests.

*   Provide actionable insights that can be used to formulate strategies to increase occupancy rates, optimize pricing, and enhance guest satisfaction.

By addressing this problem, the hotel aims to improve its operational efficiency, create targeted marketing campaigns, and ultimately increase profitability while providing a superior customer experience.

#### **Define Your Business Objective?**

Certainly! The business objectives for the hotel booking dataset analysis project can be defined as follows:

**Business Objectives:**

1. **Maximize Revenue:**
   - To identify strategies that maximize room occupancy and average daily rates throughout the year.
   - To optimize pricing models based on demand and booking patterns.

2. **Enhance Customer Experience:**
   - To understand guest preferences and tailor services to enhance satisfaction.
   - To predict and fulfill special requests more effectively, thereby improving guest loyalty.

3. **Operational Efficiency:**
   - To allocate resources such as parking spaces and staff based on predictive analysis of booking data.
   - To reduce overbooking and underbooking through better demand forecasting.

4. **Strategic Marketing:**
   - To create targeted marketing campaigns for different customer segments based on booking trends.
   - To identify the best times for promotional offers to attract more guests.

5. **Competitive Analysis:**
   - To benchmark against competitors by understanding industry trends and customer expectations.
   - To identify unique selling propositions (USPs) that set the hotel apart in the market.

6. **Data-Driven Decision Making:**
   - To establish a culture of data-driven decision making within the hotel management.
   - To use insights from data to inform policy and strategic decisions.

These objectives aim to leverage the insights gained from the Exploratory Data Analysis (EDA) to drive growth, improve customer satisfaction, and ensure the hotel's competitive edge in the hospitality industry.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive

drive.mount('/content/drive')
path = '/content/drive/MyDrive/Hotel.csv'

data = pd.read_csv(path)


### Dataset First View

In [None]:
# Dataset First Look

data.head(10)


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
data.shape

### Dataset Information

In [None]:
# Dataset data type

data.dtypes


In [None]:
# Dataset Info
data.info()


#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
data.duplicated().value_counts()


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_df =data.isnull().sum().sort_values(ascending = False)

missing_df


In [None]:
# Visualizing the missing values
plt.figure(figsize= (16,7))
sns.heatmap(data.isnull(), cbar=False,cmap='viridis')
plt.show()

### What did you know about your dataset?



1. **Dataset Structure:** The dataset comprises **119,390 rows** and **32 columns**, providing a substantial volume of data for analysis.

2. **Data Types:** The dataset features a diverse range of data types, including **integers**, **floats**, and **objects (strings)**, which facilitates a multifaceted approach to data analysis.

3. **Duplicate Entries:** There are **31994 duplicate values** present within the dataset, indicating a significant level of redundancy that may require deduplication for accurate analysis.

4. **Missing Values:** The dataset exhibits missing values across three key columns:
   - **Country Column:** There are **488 missing entries**, suggesting gaps in geographical data.
   - **Agent Column:** A notable **16,340 entries are missing**, which could impact the analysis of booking sources.
   - **County Column:** There is a substantial number of missing values, amounting to **112,593
   - **Children Column:** There is a substantial number of missing values, amounting to **4












## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
data.columns


In [None]:
# Dataset Describe
data.describe()

 description of the variables in the dataset

1. **`hotel`**: Type of hotel (e.g., "Resort Hotel" or "City Hotel").
2. **`is_canceled`**: Binary variable indicating whether the booking was canceled (1) or not (0).
3. **`lead_time`**: Number of days between booking date and arrival date.
4. **`arrival_date_year`**: Year of arrival date.
5. **`arrival_date_month`**: Month of arrival date.
6. **`arrival_date_week_number`**: Week number of arrival date.
7. **`arrival_date_day_of_month`**: Day of the month of arrival date.
8. **`stays_in_weekend_nights`**: Number of weekend nights (Saturday or Sunday) the guest stayed.
9. **`stays_in_week_nights`**: Number of weekday nights (Monday to Friday) the guest stayed.
10. **`adults`**: Number of adults.
11. **`children`**: Number of children.
12. **`babies`**: Number of babies.
13. **`meal`**: Type of meal booked (e.g., "BB" for Bed & Breakfast).
14. **`country`**: Country of origin of the guest.
15. **`market_segment`**: Market segment designation (e.g., "Online TA" for Online Travel Agents).
16. **`distribution_channel`**: Booking distribution channel (e.g., "Direct" or "Corporate").
17. **`is_repeated_guest`**: Binary variable indicating whether the guest is a repeated guest (1) or not (0).
18. **`previous_cancellations`**: Number of previous bookings canceled by the guest.
19. **`previous_bookings_not_canceled`**: Number of previous bookings not canceled by the guest.
20. **`reserved_room_type`**: Code for the type of room reserved.
21. **`assigned_room_type`**: Code for the type of room assigned at check-in.
22. **`booking_changes`**: Number of changes made to the booking.
23. **`deposit_type`**: Type of deposit made (e.g., "No Deposit" or "Non Refund").
24. **`agent`**: ID of the travel agency that made the booking.
25. **`company`**: ID of the company or entity that made the booking.
26. **`days_in_waiting_list`**: Number of days the booking was in the waiting list before being confirmed.
27. **`customer_type`**: Type of booking (e.g., "Transient" or "Contract").
28. **`adr`**: Average Daily Rate (price per room).
29. **`required_car_parking_spaces`**: Number of car parking spaces required by the guest.
30. **`total_of_special_requests`**: Total number of special requests made by the guest.
31. **`reservation_status`**: Current status of the reservation (e.g., "Check-Out").
32. **`reservation_status_date`**: Date when the reservation status was last updated.



Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

print(data.apply(lambda col: col.unique()))

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
#Handling the missing value
data.isnull().sum().sort_values(ascending= False  )


In [None]:
#Fill the mising value
data[['company','agent','children']] = data[['company','agent','children']].fillna(0).inplace =True
data[['country']] = data[['country']].fillna('others').inplace =True



In [None]:
#check the missing value

data.isnull().sum().sort_values(ascending= False  )


In [None]:
#removing the dupplicated value
len(data[data.duplicated()])
data = data.drop_duplicates()

In [None]:
data.shape

In [None]:
# Adding total staying days in hotels
data['total_stay'] = data['stays_in_weekend_nights']+data['stays_in_week_nights']

# Adding total people num as a column
data['total_member'] = data['adults']+data['children']+data['babies']


# Adding total child  num as a column
data['child'] = data['children']+data['babies']

### What all manipulations have you done and insights you found?



- The dataset initially contained **31,994 duplicate entries**. These duplicates were identified and subsequently removed to ensure the integrity of the analysis.
- We encountered missing data in four pivotal columns: **'company'**, **'agent'**, **'country'**, and **'children'**. To maintain dataset consistency, these missing values were replaced with zeros.
- To enhance the dataset's utility for further analysis, two new columns were introduced: **'total_stay'**, representing the total duration of each stay, and **'total_people'**, indicating the total number of individuals (adults, children, and babies) per booking.



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### **Chart - 1  - Which type of hotel is mostly prefered by the guests?**


In [None]:
#plotitng the plot
data['hotel'].value_counts().plot.pie(explode=[0.02, 0.02], autopct ='%1.2f%%',figsize=(10,7), fontsize = 15)

#create the lable

plt.title('Pie Chart for Most Preferred Hotel', fontsize = 20)

#plot

plt.show()


##### 1. Why did you pick the specific chart?

I selected the pie chart as it effectively displays the percentage distribution of preferred hotels. This type of chart excels at illustrating the proportional breakdown of categorical data, allowing for a clear comparison of each category against the whole, which is represented as 100%.

##### 2. What is/are the insight(s) found from the chart?



Answer Here


### We can see that the percent of City hotel is more compared to Resort hotel.
### Resort Hotel tend to be on the expensive side and most people will just stick with city hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

The insights you've gained from the data can indeed have a significant impact on business strategies. Let's explore how:

**Positive Business Impact:**
- **Market Demand Understanding**: The higher percentage of City hotel preference indicates a strong market demand for this type of accommodation. Businesses can capitalize on this by investing in City hotels or tailoring marketing strategies to highlight the benefits of City hotels.
- **Cost-Effective Solutions**: Since City hotels are generally perceived as more cost-effective than Resort hotels, promoting City hotels can attract a broader customer base looking for affordable lodging options.

**Potential Negative Growth Insights:**
- **Overlooking Niche Markets**: While the majority may prefer City hotels, there may be a niche market for Resort hotels that's being overlooked. If businesses focus solely on City hotels, they might miss out on the opportunity to cater to those willing to pay a premium for Resort hotels.
- **Price Sensitivity**: The preference for City hotels might also indicate a price-sensitive market. If the economic situation changes, such as an increase in disposable income, the same customers might opt for more luxurious accommodations, leading to a shift in market dynamics.



#### Chart - 2 - Which month is very prefered by customer during the years

In [None]:
#Booking group by as per month in both type of hotel
Group_by_months = data.groupby(['arrival_date_month'])['hotel'].count().reset_index().rename(columns = {'hotel':'Counts'})

#Creating the the mohth in order
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

#Creating the order of the months with the data frame.

Group_by_months['arrival_date_month'] = pd.Categorical(Group_by_months['arrival_date_month'], categories = months, ordered = True)

#sorting the months


Group_by_months = Group_by_months.sort_values('arrival_date_month')



Group_by_months

In [None]:
#Visvulzise the months trends


# Set plot size
plt.figure(figsize = (14,6))
# Plotting lineplot on x- months & y- bookings counts
sns.lineplot(x = Group_by_months['arrival_date_month'], y = Group_by_months['Counts'])

# Set title
plt.title('Number of bookings across each month', fontsize = 20)

# Set labels
plt.xlabel('Month', fontsize = 16)
plt.ylabel('Number of bookings', fontsize = 16)

# To show
plt.show()

##### 1. Why did you pick the specific chart?

I picked the line chart cuase line provide the full clearirt abount the trend over the time periods

##### 2. What is/are the insight(s) found from the chart?

Certainly! Here's a revised version of your insights:

1. The line charts indicate that throughout the year, there is an upward trend in bookings from June through August, with these months showing the highest counts.

2. August stands out as the month with the peak booking count for the year; however, there is a significant decline in bookings after August.

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here
The insights gained from the booking trends can indeed create a positive business impact. By identifying the peak booking months of June through August, with August being the highest, businesses can optimize their marketing strategies, staffing, and inventory to capitalize on this period. Special promotions or events can be planned to further boost bookings during these months.

On the other hand, the significant drop in bookings after August could potentially lead to negative growth if not addressed properly. This insight is crucial as it allows businesses to prepare for the off-peak season by adjusting their operational costs and exploring alternative revenue streams to maintain a steady cash flow. For instance, they could offer off-season discounts or target different customer segments to mitigate the impact of the booking decline.

In summary, while the insights highlight a period of high demand that can be leveraged for positive growth, they also reveal a subsequent downturn that requires strategic planning to prevent negative growth.

#### Chart - 3 = In a months what is the distriubation of the arrival

In [None]:
#Set the plot size

plt.figure(figsize=(15,6))

#code for visulization
sns.countplot(data = data, x = 'arrival_date_day_of_month', hue='hotel', palette='pastel')

#plot show
plt.show()


##### 1. Why did you pick the specific chart?

We pick the bar plot cause we have discrte data for the day of aariaval for the dicreate data type bar plot give good clarity

##### 2. What is/are the insight(s) found from the chart?


1. At the end of each month, there is a noticeable decrease in arrivals compared to other months.
2. City hotels experience a higher number of arrivals compared to resort hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights provided can certainly contribute to a positive business impact. Knowing that there is a lower arrival rate at the end of each month allows businesses to plan targeted marketing campaigns or special offers to boost arrivals during these typically slower periods.

Regarding the second insight, the fact that city hotels have more arrivals than resort hotels can inform strategic decisions. For instance, businesses could focus on enhancing the appeal of resort hotels or adjusting pricing strategies to attract more guests.

However, these insights could also indicate areas of potential negative growth. The lower end-of-month arrivals might suggest a recurring pattern of decreased revenue during these periods, which could impact overall profitability if not addressed. Similarly, if a business primarily operates resort hotels, the higher arrival rates at city hotels could represent a competitive disadvantage and lead to negative growth for the resort sector.



#### Chart - 4 - Number of adults in both hotels and Adults vs Cancelations

In [None]:

#Plot the size of plotter
plt.figure(figsize=(15, 8))
plt.subplot(1, 2, 1)
#code for visulization
sns.countplot(x='adults',hue='hotel', data=data, palette='pastel')
plt.title("Number of adults in both hotels",fontweight="bold", size=20)
plt.subplot(1, 2, 2)
sns.countplot(data = data, x = 'adults', hue='is_canceled', palette='husl')
plt.title('Adults vs Cancelations',fontweight="bold", size=20)


#Plot of the plotter
plt.show()

##### 1. Why did you pick the specific chart?

When comparing two types of hotels and the number of adults preferring each, a count chart visually represents these numbers in a clear and comparative manner, making it easy to see which category is more popular.





##### 2. What is/are the insight(s) found from the chart?


A pair of adults showed a greater preference for city hotels over resort hotels. Moreover, a significant portion, exceeding half of the guests, opted to cancel their reservations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: The preference for city hotels over resort hotels by a pair of adults suggests that city hotels may be more appealing to certain demographics. This insight could help hoteliers and marketers to tailor their services and marketing strategies to attract similar guests, potentially leading to increased occupancy rates and revenue for city hotels.
Negative Growth Insight: The fact that more than half of the guests canceled their reservations could indicate underlying issues with customer satisfaction or the booking process. This insight, while negative, is valuable as it highlights an area for improvement. Addressing the reasons behind these cancellations could lead to better retention rates and a more positive guest experience.

#### Chart - 5 - Number of Weekand Stay and number cancelation vs Week Stay

In [None]:
# Chart - 5 visualization

#Plot the size of plotter

plt.figure(figsize=(15, 8))
plt.subplot(1, 2, 1)
#Code for the viavualzation
sns.countplot(x='stays_in_week_nights',hue='hotel', data=data, palette='rainbow_r')
plt.title("Number of stays on weekday nights",fontweight="bold", size=20)
plt.subplot(1, 2, 2)

#Code for the viavualzation

sns.countplot(data = data, x = 'stays_in_week_nights', hue='is_canceled', palette='magma_r')
plt.title('WeekStay vs Cancelations',fontweight="bold", size=20)
plt.subplots_adjust(right=2)
#Plot the Visvulzation
plt.show()

##### 1. Why did you pick the specific chart?


"I utilized a bar plot to examine the frequency of weekend night stays and the cancellation  for weekday stays. This bar plot effectively displays categorical data, allowing for a straightforward visualization of customer counts."

##### 2. What is/are the insight(s) found from the chart?

Certainly! Here's a more concise and improved version of your insights:

1. **Two-Day Stays**: The maximum count of stays occurs for two days, indicating that customers often prefer this duration.
2. **Resort Preference**: Customers tend to prefer resort hotels for longer stays (more than four days).
3. **Cancellation Trends**: High cancellation rates are observed for stays lasting one  or two days.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



```
```

Certainly! Let's analyze the insights and address the questions:

1. **Positive Business Impact**:
   - The insight that customers prefer **two-day stays** can be leveraged to create targeted marketing campaigns or special offers for this duration. Promoting weekend packages or discounts for two-day bookings could attract more customers.
   - Additionally, the preference for **resort hotels** for longer stays (more than four days) suggests an opportunity to enhance resort amenities and services. Focusing on customer satisfaction during extended stays can lead to positive reviews and repeat business.

2. **Negative Growth Considerations**:
   - The high **cancellation rates** observed for stays lasting **two or three days** are concerning. This could impact revenue and occupancy rates. To mitigate this, the hotel management should investigate the reasons behind cancellations during these durations.
   - Possible reasons for cancellations might include pricing, room availability, or customer dissatisfaction. Addressing these issues could help reduce cancellations and improve overall business performance.

Remember, understanding customer behavior and acting on these insights strategically can significantly impact business outcomes

#### Chart - 6 - Number of adults in the hotel  and Adult VS cancelations  

In [None]:
# Chart - 6 visualization code
#Plot the size of plotter
plt.figure(figsize=(15, 8))
#Sub set plot
plt.subplot(1, 2, 1)

#Code for the plot
sns.countplot(x='adults',hue='hotel', data=data, palette='coolwarm')
plt.title("Number of adults in both hotels",fontweight="bold", size=20)
#Sub set plot
plt.subplot(1, 2, 2)
sns.countplot(data = data, x = 'adults', hue='is_canceled', palette='icefire')
plt.title('Adults vs Cancelations',fontweight="bold", size=20)

#Adjust the subsets
plt.subplots_adjust(right=1.7)

#plot the visvulazation
plt.show()


##### 1. Why did you pick the specific chart?

The countplot is preferred for analysis because it is specifically designed to show the frequency of categories. It’s an excellent tool for visualizing the distribution of categorical data, as it provides a clear and concise way to understand how many occurrences there are in each category. Here are some reasons why a countplot is beneficial:



##### 2. What is/are the insight(s) found from the chart?

Pairs of adults showed a stronger preference for city hotels over resort hotels. Interestingly, more than half of these bookings were subsequently canceled.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Analyzing the insights provided:

1. **Positive Business Impact**:
   - The insight regarding **adult pairs preferring city hotels** suggests a market segment that could be targeted more effectively. Tailoring services and marketing strategies to appeal to couples could enhance their experience and increase bookings.
   - Understanding this preference can also guide resource allocation, ensuring city hotels are well-equipped to meet the demands of this customer group.

2. **Negative Growth Potential**:
   - The fact that **more than half of these bookings were canceled** is a significant concern. It indicates a potential issue with customer satisfaction or expectations not being met.
   - This trend could lead to negative growth if not addressed. The reasons behind these cancellations need to be investigated, whether they're related to pricing, services, or other factors. Once identified, strategies must be implemented to reduce cancellations and improve customer retention.

In conclusion, while the insights have the potential to create a positive business impact by identifying a key customer segment, the high cancellation rate poses a risk to growth. Addressing the underlying causes of cancellations is crucial for turning this insight into a positive outcome.

#### Chart - 7 Type of marketing Segment and Type of distribuation Channel

In [None]:
#Plot the size of plotter
plt.figure(figsize=(15, 8))
#Sub set plot
plt.subplot(1, 2, 1)
#Code for the plot

sns.countplot(x='market_segment', data=data, palette='rocket')
plt.title('Types of market segment',fontweight="bold", size=20)

#Sub set plot

plt.subplot(1, 2, 2)

#Code for the plot

sns.countplot(data = data, x = 'distribution_channel',  palette='Set1_r')
plt.title('Types of distribution channels',fontweight="bold", size=20)

#Adjust the subplot
plt.subplots_adjust(right=1.7)

#Plot the plotter

plt.show()

##### 1. Why did you pick the specific chart?

Certainly! When considering a chart for market segments and distribution channels with categorical data, a **count plot** is an effective choice. By visualizing the frequency of each category, we can gain valuable insights into the distribution and patterns within these segments and channels

##### 2. What is/are the insight(s) found from the chart?

Certainly! Based on the data, it's evident that the **majority of distribution channels and market segments** are represented by **travel agencies**, whether they operate **offline** or **online**. This insight suggests that allocating more resources and attention to this specific segment could be a strategic move. By focusing on travel agencies, businesses can tap into a significant portion of the market and potentially enhance their overall performance.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the data analysis can indeed create a positive business impact. Focusing on the majority distribution channels and market segments, which are travel agencies (both offline and online), can lead to targeted marketing strategies, improved customer engagement, and potentially higher sales conversions. By concentrating efforts on the most prevalent channels, businesses can optimize their resources and maximize their reach within the market.

However, it’s important to consider that while these insights are valuable, they could also lead to negative growth if not managed properly. For instance, over-reliance on a single segment may result in vulnerability to market changes or disruptions in that segment. Additionally, neglecting other channels or segments could mean missing out on opportunities for diversification and growth in other areas.



#### Chart - 8 - What is the types of deposit type

In [None]:
# Chart - 8 visualization code

#Plot the size of plotter

plt.figure(figsize=(12, 6))
#Code for the plot

sns.countplot(data = data, x = 'deposit_type',hue='hotel', palette='cool')

plt.title('Types of Deposit type',fontweight="bold", size=20)

#Plot the plotter


plt.show()

##### 1. Why did you pick the specific chart?

Certainly! When considering a chart for deposit type  and distribution channels



with categorical data, a **count plot** is an effective choice. By visualizing the frequency of each category, we can gain valuable insights into the distribution and patterns within deposit type

##### 2. What is/are the insight(s) found from the chart?


**City hotels** did not require any deposits, whereas **resorts** had some deposits. This lack of deposit for city hotels could potentially lead to **booking cancellations**. Without a deposit, guests may be less committed to their reservations, and the hotel might face higher instances of no-shows or last-minute cancellations. To mitigate this risk, city hotels should consider implementing deposit policies or alternative methods to secure bookings effectively.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

No Deposit for City Hotels vs. Deposits for Resorts:
Positive Impact: City hotels not requiring deposits may attract more bookings initially due to the lack of financial commitment. This could lead to higher occupancy rates.
Negative Growth Risk: Without deposits, city hotels face a higher risk of booking cancellations, no-shows, or last-minute changes. This can impact revenue and operational efficiency.

#### Chart - 9 -Reapted Guest Conclusion

In [None]:
#Plot the size of plotter
plt.figure(figsize=(8,6))

#Code for the plot

sns.countplot(data = data, x = 'is_repeated_guest',palette='husl').set_title('Graph showing whether guest is repeated guest', fontsize = 20)


#Plot the plotter

plt.show()

##### 1. Why did you pick the specific chart?

Certainly! When considering a chart for repeated guest   and distribution channels



with categorical data, a **count plot** is an effective choice. By visualizing the frequency of each category, we can gain valuable insights into the distribution and patterns within deposit type


##### 2. What is/are the insight(s) found from the chart?

**The data indicates a low frequency of repeat guests.** It highlights the importance of strategizing to engage these guests, as they have previously stayed and are familiar with the service offerings. Implementing targeted initiatives to encourage repeat visits can capitalize on their prior experience and potentially increase guest loyalty.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gathered can indeed contribute to a positive business impact, provided they are leveraged appropriately:

1. **Low Frequency of Repeat Guests:**
   - **Positive Impact:** Recognizing the low number of repeat guests presents an opportunity to develop targeted marketing strategies aimed at increasing customer retention. By focusing on repeat guests, businesses can enhance loyalty and secure a more stable revenue stream.
   - **Negative Growth Risk:** If the low frequency of repeat guests is not addressed, it could indicate a lack of guest satisfaction or engagement, potentially leading to a decline in brand reputation and customer base.

2. **Need to Target Repeat Guests:**
   - **Positive Impact:** Since repeat guests have previously booked, they are likely to have a higher lifetime value. By targeting them, businesses can improve the efficiency of their marketing efforts and increase the likelihood of repeat bookings.
   - **Negative Growth Risk:** Over-focusing on repeat guests at the expense of acquiring new customers could limit market expansion and lead to a plateau in growth. It's important to balance retention strategies with efforts to attract new guests.



#### Chart - 10 - ADR comparision over the year Beetween resort hotel and City Hotel

In [None]:
# Chart - 10 visualization code
#Set plot size
plt.figure(figsize=(18,8))
#Code for the visualization
sns.lineplot(x='arrival_date_month', y='adr', hue='hotel',markers=True, data= data)
#Plot the  plotter
plt.show()

##### 1. Why did you pick the specific chart?

Certainly! Here's a revised version of your sentence:

We need to analyze the trends in ADR (Average Daily Rate) over different months. For this purpose, a line chart is the most suitable visualization method to observe the trend over the specified time period.

##### 2. What is/are the insight(s) found from the chart?


1. **Resort Hotel ADR Trends:**
   - During the months of **July, August, and September**, the **average daily rate (ADR)** for resort hotels is **higher**. This suggests that these months are peak seasons or have increased demand, leading to elevated prices.
   - Conversely, for **city hotels**, the ADR is **slightly higher** during **March, April, and May**. These months may correspond to specific events, holidays, or favorable weather conditions in city destinations.

2. **Low ADR in November and January for Resort Hotels:**
   - In contrast, the ADR for resort hotels is **lower** in the months of **November and January**. These periods might be off-peak seasons, resulting in more competitive pricing or discounts to attract guests.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


1. **High ADR During Peak Seasons:**
   - **Positive Impact:** Understanding that the ADR is higher during peak seasons (July, August, September for resorts, and March, April, May for city hotels) allows for strategic pricing and revenue management. This can lead to maximized profits during times of high demand.
   - **Negative Growth Risk:** If the increased ADR during these months is not accompanied by adequate service or value, it could lead to customer dissatisfaction and negative reviews, which can harm the hotel's reputation and future business.

2. **Low ADR During Off-Peak Seasons:**
   - **Positive Impact:** The lower ADR in November and January for resort hotels indicates an opportunity to attract budget-conscious travelers. Special promotions or packages during these months can increase occupancy and offset the lower rates.
   - **Negative Growth Risk:** If the low ADR is perceived as a reflection of poor quality or service, it could deter guests. Additionally, if the reduced rates are not managed properly, they could lead to decreased overall revenue.

In conclusion, while the insights can help in creating a positive business impact by informing pricing strategies, they also carry risks if not implemented thoughtfully. It's crucial to balance pricing with value offered and to manage perceptions to ensure that any pricing strategy contributes to long-term growth and customer satisfaction.

#### Chart - 11 - What is the Booking Cancelation over the year

In [None]:
df3 = data[data['is_canceled']==1]

plt.figure(figsize=(15,7))
sns.set_style('whitegrid')
plt.rc('font',size=15)
sns.countplot(x=df3['arrival_date_year'],hue=df3['hotel'])
plt.legend()
plt.title('Year wise Booking Cancelation',fontsize=20)
plt.xlabel('year',fontsize=20)
plt.ylabel('No. of bookings Cancelation',fontsize=20)
plt.show()

##### 1. Why did you pick the specific chart?

We need to  plot the year wise cancealtions count so  bar chart is the right chart to  plot the count of cancelattions booking

##### 2. What is/are the insight(s) found from the chart?

Based on our chat, it seems that there were high cancellation rates in both the City Hotel and Resort Hotel in the year 2016. Conversely, the lowest cancellation rates were observed in 2015. Therefore, we should focus on understanding the reasons behind the high cancellation rates in 2016.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Certainly! Let's evaluate the insights gained from our analysis and their potential impact on the hotel business:

1. **Positive Business Impact**:
   - **Understanding Cancellation Trends**: The insights regarding cancellation rates in 2015 and 2016 can significantly benefit hotel management. By understanding these patterns, hotels can make informed decisions to optimize their operations.
   - **Tailoring Strategies**: Armed with this information, hotels can tailor their pricing strategies, booking policies, and marketing efforts. For instance:
     - **Dynamic Pricing**: Adjusting room rates based on demand can help maximize revenue. If high cancellation rates correlate with high prices, hotels can consider offering discounts during off-peak periods.
     - **Booking Policies**: Implementing flexible booking policies (e.g., free cancellation up to a certain date) can attract more guests and reduce cancellations.
     - **Marketing Campaigns**: Targeted marketing campaigns can address specific customer segments and encourage direct bookings, reducing reliance on offline travel agents.

2. **Potential Negative Impact**:
   - **Revenue Loss**: High cancellation rates directly impact revenue. Empty rooms due to cancellations mean lost income. Hotels need to strike a balance between flexibility (to attract bookings) and minimizing cancellations.
   - **Operational Challenges**: Managing cancellations requires resources (staff time, system updates, etc.). Frequent cancellations can strain hotel operations.
   - **Guest Experience**: Frequent cancellations affect guest experience. Guests may perceive instability if they see many canceled bookings.

3. **Justification**:
   - **Negative Growth**: The high cancellation rates in 2016 could lead to negative growth if not addressed. Revenue loss and operational challenges are clear risks.
   - **Mitigation Strategies**: To mitigate these negative impacts:
     - **Data-Driven Decisions**: Hotels should use data to refine their strategies.
     - **Customer Communication**: Clear communication about booking policies can manage guest expectations.
     - **Overbooking**: Implementing controlled overbooking (within reasonable limits) can compensate for cancellations.


---


#### Chart - 12

In [None]:
data_resort = data[(data['hotel'] == 'Resort Hotel') & (data['is_canceled'] == 0)]

resort_hotel = data_resort.groupby(['arrival_date_month'])['adr'].mean().reset_index()

months_order = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]

resort_hotel['arrival_date_month'] = pd.Categorical(resort_hotel['arrival_date_month'], categories=months_order, ordered=True)

df_sorted_resort = resort_hotel.sort_values(by='arrival_date_month')

df_sorted_resort

In [None]:
# Chart - 12 visualization code

#Plot the size of plotter
plt.figure(figsize=(18, 8))
#Code for the visualization
sns.lineplot(x='arrival_date_month', y='adr',  markers=True, data= df_sorted_resort)


#Plot the plot
plt.show()

##### 1. Why did you pick the specific chart?

 a line chart provides a visual representation of how ADR varies over time. It can help hotel management make informed decisions based on these trends.



##### 2. What is/are the insight(s) found from the chart?

August: The highest Average Daily Rate (ADR) occurs in August. This could be due to peak tourist seasons, special events, or other factors that drive up room rates during this month.
January to March: Conversely, the ADR is lowest during the first quarter of the year (January to March). Possible reasons for this include off-peak travel periods, fewer bookings, or promotional pricing.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Certainly! Let's summarize the insights from the line chart:

1. **ADR Trends by Month**:
   - **August**: The highest Average Daily Rate (ADR) occurs in August. This could be due to peak tourist seasons, special events, or other factors that drive up room rates during this month.
   - **January to March**: Conversely, the ADR is lowest during the first quarter of the year (January to March). Possible reasons for this include off-peak travel periods, fewer bookings, or promotional pricing.

2. **Business Implications**:
   - **August Optimization**: Hotels can capitalize on the high ADR in August by offering attractive packages, enhancing guest experiences, and promoting exclusive amenities.
   - **First Quarter Strategies**: During the low ADR months (January to March), hotels can consider:
     - **Promotions**: Offering discounts or bundled deals to attract guests.
     - **Events**: Hosting events or conferences to boost occupancy.
     - **Marketing**: Targeted marketing campaigns to increase bookings.


---


#### Chart - 13 - Adr over the moths

In [None]:
data_city = data[(data['hotel'] == 'City Hotel') & (data['is_canceled'] == 0)]

city_hotel = data_city.groupby(['arrival_date_month'])['adr'].mean().reset_index()

months_order = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]

city_hotel['arrival_date_month'] = pd.Categorical(city_hotel['arrival_date_month'], categories=months_order, ordered=True)

df_sorted_city = city_hotel.sort_values(by='arrival_date_month')

df_sorted_city



In [None]:
# Chart - 13 visualization code

#Plot the size of plotter
plt.figure(figsize=(20, 8))
#Code for the visualization
sns.lineplot(x='arrival_date_month', y='adr', markers=True, data= df_sorted_city)


#Plot the plot
plt.show()


##### 1. Why did you pick the specific chart?

 a line chart provides a visual representation of how ADR varies over time. It can help hotel management make informed decisions based on these trends.


##### 2. What is/are the insight(s) found from the chart?

In the first quarter of the year, we have low ADR. From May to August, we experience high ADR, ranging from 115 to 120.

---



---



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Certainly! Let's evaluate the insights and their potential impact on the hotel business:

1. **Positive Business Impact**:
   - **High ADR in May to August**: The insight that ADR is high from May to August presents a positive business opportunity. Hotels can leverage this by:
     - Offering premium services during peak tourist seasons.
     - Promoting special events or packages to attract guests.
     - Maximizing revenue through strategic pricing.

2. **Negative Growth Considerations**:
   - **Low ADR in the First Quarter**: The low ADR during the first quarter (January to March) could lead to negative growth. Here's why:
     - **Revenue Impact**: Lower ADR means reduced revenue per room. Hotels need to find ways to offset this.
     - **Operational Challenges**: Managing operations during off-peak periods can be difficult. Fixed costs (staffing, maintenance) remain constant.
     - **Guest Perception**: Frequent discounts or low rates might affect guest perception of the hotel's quality.



---


#### Chart - 14 - Correlation Heatmap

In [None]:

for col in data.columns:
    if data[col].dtype == "object":
        data[col] = data[col].astype("category").cat.codes

plt.figure(figsize=(28, 28))  # Adjust the figure size as needed
sns.heatmap(data.corr(), cmap="YlGnBu", annot=True)
plt.show()


##### 1. Why did you pick the specific chart?

A heatmap is a graphical representation that uses color intensity to display the correlation matrix of a set of variables.
It provides a visual summary of how strongly different variables are related to each other.
The color scale (usually ranging from cool to warm colors) indicates the strength of correlation: darker colors represent stronger positive or negative correlations.

##### 2. What is/are the insight(s) found from the chart?


1. **Positive Correlation Between Arrival Date Year and Reservation Date**:
   - The arrival date year and reservation date exhibit a strong positive correlation. This suggests that as the arrival date year increases, so does the reservation date. In other words, bookings tend to be made well in advance for future years.
   - Business Implication: Hotels can use this insight to optimize their booking processes, allocate resources efficiently, and plan for peak seasons.

2. **Negative Correlation Between Cancelled Status and Reservation Status**:
   - The negative correlation between cancelled status and reservation status indicates an interesting relationship. When reservations are canceled, the reservation status naturally changes.
   - Business Implication: Hotels should closely monitor cancellation rates and implement strategies to reduce them. Improving guest satisfaction, offering flexible cancellation policies, and managing overbooking can help mitigate cancellations.




#### Chart - 15 - Pair Plot

In [None]:
sns.pairplot(data,  hue= "hotel", palette= "rocket")

##### 1. Why did you pick the specific chart?

Multivariate Analysis: Pair plots provide a snapshot of how each variable in a dataset is related to every other variable. This is particularly useful for hotel booking data, which often includes numerous variables like stay duration, number of guests, room type, and booking rates.

Identifying Correlations: They help in identifying potential correlations or patterns between different variables, such as the relationship between the length of stay and the average daily rate (ADR). This can be crucial for understanding factors that influence hotel bookings.

Spotting Trends and Outliers: Pair plots can reveal trends and outliers across different combinations of variables. For instance, you might spot unusual booking patterns or rare combinations of features that could signify special cases in the data.

##### 2. What is/are the insight(s) found from the chart?

Correlation Patterns: Strong correlations between variables such as room rates and customer ratings, indicating that higher-rated rooms tend to be more expensive.

Booking Trends: Seasonal trends in bookings, with certain times of the year showing higher occupancy rates, which could be useful for pricing and marketing strategies.

Customer Segmentation: Clusters of data points that suggest different customer segments, such as business travelers vs. leisure travelers, based on booking patterns and preferences.

Outliers: Unusual data points that stand out from the rest, which could indicate special events or errors in the data.

Variable Distributions: The distribution of individual variables, like the length of stay, which might show that most bookings are for short durations.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Certainly! Let's address the business objectives derived from the insights and provide solutions for each:

1. **Revenue Optimization**:
   - **Objective**: Maximize revenue by strategically adjusting prices to match travel demand, outperform the competition, increase profits, and improve hotel performance year-over-year².
   - **Solutions**:
     - **Dynamic Pricing**: Use revenue management software to optimize room rates based on demand and market conditions.
     - **Customized Rates**: Create customized product rates and rules that align with your hotel's value proposition¹.
     - **Promotions**: Offer special packages during peak seasons (May to August) to attract guests and maximize revenue³.

2. **Cancellation Management**:
   - **Objective**: Minimize cancellations to prevent revenue loss.
   - **Solutions**:
     - **Advance Payments**: Take advance payments during reservations to discourage cancellations.
     - **Strict Refund Policies**: Implement no-refund or strict refund policies to incentivize guests to keep their bookings.
     - **Monitoring**: Continuously monitor cancellation rates and adjust strategies as needed¹.

3. **Guest Experience Enhancement**:
   - **Objective**: Improve guest satisfaction and encourage repeat visits.
   - **Solutions**:
     - **Targeted Advertising**: Tailor advertising to attract couples and business travelers, emphasizing amenities and experiences relevant to their needs.
     - **Loyalty Programs**: Offer special benefits (discounts, upgrades, loyalty points) to returning guests.
     - **Personalization**: Use guest data to personalize interactions and enhance their overall experience⁴.

4. **Operational Efficiency**:
   - **Objective**: Optimize resource allocation and operational processes.
   - **Solutions**:
     - **Staffing**: Plan staffing based on booking trends, ensuring adequate coverage during peak and off-peak periods.
     - **Maintenance**: Efficiently manage maintenance tasks during low-demand periods.
     - **Cross-Department Collaboration**: Foster collaboration between departments (e.g., sales, marketing, operations) to streamline processes and improve efficiency¹.

By implementing these solutions, hotels can achieve positive growth, enhance guest satisfaction, and operate more efficiently.

---


# **Conclusion**

Certainly! Let's rephrase the insights based on the information provided:

1. **Resort Hotels**:
   - **Marketing Strategy**: Resort hotels should focus on improving their marketing efforts. Promoting the hotels, especially through social media, can attract more guests.
   - **Price Optimization**: Consider reducing prices strategically to increase booking percentages. Competitive pricing can encourage more reservations.

2. **Peak Months (May-August)**:
   - **Business Opportunity**: May to August is the busiest period. Hotels should actively target customers during these months to maximize revenue.
   - **Seasonal Promotions**: Special offers, events, and packages can attract guests during peak seasons.

3. **City Hotels**:
   - **Booking vs. Cancellation**: Although city hotels have more bookings, they also face higher cancellation rates.
   - **Mitigation Strategies**:
     - **Advance Payments**: Taking advance payments during reservations can reduce cancellations.
     - **Strict Refund Policies**: Implementing no-refund or strict refund policies can discourage cancellations.

4. **Guest Demographics**:
   - **Couples and Business Travelers**: Most guests travel in pairs, and bringing children or babies is rare. Hotels can tailor advertising to attract couples and business travelers.
   - **Repeat Guests**: While most guests do not return, hotels can create targeted advertisements to encourage return visits. Offering special benefits to returning guests can enhance loyalty.

In summary, a combination of effective marketing, pricing strategies, and guest-focused policies can positively impact hotel bookings and guest satisfaction.

---


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***