<a href="https://colab.research.google.com/github/gunagreeshma/Hotel-booking/blob/main/Hotel_booking_analysis_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

Write the summary here within 500-600 words.

This project focuses on delving into a comprehensive hotel booking dataset encompassing both city and resort accommodations. The primary objective is to extract meaningful insights and discern influential factors governing hotel bookings. The analysis will address crucial questions such as the optimal timing for booking, investigating if certain periods yield lower prices or increased availability for both types of hotels. Additionally, the study aims to uncover patterns in the length of stay and identify whether there are specific durations associated with favorable daily rates. A key aspect involves predicting the likelihood of hotels receiving a higher number of special requests, with an exploration of contributing factors. The dataset includes variables such as booking dates, guest demographics, and parking availability, enabling an examination of how these elements interplay with booking patterns. Through thorough data exploration, descriptive statistics, visualizations, and potential machine learning models, the project seeks to deliver actionable insights for hotel management. The anticipated outcomes include a detailed data analysis report, visually informative representations, and, if applicable, a predictive model for special requests. Ultimately, the significance lies in providing practical recommendations that can optimize booking strategies and enhance overall customer satisfaction within the hotel industry.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

Have you ever wondered when the best time of year to book a hotel room is? Or the optimal length of stay in order to get the best daily rate? What if you wanted to predict whether or not a hotel was likely to receive a disproportionately high number of special requests? This hotel booking dataset can help you explore those questions! This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things. All personally identifying information has been removed from the data. Explore and analyse the data to discover important factors that govern the bookings.

#### **Define Your Business Objective?**

The business objective of this project is to leverage data-driven insights to optimize hotel booking strategies, enhance customer experience, and improve overall business outcomes for both city and resort accommodations. By analyzing the provided hotel booking dataset, the aim is to answer key questions related to optimal booking timing, length of stay dynamics, prediction of special requests, and understanding the impact of variables such as guest demographics and parking availability. The project seeks to offer actionable recommendations to hotel management, empowering them to make informed decisions that contribute to increased efficiency, revenue, and customer satisfaction. Through a combination of data exploration, descriptive analytics, visualizations, and potentially predictive modeling, the ultimate goal is to provide valuable insights that can be translated into practical strategies for the hospitality industry.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
pip install pandas #install the pandas library

In [None]:
# Import Libraries
import pandas as pd #Pandas allows for quick exploration of data.
import numpy as np #useful for performing operations on entire columns or rows of a dataset without the need for explicit loops.

### Dataset Loading

In [None]:
# Mounting Google Drive to access files in Colab
from google.colab import drive
drive.mount('/content/drive')


In [None]:
# Load Dataset
filepath= "/content/Hotel Bookings (1).csv"
auto_df=pd.read_csv(filepath) # Reads a comma-separated values (CSV) file into a DataFrame.

### Dataset First View

In [None]:
# Dataset First Look
auto_df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
# Display the shape of the dataset (number of rows and columns)
auto_df_shape = auto_df.shape
print(f"Dataset shape: {auto_df_shape}")

### Dataset Information

In [None]:
# Dataset Info
auto_df.info()

In [None]:
# Display the first n rows of the DataFrame (default n=5)
auto_df.head()


In [None]:
# Display the last  n rows of the DataFrame (default n=5)
auto_df.tail()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
# Use the duplicated() method to identify duplicate rows
duplicates = auto_df[auto_df.duplicated(keep='first')]

# Count the duplicate rows
duplicate_count = duplicates.shape[0]
duplicate_count

In [None]:
# Remove duplicate rows from the dataset
auto_df_cleaned = auto_df.drop_duplicates(keep='first')

# Confirming that duplicates are removed
cleaned_duplicate_count = auto_df_cleaned[auto_df_cleaned.duplicated(keep='first')].shape[0]

# Print the count of duplicates before and after removal
print(f"Duplicate rows before removal: {duplicate_count}")
print(f"Duplicate rows after removal: {cleaned_duplicate_count}")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
# Use the isna() or isnull() method to identify missing values
missing_values = auto_df_cleaned.isna()

# Count the missing values in each column
missing_count = missing_values.sum()


In [None]:
# Visualizing the missing values
missing_count

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns


# Convert the DataFrame to a Series
missing_count_series = missing_count.squeeze()

# Setting up the plot size
plt.figure(figsize=(12, 8))

# Creating a heatmap for missing values
sns.heatmap(missing_count_series.to_frame(), cmap='viridis', cbar=False, annot=True, fmt='g')

# Adding a title to the plot
plt.title('Missing Values Heatmap')

# Displaying the plot
plt.show()

### What did you know about your dataset?


In the provided dataset with 119,390 rows and 32 columns, several columns exhibit missing values. Specifically, the 'company' column has 82,137 missing values, 'agent' has 12,193 missing values, 'country' has 452 missing values, and 'children' has 4 missing values. Additionally, the dataset contains 31,994 duplicate values.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
columns = auto_df_cleaned.columns # Get the column names of the DataFrame 'auto_df' and store them in the 'columns' variable
columns


In [None]:

# Dataset Describe
# Calculate descriptive statistics for the DataFrame 'auto_df' using the describe() method
description = auto_df_cleaned.describe()

# Display the summary statistics, including count, mean, std, min, 25%, 50%, 75%, and max
print(description)

### Variables Description

The columns of the dataframe and the data they represent are listed below:

hotel : Name of the hotel namely - Resort Hotel and City Hotel

is_canceled : If the booking was canceled (1) or not (0)

lead_time : Number of days before the actual arrival of the guests

arrival_date_year : Year of arrival date

arrival_date_month : Month of arrival date

arrival_date_week_number : Week number of the year for arrival date

arrival_date_day_of_month : Day of arrival date

stays_in_weekend_nights : Number of weekend nights (Saturday or Sunday) spent at the hotel by the guests.

stays_in_week_nights : Number of weeknights (Monday to Friday) spent at the hotel by the guests.

adults : Number of adults among the guests

children : Number of children accompanying the adults

babies : Number of babies accompanying the adults

meal : Type of meal booked

country : Country of origin of the guests

market_segment : Designation of market segment

distribution_channel : Name of booking distribution channel

is_repeated_guest : If the booking was from a repeated guest (1) or not (0)

previous_cancellations : Number of previous bookings that were cancelled by the customer prior to the current booking

previous_bookings_not_canceled : Number of previous bookings not cancelled by the customer prior to the current booking

reserved_room_type : Code of room type reserved

assigned_room_type : Code of room type assigned

booking_changes : Number of changes made to the booking

deposit_type : Type of the deposit made by the guest

agent : ID of travel agent who made the booking

company : ID of the company that made the booking

days_in_waiting_list : Number of days the booking was in the waiting list

customer_type : Type of customer, assuming one of four categories

adr : Average Daily Rate, as defined by dividing the sum of all lodging transactions by the total number of staying nights

required_car_parking_spaces : Number of car parking spaces required by the customer

total_of_special_requests : Number of special requests made by the customer

reservation_status : Reservation status (Cancelled, Check-Out or No-Show)       



### Check Unique Values for each variable.

In [None]:

# Check Unique Values for each variable.
# Use the nunique() method to count unique values for each column
unique_values = auto_df_cleaned.nunique()
unique_values

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Visualizing the missing values
missing_count

In [None]:
# Handling missing values
auto_df_cleaned = auto_df_cleaned.copy()
# Handling missing values in 'children' column (numerical)
auto_df_cleaned['children'].fillna(auto_df_cleaned['children'].mean(), inplace=True)





In [None]:
# Check missing values in 'children' column
children_missing_count = auto_df_cleaned['children'].isnull().sum()

print(f"Missing values in 'children' column: {children_missing_count}")

In [None]:

# Handling missing values in 'company' and 'agent' columns by filling with 0
auto_df_cleaned[['company', 'agent']] = auto_df_cleaned[['company', 'agent']].fillna(0)


In [None]:
# Check missing values in 'Company','agent' column
company_missing_count = auto_df_cleaned['company'].isnull().sum()

#Display the missing values for  column company
print(f"Missing values in 'company'  column: {company_missing_count}")

In [None]:

# Check missing values in 'Company','agent' column
agent_missing_count = auto_df_cleaned['agent'].isnull().sum()

#Display the missing values for the column agent
print(f"Missing values in 'agent'  column: {agent_missing_count}")

In [None]:
# Check missing values in 'country'  column
auto_df_cleaned['country'].fillna('others', inplace=True)


In [None]:

#Display the missing values for the column country
missing_country_count = auto_df_cleaned['country'].isnull().sum()
print(f"Number of missing values in 'country' column: {missing_country_count}")


In [None]:
# Check missing values count for each column
missing_values_count = auto_df_cleaned.isnull().sum()
print("Missing values count for each column:")
print(missing_values_count)


In [None]:
# Visualizing the distribution of Average Daily Rate (ADR) using a boxplot
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10, 6))
sns.boxplot(x=auto_df_cleaned['adr'], color='lightblue')
plt.title('Distribution of Average Daily Rate (ADR)')
plt.xlabel('Average Daily Rate (ADR)')
plt.show()

In [None]:
# Assuming auto_df_cleaned is your DataFrame
# Calculate Q1, Q3, and IQR for the 'adr' column
Q1 = auto_df_cleaned['adr'].quantile(0.25)
Q3 = auto_df_cleaned['adr'].quantile(0.75)
IQR = Q3 - Q1

# Define the upper and lower bounds
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Create a mask for outliers based on the IQR method
outliers_mask = (auto_df_cleaned['adr'] < lower_bound) | (auto_df_cleaned['adr'] > upper_bound)

# Filter out rows with outliers
filtered_df_no_outliers = auto_df_cleaned[~outliers_mask]

# Visualize the box plot after removing outliers
plt.figure(figsize=(12, 8))
sns.boxplot(x='adr', data=filtered_df_no_outliers, color='lightblue')
plt.title('Distribution of ADR (After Removing Outliers)', fontsize=16)
plt.xlabel('ADR', fontsize=14)
plt.show()






### What all manipulations have you done and insights you found?

In the preprocessing of the dataset, missing values in the 'agent,' 'country,' and 'children' columns were addressed using the fillna method. Additionally, outliers were identified and removed from the 'adr' column using a box plot visualization. These manipulations enhance the dataset's cleanliness and reliability for subsequent analyses. The insights gained include a more complete dataset with reduced missing values and a refined understanding of the distribution of average daily rates ('adr') after outlier removal. These steps contribute to improved data quality, ensuring more accurate and meaningful insights in the context of hotel bookings.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

Univariate Analysis

#### Chart - 1

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Setting up the plot size
plt.figure(figsize=(8, 5))

# Creating a histogram with a kernel density estimate (KDE) for lead_time
sns.histplot(auto_df_cleaned['lead_time'], bins=30, kde=True)

# Adding a title to the plot
plt.title('Distribution of Lead Time')

# Labeling the x-axis
plt.xlabel('Lead Time')

# Labeling the y-axis
plt.ylabel('Frequency')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

I chose a histogram with a kernel density estimate (KDE) because it effectively illustrates the distribution of lead times, providing a clear visualization of the frequency of different lead time intervals.

##### 2. What is/are the insight(s) found from the chart?

The histogram indicates a right-skewed distribution, with a peak frequency in the 0 to 100 days range. This suggests that a substantial number of bookings are made with relatively short lead times, potentially highlighting a trend of last-minute reservations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insight into a concentration of bookings with short lead times could positively impact business responsiveness to last-minute demand. However, it may also pose challenges for resource planning and demand forecasting, potentially leading to negative growth if operational strategies are not adjusted to efficiently accommodate short-term reservations. Balancing operational flexibility with resource optimization will be crucial for managing the potential impact on business growth.

#### Chart - 2

In [None]:

# Setting up the plot size
plt.figure(figsize=(8, 8))

# Creating a pie chart for the distribution of hotel types
auto_df_cleaned['hotel'].value_counts().plot.pie(autopct='%1.1f%%', startangle=90, colors=['skyblue', 'lightcoral'])

# Adding a title to the plot
plt.title('Distribution of Hotel Types')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

I selected a pie chart for its ability to effectively illustrate the distribution of hotel types in a visually intuitive manner, making it easy to compare the proportions of City Hotel and Resort Hotel.

##### 2. What is/are the insight(s) found from the chart?

The pie chart indicates that City Hotel constitutes a majority at 61.1%, while Resort Hotel accounts for 38.9% of the total distribution. This insight provides a clear understanding of the relative prevalence of each hotel type in the dataset

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can positively impact business planning by informing strategic decisions based on the dominant presence of City Hotel. However, reliance on a single hotel type might pose a risk during market shifts or changes in customer preferences, potentially leading to negative growth if not diversified. A diversified approach considering market dynamics could mitigate such risks and foster positive business impact over the long term.

#### Chart - 3

In [None]:
# Setting up the plot size
plt.figure(figsize=(12, 6))

# Creating a count plot for the number of bookings by month
sns.countplot(x='arrival_date_month', data=auto_df_cleaned, order=['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])

# Adding a title to the plot
plt.title('Number of Bookings by Month')

# Labeling the x-axis
plt.xlabel('Month')

# Labeling the y-axis
plt.ylabel('Number of Bookings')

# Displaying the plot
plt.show()

What is the count distribution of different customer types

I opted for a count plot for its ability to showcase the distribution of bookings across months, providing a clear visual representation of the variations in booking volumes throughout the year.

##### 2. What is/are the insight(s) found from the chart?

The count plot reveals that August has the highest number of bookings, followed by July. This insight suggests a peak season during these summer months, potentially indicating a period of increased demand for hotel bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights into peak booking months, particularly in August and July, can positively impact business planning by allowing for optimized resource allocation and tailored marketing strategies during high-demand periods. However, if not managed efficiently, increased demand might lead to operational challenges, potentially resulting in negative growth due to overwhelmed resources and potential customer dissatisfaction. Strategic planning and operational preparedness are crucial to capitalize on the positive impact of heightened demand.






#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Setting up the plot size
plt.figure(figsize=(8, 5))

# Creating a count plot for the customer types
sns.countplot(x='customer_type', data=auto_df_cleaned)

# Adding a title to the plot
plt.title('Count of Customer Types')

# Labeling the x-axis
plt.xlabel('Customer Type')

# Labeling the y-axis
plt.ylabel('Count')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

I chose a count plot to visually represent the distribution of customer types, as it provides a straightforward and effective way to compare the counts of different customer categories.

##### 2. What is/are the insight(s) found from the chart?

The count plot reveals that the 'Transient' customer type is predominant, with 'Transient-Party' being the second-highest category. This insight suggests a significant proportion of short-term and individual bookings compared to group or long-term stays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can positively impact business strategies by tailoring services and marketing efforts to cater to the predominant 'Transient' customer type. However, relying heavily on short-term and individual bookings might pose challenges during periods of low demand, potentially leading to negative growth if the business is not diversified to accommodate varying customer preferences and stay durations. A balanced approach that considers diverse customer segments could mitigate potential risks and foster positive business impact over the long term.

#### Chart - 5

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Grouping by 'distribution_channel' in your DataFrame ('auto_df_cleaned')
group_by_dc = auto_df_cleaned.groupby('distribution_channel')

# Calculating the percentage of bookings for each distribution channel
d1 = pd.DataFrame(round((group_by_dc.size() / auto_df_cleaned.shape[0]) * 100, 2)).reset_index().rename(columns={'distribution_channel': 'Booking_Channel', 0: 'Booking_Percentage'})

# Plotting a pie chart
plt.figure(figsize=(8, 8))

# Extracting data for the pie chart
data = d1['Booking_Percentage']
labels = d1['Booking_Channel']

# Custom colors for the pie chart
custom_colors = ['lightblue', 'lightcoral', 'lightgreen', 'lightsalmon', 'lightseagreen']

# Creating the pie chart with explode, labels, and percentage distance customization
plt.pie(x=data, autopct="%.2f%%", explode=[0.05] * len(data), labels=labels, pctdistance=0.5, colors=custom_colors)

# Adding a title to the pie chart
plt.title("Booking Percentage by Distribution Channels", fontsize=14)

# Displaying the pie chart
plt.show()

##### 1. Why did you pick the specific chart?

I selected a pie chart to visually represent the distribution of booking channels, providing a clear and concise overview of the percentage contribution of each channel to the total bookings.

##### 2. What is/are the insight(s) found from the chart?

The pie chart indicates that 'TA/TO' (Travel Agents/Tour Operators) is the dominant booking channel, constituting 79.11% of total bookings. 'Direct' bookings follow as the second-highest channel with 14.86%, and other channels contribute to the remaining percentage. This insight highlights the significant influence of travel agents and tour operators in driving hotel bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can positively impact business strategies by emphasizing collaboration with and optimizing services for travel agents and tour operators, the primary contributors. However, overreliance on a single dominant channel may pose a risk during shifts in market dynamics or changes in customer behaviors, potentially leading to negative growth if not diversified. A diversified approach, considering various booking channels, can help mitigate risks and foster positive business impact over the long term

#### Chart - 6

In [None]:
# Setting up the plot size
plt.figure(figsize=(8, 5))

# Creating a histogram with kernel density estimation for the distribution of total special requests
sns.histplot(auto_df_cleaned['total_of_special_requests'], kde=True)

# Adding a title to the plot
plt.title('Distribution of Total Special Requests')

# Labeling the x-axis
plt.xlabel('Total Special Requests')

# Labeling the y-axis
plt.ylabel('Frequency')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

I chose a histogram with kernel density estimation to visualize the distribution of total special requests as it provides a clear representation of the frequency distribution of discrete values, allowing for insights into the concentration of special requests.

##### 2. What is/are the insight(s) found from the chart?

The histogram indicates that most bookings have a total of 0 to 2 special requests, with a decline in frequency as the number of special requests increases. The majority of customers seem to make fewer special requests, with a notable drop in frequency for higher numbers of requests.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can positively impact business operations by informing service offerings and resource allocation based on the typical distribution of total special requests. However, if the business focuses solely on accommodating higher numbers of special requests without considering the overall distribution, it may lead to operational challenges and increased costs, potentially resulting in negative growth. A balanced approach that aligns services with the most common request patterns could positively impact customer satisfaction and operational efficiency.

#### Chart - 7

In [None]:
# Setting up the plot size
plt.figure(figsize=(8, 5))

# Creating a countplot for the distribution of cancellation status
sns.countplot(x='is_canceled', data=auto_df_cleaned)

# Adding a title to the plot
plt.title('Count of Cancellation Status')

# Labeling the x-axis
plt.xlabel('Cancellation Status')

# Labeling the y-axis
plt.ylabel('Count')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

I chose a count plot to visually represent the distribution of cancellation status, providing a clear comparison of the counts for canceled (1) and not canceled (0) bookings.

##### 2. What is/are the insight(s) found from the chart?

The count plot reveals that the count for non-canceled bookings (0) is notably higher, reaching up to 60,000, compared to canceled bookings (1). This insight indicates a higher prevalence of successful bookings without cancellations.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can positively impact business planning by highlighting the higher proportion of successful bookings. However, an excessive focus solely on the count of non-canceled bookings may lead to complacency in managing and mitigating cancellations. Negative growth could occur if strategies are not in place to address and understand the factors contributing to cancellations, impacting revenue and potentially customer satisfaction. A holistic approach that considers both successful and canceled bookings is crucial for effective business impact and growth.

#### Chart - 8

In [None]:
# Chart - 9 visualization code
import matplotlib.pyplot as plt

# Count the occurrences of each meal type
meal_counts = auto_df_cleaned['meal'].value_counts()

# Create a donut chart
plt.figure(figsize=(8, 8))
plt.pie(meal_counts, labels=meal_counts.index, autopct='%1.1f%%', startangle=90, wedgeprops=dict(width=0.3), colors=sns.color_palette('Set2'))
plt.title('Distribution of Meal Types')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a donut chart to visually represent the distribution of meal types, offering a clear and concise illustration of the proportion of each meal category.

##### 2. What is/are the insight(s) found from the chart?

The donut chart indicates that the majority of bookings (77.8%) have a 'Bed & Breakfast' (BB) meal type, followed by a smaller percentage (10.8%) for 'SC' (Self-Catering). This insight provides a quick overview of the prevalent meal types in the dataset.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can positively impact business operations by informing catering and service strategies based on the prevalent meal types. However, if the business focuses solely on the most popular meal type (BB), it may neglect the potential demand for other meal options, potentially leading to negative growth if customer preferences are not adequately addressed. A balanced approach catering to a variety of meal types could positively impact customer satisfaction and overall business growth.

Chart-9

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Grouping data by hotel type and calculating the average lead time for each
average_lead_time = auto_df_cleaned.groupby('hotel')['lead_time'].mean()

# Plotting a pie chart
plt.figure(figsize=(8, 8))
plt.pie(average_lead_time, labels=average_lead_time.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette('pastel'))
plt.title('Average Lead Time by Hotel')
plt.show()


Why did you pick the specific chart?


I chose a pie chart to visually compare the average lead time between Resort Hotel and City Hotel, as it provides a clear representation of the proportion of lead time for each hotel type.

What is/are the insights found from the chart?


The pie chart reveals that Resort Hotel has a slightly higher average lead time, constituting 51.8%, compared to City Hotel with 48.2%. This insight highlights the distribution of lead time between the two hotel types

Will the gained insights help create a positive business impact? Are there any insights that lead to negative growth? Justify with a specific reason.



The gained insights can help optimize operational strategies based on lead time patterns for each hotel type. However, potential negative impacts may arise if operational planning is exclusively tailored to the average lead time without considering variations and other factors influencing bookings. A more comprehensive approach, considering factors like seasonality and customer preferences, is essential to avoid potential negative growth due to inadequate operational responsiveness. Overall, utilizing the insights thoughtfully can contribute to positive business impact by aligning services with customer booking patterns.

Chart - 10

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Assuming your DataFrame is named 'auto_df_cleaned'
# Replace 'auto_df_cleaned' with the actual variable name if it's different

# Create a new column 'revenue' by multiplying 'adr' with the total stay duration
auto_df_cleaned['revenue'] = auto_df_cleaned['adr'] * (auto_df_cleaned['stays_in_weekend_nights'] + auto_df_cleaned['stays_in_week_nights'])

# Group data by hotel type and calculate the total revenue for each
revenue_by_hotel = auto_df_cleaned.groupby('hotel')['revenue'].sum().reset_index()

# Plotting a bar plot
plt.figure(figsize=(8, 6))
sns.barplot(x='hotel', y='revenue', data=revenue_by_hotel, palette='viridis')
plt.title('Total Revenue by Hotel Type')
plt.xlabel('Hotel Type')
plt.ylabel('Total Revenue')
plt.show()

Why did you pick the specific chart?


I chose a bar plot to visually compare the total revenue between City Hotel and Resort Hotel, as it provides an effective way to illustrate the revenue difference in a clear and easily interpretable format.

What is/are the insight(s) found from the chart?


The bar plot reveals that City Hotel generates higher total revenue compared to Resort Hotel. This insight indicates a revenue disparity between the two hotel types, with City Hotel being the more profitable.

Will the gained insights help create a positive business impact? Are there any insights that lead to negative growth? Justify with a specific reason.


The gained insight into higher revenue for City Hotel can positively impact business strategies by informing resource allocation, marketing efforts, and service enhancements tailored to the more profitable hotel type. However, potential negative growth may arise if the business solely focuses on City Hotel without considering the specific strengths and market dynamics of Resort Hotel. A balanced approach that leverages the strengths of each hotel type could help mitigate potential negative impacts and contribute to overall positive business growth.

Chart-11

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Grouping data by distribution channel and calculating cancellation percentage for each channel
cancellation_percentage = auto_df_cleaned.groupby('distribution_channel')['is_canceled'].mean() * 100

# Sorting channels based on cancellation percentage in descending order
sorted_channels = cancellation_percentage.sort_values(ascending=False)

# Plotting a bar chart
plt.figure(figsize=(10, 6))
sns.barplot(x=sorted_channels.index, y=sorted_channels, palette='viridis')
plt.title('Cancellation Percentage by Distribution Channel')
plt.xlabel('Distribution Channel')
plt.ylabel('Cancellation Percentage')
plt.show()

# Displaying the distribution channel with the highest cancellation percentage
highest_cancellation_channel = sorted_channels.idxmax()
print(f"The distribution channel with the highest cancellation percentage is: {highest_cancellation_channel}")


Why did you pick the specific chart?


The bar chart was chosen to clearly visualize and compare cancellation percentages across different distribution channels. Its simplicity and effectiveness make it suitable for presenting the distribution channel with the highest cancellation percentage.

What is/are the insight(s) found from the chart?  


The chart indicates that the distribution channel labeled as "Undefined" has the highest cancellation percentage, with TA/TO (Travel Agents/Tour Operators) being the second-highest. This insight highlights the significance of these channels in terms of booking cancellations.

Will the gained insights help create a positive business impact? Are there any insights that lead to negative growth? Justify with a specific reason.


The insights can help inform business strategies by addressing the challenges associated with the "Undefined" distribution channel and TA/TO. Understanding and mitigating factors contributing to higher cancellations in these channels could lead to positive business impact, such as targeted improvements in service offerings, customer communication, or relationship management.

Chart - 12

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns


# Count the occurrences of each country
top_countries = auto_df_cleaned['country'].value_counts().head(10)

# Setting up the plot size
plt.figure(figsize=(12, 6))

# Creating a bar chart for the top countries with the highest number of guests
sns.barplot(x=top_countries.index, y=top_countries, palette='viridis')

# Adding a title to the plot
plt.title('Top 10 Countries with the Highest Number of Guests')

# Labeling the x-axis
plt.xlabel('Country')

# Labeling the y-axis
plt.ylabel('Number of Guests')

# Rotating x-axis labels for better readability
plt.xticks(rotation=45, ha='right')

# Displaying the plot
plt.show()


Why did you pick the specific chart?


The bar chart was chosen because it effectively visualizes the distribution of guests across different countries, highlighting the top 10 countries with the highest number of guests. Bar charts are particularly suitable for displaying the frequency or count of categorical data, such as the occurrences of guests from different countries.

What is/are the insight(s) found from the chart?


The insight from the chart is that Portugal (PRT) has the highest number of guests, followed by the United Kingdom (GBR). The chart provides a quick and clear comparison of the guest counts for the top 10 countries, allowing for easy identification of the most significant contributors to the hotel's guest population.

Will the gained insights help creating a positive business impact? Are there any insights that lead to negative growth? Justify with a specific reason.


The gained insights can potentially have a positive business impact. Knowing the countries with the highest number of guests allows the hotel management to tailor marketing strategies, services, and communication to better meet the preferences and needs of guests from these countries. For example, targeted advertising in regions with high guest counts can lead to increased bookings.

#### Chart - 13

Bivariate Analysis


What is the most preferred meal type among customers?

In [None]:
# Setting up the plot size
plt.figure(figsize=(12, 6))

# Grouping data by arrival date and calculating mean ADR for each month
adr_trends = auto_df_cleaned.groupby('arrival_date_month')['adr'].mean().reset_index()

# Creating a line chart for ADR trends over time
sns.lineplot(x='arrival_date_month', y='adr', data=adr_trends, marker='o', color='skyblue')

# Adding a title to the plot
plt.title('ADR Trends Over Time')

# Labeling the x-axis
plt.xlabel('Month')

# Labeling the y-axis
plt.ylabel('Average Daily Rate (ADR)')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?


I chose a line chart to visualize the Average Daily Rate (ADR) trends over time as it provides a clear representation of how ADR fluctuates across different months, enabling a straightforward analysis of seasonal variations.

##### 2. What is/are the insight(s) found from the chart?

The line chart indicates that ADR experiences notable peaks from April to August, suggesting a high-demand season during these months. Additionally, there's a slight increase in ADR from January to July, showcasing another period of elevated pricing.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can positively impact business strategies by enabling pricing optimization and resource allocation during high-demand seasons. However, if ADR is consistently high without corresponding increases in service quality or customer value, it may lead to negative growth as customers might seek more affordable alternatives. Balancing pricing strategies with customer satisfaction and value proposition is crucial to ensure positive business impact over the long term.

#### Chart - 14

In [None]:

# Setting up the plot size
plt.figure(figsize=(14, 8))

# Creating a grouped bar chart for market segment vs. ADR
sns.barplot(x='market_segment', y='adr', hue='hotel', data=auto_df_cleaned, palette='viridis')

# Adding a title to the plot
plt.title('Market Segment vs. ADR')

# Labeling the x-axis
plt.xlabel('Market Segment')

# Labeling the y-axis
plt.ylabel('ADR')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

The grouped bar chart was chosen to visually compare the average daily rate (ADR) across different market segments, with a further breakdown based on hotel types (City Hotel and Resort Hotel). This type of chart is effective for showing the relationship between categorical variables and a numerical variable, allowing for easy comparison and identification of patterns.

##### 2. What is/are the insight(s) found from the chart?

or the "Direct" market segment, the ADR is higher for City Hotel compared to Resort Hotel.
In the "Online TA" (Travel Agents) market segment, ADR is similar between City Hotel and Resort Hotel.
Overall, the ADR in the "Direct" market segment is higher for City Hotel, while the ADR in the "Online TA" market segment is more comparable between the two hotel types.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: The hotel management can use this information to tailor pricing strategies based on market segments. For the "Direct" segment, where City Hotel has a higher ADR, targeted marketing and promotional activities may be implemented to attract more direct bookings for City Hotel.

Negative Growth Consideration: If the ADR is significantly higher for City Hotel in a particular market segment, it might be essential to carefully manage pricing strategies to ensure competitiveness and avoid potential negative impacts on occupancy rates. Additionally, strategies to increase the ADR for Resort Hotel in specific market segments may be explored to balance revenue streams.

Chart - 15

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns


# Create a cross-tabulation for 'customer_type' and 'is_repeated_guest'
cross_tab = pd.crosstab(auto_df_cleaned['customer_type'], auto_df_cleaned['is_repeated_guest'])

# Plotting a stacked bar chart
cross_tab.plot(kind='bar', stacked=True, colormap='viridis', figsize=(10, 6))

# Adding a title to the plot
plt.title('Customer Type vs. Repeated Guest Status')

# Labeling the x-axis
plt.xlabel('Customer Type')

# Labeling the y-axis
plt.ylabel('Count')

# Displaying the plot
plt.show()


Why did you pick the specific chart?

The stacked bar chart was chosen as it effectively illustrates the relationship between 'Customer Type' and 'Repeated Guest Status.' This visualization allows for a clear comparison of counts between repeated and non-repeated guests within each customer type, offering a comprehensive view of the distribution.

What is/are the insight(s) found from the chart?

Transistent Customer Type Dominance: The 'Transient' customer type is prevalent across both repeated and non-repeated guest categories, indicating a significant presence of this segment.

Repetition Across Customer Types: The presence of repeated guests is observed in various customer types, with both 'Transient' and 'Transient-Party' showing considerable counts for repeated guests.

Will the gained insights help creating a positive business impact?

Positive Impact on Customized Services: Understanding the dominance of 'Transient' customers suggests an opportunity to tailor services for this significant segment, potentially leading to improved satisfaction and loyalty.

Enhanced Repeated Guest Strategies: Recognizing the presence of repeated guests across different customer types allows the business to implement targeted strategies, such as loyalty programs and personalized offerings, positively impacting customer retention and revenue.

#### Chart - 16

In [None]:
# Setting up the plot size
plt.figure(figsize=(10, 6))

# Creating a scatter plot for ADR vs. Lead Time with different colors for cancellation status
sns.scatterplot(x='lead_time', y='adr', hue='is_canceled', data=auto_df_cleaned, palette='Set1', alpha=0.7)

# Adding a title to the plot
plt.title('Scatter Plot: ADR vs. Lead Time')

# Labeling the x-axis
plt.xlabel('Lead Time')

# Labeling the y-axis
plt.ylabel('ADR')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

The scatter plot was chosen to visualize the relationship between the average daily rate (ADR) and lead time, with the additional dimension of cancellation status represented by different colors. Scatter plots are effective for understanding the distribution of two continuous variables and identifying patterns, clusters, or trends in the data.

##### 2. What is/are the insight(s) found from the chart?

For bookings with cancellation status 0 (not canceled), there is a higher concentration of data points (scatters) across various lead times and ADR values.
Bookings with cancellation status 1 (canceled) are more scattered and show less concentration compared to non-canceled bookings.
The majority of cancellations (is_canceled=1) tend to occur in a specific range of lead times and ADR values.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:

The insights can help the hotel management understand the patterns associated with cancellations. Strategies can be developed to target specific lead time ranges or ADR values where cancellations are more prevalent. For example, offering flexible cancellation policies or targeted promotions for bookings in the identified range may mitigate cancellations.

Negative Growth Consideration:

If a significant number of cancellations are concentrated in a particular lead time or ADR range, it could indicate potential issues with pricing, booking policies, or customer satisfaction. Failure to address these issues may lead to negative growth, as high cancellation rates can impact revenue and occupancy rates.

#### Chart - 17

In [None]:
group_by_dc_hotel = auto_df_cleaned.groupby(['distribution_channel', 'hotel'])
d5 = pd.DataFrame(round((group_by_dc_hotel['adr']).agg(np.mean), 2)).reset_index().rename(columns={'adr': 'avg_adr'})

plt.figure(figsize=(7, 5))
sns.barplot(x=d5['distribution_channel'], y=d5['avg_adr'], hue=d5['hotel'], palette='Set2')  # Change the palette as needed
plt.ylim(40, 140)
plt.title('Average Daily Rate (ADR) by Distribution Channel and Hotel')  # Change the title as needed
plt.xlabel('Distribution Channel')
plt.ylabel('Average Daily Rate (ADR)')
plt.show()






##### 1. Why did you pick the specific chart?

The grouped bar chart was selected to visually compare the average daily rate (ADR) across different distribution channels for both City Hotel and Resort Hotel. This type of chart is effective for displaying the relationship between categorical variables (distribution channel, hotel) and a numerical variable (ADR), allowing for easy comparison of ADR values across various channels and hotel types.

##### 2. What is/are the insight(s) found from the chart?

For City Hotel, the ADR is highest for the GDS (Global Distribution System) distribution channel, followed by the Direct channel.
For Resort Hotel, the ADR is highest for the undefined distribution channel, followed by the Direct channel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights can guide pricing and marketing strategies for different distribution channels. For instance, if GDS has a higher ADR for City Hotel, targeted marketing efforts or promotions may be implemented to attract more bookings through this channel.
Negative Growth Consideration:

If there are inconsistencies or unexpected patterns in ADR across distribution channels, it might signal potential issues with pricing strategies or customer preferences. Ignoring such insights could lead to negative growth, as misaligned pricing or marketing strategies may result in reduced bookings or revenue

#### Chart - 18

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Calculate total stay duration for each reservation
auto_df_cleaned['total_stay'] = auto_df_cleaned['stays_in_weekend_nights'] + auto_df_cleaned['stays_in_week_nights']

# Group by hotel type and sum the total stay duration
grouped_data = auto_df_cleaned.groupby(['hotel', 'reservation_status'])['total_stay'].sum().unstack()

# Plotting a stacked bar chart
fig, ax = plt.subplots(figsize=(10, 6))
grouped_data.plot(kind='bar', stacked=True, ax=ax, color=['skyblue', 'lightcoral', 'lightgreen'])

# Adding labels and title
plt.xlabel('Hotel Type')
plt.ylabel('Total Stay Duration')
plt.title('Total Stay Duration for City Hotel and Resort Hotel')
plt.legend(title='Reservation Status', bbox_to_anchor=(1.05, 1), loc='upper left')

# Display the plot
plt.show()








##### 1. Why did you pick the specific chart?

The stacked bar chart was chosen to visually represent the total stay duration for City Hotel and Resort Hotel, categorized by different reservation statuses. This type of chart is effective for illustrating the distribution of stay durations across various reservation statuses and hotel types.

##### 2. What is/are the insight(s) found from the chart?

For City Hotel, the total stay duration is higher across all reservation statuses compared to Resort Hotel.
The "Check-Out" reservation status contributes significantly to the total stay duration for both City Hotel and Resort Hotel.
The "Canceled" reservation status has a higher impact on the total stay duration for City Hotel compared to Resort Hotel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:

The insights can guide resource allocation and planning for City Hotel, especially in managing stays associated with the "Canceled" reservation status. Strategies to minimize cancellations or optimize room allocation could positively impact revenue and resource utilization.
Negative Growth Consideration:

The higher total stay duration for the "Canceled" reservation status in City Hotel may indicate potential revenue loss due to cancellations. Addressing the factors contributing to cancellations, such as flexible booking policies or targeted promotions, is crucial to mitigate negative growth.

Chart-19

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Assuming your DataFrame is named 'auto_df_cleaned'
# Replace 'auto_df_cleaned' with the actual variable name if it's different

# Count the occurrences of each country
top_countries = auto_df_cleaned['country'].value_counts().head(10)

# Calculate the average ADR for each country
avg_adr_by_country = auto_df_cleaned.groupby('country')['adr'].mean().loc[top_countries.index]

# Setting up the plot size
plt.figure(figsize=(12, 6))

# Creating a bar chart for the top countries with color-coded bars based on ADR
sns.barplot(x=top_countries.index, y=top_countries, hue=avg_adr_by_country, palette='viridis')

# Adding a title to the plot
plt.title('Top 10 Countries with the Highest Number of Guests and Average Daily Rate (ADR)')

# Labeling the x-axis
plt.xlabel('Country')

# Labeling the y-axis
plt.ylabel('Number of Guests')

# Rotating x-axis labels for better readability
plt.xticks(rotation=45, ha='right')

# Adding a legend for the ADR color scale
plt.legend(title='Average Daily Rate (ADR)')

# Displaying the plot
plt.show()


Why did you pick the specific chart?


The bar chart with color-coded bars was chosen to simultaneously display both the number of guests and the average daily rate (ADR) for the top 10 countries. This type of chart provides a clear visual representation of two related metrics for each country, making it easy to compare the number of guests and ADR across different countries.

What is/are the insight(s) found from the chart?

Portugal (PRT) has the highest ADR among the top 10 countries, indicating that guests from Portugal tend to pay more on average for their stay.
While the number of guests from Portugal is not the highest, the combination of a relatively high number of guests and a high ADR contributes to Portugal having a significant impact on overall revenue.
The United Kingdom (GBR) follows Portugal with a substantial number of guests and a relatively high ADR.

Will the gained insights help create a positive business impact? Are there any insights that lead to negative growth? Justify with a specific reason.


Positive Impact:

The insights can guide targeted marketing efforts and pricing strategies for countries with high ADR. For example, implementing promotional campaigns or loyalty programs for guests from Portugal could further boost revenue.

Negative Growth Consideration:

While high ADR is positive for revenue, if the strategies to attract guests from countries with lower ADR are neglected, it may lead to missed opportunities for increasing occupancy and overall revenue. Balancing strategies to cater to diverse guest segments is essential to prevent negative growth

Multi variate analysis

#### Chart - 20  - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
correlation_matrix = auto_df_cleaned.corr()
plt.figure(figsize=(15, 10))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap')
plt.show()



##### 1. Why did you pick the specific chart?

The correlation heatmap was selected to visually represent the correlation coefficients between different variables in the dataset. This type of chart is effective in identifying relationships and dependencies between variables, helping to understand how changes in one variable may be associated with changes in another.

##### 2. What is/are the insight(s) found from the chart?

The correlation heatmap reveals valuable insights into the relationships between various variables in the dataset. Notably, a perfect positive correlation along the diagonal indicates that each variable is perfectly correlated with itself, as expected. Additionally, the moderate negative correlation of -0.52 between 'arrival_date_year' and 'arrival_date_week_number' suggests a seasonal pattern where the week number within a year tends to decrease as the years progress. These insights can guide strategic decision-making for the hotel. For positive correlations, such as between 'stays_in_weekend_nights' and 'stays_in_week_nights,' the hotel could leverage these patterns to tailor promotions or packages that encourage longer stays on both weekends and weekdays, potentially boosting revenue. On the other hand, the negative correlation underscores the importance of considering temporal trends, enabling the hotel to adapt its operations to seasonal variations and optimize marketing strategies for specific times of the year. Overall, these insights empower the hotel to make informed decisions that enhance positive correlations and address challenges indicated by negative correlations, fostering a more resilient and adaptable business strategy.







Chart - 21

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns


# Creating a new column 'revenue' by multiplying 'adr' with the total stay duration
auto_df_cleaned['revenue'] = auto_df_cleaned['adr'] * (auto_df_cleaned['stays_in_weekend_nights'] + auto_df_cleaned['stays_in_week_nights'])

# Grouping data by hotel type and month, then calculating the total revenue for each group
grouped_data = auto_df_cleaned.groupby(['hotel', 'arrival_date_month'])['revenue'].sum().reset_index()

# Creating a line chart
plt.figure(figsize=(12, 6))
sns.lineplot(x='arrival_date_month', y='revenue', hue='hotel', data=grouped_data, marker='o')

# Adding labels and title
plt.xlabel('Month')
plt.ylabel('Total Revenue')
plt.title('Revenue Trend Over Months for City Hotel and Resort Hotel')

# Displaying the plot
plt.xticks(rotation=45)
plt.show()


Why did you pick the specific chart?

I chose a line chart for multivariate analysis as it effectively illustrates the revenue trends over months for both Resort Hotel and City Hotel, allowing for a direct comparison of their performance.

What is/are the insight(s) found from the chart?

The chart reveals that Resort Hotel consistently generates higher total revenue than City Hotel. Both hotels experience a revenue surge between April and August, possibly indicating a peak season, and another increase from January to July. These insights highlight periods of high demand and revenue potential.

Will the gained insights help create a positive business impact? Are there any insights that lead to negative growth? Justify with specific reason.

The insights can positively impact business planning by aligning resources with high-demand periods. However, if not managed efficiently, increased demand during these peak periods might strain resources, potentially leading to negative growth due to operational challenges and customer dissatisfaction. Proper resource allocation and strategic planning are essential to capitalize on revenue opportunities while mitigating potential negative impacts.

In [None]:
# Check the column names in your DataFrame
print(auto_df_cleaned.columns)


Chart-22

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# Reindexing the 'arrival_date_month' to have a custom order
reindex = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
auto_df_cleaned['arrival_date_month'] = pd.Categorical(auto_df_cleaned['arrival_date_month'], categories=reindex, ordered=True)

# Setting up the plot size
plt.figure(figsize=(15, 8))

# Creating a line chart for the mean ADR over the months with hue for hotel types
sns.lineplot(x='arrival_date_month', y='adr', hue='hotel', data=auto_df_cleaned, marker='o')

# Adding a title to the plot
plt.title('Average Daily Rate (ADR) Trends Over Months for City Hotel and Resort Hotel')

# Labeling the x-axis
plt.xlabel('Month')

# Labeling the y-axis
plt.ylabel('Average Daily Rate (ADR)')

# Displaying the legend
plt.legend(title='Hotel Type')

# Displaying the plot
plt.show()


Why did you pick the specific chart?


I chose a line chart to visualize the Average Daily Rate (ADR) trends over months for both City Hotel and Resort Hotel. A line chart is effective for showing trends and variations over a continuous variable (months, in this case). The use of different colors for each hotel type (City Hotel and Resort Hotel) provides a clear visual distinction, making it easier to compare their ADR trends.

What is/are the insight(s) found from the chart?


The insights derived from the chart include the observation that Resort Hotel consistently maintains a higher ADR compared to City Hotel. Additionally, for Resort Hotel, there is a discernible increase in ADR from May to July, followed by a decline from September to November. These insights highlight the seasonal variations in pricing and suggest potential opportunities for strategic pricing adjustments.

Will the gained insights help creating a positive business impact? Are there any insights that lead to negative growth? Justify with specific reason.


The gained insights can positively impact business strategies. For instance, the higher ADR for Resort Hotel suggests the potential for increased revenue from guests seeking a more luxurious experience. The observed seasonal variations allow hotels to optimize pricing strategies, potentially capitalizing on high-demand periods and implementing targeted promotions during slower months. However, without additional context on the hotel's overall business strategy, it's challenging to definitively state whether these insights will lead to negative growth. Strategic decisions based on these insights should consider a holistic approach, taking into account broader business objectives and market dynamics.

#### Chart - 23 - Pair Plot

In [None]:
# Pair# Pair plot for numerical features
import seaborn as sns
import matplotlib.pyplot as plt

# Selecting specific numerical columns
selected_numerical_columns = [
    'lead_time', 'stays_in_weekend_nights','stays_in_week_nights',
    'adults', 'children', 'babies', 'previous_cancellations', 'booking_changes',
    'days_in_waiting_list', 'adr', 'required_car_parking_spaces', 'total_of_special_requests'
]

# Adding the target variable 'is_canceled' for color differentiation
selected_numerical_columns_with_target = ['is_canceled'] + selected_numerical_columns

# Creating pair plot for selected numerical columns
sns.pairplot(auto_df_cleaned[selected_numerical_columns_with_target], hue='is_canceled')
plt.suptitle('Pair Plot for Selected Numerical Columns with Target Variable', y=1.02)
plt.show()




##### 1. Why did you pick the specific chart?

I chose a pair plot for selected numerical features with the target variable 'is_canceled' to visually explore the relationships and distributions between these variables. A pair plot is useful for examining pairwise interactions, correlations, and identifying patterns in the data. The hue parameter allows us to differentiate between canceled and non-canceled bookings, providing insights into how numerical features relate to the cancellation status.

##### 2. What is/are the insight(s) found from the chart?

The pair plot reveals several insights:

Lead Time vs. Previous Cancellations: Bookings with longer lead times tend to have fewer previous cancellations, suggesting that guests who plan well in advance are less likely to cancel.

Total of Special Requests vs. Required Car Parking Spaces: There is no clear trend between these variables, indicating that special requests are not strongly correlated with the need for parking spaces.

ADR (Average Daily Rate) vs. Booking Changes: Higher ADR values are associated with a lower frequency of booking changes, indicating that guests with more expensive reservations might be less likely to modify their bookings.

Stays in Week Nights vs. Stays in Weekend Nights: The distribution of stays during the week and weekend nights shows various patterns, and it is not straightforward to conclude a clear relationship.

Cancellation Status and Numeric Features: The hue differentiation helps identify patterns specific to canceled and non-canceled bookings across various numeric features.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?


To achieve the business objectives successfully, I recommend adopting a strategic and balanced approach that integrates the valuable insights garnered from the various analyses and visualizations. Firstly, given the concentration of bookings with short lead times, the client should prioritize operational flexibility while maintaining efficient resource planning to accommodate last-minute demand effectively. Additionally, the dominance of City Hotel provides an opportunity for business planning, but to mitigate risks associated with shifts in market dynamics, the client should consider diversification strategies, possibly exploring avenues for enhancing Resort Hotel offerings.

Moreover, insights into peak booking months offer a chance for optimized resource allocation and tailored marketing strategies. However, operational challenges may arise during high-demand periods, making strategic planning and operational preparedness crucial. Tailoring services and marketing efforts to the predominant 'Transient' customer type is a positive strategy, but the client should aim for a balanced approach that considers diverse customer segments to ensure resilience during periods of low demand.

Furthermore, collaboration with travel agents and tour operators, being primary contributors, is essential, but a diversified approach considering various booking channels can mitigate risks during market shifts. Aligning services with the most common special request patterns positively impacts customer satisfaction and operational efficiency, but maintaining a balanced approach is essential to prevent operational challenges and increased costs.

In terms of revenue optimization, while insights into higher revenue for City Hotel are valuable, focusing solely on City Hotel without leveraging the strengths of Resort Hotel may hinder overall growth. Therefore, a balanced strategy that capitalizes on the strengths of each hotel type will be crucial for sustained positive business impact. In conclusion, the client should adopt a holistic and diversified business strategy that encompasses operational efficiency, customer satisfaction, and market adaptability to achieve the desired business objectives over the long term.

# **Conclusion**

Write the conclusion here.

In conclusion, the comprehensive analysis of the hotel dataset provides valuable insights for strategic decision-making and business planning. Understanding the trends and patterns in booking behavior, lead times, hotel types, and customer preferences is crucial for achieving positive business outcomes. The concentration of bookings with short lead times suggests the need for operational flexibility and responsiveness, balancing last-minute demand with efficient resource planning.

The dominance of City Hotel presents an opportunity for strategic planning, but diversification strategies should be explored to mitigate risks associated with market shifts. Leveraging insights into peak booking months, customer types, and booking channels allows for optimized resource allocation, tailored marketing strategies, and collaboration with key contributors. However, a balanced approach is essential to address challenges and maintain operational efficiency.

The client is recommended to adopt a diversified and customer-centric approach, considering the strengths of each hotel type, and incorporating flexibility in operations. Strategic collaboration, service alignment with customer preferences, and a focus on optimizing revenue for both hotel types will contribute to sustained positive business impact. The client should remain agile, responsive to market dynamics, and continuously refine strategies based on evolving trends to achieve long-term success and growth in the hospitality industry.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***