# **Project Name**    -Hotel Booking Analysis



##### **Project Type**    - Classification

---



---


##### **Contribution**    - Individual


# **Project Summary -**

The Hotel Booking Analysis project provides valuable insights into the factors influencing hotel bookings and customer behavior. By leveraging these insights, hotel managers can make informed decisions to optimize their operations, enhance customer experiences, and improve overall profitability. The findings underscore the importance of understanding booking trends, customer segmentation, and cancellation behaviors in the highly competitive hospitality industry.









# **Problem Statement**




Key Features in the Dataset:
Hotel Type: Type of hotel (e.g., City Hotel or Resort Hotel)
Booking Time: When the booking was made
Check-in/Check-out Dates: Arrival and departure dates
Length of Stay: Duration of stay in nights
Booking Channels: Source of booking (e.g., Online Travel Agent, Direct)
Customer Demographics: Information about the customer (e.g., country, age, etc.)
Booking Status: Whether the booking was canceled or confirmed
Special Requests: Any special requests made by the customer
Key Questions to Address:
Booking Trends:

What are the peak booking periods (months or seasons)?
Which types of rooms are most frequently booked?
Customer Segmentation:

What is the demographic profile of the customers (age, country, etc.)?
How do booking behaviors vary across different customer segments?
Cancellation Analysis:

What is the cancellation rate?
What factors contribute to booking cancellations?
Revenue Analysis:

What is the average revenue per booking?
How does revenue vary with different types of bookings (e.g., length of stay, room type)?
Channel Performance:

Which booking channels are most popular?
How do different channels compare in terms of booking volume and revenue?
Special Requests and Preferences:

What are the common special requests made by customers?
How do special requests impact customer satisfaction and booking decisions?
Analytical Approach:
Data Cleaning and Preprocessing:

Handle missing values and outliers.
Ensure data consistency and accuracy.
Descriptive Statistics:

Summarize key attributes to understand the data distribution.
Visualize trends and patterns using charts and graphs.
Exploratory Data Analysis (EDA):

Identify correlations and relationships between variables.
Segment customers based on booking behavior and demographics.
Predictive Modeling (optional):

Build models to predict cancellations and revenue.
Analyze the impact of different features on booking outcomes.
Recommendation Generation:

Based on the analysis, provide actionable insights for improving hotel performance.
Suggest strategies to reduce cancellations, optimize pricing, and enhance customer experience.
Expected Outcomes:
Comprehensive insights into booking trends and customer behavior.
Identification of key factors affecting bookings and cancellations.
Actionable recommendations to improve the hotel's booking process and revenue management.
Enhanced understanding of customer preferences and special requests.
By completing this analysis, the hotel can make informed decisions to optimize its operations, enhance customer satisfaction, and increase profitability.



#### **Define Your Business Objective?**

Specific Goals:
Increase Occupancy Rates:

Identify patterns and trends in booking data to understand peak and off-peak periods.
Analyze customer demographics to tailor marketing efforts and attract more bookings during off-peak times.
Develop dynamic pricing strategies based on booking trends and competitor analysis.
Enhance Customer Experience:

Analyze customer feedback and booking preferences to improve services and amenities.
Personalize marketing and communication strategies to cater to repeat customers and attract new ones.
Optimize Marketing Efforts:

Evaluate the effectiveness of different marketing channels and campaigns.
Identify the most profitable customer segments and focus marketing efforts on them.
Increase Direct Bookings:

Reduce dependency on third-party booking platforms by enhancing the hotel's direct booking platform.
Implement loyalty programs and special offers for direct bookings.
Forecasting and Planning:

Develop accurate forecasts for future demand to optimize staffing, inventory, and resource allocation.
Use predictive analytics to anticipate trends and adjust strategies accordingly.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
from numpy import math
from numpy import loadtxt
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import rcParams
!pip install pymysql
import pymysql
from sqlalchemy import create_engine
from sqlalchemy.pool import NullPool

import numpy as np
import seaborn as sns
from scipy.stats import *
import math

from statsmodels.stats.outliers_influence import variance_inflation_factor
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from imblearn.over_sampling import SMOTE
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn import metrics
from sklearn.metrics import roc_curve
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RepeatedStratifiedKFold
from xgboost import XGBClassifier
from xgboost import XGBRFClassifier
from sklearn.tree import export_graphviz



### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')


In [None]:
file_path='/content/drive/MyDrive/m2_projects/Hotel Bookings.csv'

In [None]:
# Load Dataset
hmdb_df=pd.read_csv(file_path)

### Dataset First View

In [None]:
# Dataset First Look
hmdb_df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

hmdb_df.shape

In [None]:
columns = hmdb_df.count(axis=1)
print(columns)

In [None]:
rows = hmdb_df.count(axis=0)
print(rows)

### Dataset Information

In [None]:
# Dataset Info
hmdb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(hmdb_df[hmdb_df.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(hmdb_df.isnull().sum().sum())

In [None]:
# Visualizing the missing values
print(hmdb_df.isnull().sum())

### What did you know about your dataset?

The dataset given is a dataset from  Hotel Booking Analysis and we have to analysis the hotal and the insights behind it. At now I know about Import Libraries,
Dataset Loading,
Dataset First View,
Dataset Rows & Columns count,
Dataset Information,
Duplicate Values,
Missing Values/Null Values,




The above dataset has 119390 rows × 32 columns. There are mising values 129425 and duplicate values 31994 in the dataset.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hmdb_df.columns

In [None]:
# Dataset Describe
hmdb_df.describe(include='all')

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values = hmdb_df.apply(lambda x: x.unique())
print(unique_values)


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
hmdb_df.info()
hmdb_df.head()
hmdb_df.describe()

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 histplot

In [None]:
sns.histplot(data=hmdb_df, x="lead_time", bins=10, kde = True, color  = 'Red',hue='customer_type')


plt.title("Distribution of lead_time")
plt.xlabel("lead_time")
plt.ylabel("Frequency")

plt.show()

##### 1. Why did you pick the specific chart?

The specific chart was chosen to enable a comprehensive analysis of lead time distributions and their variations across different customer types, providing actionable insights to drive operational improvements and enhance customer experiences.

##### 2. What is/are the insight(s) found from the chart?

The histogram provides insights into lead time distribution, comparative analysis across customer types, frequency patterns, and the presence of outliers, offering valuable information for process optimization, customer service improvements, and operational efficiency.

##### 3. Will the gained insights help creating a positive business impact?

The insights derived from the lead time distribution visualization have the potential to positively impact the business by supporting more equitable service provision, enhancing customer satisfaction, and optimizing resource allocation based on the specific needs of different customer segments.

#### Chart - 2   KDE plot


In [None]:
sns.kdeplot(data=hmdb_df['lead_time'],fill =True, color = 'blue' )
plt.title('Kernel Density Estimate (KDE) of lead_time')
plt.xlabel('lead_time')
plt.ylabel('Density')

plt.show()

##### 1. Why did you pick the specific chart?

The specific chart being used here is a KDE (Kernel Density Estimation)



This type of chart is effective in summarizing and comparing the distribution of lead times across different customer types, making it a suitable choice for visualizing this specific dataset.Answer Here.

##### 2. What is/are the insight(s) found from the chart?


 Here The provided code generates a kde plot that displays the distribution of lead times.
Lead Time Distribution: The kde plot shows the distribution of lead times across the dataset..

Frequency of Lead Times: The y-axis showcases the frequency of lead times within each bin, providing an understanding of how often certain lead time ranges occur in the data

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Gaining insights about lead time distribution and its correlation with customer types can empower businesses to make data-driven decisions, improve operational workflows, enhance customer satisfaction, and optimize resource allocation, ultimately leading to a positive business impact

In [None]:
hmdb_df.head(2)

#### Chart - 3  Box plot

In [None]:
sns.boxplot(x="lead_time", data=hmdb_df)
plt.title("lead_time Distribution")
plt.xlabel("Frequency")
plt.ylabel("lead_time")
plt.show()


##### 1. Why did you pick the specific chart?

the boxplot was likely chosen to provide a clear and efficient visualization of lead time distribution, outlier identification, and the comparison of lead time distribution across different groups or categories.

##### 2. What is/are the insight(s) found from the chart?
From the correct boxplot of lead times, you can gain several insights:

Central Tendency: The central line in the box represents the median lead time for each category. This insight offers a quick and clear way to compare the typical lead time across different categories.

Variability: The height of the box helps visualize the variability or spread of lead times within each category. It provides an understanding of the range and distribution of lead times, highlighting potential outliers or anomalies.

Comparison across Categories: By observing the distribution of lead times in different categories, you can compare the lead time distributions and identify which categories consistently have longer or shorter lead times. This insight can guide decisions related to resource allocation, customer service, and operational efficiency.

Using this visualization, businesses can make informed decisions about resource allocation, customer service improvement, and operational efficiency improvements to positively impact performance and customer satisfaction.





Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 while insights gained from the lead time distribution can help businesses enhance operational efficiency and improve customer experiences, negative impacts may arise from consistent delays, high variability, or an abundance of outliers. It's important for businesses to analyze these insights and take actions to address inconsistent performance, reduce lead time variability, and mitigate the occurrences of outliers to ensure positive growth and customer satisfaction.

#### Chart - 4 Bar Plot


In [None]:
iscancelled=hmdb_df[hmdb_df['is_canceled'] == 1].groupby('arrival_date_year').size().reset_index(name='count')

In [None]:
#Bar plot
x = list(iscancelled['arrival_date_year'])
y = list(iscancelled['count'])
sns.barplot(x= x, y=y),
plt.title('Year wise cancelled booking')
plt.xlabel('year')
plt.ylabel('cancelled')
plt.show

##### 1. Why did you pick the specific chart?

 the specific choice of a bar plot for this context is well-suited due to its ability to succinctly compare categorical data across different years, providing a clear and visually appealing representation of the distribution of cancelled bookings.

##### 2. What is/are the insight(s) found from the chart?

Based on the provided bar plot depicting "Year-wise cancelled booking," several insights can be gleaned from the chart:

Trend Analysis: By observing the bar plot, one can identify the trend of cancelled bookings over the years. If there is a consistent increase or decrease in cancellations, it provides an understanding of booking behavior and potential shifts in customer preferences or market conditions over time.

Seasonal Variations: Yearly variations can indicate seasonal trends in cancellations. For instance, if cancellations spike during certain years, it may correspond to economic factors, industry trends, or specific events that affect booking behavior.

Operational Impact: Understanding the fluctuations in cancellations year over year can provide insights into potential operational improvements. Consistently high cancellations in certain years may necessitate a review of policies, customer service practices, or market positioning to mitigate negative impacts.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

while insights gained from the year-wise cancellation plot can help businesses make strategic decisions to enhance customer satisfaction and operational efficiency, negative impact may arise from consistent high cancellations, unfavorable trends, and potential repercussions on customer retention and market reputation. It’s crucial for businesses to analyze these insights and take actions to address underlying issues, enhance customer experience, and reduce cancellations to ensure positive growth and customer loyalty.

In [None]:
hmdb_df.head()



#### Chart - 5 line plot

In [None]:
# Chart - 5 visualization code
sns.lineplot(x='arrival_date_month', y='lead_time', data = hmdb_df)
plt.title("lead_time Distribution")
plt.xlabel("Frequency")
plt.ylabel("lead-time")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

this specific chart is selected to provide a clear and intuitive representation of lead time variations over time, which can be instrumental in understanding and managing the impact of lead times on business operations and customer experiences. Overall, the line plot is well-suited for presenting lead time patterns across months and aiding in strategic decision-making through visual trend analysis.




##### 2. What is/are the insight(s) found from the chart?

The line plot created using sns.lineplot visualizes the lead time variation over the months. Insights from this chart may include:

Seasonal Trends: It helps in identifying seasonal patterns in lead times. For instance, if lead times are consistently higher during certain months, this insight can inform businesses to allocate resources accordingly, adjust staffing levels, and manage customer expectations more effectively.

Operational Efficiency: A consistent downward trend in lead times over months could indicate improvements in operational efficiency. Conversely, a consistent upward trend may suggest operational bottlenecks that need to be addressed.

Demand Fluctuations: Fluctuations in lead times can provide insights into demand fluctuations. If lead times increase during peak months, it can indicate the need for additional resources or operational adjustments to manage high demand periods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

while insights gained from the lead time distribution across months can help businesses optimize resource allocation and improve operational planning, negative impacts may arise from unpredictable variations, consistent lengthy lead times, and inconsistent performance. Businesses need to use these insights to refine their operational strategies, ensure consistent service levels, and address inefficiencies to achieve positive growth and enhance customer satisfaction.




#### Chart - 6 Bar plot


In [None]:
# Chart - 6 visualization code


plt.rcParams['figure.figsize'] = (12, 7)
color = plt.cm.copper(np.linspace(0, 0.5, 20))
((hmdb_df.groupby(['arrival_date_year'])['is_canceled'].mean())*100).sort_values(ascending = False).head(10).plot.bar(color = ['violet','indigo','b','g','y','orange','r'])
plt.title(" arrival_date_year with most is_canceled percentage", fontsize = 20)
plt.xlabel('arrival_date_year', fontsize = 15)
plt.ylabel('percentage', fontsize = 15)
plt.show()

##### 1. Why did you pick the specific chart?

the choice of a bar plot with specific features such as color variation, plot dimensions, and informative titles aligns well with the goal of representing the cancellation percentages across different arrival_date years clearly and effectively, making it suitable for comparing and analyzing cancellation trends over time.

##### 2. What is/are the insight(s) found from the chart?

The provided code generates a bar plot showing the percentage of cancellations based on arrival date year. Here are the insights found from the chart:

Trend of Cancellations: The chart provides insight into the trend of cancellations across different years. It is evident that certain years have notably higher cancellation rates compared to others.

Identification of Critical Years: By visualizing the cancellation percentages, the chart highlights specific years where the percentage of cancellations was notably high. This identification of critical years can prompt further investigation into the reasons behind these peaks in cancellations.

Potential Impact on Revenue and Operations: High cancellation percentages in certain years may have had a significant impact on revenue and operational planning. Understanding these trends can help the business make informed decisions about capacity planning, staffing, and managing fluctuations in demand.

Need for targeted improvements: The visualization underscores the need to focus on improving customer retention and satisfaction for the years with higher cancellation rates. The insights gained from this chart can guide the development of strategies aimed at reducing cancellations in subsequent years.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Insights gained from the analysis of cancellation percentages per arrival year can provide actionable data for business planning, revenue management, and customer satisfaction improvements. However, consistently high cancellation rates across specific years might signal deeper issues that could negatively impact customer satisfaction, market positioning, and financial performance. It's essential for businesses to leverage these insights to implement targeted strategies aimed at reducing cancellations, enhancing customer experience, and fostering sustainable growth.

#### Chart - 7 line plot


In [None]:
# Template 1
# Multiple columns Line Plot
# Select columns for the line plot
columns_to_plot = ['stays_in_weekend_nights', 'stays_in_week_nights']

# Plot the data
plt.figure(figsize=(12, 6))
for column in columns_to_plot:
    plt.plot(hmdb_df[column], label=column, linewidth=2)

# Add labels and title
plt.xlabel('Index', fontsize=12, fontweight='bold', color='gray')
plt.ylabel('night', fontsize=12, fontweight='bold', color='gray')
plt.title(' stay night', fontsize=16, fontweight='bold')

# Add legend with improved styling
plt.legend(fontsize=10, loc='upper left')

# Add grid with customized style
plt.grid(True, linestyle='--', linewidth=0.5, alpha=0.7)

# Customize tick parameters
plt.tick_params(axis='both', which='major', labelsize=10, colors='black')

# Customize spine colors
plt.gca().spines['top'].set_color('none')
plt.gca().spines['right'].set_color('none')
plt.gca().spines['bottom'].set_color('gray')
plt.gca().spines['left'].set_color('gray')

# Show plot
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The specific chart chosen appears to be a line plot that displays the trends for the 'stays_in_weekend_nights' and 'stays_in_week_nights' columns. The choice of this chart seems to align with the intent to visualize the variations in stays during weekend nights and weekdays.
the line plot is a suitable choice for the given data as it allows for the comparison of trends, facilitates the observation of patterns, and assists in understanding the relationship between the variables related to stays during weekend nights and weekdays.

##### 2. What is/are the insight(s) found from the chart?

The  code generates a line plot comparing the columns 'stays_in_weekend_nights' and 'stays_in_week_nights' from the dataset. Here are potential insights that could be derived from this chart:

Comparison of Stay Nights: The plot offers a clear visual comparison of the number of nights stayed during weekends versus weekdays. This insight can be valuable for understanding guests' booking patterns and how they differ between weekends and weekdays.

Trend Analysis: By observing the trends in both types of stays, the business can understand whether there are any variations in the length of stays during different parts of the week. For instance, identifying longer weekend stays could inform pricing strategies, package deals, or staffing arrangements.

Operational Planning: The plot can provide insights for operational planning, such as staffing levels, housekeeping scheduling, and resource allocation. Understanding stay patterns can help in optimizing staff schedules and service offerings.

Customer Behavior: This comparison might reveal insights into the behavioral preferences of guests, such as whether they tend to stay longer on weekends or extend their stays into the weekdays.

Demand Forecasting: Observing these trends can aid in forecasting demand for specific days of the week, supporting inventory management and revenue optimization strategies.

Overall, the chart allows for a direct comparison of the duration of stays during weekends and weekdays, offering insights that can inform operational decisions, marketing strategies, and overall business planning based on the observed trends and patterns in guest stays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing stays across weekdays and weekends can contribute positively by optimizing resource allocation, influencing occupancy-driven marketing strategies, and refining pricing models. However, if these insights reveal underutilization, operational challenges, or deviation from industry practices, negative impacts might arise, prompting the need for strategic adjustments to mitigate potential drawbacks.

#### Chart - 8 Box plot

In [None]:
for col in hmdb_df.describe().columns:
  fig=plt.figure(figsize=(9,6))
  ax=fig.gca()
  feature= (hmdb_df[col])
  sns.histplot(hmdb_df[col],kde =True) # bin can be mentioned, if required
  ax.axvline(feature.mean(),color='magenta', linestyle='dashed', linewidth=2)
  ax.axvline(feature.median(),color='cyan', linestyle='dashed', linewidth=2)
  ax.set_title(col)
plt.show()

# Visualizing code of box plot for each columns to know the data distibution
for col in hmdb_df.describe().columns:
    fig = plt.figure(figsize=(9, 6))
    ax = fig.gca()
    hmdb_df.boxplot( col, ax = ax)
    ax.set_title('Label by ' + col)
    ax.set_ylabel("arrival_date_year")
plt.show()


##### 1. Why did you pick the specific chart?

these specific charts were chosen because they provide complementary insights into the distribution, central tendency, and variability of the data across different columns. They are effective for identifying outliers, understanding skewness, and gaining an overall understanding of the data's statistical properties. This approach allows for a comprehensive visual exploration of the dataset's characteristics.

##### 2. What is/are the insight(s) found from the chart?

The provided Python code generates visualizations using histograms and box plots for each column in the DataFrame. By analyzing the visualizations created by the mentioned code, one can gain insights related to the data distribution for each column:

Insights from Histograms:

Central Tendency: The dashed magenta line represents the mean, while the dashed cyan line represents the median. By observing the relative positions of these lines in each histogram, one can gain insight into the skewness and central tendency of each feature.

Data Spread: The shape of the histograms and the density distribution (if kde is enabled) provide insights into the spread and variation of the data, helping to identify features with wide or narrow distributions.

Data Outliers: Histograms can reveal the presence of outliers in the data distribution, particularly if there are long tails or unusually high bars at specific ranges.

Insights from Box Plots:

Outlier Detection: Each box plot can help in identifying potential outliers, particularly those that fall outside the whiskers of the plot.

Variability: Comparing the lengths of the boxes and the positions of the medians can highlight differing variabilities and central tendencies across features.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These visualizations provide a comprehensive understanding of the data distributions, central tendencies, variability, and potential outliers. These insights are essential for making informed decisions in data analysis, identifying data preprocessing needs, and understanding relationships within the dataset.

#### Chart - 9 Histogram


In [None]:
# Chart - 9 visualization code
# Histogram of prices
plt.hist(hmdb_df['agent'], bins=20 )
plt.xlabel('agent')
plt.ylabel('hotal')
plt.title('Distribution of agent')
plt.show()


##### 1. Why did you pick the specific chart?

The code provided generates a histogram to visualize the distribution of the "agent" variable. However, there's an inconsistency in the provided labels (using "hotal" instead of "Count" or "Frequency"). Assuming it's the count of occurrences, here's an explanation.
The histogram is a suitable choice for understanding the frequency distribution of the "agent" variable, providing a quick overview of the data's distribution characteristics and identifying potential patterns or outliers.

##### 2. What is/are the insight(s) found from the chart?

The code provided creates a histogram of the "agent" data from the "hmdb_df" dataset.

From the histogram of agent distribution, potential insights could include:

Agent Contribution: Identification of which agents have the highest and lowest frequency of bookings or reservations. This insight can indicate which agents contribute significantly to the business and which ones may require further support or attention.

Distribution Spread: Understanding the distribution spread of agent bookings can highlight potential outliers or concentrations. For example, if there is a wide variation in the frequency of bookings among different agents, it could signify diverse levels of performance or popularity among agents.

Patterns or Trends: Identification of any discernible patterns or trends in agent bookings, such as seasonal variations or shifts over time. Recognizing these patterns may lead to tailored incentive programs or targeted marketing efforts for specific agents.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights gained from the agent distribution can positively impact the business by recognizing top performers, identifying inefficiencies, and improving service quality. On the other hand, imbalanced workloads, lack of support or training, and overall inefficiency could lead to negative growth, affecting customer satisfaction and overall business performance. It's important to address any imbalances and inefficiencies to foster a positive impact on the business.

#### Chart - 10 Scatterplot


In [None]:
# Chart - 10 visualization code

# Scatterplot of stays_in_week_nights vs. price colored by fuel-type:

sns.scatterplot(x='stays_in_week_nights', y='agent', data=hmdb_df, hue="hotel")

##### 1. Why did you pick the specific chart?

The scatterplot chosen to visualize 'stays_in_week_nights' versus 'agent', colored by 'hotel', was selected based on the need to visualize the relationship and distribution of these variables simultaneously.The scatterplot was chosen to provide a comprehensive view of the relationship between 'stays_in_week_nights' and 'agent', while also considering the influence of 'hotel' type. This approach enables analysts to identify potential correlations and understand how different categories interact within the dataset in a single concise visualization.

##### 2. What is/are the insight(s) found from the chart?

It seems I want to generate a scatterplot of stays_in_week_nights vs. agent with the points colored by the hotel field. Based on this scatterplot, here are potential insights that could be gleaned from the chart:

Booking Preferences: The scatterplot could potentially reveal patterns in the duration of stays (stays_in_week_nights) and choice of booking agent (agent) across different hotel types. For instance, it may show a clustering of longer stays for certain hotels, indicating varying booking behaviors.

Agency Distribution: It may uncover the distribution of different booking agencies across the stays in weeknights, shedding light on which agencies are involved in longer or shorter stays.

Hotel Occupancy Trends: By color-coding the points by hotel, you can potentially observe whether there are differences in stays based on the type of hotel, offering insight into occupancy rates and customer preferences for different hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights gained from the relationship between stays and prices colored by hotel can positively influence pricing strategies, demand forecasting, and overall revenue. However, negative impacts may arise from unfavorable price sensitivities, uncompetitive pricing strategies, and potential effects on customer retention. It is essential for businesses to leverage these insights to optimize pricing strategies, align pricing with demand patterns, and maintain competitiveness to ensure a positive impact on business growth and customer satisfaction.

#### Chart - 11  pie chart



In [None]:
x = list(iscancelled['arrival_date_year'])
y = list(iscancelled['count'])
plt.pie(y,labels=x,autopct='%1.1f%%')
plt.title('Year wise cancellation percentage')
plt.show

##### 1. Why did you pick the specific chart?

Pie charts emphasize proportions more effectively than some other chart types, making it easier to see the relative impact of cancellations across different years.

In summary, the pie chart is chosen for its ability to clearly represent proportions and percentages, making it a suitable choice for visualizing the distribution of cancellation counts across different years, ensuring an easy-to-understand comparison within this specific context.

##### 2. What is/are the insight(s) found from the chart?


From the pie chart showcasing "Year wise cancellation percentage," the following insights can be derived:

Yearly Comparison: The chart provides a comparison of cancellations across different years, illustrating the percentage of cancellations for each year. This comparison can help in identifying trends or patterns of cancellations over time.

Priority Areas: By analyzing cancellation percentages for each year, the business can identify which years experienced higher cancellation rates, hinting at areas that may need attention or improvement.

Trend Identification: If there are noticeable variations in cancellation percentages across years, this could denote evolving customer behavior, changing market conditions, or operational factors that warrant deeper investigation.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 While insights gained from the year-wise cancellation percentage pie chart can guide businesses to make informed operational and strategic decisions, negative growth potential can arise from consistent increases in cancellations, seasonal variations impacting revenue, and overall customer dissatisfaction. It's crucial for businesses to proactively address these insights, aiming to reduce cancellations and improve overall customer satisfaction, resulting in positive growth and sustainable business success.

#### Chart - 12 Bar plot

In [None]:
hmdb_df['required_car_parking_spaces'].unique()

In [None]:
sns.barplot(x='customer_type', y='required_car_parking_spaces', data=hmdb_df, estimator = 'mean')

# Set title and labels
plt.title('Box Plot')
plt.xlabel('customer_type')
plt.ylabel('required_car_parking_spaces')
plt.xticks(rotation = 90)

# Show plot
plt.show()

##### 1. Why did you pick the specific chart?

 The bar plot was chosen for its ability to visually compare average parking space requirements between different customer types, making it a suitable choice for analyzing and presenting insights related to this specific comparison in the dataset.

##### 2. What is/are the insight(s) found from the chart?

The bar plot created using sns.barplot visualizes the average required car parking spaces based on customer type. From this chart, several insights can be derived:

Parking Space Utilization: The plot provides a clear comparison of the average parking space requirements between different customer types. It appears to show whether certain customer types typically require more parking spaces than others.

Customer Segmentation Impact: The graph reveals potential differences in parking space needs based on customer type. Understanding these variations can influence parking facility planning, allocation of resources, and even marketing strategies tailored to different customer segments.

Operational Insights: The visual comparison may uncover patterns indicating that certain customer types place a higher demand on parking facilities, potentially influencing operational decisions such as infrastructure investments, valet services, or priority parking access.

Decision Making: The insights from this chart can guide decision-making processes related to parking resource allocation and utilization, leading to improved customer experiences and operational efficiencies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights from the average required car parking spaces for different customer types can help businesses tailor their services and resource allocation, potentially leading to positive business impacts. However, it's important to address potential challenges related to operational efficiency, resource utilization, and competitive positioning to ensure that these insights contribute positively to business growth and customer satisfaction.

#### Chart -  13 Box Plot


In [None]:
# Box Plot
sns.boxplot(x='hotel', y='lead_time', data=hmdb_df)

# Set title and labels
plt.title('Box Plot')
plt.xlabel('hotel')
plt.ylabel('lead_time')

# Show plot
plt.show()

##### 1. Why did you pick the specific chart?

The specific chart chosen, the box plot, was likely selected due to its ability to showcase the distribution of lead times for each hotel category in the dataset. Here's why the box plot was likely chosen for this visualization.The box plot was most likely chosen to provide a comprehensive understanding of lead time distributions for different hotels, aiding in the comparative analysis of lead times and highlighting any variations or outliers that may be key to business decisions and operational improvements.

##### 2. What is/are the insight(s) found from the chart?

The box plot comparing lead times for different hotels can yield a variety of insights that can impact business decisions:

Lead Time Comparison: By comparing lead times between different hotels, businesses can gain insights into the time periods between booking and arrival for each hotel.

Identification of Outliers: The box plot can help identify potential outliers in lead times for each hotel. Outliers may represent extraordinary lead time durations, which could be further investigated to understand exceptional cases that might be impacting customer experiences.

Variability Assessment: The comparison between hotels can reveal differences in lead time variability. Understanding variability can influence capacity planning, staffing, and resource allocation strategies for different hotels.

Customer Expectations: Insights from the lead time comparison can help shape customer expectations about booking lead times, which in turn can inform marketing strategies and service level agreements for each hotel.

Operational Improvements: Variances in lead times might indicate areas for operational improvements, either in booking processes, customer service responsiveness, or demand forecasting, impacting the overall operational efficiency and customer satisfaction.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

In conclusion, the insights gained from the comparison of lead times between hotels can help identify operational best practices, enhance customer experiences, and improve overall operational effectiveness. However, negative business impacts may arise from inconsistent service levels, operational inefficiencies, and potential reputation damage associated with significant variations in lead times. It's essential for businesses to address discrepancies and strive for consistency in service delivery across all hotels to ensure positive growth and customer satisfaction.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Create a beautiful heatmap
corr = hmdb_df[hmdb_df.describe().columns].corr()
# Create a beautiful heatmap
plt.figure(figsize=(10, 8))
sns.set(font_scale=1.2)  # Increase font size for readability

# Define custom color palette with more contrast
cmap = sns.diverging_palette(220, 20, as_cmap=True)

# Draw the heatmap with improved annotation and aesthetics
sns.heatmap(corr, cmap=cmap, annot=False, fmt=".2f", annot_kws={"size": 12, "weight": "bold"},
            linewidths=0.5, cbar_kws={"shrink": 0.8, "label": "Correlation Coefficient"},
            square=True, linecolor='white', vmin=-1, vmax=1)

# Add title
plt.title('Correlation Heatmap', fontsize=18)

# Rotate y-axis labels for better readability
plt.yticks(rotation=0)

# Show plot
plt.tight_layout()  # Adjust layout to prevent clipping of labels
plt.show()

##### 1. Why did you pick the specific chart?


The heatmap was chosen for its ability to effectively showcase correlation patterns, its aesthetic appeal, the clarity of the custom color palette, annotation, and the overall readability provided, making it an excellent choice for visualizing the given correlation matrix.

##### 2. What is/are the insight(s) found from the chart?

The provided code generates a visually appealing heatmap to visualize the correlation among the numerical variables in the dataset. Insights from this chart and its impact on the business include the following:

Identifying Strong Relationships: The heatmap can reveal strong positive or negative correlations between variables. Identifying strong positive correlations can highlight potential opportunities for bundling products or services, cross-selling, or even identifying common factors that contribute to successful outcomes in the business.

Detecting Redundant Information: If variables exhibit high correlations, it may indicate redundant or overlapping information being captured. Identifying such instances can guide businesses to streamline data collection and analysis, leading to enhanced operational efficiency.

Understanding Dependencies: Correlation insights can help businesses understand how changes in one variable may impact others. This, in turn, aids in making informed decisions, formulating effective business strategies, and anticipating potential cause-and-effect relationships between different business parameters.

Potential Areas for Improvement: Discovering negative correlations might indicate areas where improvements in one variable could positively impact another. For instance, a negative correlation between customer wait time and customer satisfaction may prompt actions for reducing wait times to enhance overall customer experience.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the correlation heatmap can lead to positive business impacts by providing clearer insights into interrelationships among variables and areas for potential improvement. Conversely, identifying negative correlations or dependencies on single factors can help identify areas of risk and potential negative growth, prompting businesses to address these vulnerabilities to ensure long-term resilience and success.

#### Chart - 15 - Pair Plot



In [None]:
hmdb_df.head(2)

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To provide effective recommendations on achieving a business objective, it's crucial to understand the specific objective and context. However, without knowing the precise objective, I can offer some general strategies that many businesses find useful in achieving their goals:

Define Clear Goals and Key Performance Indicators (KPIs): Clearly define what the business objective is and establish measurable KPIs to track progress. This ensures everyone in the organization is aligned and working towards the same goal.

Data Analysis and Insights: Utilize data analytics to gain insights into customer behavior, market trends, and operational efficiency. By understanding the data, businesses can make informed decisions and identify areas for improvement.

Customer-Centric Approach: Focus on understanding the needs and preferences of your target customers. Tailor products, services, and marketing strategies to meet their needs and enhance customer satisfaction.

Innovation and Adaptability: Embrace innovation and be willing to adapt to changing market conditions. Stay ahead of the competition by continuously improving products/services and exploring new opportunities.

Effective Marketing and Branding: Develop a strong brand identity and use targeted marketing strategies to reach your audience. Invest in channels that provide the highest return on investment (ROI) and continuously refine your marketing approach based on analytics and feedback.

Employee Engagement and Development: Ensure employees are engaged, motivated, and equipped with the necessary skills to contribute to the business objectives. Invest in training and development programs to enhance employee performance and job satisfaction.

Partnerships and Collaborations: Explore opportunities for strategic partnerships and collaborations that can help expand your reach, access new markets, or enhance your product/service offerings.

Financial Management: Maintain a sound financial management strategy to ensure the efficient use of resources and sustainable growth. Monitor key financial metrics closely and adjust strategies as needed to optimize profitability and cash flow.

Customer Experience Excellence: Provide exceptional customer service and prioritize delivering a positive customer experience at every touchpoint. Happy customers are more likely to become repeat customers and advocates for your brand.

Continuous Improvement: Foster a culture of continuous improvement within the organization. Encourage feedback from employees, customers, and stakeholders, and use it to drive innovation and refine strategies over time.

By implementing a combination of these strategies tailored to the specific business objective, organizations can increase their chances of success and achieve sustainable growth.

# **Conclusion**

 You learned about the importance of exploratory data analysis (EDA) and its role in data analysis.

📊 You learned how to explore and analyze a real-world dataset using Python programming language and its data analysis libraries such as Pandas, NumPy, and Matplotlib.

🧹 You learned various EDA techniques including data cleaning, data preprocessing, feature engineering, data visualization, and statistical analysis.

💡 You learned how to interpret and communicate insights and inferences derived from the data analysis, and use them to draw meaningful conclusions.

🤔 Through this lesson, you developed critical thinking and problem-solving skills required for data analysis in various domains such as business, finance, healthcare, and social sciences.

👨‍💻 You also had a hands-on experience with the EDA process and gained confidence in working with real-world datasets.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***