<a href="https://colab.research.google.com/github/shaikhrnafees/Data-Analysis-for-Hotel-Bookings/blob/main/Shaikh_EDA_on_Hotel_Booking_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -




Project Type - **EDA**

Contribution - **Individual**

Name - **Nafees Shaikh**

# **Project Summary -**

This project aims to perform Exploratory Data Analysis (EDA) on a dataset containing hotel booking information to gain insights and understand patterns in customer behaviour, booking trends, and other relevant factors that impact the hospitality industry. By conducting a comprehensive EDA, we intend to extract meaningful information that can guide decision-making, improve customer experience, and optimise hotel management strategies.


---


**Work Flow:**

Identification of factors contributing to booking cancellations, such as lead time and market segment.

Insights into the effectiveness of different booking distribution channels and targeting segments.

Understanding guest demographics and preferences, which could be used to offer personalized services.

Analysis of average daily rates and their impact on customer satisfaction and revenue.

Opportunities to increase revenue through additional services such as car parking or special requests.

Identification of trends in customer behavior and preferences for targeted marketing efforts.

Optimization of operational efficiency by understanding lead times, room type preferences, and booking patterns.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**




*   Clean the data by handling missing values, data type inconsistencies, and any other data quality issues.

*   Visualize the cleaned data to gain insights into the booking patterns, cancellation rates, customer demographics, and other relevant factors.

*   Generate visualizations (e.g., bar plots, histograms, scatter plots) to showcase the relationships between different variables.


*   Identify any trends or patterns in the data that could be useful for making business decisions.







#### **Define Your Business Objective?**



*   Reducing booking cancellations by identifying factors that contribute to cancellations and taking steps to address them.

*   Improving booking distribution strategies by analyzing the effectiveness of different distribution channels and targeting segments.

*   Enhancing customer experience by offering personalized services based on guest demographics and preferences.

*   Optimizing pricing strategies by analyzing average daily rates and the impact on customer satisfaction and revenue.

*   Identifying opportunities to increase revenue through additional services such as car parking or special requests.


*   Identifying trends in customer behavior and preferences to tailor marketing and promotional efforts effectively.


*   Improving operational efficiency by understanding lead times, room type preferences, and booking patterns.










# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
# Load Dataset
hb=pd.read_csv("/content/Hotel Bookings.csv")  # reading the dataset file
hb.index=hb.index+1 # starting the index no. from 0 to 1 by adding index+1
hb

### Dataset First View

In [None]:
# Dataset First Look

hb.head() # showing first five rows of DataFrame

In [None]:
hb.tail() # showing last 5 rows of DataFrame

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

rows_coulmns=hb.shape # by this we can get the rows and columns counts
rows_coulmns

In [None]:
hb.index # by this we can get the  count and step of index

In [None]:
hb.columns # by this every column of DataFrame is showing

### Dataset Information

In [None]:
# Dataset Info
Dataset_Info=hb.info()
Dataset_Info

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

d_v_count= hb.duplicated().value_counts() # it is counting the duplicated values
d_v_count

In [None]:
duplicate_counts=hb["hotel"].duplicated().value_counts() # also  single column and multiple column wise we can count the duplicated values
#duplicate_counts = hb[hb.duplicated(subset=['A', 'B'])].value_counts()-------- just a use case for multiple count for analysis
duplicate_counts

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

missing_counts=hb.isna().sum() # by this counting the missing values in every column
missing_counts

In [None]:
# Visualizing the missing values
m_values = missing_counts[missing_counts > 0]
per_miss=(m_values/len(hb))*100 # for getting percentage count
plt.bar(m_values.index,m_values.values) # index for x position and values for y position or height
plt.xlabel("Columns")
plt.ylabel("Numbers of Missing Values")
plt.title("Number of Missing Values in Columns")
plt.grid(axis='y', linestyle='--', alpha=0.7)
# Display the percentage of missing values on top of each bar
for index, value in enumerate(m_values): # usnig loop for each bar
    plt.text(index, value, f'{per_miss[index]:.2f}%', ha='center', va='bottom') # addint text on the plot by using ha and va parameters
plt.show()

### What did you know about your dataset?

The dataset contains 119390 rows & 32 columns. There are several duplicated rows and some columns contains missing values, so we need to treat them individually.

After cleaning the data and counting all duplicated entries, it was found that there were 29 duplicates in the dataset. This suggests that there were multiple instances of the same booking made by the same guest, which could be due to human error or system issues.

To visualize the missing values, a bar plot was created using the matplotlib library. The barplot revealed that the 'agent' and 'company' columns had a significant number of missing values, with over 95% of the data missing in these columns. The 'country' column also had some missing values, but to a lesser extent.

The presence of duplicates and missing values in the dataset indicates that the data may not be entirely reliable and could potentially impact the accuracy of any analysis or predictions made using this data. Therefore, it is important to carefully consider these factors when interpreting the results of any analysis or modeling performed on this dataset.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
hb.columns # for all clomuns visiblity in DataFrame

In [None]:
# Dataset Describe
hb.describe() # for numerical columns it gives us those operation results

### Variables Description

1. hotel: Name of hotel (City hotel or Resort hotel)

2. is_canceled: If the booking was cancelled (1) or not (0)

3. lead_time: The time taken between a customer makes a reservation and their actual arrival time

4. arrival_date_year: Year of hotel arrival date

5. arrival_date_month: Month of month hotel arrival date

6. arrival_date_week_number: week hotel arrival date

7. arrival_date_day_of_month: Day of hotel arrival date

8. stays_in_weekend_nights: Number of weekend nights (Saturday & Sunday) the guest stayed or booked to stay at the hotel

9. stays_in_week_nights: Number of weeknights (Mon to Fri) the guest stayed or booked to stay at the hotel

10. adults: Number of adults among guests

11. children: Number of children among guests

12. babies: Number of babies among guests

13. meal: Available options of meal for guests

14. country: country code

15. market_segment: A strategy that allows hotel owners to better understand that which segment customer belongs to

16. distribution_channel: Name of booking distribution channel. The term 'TA' means 'Travel Agents'& 'TO' means 'Tour Operators.'

17. is_repeated_guest: If the booking were repeated guest (1) or not (0)

18. previous_cancellations: Number of Previous bookings that were cancelled by customer priorities the current booking

19. previous_bookings_not_canceled: Number of Previous bookings that were not cancelled by the customer from current booking

20. reserved_room_type: Code of reserved room type

21. assigned_room_type: Code of assigned room type

22. booking_changes: Number changes made in the booking

23. deposit_type: Type of deposit made by guest

24. agent: Number booking made by company agent (ID)

25. company: The company that made the number of bookings by company (ID)

26. days_in_waiting_list: The number of days booking was in the waiting list.

27. customer_type: Type of customer, assuming one of four categories

28. adr: Average daily rate

29. required_car_parking_spaces: Number of car parking spaces required by the customer

30. total_of_special_requests: Number of special requests made by the customer

31. reservation_status: Status of reservation

32. reservation_status_date: The date at which the last reservation status was updated

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

unique_values = hb.apply(pd.Series.unique) # by this we can get the list of uniques values in DataFrame
unique_values
#unique_values["name/indexnumber"]----use case for specific row

In [None]:
unique_values_counts = hb.apply(pd.Series.nunique) # by this we can get the total count of unique values in DataFrame
unique_values_counts

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Dropping the company column since it has more than 90 % empty columns
hb = hb.drop(columns='company',axis=1) # by this we can droppin columns , axis parameter for rows and columns

In [None]:
# Filling the Missing value

hb[['agent','children']]= hb[['agent','children']].fillna(0)  # filling 0 in children and agent missing values in column
hb[['country']] = hb[['country']].fillna('other') # same in country with filling "others"

# Filling missing values with those because it justfies and does not effect on DataFrame

In [None]:
hb.isnull().sum() # now thew missing values is zero in DataFrame

In [None]:
hb.shape # shape of row is different because we dropped "company" column

In [None]:
# Checking rows having duplicate values
hb.duplicated().sum()

In [None]:
# Dropping duplicated rows
hb= hb.drop_duplicates(keep='first') # dropping the duplicate rows but keeps the first one

In [None]:
hb.duplicated().sum()

In [None]:
# Creating a new columns 'total_stays' by merging columns containg 'stays_in_weekend_nights' & 'stays_in_week_nights'
hb['total_stay'] = hb['stays_in_weekend_nights']+hb['stays_in_week_nights']


# Adding [total people num as column, i.e total type of person =no. of adults + No. of children + No. of babies
hb['total_guests'] = hb['adults'] + hb['children']+hb['babies']


# Now dropping the individual columns
hb.drop(columns=['stays_in_weekend_nights', 'stays_in_week_nights','adults','children','babies'], inplace=True)


In [None]:
# People can not be in float count
hb['total_guests']=hb['total_guests'].astype(int)
hb.head(20)

In [None]:
# Changing datatype of 'reservation_status_date' & arrival_date from object to datetime
hb['reservation_status_date'] = pd.to_datetime(hb['reservation_status_date'])
hb['reservation_status_date'] = hb['reservation_status_date'].dt.strftime('%d-%m-%Y')
hb['reservation_status_date']

In [None]:
# Mapping the 'arrival_date_month' columns with its numerical equivalent
month_map = {'January': 1, 'February': 2, 'March': 3, 'April': 4, 'May': 5, 'June': 6,
    'July': 7, 'August': 8, 'September': 9, 'October': 10, 'November': 11, 'December': 12}

hb['arrival_date_month'] = hb['arrival_date_month'].map(month_map) # changing columns name to match the syntax
hb.rename(columns={'arrival_date_day_of_month': 'day',
                   'arrival_date_month': 'month',
                   'arrival_date_year': 'year'}, inplace=True)

# Combine year, month, and day columns into a single datetime column
hb['date'] = pd.to_datetime(hb[['year', 'month', 'day']], errors='coerce') # errors for handling error and convert into NaT

# Dropping columns of date, month, year
hb.drop(columns=['year', 'month', 'day'], inplace=True) # dropping these columns because we made the date columns

In [None]:
hb.head()

### What all manipulations have you done and insights you found?

**Each step of the dataframe manipulation process**:

---




***Dropping the 'company' column:*** The 'company' column has a high
percentage of missing values (95%). Since imputing or filling these missing values would not provide meaningful insights, it's better to drop the column altogether to maintain data integrity.





***Filling null values in 'children' and 'agent' columns with zero:*** Null values in 'children' and 'agent' columns may indicate no children or no agent involved. Filling these null values with zero allows for better analysis and ensures that the dataset remains consistent.

***Filling null values in 'country' column with 'Others'***: Null values in the 'country' column may represent various reasons, such as missing data or unidentified country codes. By filling these null values with 'Others', we categorize these instances separately and avoid losing valuable information.

***Dropping duplicated rows while keeping the first occurrence:*** Duplicate rows can skew analysis results and lead to incorrect conclusions. By dropping duplicated rows and keeping the first occurrence, we ensure that the dataset remains accurate and representative.

***Creating a new column 'total_stays':*** The 'stays_in_weekend_nights' and 'stays_in_week_nights' columns provide information about the length of a guest's stay. By merging these two columns into a new 'total_stays' column, we create a single metric that represents the total length of a guest's stay, which can be more informative for analysis.

***Mapping the 'arrival_date_month' column with its numerical equivalent:*** While 'arrival_date_month' provides the month names, mapping them to their numerical equivalents (e.g., January as 1, February as 2) makes it easier to perform numerical analysis and comparisons.

***Merging date, month, and year columns into a single column:*** The three separate columns containing date, month, and year information are merged into a single column called 'date'. This consolidation simplifies the dataset and makes it easier to work with for analysis.

***Changing the datatype of 'reservation_status_date' and 'arrival_date':*** Converting 'reservation_status_date' and 'arrival_date' from object to datetime datatype allows for easier manipulation, sorting, and filtering of date-related data.


---



In summary, these dataframe manipulation steps aim to improve the dataset's integrity, fill in missing values, create new features for analysis, and simplify date handling. By performing these steps, the dataset becomes more suitable for accurate and meaningful analysis and modeling.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code (Hotel Type Distribution)

hb_counts=hb["hotel"].value_counts() # this count the cvalue counts in hootel column
plt.figure(figsize=(15, 8)) #The figsize parameter specifies the size of the figure in inches
plt.pie(hb_counts, labels=hb_counts.index, autopct='%1.2f%%', startangle=45, explode=(0, 0.05), colors=['skyblue', 'lightgreen'],textprops={'fontsize': 13},shadow=True)
plt.title('Hotel Type Distribution')
plt.axis('equal')  # Equal aspect ratio ensures the pie chart is circular.
plt.legend(title='Hotel Type', loc='upper right', labels=['City Hotel', 'Resort Hotel'])
plt.axis('equal')
plt.show()

##### 1. Why did you pick the specific chart?

The specific chart chosen is a pie chart, which is useful for visualizing the distribution of categorical data. In this case, the categorical data is the type of hotel—City Hotel or Resort Hotel.

##### 2. What is/are the insight(s) found from the chart?

The insight from the chart is that there is a significant difference in the distribution of hotel types. The City Hotel accounts for 61.14% of the total, while the Resort Hotel accounts for 38.86%. This indicates that the City Hotel is more popular or more prevalent in the dataset compared to the Resort Hotel.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights could potentially help create a positive business impact. For example, if the company owns both City and Resort Hotels, they might want to focus more on the City Hotel segment since it appears to be more popular. Alternatively, if they want to increase the presence of the Resort Hotel, they might need to invest in marketing or other strategies to attract more customers.

---



---



However, there could be negative growth or implications from this insight. For instance, if the company was primarily focused on the Resort Hotel segment and was not aware of the popularity of the City Hotel, they might have been missing out on potential revenue from the City Hotel segment. This could lead to a negative impact on overall business growth if they do not adjust their strategies accordingly.

#### Chart - 2

In [None]:
# Chart - 2 visualization code

In [None]:
# Assigned Booking Preference by Room Types

room_type_counts = hb['assigned_room_type'].value_counts()

# Plot the bar chart
plt.figure(figsize=(14, 8))

colors = ['skyblue', 'lightgreen','black', 'red', 'lightcoral', 'gold']
bars=plt.bar(room_type_counts.index, room_type_counts.values,color=colors)
plt.xlabel('Room Type',fontsize=12)
plt.ylabel('Number of Bookings',fontsize=15)
plt.title('Assigned Booking Preference by Room Types',fontsize=12)
plt.xticks(rotation=0, ha='right',fontsize=12)
plt.tick_params(axis='y', labelsize=10)

# Add the value on top of each bar
for bar in bars:
    height = bar.get_height()
    plt.annotate(f'{height}', xy=(bar.get_x() + bar.get_width() / 2, height), xytext=(0, 3),
                 textcoords="offset points", ha='center', va='bottom', fontsize=10, color='black')
plt.show()

In [None]:
# Reserved Booking Preference by Room Types

reserved_room_type_counts = hb['reserved_room_type'].value_counts()

# Plot the bar chart
plt.figure(figsize=(14, 8))

colors = ['lightgreen','yellow','pink','purple','blue','orange' ]
bars=plt.bar(reserved_room_type_counts.index, reserved_room_type_counts.values,color=colors)
plt.xlabel('Room Type',fontsize=12)
plt.ylabel('Number of Bookings',fontsize=15)
plt.title('Reserved Booking Preference by Room Types',fontsize=12)
plt.xticks(rotation=0, ha='right',fontsize=12)
plt.tick_params(axis='y', labelsize=10)

# Add the value on top of each bar
for bar in bars:
    height = bar.get_height() # This line gets the height of the current bar
    plt.annotate(f'{height}', xy=(bar.get_x() + bar.get_width() / 2, height), xytext=(0, 3),
                 textcoords="offset points", ha='center', va='bottom', fontsize=10, color='black') # annotate for lableling text on bar
plt.show()

##### 1. Why did you pick the specific chart?

The specific chart chosen is a bar chart, which is suitable for comparing the number of bookings for different categories (in this case, room types) and identifying trends or patterns within the data.

##### 2. What is/are the insight(s) found from the chart?

**Assigned Room Type Preference**: The bar chart for assigned room types shows that the most popular room type is 'A', followed by 'D' and 'E'. The chart also reveals that there are no bookings for room types 'P', 'L', and 'G'. This suggests that certain room types are more preferred than others, possibly due to factors like room size, amenities, or location within the hotel.


---


**Reserved Room Type Preference**: The bar chart for reserved room types also shows that the most popular room type is 'A', followed by 'D' and 'E'. Similar to the assigned room types, there are no bookings for room types 'P', 'L', and 'G'. This indicates that the preference for room types is consistent between assigned and reserved bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights could help create a positive business impact by informing hotel management about the popularity of certain room types. For example, if 'A' rooms are consistently in high demand, the hotel could allocate more resources to maintain these rooms or offer special promotions to attract guests to other room types that are less frequently booked.


---


However, there are no insights from the charts that directly indicate negative growth. Negative growth could occur if certain room types are consistently unpopular, leading to decreased occupancy rates and potential revenue loss. However, the charts do not show this trend as all room types have some bookings, albeit in varying frequencies.

#### Chart - 3

In [None]:
#Chart A (Booking Customer Type)

counts_df = hb.groupby(['customer_type', 'hotel']).size().reset_index(name='count') # groupy for the size between type and hotel

# Pivot the data to create a table-like structure
pivot_table = counts_df.pivot(index='customer_type', columns='hotel', values='count').fillna(0) #used to reshape a DataFrame from long to wide format.
pivot_table['Total'] = pivot_table.sum(axis=1)
pivot_table
# Plot the donut chart
fig, ax = plt.subplots(figsize=(15, 8))
ax.axis('equal')

# Inner circle (donut hole)
ax.pie(pivot_table['Total'], labels=pivot_table.index, autopct='%1.2f%%', startangle=180, wedgeprops=dict(width=0.3, edgecolor='w'))
ax.legend()


# Set aspect ratio to be equal to ensure the donut chart is circular
ax.set_aspect('equal')

# Add a title
plt.title('Bookings by Customer Type',fontsize=20,fontweight='bold',loc='center')
plt.show()

In [None]:
#Chart B (Bookings by Customer Type in City Hotel vs Resort Hotel)

fig, ax = plt.subplots(1,2) # for multiple graph acooriding to (nrows,ncols,index)

# Inner circle (donut hole)
# ax[0]- left plot/ ax[1]-right plot

ax[0].pie(pivot_table['City Hotel'],textprops={'fontsize': 7}, labels=pivot_table.index, autopct='%1.2f%%', startangle=180, wedgeprops=dict(width=0.4, edgecolor='w'))
ax[0].set_title('City Hotel', x=0.5, y=0.5,fontsize=8,fontweight='bold')
ax[0].legend(loc='center', bbox_to_anchor=(1.0, 0.0),fontsize='small')
ax[1].pie(pivot_table['Resort Hotel'],textprops={'fontsize': 7}, labels=pivot_table.index, autopct='%1.2f%%', startangle=180, wedgeprops=dict(width=0.4, edgecolor='y'))
ax[1].set_title('Resort Hotel', x=0.5, y=0.5,fontsize=8,fontweight='bold')
# Set aspect ratio to be equal to ensure the donut chart is circular
ax[0].set_aspect('equal')
ax[1].set_aspect('equal')
# Add a title
plt.title('Bookings by Customer Type in City Hotel vs Resort Hotel',loc='right',fontweight='bold')
plt.show()

##### 1. Why did you pick the specific chart?

The specific chart chosen is a donut chart. Donut charts are effective for showing the distribution of parts to a whole, often used to visualize categorical data. In this case, the chart helps to understand the distribution of customer types across different hotel types, providing a clear and concise overview of booking patterns.

##### 2. What is/are the insight(s) found from the chart?

**Transient Customers Dominance:** Across all hotels, transient customers constitute a significant portion of bookings, while contract customers make up a smaller percentage.


---


**City Hotels vs. Resort Hotels:** Both hotel types have a higher percentage of transient customers. However, City Hotels have a relatively higher percentage compared to Resort Hotels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

The gained insights can potentially lead to a positive business impact:

**Tailored Marketing Strategies:** Understanding the dominance of transient customers can help hotels tailor their marketing efforts to target this segment more effectively, potentially leading to increased bookings.

**Service Offerings:** Hotels can adjust their service offerings to cater better to the needs and preferences of transient customers, such as flexible booking options or short-term promotions.


---



Insights Leading to Negative Growth:

**Heavy Reliance on Transient Customers:** If a hotel business heavily relies on transient customers and this segment experiences a decline in demand (due to economic factors or shifts in travel preferences), it could lead to negative growth.

**Inability to Adapt:** If there's a significant shift in customer type distribution between City Hotels and Resort Hotels and the business is not able to adapt its services and marketing strategies, it could lead to negative growth.


---


**Conclusion:**

The insights from the donut chart provide valuable information that can guide strategic decisions to enhance customer experience and drive positive business outcomes. However, it's essential for hotel businesses to remain agile and responsive to market changes to mitigate potential negative impacts.

#### Chart - 4

In [None]:
# Chart - 4 visualization code (Monthly Bookings Trend)

# Calculate the hotel counts per month in word representation
hb['month']=hb['date'].dt.month # extract month from date column
hotel_counts_by_month = hb.groupby('month')['hotel'].count()
plt.figure(figsize = (11,8))
c=['r','b','g','y']
plt.stem(hotel_counts_by_month.index, hotel_counts_by_month.values,linefmt=":",markerfmt="r^")
plt.xlabel('Month',fontsize=15)
plt.ylabel('Number of Bookings',fontsize=15)
plt.title('Monthly Bookings Trend',fontsize=15)
plt.xticks(ticks=range(1, 13), labels=['January', 'February', 'March', 'April', 'May', 'June',
                                       'July', 'August', 'September', 'October', 'November', 'December'],
           rotation=90,fontsize=9, fontweight='bold')
for i, value in enumerate(hotel_counts_by_month.values): # iterate index and values in counts of booking hotel month wise
    plt.text(hotel_counts_by_month.index[i], value+30, f'{value}', ha='left', va='baseline',fontweight='bold',rotation=45,fontsize=7)
plt.ylim(bottom=0)
plt.show()

In [None]:
hb.drop(columns=['month'],inplace=True)#-------dropped the columns of month after use for safer analysis

##### 1. Why did you pick the specific chart?

**Calculating Hotel Counts by Month:**

The code first extracts the month from the 'date' column and creates a new column named 'month'. Then, it groups the data by month and counts the number of bookings for each month.


---



**Plotting the Monthly Bookings Trend:**

The stem plot is used to visualize the monthly bookings trend. Each stem represents a month, with the height of the stem corresponding to the number of bookings in that month. The x-axis represents the months, and the y-axis represents the number of bookings.

##### 2. What is/are the insight(s) found from the chart?

**Monthly Variation:** The plot shows the variation in the number of bookings across different months. Some months have higher booking numbers than others, indicating seasonal trends or specific periods of higher demand.

**Peak Months**: The plot helps identify the months with the highest number of bookings, which can be useful for planning and resource allocation.

**Monthly Trend**: The plot shows the overall trend of bookings over the months, helping to identify patterns or changes in demand over time.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

The insights gained from the plot can help create a positive business impact by informing strategic decisions, such as:

**Marketing Strategies:** Based on the monthly trends, hotels can plan targeted marketing campaigns to attract more customers during peak months or during periods of low demand.

**Pricing Strategies:** Hotels can adjust their pricing strategies based on demand fluctuations. For example, they can offer discounts or promotions during low-demand months to attract more bookings.
Resource Allocation: By understanding the monthly variation in bookings, hotels can allocate resources more efficiently, such as staffing levels, inventory management, and maintenance schedules.


---


**Potential Negative Growth:**

**Seasonal Variation:** If a hotel business heavily relies on bookings during specific months and experiences a decline in demand during those months, it could lead to negative growth.

**Inability to Adapt:** If there's a significant change in monthly booking trends, and the business is not able to adapt its marketing or pricing strategies accordingly, it could lead to negative growth.



---


**Conclusion:**

The stem plot provides valuable insights into the monthly bookings trend, which can help hotels make informed decisions to optimize their operations and drive positive business outcomes. However, it's essential for hotel businesses to remain agile and responsive to changing market conditions to mitigate potential negative impacts.

#### Chart - 5

In [None]:
# Chart - 5 visualization code (ADR Across Distribution Channel)

# Group by distribution_channel and hotel and calculate the average ADR
distribution_channel_df = hb.groupby(['distribution_channel', 'hotel'])['adr'].mean().reset_index()

# Set plot size and plot bar chart
plt.figure(figsize = (11,8))
ax=sns.barplot(data= distribution_channel_df,x ='distribution_channel', y= 'adr', hue = 'hotel',palette='Accent')
adr_value = f'{height:.2f}'
for p in ax.patches:
    height = p.get_height()
    adr_value = f'ADR={height:.1f}' # getting adr values for each bar
    ax.annotate(adr_value, (p.get_x() + p.get_width() / 2., height), ha='center', va='center'
    , fontsize=8, color='black', xytext=(1, 5), textcoords='offset pixels')
plt.title('ADR Across Distribution Channel',fontsize=15,)
plt.xticks(fontweight='bold')
plt.xlabel('Distribution Channel',fontsize=15)
plt.ylabel('Average Daily Rate (ADR)',fontsize=15)
fig.show()

##### 1. Why did you pick the specific chart?

**Grouping and Calculating ADR:**

The code first groups the data by the distribution_channel and hotel columns and calculates the mean ADR for each group. It then resets the index to create a new DataFrame called distribution_channel_df.



---



**Plotting the Bar Chart:**

The bar plot visualizes the average ADR across different distribution channels for each hotel type. Each bar represents a distribution channel, and the height of the bar represents the average ADR. The bars are grouped by hotel type, indicated by different colors.

##### 2. What is/are the insight(s) found from the chart?

**ADR Variation Across Channels:** The plot shows the variation in ADR across different distribution channels for each hotel type. Some channels may have higher or lower ADRs compared to others.

**Comparing Hotel Types:** The plot allows for a comparison of ADRs between City Hotels and Resort Hotels across different distribution channels.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

The insights gained from the plot can help create a positive business impact by informing strategic decisions, such as:

**Pricing Strategies:** Hotels can adjust their pricing strategies based on ADR variations across different distribution channels. For example, they can offer discounts or promotions through specific channels to attract more bookings.

**Distribution Channel Optimization:** Hotels can optimize their distribution channels based on ADR performance. They can focus on channels that yield higher ADRs and allocate resources accordingly.



---


**Potential Negative Growth:**

**ADR Decline:** If the ADR across all channels shows a declining trend, it could lead to negative growth as it may indicate a decrease in revenue per booking.

**Inability to Adapt:** If there's a significant change in ADR trends, and the business is not able to adapt its pricing or distribution strategies accordingly, it could lead to negative growth.



---


**Conclusion:**

The bar plot provides valuable insights into the average ADR across different distribution channels for City Hotels and Resort Hotels, which can help hotels make informed decisions to optimize their revenue and drive positive business outcomes. However, it's essential for hotel businesses to remain agile and responsive to changing market conditions to mitigate potential negative impacts.

#### Chart - 6

In [None]:
# Chart - 6 visualization code (Hotel Type vs. Cancellation Status)

plt.figure(figsize =(12,6))

#palette parameter allows you to customize the colors used in the plot
ax1 = sns.countplot(data = hb,x = 'hotel', hue = 'is_canceled',palette='Dark2')

# Add value annotations to each bar
for p in ax1.patches:
    height = p.get_height()
    width = p.get_width()
    x, y = p.get_xy()
    ax1.annotate(f'{int(height)}', (x + width / 2, height), ha='center', va='bottom', fontsize=10)

plt.title ('Hotel Type vs. Cancellation Status',fontsize=15)
plt.xlabel('Hotel',fontsize=15)
plt.ylabel('Number of Reservation',fontsize=15)


plt.show()

##### 1. Why did you pick the specific chart?

The chart type used is a countplot from the Seaborn library, which is suitable for comparing the frequency of categorical variables.

##### 2. What is/are the insight(s) found from the chart?



*  The plot shows the number of reservations for each hotel type (City Hotel and Resort Hotel) and differentiates between canceled and non-canceled reservations.

*  It can be observed that City Hotels have a higher number of both canceled and non-canceled reservations compared to Resort Hotels.

*  Additionally, the plot shows that canceled reservations are more common in City Hotels compared to Resort Hotels.





##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**


*   The insights gained from the plot can help hotels understand the distribution of canceled and non-canceled reservations across different hotel types.
*   This information can be used to optimize cancellation policies, improve customer service, and enhance booking processes to reduce the number of canceled reservations.


---


**Negative Growth Considerations:**


*   If the cancellation rate remains consistently high for a hotel, it can lead to a negative impact on revenue and overall business growth.
*   High cancellation rates may indicate dissatisfaction with the hotel's services or pricing, leading to negative reviews and a decline in bookings.


---

**Conclusion:**


*   The bar plot provides a clear visualization of the relationship between hotel type and cancellation status, which can help hotels make informed decisions to improve customer satisfaction and reduce cancellations.
*   By addressing the factors contributing to canceled reservations, hotels can work towards positive business outcomes, such as increased customer loyalty and improved financial performance.






#### Chart - 7

In [None]:
# Chart - 7 visualization code (Special Requests)

plt.figure(figsize =(12,6))
ax1 = sns.countplot(data = hb,x = 'hotel', hue = 'total_of_special_requests',palette='Dark2')

# Add value annotations to each bar
for p in ax1.patches:
    height = p.get_height()
    width = p.get_width()
    x, y = p.get_xy()
    ax1.annotate(f'{int(height)}', (x + width / 2, height), ha='center', va='bottom', fontsize=10)

plt.title ('Special Requests',fontsize=15)
plt.xlabel('Hotel',fontsize=15)
plt.ylabel('Requests',fontsize=15)


plt.show()

##### 1. Why did you pick the specific chart?

The chart type used is a countplot from the Seaborn library, which is suitable for comparing the frequency of categorical variables.

##### 2. What is/are the insight(s) found from the chart?



*   The plot shows the number of reservations for each hotel type (City Hotel and Resort Hotel) and differentiates between canceled and non-canceled reservations.

*   It can be observed that City Hotels have a higher number of both canceled and non-canceled reservations compared to Resort Hotels.


*   Additionally, the plot shows that canceled reservations are more common in City Hotels compared to Resort Hotels.






##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**


*   The insights gained from the plot can help hotels understand the distribution of special requests across different hotel types.

*   This information can be used to enhance customer service, tailor amenities, and improve overall guest experience.


---







**Negative Growth Considerations:**


*   If guests consistently make a high number of special requests, it could indicate dissatisfaction with the standard services or amenities provided by the hotel.
*   High numbers of special requests may also increase operational costs for the hotel, affecting profitability.


---






**Conclusion:**


*   The bar plot provides a clear visualization of the relationship between hotel type and the total number of special requests, which can help hotels make informed decisions to improve guest satisfaction and overall service quality.
*   By addressing the factors contributing to special requests, hotels can work towards positive business outcomes, such as increased guest loyalty and improved guest ratings.






#### Chart - 8

In [None]:
# Chart - 8 visualization code (Hotel Type vs. Lead Time)

plt.figure(figsize=(10, 6))
sns.boxplot(data=hb, x='hotel', y='lead_time')
plt.title('Hotel Type vs. Lead Time',fontsize=15)
plt.xlabel('Hotel',fontsize=15)
plt.ylabel('Lead Time',fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

The chart type used is a boxplot from the Seaborn library, which is suitable for visualizing the distribution of a continuous variable (lead time) across different categories (hotel types).

##### 2. What is/are the insight(s) found from the chart?



*   The plot shows the distribution of lead times for each hotel type (City Hotel and Resort Hotel).

*   It can be observed that the median lead time is slightly higher for City Hotels compared to Resort Hotels.
*   The box plot also provides information about the spread of lead times, including the interquartile range (IQR) and any outliers.







##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**


*   The insights gained from the plot can help hotels understand the distribution of lead times across different hotel types.
*   This information can be used to optimize operations, such as staffing levels and inventory management, based on expected lead times.


---







**Negative Growth Considerations:**


*   If the lead time for a hotel is consistently high, it could indicate inefficiencies in the booking process or operational challenges.
*   Long lead times may also affect customer satisfaction, as guests may perceive longer lead times as a barrier to booking.


---





**Conclusion:**


*   The box plot provides a clear visualization of the relationship between hotel type and lead time, which can help hotels make informed decisions to improve operational efficiency and enhance guest experience.
*   By addressing the factors contributing to lead times, hotels can work towards positive business outcomes, such as increased customer satisfaction and improved operational performance.






#### Chart - 9

In [None]:
# Chart - 9 visualization code (Repeated Guest)

guests_by_country = hb.groupby('hotel')['is_repeated_guest'].sum().reset_index()
l=["City Hotel","Resort Hotel"]
prefer=guests_by_country
plt.pie(prefer.value_counts(),labels=l, autopct='%1.f%%', startangle=45, explode=(0, 0.05), colors=['red', 'green'],textprops={'fontsize': 10},shadow=True)
plt.title("Repeated Guest",fontsize=15,fontweight='bold')
plt.show()
#plt.show()

##### 1. Why did you pick the specific chart?

The specific chart chosen is a pie chart. Pie charts are effective for showing the distribution of parts to a whole, often used to visualize categorical data. In this case, the chart helps to understand the proportion of repeated guests for each hotel type (City Hotel and Resort Hotel).

##### 2. What is/are the insight(s) found from the chart?

**Proportion of Repeated Guests:** The chart shows the proportion of repeated guests for each hotel type. It can be observed that a higher proportion of guests are repeated guests at the Resort Hotel compared to the City Hotel.

**Overall Proportion:** The chart also shows the overall proportion of repeated guests across all hotels, providing a holistic view of guest behavior.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

The insights gained from the chart can help create a positive business impact by informing strategic decisions, such as:


*   Customer Retention: Understanding the proportion of repeated guests can help hotels focus on retaining these guests by offering loyalty programs or personalized services.
*   Marketing Strategies: Hotels can tailor their marketing strategies to attract more repeated guests based on their preferences and booking behavior.


---







**Negative Growth Considerations:**


*   High Churn Rate: If the proportion of repeated guests is low or decreasing, it could indicate a high churn rate, which may lead to negative growth as it could affect the hotel's revenue and profitability.
*   Customer Satisfaction: A low proportion of repeated guests could also indicate dissatisfaction with the hotel's services or amenities, leading to negative reviews and a decline in bookings.


---







**Conclusion:**

The pie chart provides valuable insights into the proportion of repeated guests for each hotel type, which can help hotels make informed decisions to improve customer retention and drive positive business outcomes. However, it's essential for hotel businesses to remain agile and responsive to changing market conditions to mitigate potential negative impacts.

#### Chart - 10

In [None]:
# Chart - 10 visualization code (Hotel Type vs. Market Segment)

plt.figure(figsize=(12, 6))
sns.countplot(hb, x='hotel', hue='market_segment', palette='muted')
plt.title('Hotel Type vs. Market Segment',fontsize=15)
plt.xlabel('Hotel',fontsize=15)
plt.ylabel('Number of Bookings',fontsize=15)
plt.legend(title='Market Segment', loc='upper right')
plt.show()

##### 1. Why did you pick the specific chart?

The chart type used is a countplot from the Seaborn library, which is suitable for comparing the frequency of categorical variables.

##### 2. What is/are the insight(s) found from the chart?



*   The plot shows the number of bookings for each hotel type (City Hotel and Resort Hotel) and differentiates between market segments.

*   It can be observed that the majority of bookings for both hotel types come from the Online Travel Agent (OTA) and Offline Travel Agent (OTA) segments.
*   The plot also shows variations in booking patterns across different market segments for each hotel type.




*   List item



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**


*   The insights gained from the plot can help hotels understand the distribution of bookings across different market segments for each hotel type.
*   This information can be used to optimize marketing strategies, allocate resources, and tailor services based on the preferences and behaviors of guests from different market segments.


---







**Negative Growth Considerations:**


*   If there's a significant decline in bookings from certain market segments, it could indicate a loss of market share or a decrease in demand, leading to negative growth.
*   High dependency on a specific market segment can also be a risk factor, as changes in the behavior or preferences of that segment can impact the hotel's revenue and profitability.


---






**Conclusion:**


*   The count plot provides a clear visualization of the relationship between hotel type and market segment, which can help hotels make informed decisions to optimize revenue and drive positive business outcomes.
*   By understanding the distribution of bookings across different market segments, hotels can tailor their marketing efforts, improve customer targeting, and enhance overall guest experience.






#### Chart - 11

In [None]:
# Chart - 11 visualization code (Total Guests by Country)

# Group by 'country' and calculate the total number of guests for each country
guests_by_country = hb.groupby('country')['total_guests'].sum().reset_index()

# Sort the data by the total number of guests in descending order
guests_by_country = guests_by_country.sort_values('total_guests', ascending=False)

# Select the top 15 countries with the highest number of guests
top_10_countries = guests_by_country.head(10)

# Create the bar plot
plt.figure(figsize=(12, 6))
sns.barplot(data=top_10_countries, x='country',y ='total_guests', palette='viridis')
plt.title('Total Guests by Country',fontsize=15)
plt.xlabel('Country',fontsize=15)
plt.xticks(fontweight='bold')
plt.ylabel('Total Guests',fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

The chart type used is a barplot from the Seaborn library, which is suitable for comparing the values of a categorical variable (country) across different categories.

##### 2. What is/are the insight(s) found from the chart?



*   The plot shows the total number of guests for each of the top 15 countries with the highest number of guests.

*   It can be observed that the top countries with the highest number of guests are not necessarily the most populous countries.
*   The plot also shows variations in the number of guests across different countries.








##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**


*   The insights gained from the plot can help hotels understand the distribution of guests across different countries.

*   This information can be used to optimize marketing strategies, tailor services, and improve guest targeting based on the preferences and behaviors of guests from different countries.


---







**Negative Growth Considerations:**


*   If there's a significant decline in the number of guests from certain countries, it could indicate a loss of market share or a decrease in demand, leading to negative growth.

*   High dependency on a specific country for guests can also be a risk factor, as changes in the behavior or preferences of that country's guests can impact the hotel's revenue and profitability.


---







**Conclusion:**


*   The bar plot provides a clear visualization of the total number of guests by country, which can help hotels make informed decisions to optimize revenue and drive positive business outcomes.

*   By understanding the distribution of guests across different countries, hotels can tailor their marketing efforts, improve customer targeting, and enhance overall guest experience.






#### Chart - 12

In [None]:
# Chart - 12 visualization code (Total Guests by Country)

plt.figure(figsize=(10, 6))
plt.scatter(hb['total_stay'], hb['total_guests'], color='r', marker='.',cmap="Accent")
plt.title('Total Guests by Country',fontsize=15)
plt.xlabel('Total Stay (Weekend Nights + Week Nights)',fontsize=15)
plt.ylabel('Total Guest')
plt.grid(True, axis='both', color='green', linestyle=':', linewidth=0.5, alpha=0.5, zorder=0)
plt.show()

##### 1. Why did you pick the specific chart?

The chart type used is a scatter plot from the Matplotlib library, which is suitable for visualizing the relationship between two continuous variables.

##### 2. What is/are the insight(s) found from the chart?



*   The plot shows the total number of guests on the y-axis and the total stay (weekend nights + week nights) on the x-axis for each booking.
*   It can be observed that there is a wide range of total stays and total guests, with some bookings having a high total stay and high total guests, while others have a low total stay and low total guests.

*   The plot also shows variations in the distribution of total guests across different total stays.









##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**


*   The insights gained from the plot can help hotels understand the relationship between the total stay and the total number of guests.

*   This information can be used to optimize pricing strategies, allocate resources, and tailor services based on the preferences and behaviors of guests with different total stays.


---







**Negative Growth Considerations:**


*   If there's a significant decline in the number of guests with longer total stays, it could indicate a decrease in demand for longer stays, leading to negative growth.

*  High dependency on a specific total stay duration can also be a risk factor, as changes in the behavior or preferences of guests with that total stay duration can impact the hotel's revenue and profitability.




---



**Conclusion:**


*   The scatter plot provides a clear visualization of the relationship between the total stay and the total number of guests, which can help hotels make informed decisions to optimize revenue and drive positive business outcomes.

*   By understanding the relationship between the total stay and the total number of guests, hotels can tailor their marketing efforts, improve customer targeting, and enhance overall guest experience.






#### Chart - 13

In [None]:
grouped = hb.groupby('hotel')
df1 = grouped.get_group('Resort Hotel')
df2 = grouped.get_group('City Hotel')
df1

In [None]:
# Chart - 13 visualization code (Data of Agent)

fig, axs = plt.subplots(1, 2, figsize=(12, 6))

# Get the top 10 highest value counts in column 'agent' for column 'A'
top_10_df1 = df1['agent'].value_counts().sort_values(ascending=False).head(10)#.sort_index().head(10)
# Create a histogram plot for column 'A'
top_10_df1.plot(kind='bar', ax=axs[0],edgecolor='black')
axs[0].set_xlabel('Resort Hotel Agent')
axs[0].set_ylabel('Booking Counts')
axs[0].set_title('Top 10 Agents')

# Get the top 10 highest value counts in column 'agent' for column 'B'
top_10_df2 = df2['agent'].value_counts().sort_values(ascending=False).head(10)
# Create a histogram plot for column 'B'
top_10_df2.plot(kind='bar', ax=axs[1],edgecolor='black')
axs[1].set_xlabel('City Hotel Agents')
axs[1].set_ylabel('Booking Counts')
axs[1].set_title('Top 10 Agents')

# Adjust layout
plt.tight_layout()

# Show plot
plt.show()

In [None]:
top_10_counts=hb.groupby('hotel')['agent'].value_counts().sort_values(ascending=False).head(20)
top_10_counts.plot(kind='bar')
plt.show()

##### 1. Why did you pick the specific chart?

The specific chart chosen is a bar plot. Bar plots are effective for comparing the frequency of categorical variables across different categories. In this case, the chart helps to visualize the top 10 agents for booking counts in each hotel type (Resort Hotel and City Hotel) separately.

##### 2. What is/are the insight(s) found from the chart?



*   **Top Agents:** The chart shows the top 10 agents with the highest booking counts for each hotel type. It provides a clear comparison of the most influential agents for bookings in each hotel type.

*   **Booking Trends**: The chart also highlights the popularity of specific agents in each hotel type, which could indicate booking trends or preferences among guests.





##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**


*   **Agent Collaboration**: Understanding which agents contribute the most to bookings can help hotels establish closer collaborations with these agents, leading to more bookings and positive business impact.

*   **Marketing Strategies**: Hotels can tailor their marketing strategies based on the preferences of guests who book through specific agents, improving customer targeting and increasing bookings.



---




**Negative Growth Considerations:**


*   **Dependency on Specific Agents**: If a hotel is heavily dependent on a few agents for bookings, it could be risky, as changes in the behavior or preferences of these agents can impact the hotel's revenue and profitability.

*   **Booking Concentration:** If a significant portion of bookings is concentrated among a few agents, it could indicate a lack of diversity in the hotel's booking sources, leading to negative growth.



---





**Conclusion:**


*   The bar plot provides valuable insights into the top agents for bookings in each hotel type, which can help hotels make informed decisions to optimize revenue and drive positive business outcomes.

*   By understanding the influence of specific agents on bookings, hotels can tailor their strategies, improve customer targeting, and enhance overall guest experience.




#### Chart - 14 - Correlation Heatmap

In [None]:
# Chart - 14 visualization code (Correlation Heatmap visualization code)

plt.figure(figsize = (18,10))
sns.heatmap(hb.corr(),annot = True, fmt='.2f')
plt.title('Correlation of the Columns', fontsize = 20)
plt.show()

##### 1. Why did you pick the specific chart?

The chart type used is a heatmap from the Seaborn library. Heatmaps are used to visualize the correlation between different variables in a dataset.

### What is/are the insight(s) found from the chart?


*   The plot shows the correlation coefficients between different columns in the dataset. Each cell in the heatmap represents the correlation coefficient between two columns.

*   The correlation coefficient ranges from -1 to 1, where -1 indicates a perfect negative correlation, 0 indicates no correlation, and 1 indicates a perfect positive correlation.

*   The color intensity in the heatmap represents the strength of the correlation, with darker colors indicating stronger correlations.







#### Chart - 15 - Pair Plot

In [None]:
# Chart - 15 visualization code (Pair Plot visualization code)
hb['year']=hb['date'].dt.year
columns= ['lead_time','year','total_of_special_requests','total_stay','adr','total_guests']
sns.pairplot(hb[columns])
plt.show()

In [None]:
hb.drop(columns=['year'])

##### 1. Why did you pick the specific chart?

The chart type used is a pair plot from the Seaborn library. Pair plots are used to visualize the relationships between multiple numerical variables in a dataset. Each scatterplot in the pair plot represents the relationship between two variables, and the diagonal shows the distribution of each variable.

##### 2. What is/are the insight(s) found from the chart?



*   The plot shows scatterplots for the relationships between different numerical variables in the dataset, such as lead time, year, total of special requests, total stay, average daily rate (ADR), and total guests.

*   Each scatterplot provides insights into the relationships between two variables. For example, the scatterplot between lead time and total stay shows how the lead time of bookings is related to the total stay duration.

*   The diagonal of the pair plot shows the distribution of each variable, providing insights into the spread and central tendency of each variable.






## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To achieve the business objective, which is likely to be maximizing revenue and optimizing operational efficiency, I would recommend the following strategies based on the insights from the dataset:

**Diversify Marketing Channels:**

The dataset suggests that a significant portion of bookings comes from Online Travel Agents (OTAs) and Offline Travel Agents (OTAs). While these channels are important, diversifying marketing channels can help reach a wider audience and reduce dependency on specific agents or channels.

**Improve Customer Retention:**

The proportion of repeated guests is higher at the Resort Hotel compared to the City Hotel. This indicates that the Resort Hotel may have a more loyal customer base. Implementing loyalty programs and personalized services can help improve customer retention at both hotels.

**Enhance Operational Efficiency**:

Analyzing lead times, total stays, and total guests can provide insights into booking patterns and operational efficiency. For example, if a significant portion of bookings has short lead times and long stays, hotels can adjust staffing levels and inventory management to accommodate these trends.

**Optimize Pricing Strategies:**
Analyzing the relationship between ADR and customer satisfaction ratings can help optimize pricing strategies. If there is a positive relationship between ADR and satisfaction, hotels can consider increasing prices for additional services or amenities.

**Improve Service Quality:**
Analyzing the relationship between special requests and customer satisfaction ratings can help identify areas for improvement in service quality. Fulfilling more special requests can lead to higher customer satisfaction and potentially more repeat bookings.

**Monitor and Adapt:**

It's important to continuously monitor booking trends, customer feedback, and market dynamics to adapt strategies accordingly. Flexibility and responsiveness to changes in customer preferences and market conditions are key to achieving long-term business objectives.



---


In breif, diversifying marketing channels, improving customer retention, enhancing operational efficiency, optimizing pricing strategies, improving service quality, and continuously monitoring and adapting to changes in the market can help the client achieve their business objectives.

# **Conclusion**

The analysis of the dataset provides valuable insights into various aspects of the hotel business, including booking trends, customer behavior, and operational efficiency. Here are some key conclusions drawn from the analysis:

**Booking Trends:**

The analysis shows that the number of bookings fluctuates throughout the year, with peak months being August and September. This indicates seasonal variations in demand, which hotels can leverage to optimize pricing and marketing strategies.

**Customer Behavior:**

The analysis reveals that the majority of bookings come from Online Travel Agents (OTAs) and Offline Travel Agents (OTAs). This highlights the importance of these channels in driving bookings and suggests that hotels should maintain strong partnerships with these agents.

**Operational Efficiency:**

The analysis of lead times, total stays, and total guests provides insights into booking patterns and operational efficiency. For example, if a significant portion of bookings has short lead times and long stays, hotels can adjust staffing levels and inventory management to accommodate these trends.

**Pricing Strategies:**

The analysis of the relationship between ADR and customer satisfaction ratings can help optimize pricing strategies. If there is a positive relationship between ADR and satisfaction, hotels can consider increasing prices for additional services or amenities.

**Service Quality:**

The analysis of the relationship between special requests and customer satisfaction ratings can help identify areas for improvement in service quality. Fulfilling more special requests can lead to higher customer satisfaction and potentially more repeat bookings.

**Marketing and Customer Retention:**

The analysis also reveals that the proportion of repeated guests is higher at the Resort Hotel compared to the City Hotel. This indicates that the Resort Hotel may have a more loyal customer base. Implementing loyalty programs and personalized services can help improve customer retention at both hotels.

**Recommendations:**

Based on the insights from the analysis, recommendations for the client include diversifying marketing channels, improving customer retention, enhancing operational efficiency, optimizing pricing strategies, improving service quality, and continuously monitoring and adapting to changes in the market.



---


In conclusion, the analysis provides valuable insights that can help the client optimize revenue, improve customer satisfaction, and drive positive business outcomes. By leveraging these insights and implementing the recommendations, the client can achieve their business objectives and maintain a competitive edge in the hotel industry.Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***