## Data Analysis Project Steps:
1. Create a Problem Statement.
2. Identify the data you want to analyze.
3. Explore and Clean the data.
4. Analyze the data to get useful insights.
5. Present the data in terms of reports or dashboards using visualization.

## Business Problem
In the recent years, City Hotel and Resort Hotel have been high cancellation rates. Each hotel is now dealing with a number of issues as a result, including fewer revenues and less than ideal hotel room use. Consequently, lowering cancellation rates is both hotels primary goal in order to increase their efficiency in generating revenue, and for us to offer through business advice to address this problem.

The analysis of the hotel booking cancellations as well as other factors that have no bearing on their business and yearly revenue generation are the main topic of this report.

## Assumptions
1. No unusual occurances between 2015 and 2017 will have a substaintial impact on the data used.
2. The information is still current and can be used to analyze a hotel's possible plans in an efficient manner.
3. There are no unanticipated negatives to the hotel employing any advised technique.
4. The hotels are not currently using any of the suggested solutions.
5. The biggest factor affecting the effectiveness of earning income is booking cancellations.
6. Canellations result in vacant rooms for the booked length of time.
7. Clients make hotel reservations the same year they make cancellations.

## Research Question
1. What are the variables that affect hotel reservation cancellations?
2. How can we make hotel reservation cancellations better?
3. How will hotels be assisted in makeing pricing and promotional decisions?

## Hypothesis
1. More cancellations occur when price are higher.
2. When there is a longer waiting list, customer tend to cancel more frequently.
3. The majority of clients are coming from offline travel agents to make their reservations.

## Importing Library

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

## Loading the dataset

In [None]:
df = pd.read_csv("hotel_bookings 2.csv")

## Exploratory Data Analysis and Data Cleaning

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.shape

In [None]:
df.columns

In [None]:
df['meal'].value_counts()

In [None]:
df.info()

In [None]:
df['reservation_status_date'] = pd.to_datetime(df['reservation_status_date'])

In [None]:
df.info()

In [None]:
df.describe(include='object')

In [None]:
obj_col = df.describe(include='object')

In [None]:
for col in obj_col.columns:
    print(col)
    print(df[col].unique())
    print("---------------------------------------------------------------------")

In [None]:
df.isnull().sum()

In [None]:
df.drop(['agent', 'company'], axis = 1, inplace = True)
df.dropna(inplace=True)

In [None]:
df.isnull().sum()

In [None]:
df.describe()

In [None]:
df['adr'].plot(kind='box')

In [None]:
df = df[df['adr']<5000]

In [None]:
df.describe()

## Data Analysis and Visualizations

In [None]:
cancelled_percentage = df['is_canceled'].value_counts(normalize=True)
cancelled_percentage

In [None]:
print(cancelled_percentage)

plt.figure(figsize= (5, 4))
plt.title('Reservation Status Counts')
plt.bar(['Not Canceled', 'Canceled'], df['is_canceled'].value_counts(normalize=False), edgecolor = 'k', width= 0.7)

plt.show()

In [None]:
plt.figure(figsize= (8, 4))
ax1 = sns.countplot(x = 'hotel', hue = 'is_canceled', data = df, palette= 'Blues')

legend_labels,_ = ax1.get_legend_handles_labels()
ax1.legend(bbox_to_anchor=(1,1))

plt.title('Reservation status in different hotels', size = 15)
plt.xlabel('Hotel')
plt.ylabel('Number of Reservations')

plt.show()

In [None]:
resort_hotel = df[df['hotel'] == 'Resort Hotel']
resort_hotel['is_canceled'].value_counts(normalize = True)

In [None]:
city_hotel = df[df['hotel'] == 'City Hotel']
city_hotel['is_canceled'].value_counts(normalize = True)

In [None]:
resort_hotel = resort_hotel.groupby('reservation_status_date')[['adr']].mean()
city_hotel = city_hotel.groupby('reservation_status_date')[['adr']].mean()

In [None]:
plt.figure(figsize= (20,8))
plt.title('Average Daily Rate in City and Resort Hotel', fontsize = 25)

plt.plot(resort_hotel.index, resort_hotel['adr'], label = 'Resort Hotel')
plt.plot(city_hotel.index, city_hotel['adr'], label = 'City Hotel')

plt.legend(fontsize = 20)
plt.show()

In [None]:
df['month'] = df['reservation_status_date'].dt.month
plt.figure(figsize=(16,8))
ax1 = sns.countplot(x = 'month', hue= 'is_canceled', data= df, palette= 'bright')
legend_labels,_ = ax1. get_legend_handles_labels()
ax1.legend(bbox_to_anchor = (1,1))
plt.title('Reservation Status per Month', size = 20)
plt.xlabel('Month')
plt.ylabel('Number of Reservations')
plt.legend(['not canceled', 'canceled'])
plt.show()

In [None]:
plt.figure(figsize = (15, 8))
plt.title('ADR per Month', fontsize = 30)

sns.barplot(x = 'month', y = 'adr', data = df[df['is_canceled'] == 1].groupby('month')[['adr']].sum().reset_index())
#plt.legend(fontsize = 20)
plt.show()

In [None]:
canceled_data = df[df['is_canceled'] == 1]
top_10_country = canceled_data['country'].value_counts()[:10]

plt.figure(figsize= (8,8))
plt.title('Top 10 Country with Reservations Canceled')
plt.pie(top_10_country, autopct= '%.2f', labels= top_10_country.index)
plt.show()

In [None]:
df['market_segment'].value_counts()

In [None]:
df['market_segment'].value_counts(normalize=True)

In [None]:
canceled_data['market_segment'].value_counts(normalize=True)

In [None]:
canceled_df_adr = canceled_data.groupby('reservation_status_date')[['adr']].mean()
canceled_df_adr.reset_index(inplace = True)
canceled_df_adr.sort_values('reservation_status_date', inplace = True)

not_canceled_data = df[df['is_canceled'] == 0]
not_canceled_df_adr = not_canceled_data.groupby('reservation_status_date')[['adr']].mean()
not_canceled_df_adr.reset_index(inplace = True)
not_canceled_df_adr.sort_values('reservation_status_date', inplace = True)

plt.figure(figsize= (20,6))
plt.title('Average Daily Rate')
plt.plot(not_canceled_df_adr['reservation_status_date'], not_canceled_df_adr['adr'], label = 'not_canceled')
plt.plot(canceled_df_adr['reservation_status_date'], canceled_df_adr['adr'], label = 'canceled')
plt.legend()
plt.show()

In [None]:
canceled_df_adr = canceled_df_adr[(canceled_df_adr['reservation_status_date'] > '2016') & (canceled_df_adr['reservation_status_date'] < '2017-09')]

not_canceled_df_adr = not_canceled_df_adr[(not_canceled_df_adr['reservation_status_date'] > '2016') & (not_canceled_df_adr['reservation_status_date'] < '2017-09')]

In [None]:
plt.figure(figsize= (20,6))
plt.title('Average Daily Rate', fontsize = 20)
plt.plot(not_canceled_df_adr['reservation_status_date'], not_canceled_df_adr['adr'], label = 'not_canceled')
plt.plot(canceled_df_adr['reservation_status_date'], canceled_df_adr['adr'], label = 'canceled')
plt.legend()
plt.show()