# Introduction
> Hotel-Booking dataset contains 119390 observations for a City Hotel and a Resort Hotel. Each observation represents a hotel booking between the 1st of July 2015 and 31st of August 2017, including booking that effectively arrived and booking that were canceled. <br>
The data is originally from the article Hotel Booking Demand Datasets, written by Nuno Antonio, Ana Almeida, and Luis Nunes for Data in Brief, Volume 22, February 2019.

At this notebook, I'll show some data of the dataset using Seaborn library plots.

# Dataset's Content
> Since this is hotel real data, all data elements pertaining hotel or costumer identification were deleted.
Four Columns, 'name', 'email', 'phone number' and 'credit_card' have been artificially created and added to the dataset.

# [Step 1] Importing libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# [Step 2] Reading dataset's CSV file

In [None]:
df = pd.read_csv("../input/hotel-booking/hotel_booking.csv");

The dataset's csv file is readed and stored in 'df' variable as a dataframe with Pandas library.

# [Step 3] Dataset's base informations
* Lets see the base informations about the dataset unsing *info()* function.

In [None]:
df.info()

* Also, lets see the first 6 records of the dataset and their contents using *head()* function.

In [None]:
df.head(6)

# [Step 5] Displaying plots

# 1 - How many passengers each country have?
To know how much passengers each country have, I summed *adults*, *children* and *babies* columns together. Since some passengers had canceled their tickets, I subtracted *is_canceled* column from the summation. The results are stored in a new column named *total_passengers*. <br>
Then, I represented the results as a bar-plot using *Seaborn* library.

In [None]:
# Creating a new column for the calculation of total passengers whom had traveled
df['total_passengers'] = (df['adults'] + df['children'] + df['babies']) - df['is_canceled'];

# Displaying in a bar plot
plt.figure(figsize=(60,20), dpi=200);
sns.set_style(style='whitegrid');
sns.barplot(data=df, x='country', y='total_passengers');
plt.xticks(rotation=45);

# 2 - Total missing data in each column. (The columns with no missing data are not displayed.)
> Source: [yun.ir/h4j3m6](yun.ir/h4j3m6)

**Note that** the *total_passengers* column was created in the previous step and it's not needed for here, so I temporarly removed it using *drop()* function.

At first, the average of missing data is calculated by summing the total missing data and dividing them by total number of records in the dataset. <br>
Second, the columns which have missing data (more than zero) are stored as a dataframe using *to_frame()* function. <br>
Third, the content's columns are renamed. For reindexing the datafram to numeral indexes, I used the *reset_index()* function.

At last and after renaming the columns of *missing_values* dataframe, the results are displayed in a bar-plot. The x-axis shows the columns and the y-axis shows the number of missing data. <br>
To display the plot better, I assigned its figure size to *(x=4,, y=4)* and to show it bigger, I assigned the *dpi* to 100 in the *figure()* function. <br>
Also, to avoid the columns names from crambling together, I rotated them to 45 degrees using *xticks()* function.

In [None]:
# Calculating number of missing data 
missing_values = 100*(df.drop('total_passengers', axis=1).isnull().sum() / len(df));

# Creating a DataFrame for the columns
missing_values = missing_values[missing_values > 0].to_frame();
missing_values.columns = ['Total missing data'];
missing_values.index.names = ['Columns'];
missing_values = missing_values.reset_index();

# Displaying the results in a bar-plot
plt.figure(figsize=(4,4), dpi=100);
sns.set_style(style='ticks');
sns.barplot(data=missing_values, x='Columns', y='Total missing data');
plt.xticks(rotation = 45);

# 3 - Total nights that passengers of each hotel had stayed.
To have the total nights passengers stayed, I summed *stays_in_week_nights* and *stays_in_weekend_nights* columns together.

This plot shows that how many passengers of each hotel had stayed for the specified number of nights. For example, more than 20.000 passengers of City Hotel had stayed for 2 and 3 nights at total. <br>
I limited the plot in order to view it better. (Because the others were shown near to 0.)

In [None]:
df['total_stays_in_nights'] = df['stays_in_week_nights'] + df['stays_in_weekend_nights'];

plt.figure(figsize=(16,8));
sns.countplot(data=df, x='total_stays_in_nights', hue='hotel');
plt.xticks(rotation=45);
plt.xlim(0,16);
plt.legend(loc=(0.3,0.9));