# Hotel Booking Dataset Analysis Using Pandas & Seaborn


### About DataSet :


#### This dataset contains 119390 observations for a City Hotel and a Resort Hotel. Each observation represents a hotel 
#### Booking between the 1st of July 2015 and 31st of August 2017, including booking that effectively arrived and booking that were canceled.
#### Since this is hotel real data, all data elements pertaining hotel or costumer identification were deleted.
#### Four Columns, 'name', 'email', 'phone number' and 'credit_card' have been artificially created and added to the dataset.
#### 
#### The data is originally from the article :
> Hotel Booking Demand Datasets, written by Nuno Antonio, Ana Almeida, and 
 Luis Nunes for Data in Brief, Volume 22, February 2019.

#### Import Libraries

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

#### Reading dataset from CSV file

In [None]:
df=pd.read_csv('../input/hotel-booking/hotel_booking.csv')

#### Dataset information

In [None]:
df.info()

#### Gives number of rows and columns

In [None]:
pd.DataFrame([df.shape],columns=['number_of_rows','number_of_columns'],index=['#'])

#### Check out the missing data


In [None]:

pd.DataFrame([(df.isnull().values.any(),df.isnull().sum().idxmax(),max(df.isnull().sum()))],columns=['missing_data_existance','column_with_maximum_missing_data','number_of_missing_data'],index=['#'])

#### Drop " company " column from DataFrame 

In [None]:
df.drop('company',axis=1).columns.tolist()


#### Which countries have the most passengers ?

In [None]:
df['total_passengers'] = df['adults'] + df['children'] + df['babies'] - df['is_canceled']
df[['country','adults','children','babies','is_canceled','total_passengers']].groupby('country').sum().nlargest(5,'total_passengers').reset_index()


#### Find passenger who have the maximum ADR (Average Daily Rate)

In [None]:
pd.DataFrame([(df.name[df['adr'].idxmax()],df['adr'].max())],columns=['passenger_name','price'] ,index=['max(ADR)'])

####  Average of total ADRs

In [None]:
pd.DataFrame([(round(df['adr'].mean(),2))],columns=['average_of_total_adrs',] ,index=['mean (ADR)'])

#### Define the average of number of nights stayed.

In [None]:
df['total_stays_in_nights'] = df['stays_in_week_nights'] + df['stays_in_weekend_nights']

In [None]:
df['total_stays_in_nights'].mean().round(2)

#### Define the name and email of people who had 5 special requests

In [None]:
df[df['total_of_special_requests'] == 5][['name','email']].reset_index()

#### Which first names have the most frequency of last name?


In [None]:
df['name'].apply(lambda last_name:last_name.split()[1]).value_counts().head().to_frame('value_counts').reset_index().rename(columns={'index':'last_name'})


#### Define the people whom reserved a hotel with most number of babies and children

In [None]:
df['total_babies_and_children'] = df['babies'] + df['children']
df[['name','email','phone-number','credit_card','hotel','total_babies_and_children']].sort_values('total_babies_and_children',ascending=False).head()

#### Define the phone number of regions which had the most reservations

In [None]:
df['phone-number'].apply(lambda phone_code:phone_code[:3]).value_counts().head().to_frame('value_counts').reset_index().rename(columns={'index':'phone_code'})

### Exploratory Data Analysis

In [None]:
plt.figure(figsize=(6,4), dpi=150)
sns.countplot(x='hotel', data=df)

#### Chart analysis :
##### City Hotel has the most visitors.


In [None]:
plt.figure(figsize=(6,4), dpi=150)
sns.barplot(x='customer_type', y='total_stays_in_nights', data=df)

#### Chart analysis :
##### ' Contract ' of Customer Types has the most stay duration.


In [None]:
plt.figure(figsize=(12,4), dpi=200)
sns.barplot(x='arrival_date_month', y='total_passengers',hue='hotel' ,data=df)
plt.legend(loc=(1.05, 1))

#### Chart analysis :
#####  In both hotels most bookings were made from July to August . 
##### In all months, the most passengers are for Resort Hotel