# <p style = "font-size : 42px; color : #000000 ; font-family : 'Oregon'; text-align : center; background-color : #dba514; border-radius: 5px 5px;"><strong>Hotel Booking Cancellation Prediction</strong></p>

<p align="center">
  <img style = "border:5px solid #ffb037;" src="https://5.imimg.com/data5/EF/GO/MY-17287433/hotel-bookings-500x500.jpg" alt="Project Banner" width="1000"/>
</p>

## <p style = "font-size : 25px; color : #ff0099; font-family : 'Comic Sans MS'; "><strong>About Notebook</strong></p> 

<div style="font-size: 16px; color: #ff9900; font-family: 'Comic Sans MS';">

This notebook focuses on preparing the hotel booking dataset for building robust predictive models. Through systematic `data preprocessing`, we aim to enhance data quality and model readiness by addressing missing values, handling noisy data, and transforming features as necessary. Key steps include:

- `Feature Engineering`: We’ll identify the most relevant features that contribute to meaningful patterns in the dataset, reducing noise and improving model efficiency.
- `Handling Missing Values`: Addressing any missing data to ensure completeness and reliability.
- `Handling Noisy Data`: Cleaning the dataset to reduce inconsistencies or errors.
- `Encoding Categorical Features`: Converting categorical features to numerical values for model compatibility.
- `Normalization/Scaling`: Adjusting numerical features to comparable scales to enhance model accuracy.
- `Handling Imbalanced Data`: Assessing class distributions to determine if resampling techniques are needed.

This preprocessing pipeline is essential for ensuring data readiness and setting a solid foundation for building accurate, reliable models.

### <p style = "font-size : 25px; color : #ff0099; font-family : 'Comic Sans MS'; "><strong>About Data</strong></p> 

<p style = "font-size : 16px; color : #ff9900; font-family : 'Comic Sans MS';">
    The data includes detailed information on hotel bookings, covering customer demographics, booking patterns, and reservation specifics.<br> 
    Key attributes include booking status, stay duration, guest count, booking channel, room assignment, and any special requests.<br> 
    It is suitable for analyzing booking trends, customer behaviors, and factors influencing cancellations and modifications.
</p> 

## <p style = "font-size : 25px; color : #ff0099; font-family : 'Comic Sans MS'; "><strong>Data Collection</strong></p> 

<p style = "font-size : 16px; color : #ff9900; font-family : 'Comic Sans MS';">
    The Hotel booking data is stored in mysql database we will fetch the data from the database
</p> 

### <div style = "font-size : 25px; color : #ff0099; font-family : 'Comic Sans MS'; "><strong>Dataset Description</strong>

<div align='center' style="border-radius:10px; padding: 15px; font-size:15px;">

| __Index__ | __Variable__ | __Description__ |
|   :---    |     :---     |       :---      |
| 1 | __hotel__ | Type of hotel (Resort Hotel, City Hotel) |
| 2 | __is_canceled__ | Reservation cancellation status (0 = not canceled, 1 = canceled) |
| 3 | __lead_time__ | Number of days between booking and arrival |
| 4 | __arrival_date_year__ | Year of arrival |
| 5 | __arrival_date_month__ | Month of arrival |
| 6 | __arrival_date_week_number__ | Week number of the year for arrival |
| 7 | __arrival_date_day_of_month__ | Day of the month of arrival |
| 8 | __stays_in_weekend_nights__ | Number of weekend nights (Saturday and Sunday) the guest stayed or booked |
| 9 | __stays_in_week_nights__ | Number of week nights the guest stayed or booked |
| 10 | __adults__ | Number of adults |
| 11 | __children__ | Number of children |
| 12 | __babies__ | Number of babies |
| 13 | __meal__ | Type of meal booked (BB, FB, HB, SC, Undefined) |
| 14 | __country__ | Country of origin of the guest |
| 15 | __market_segment__ | Market segment designation |
| 16 | __distribution_channel__ | Booking distribution channel |
| 17 | __is_repeated_guest__ | If the guest is a repeat customer (0 = not repeated, 1 = repeated) |
| 18 | __previous_cancellations__ | Number of previous bookings that were canceled by the customer |
| 19 | __previous_bookings_not_canceled__ | Number of previous bookings that were not canceled by the customer |
| 20 | __reserved_room_type__ | Type of reserved room |
| 21 | __assigned_room_type__ | Type of assigned room |
| 22 | __booking_changes__ | Number of changes made to the booking |
| 23 | __deposit_type__ | Type of deposit made (No Deposit, Refundable, Non Refund) |
| 24 | __agent__ | ID of the travel agent responsible for the booking |
| 25 | __company__ | ID of the company responsible for the booking |
| 26 | __days_in_waiting_list__ | Number of days the booking was in the waiting list |
| 27 | __customer_type__ | Type of customer (Transient, Contract, Transient-Party, Group) |
| 28 | __adr__ | Average Daily Rate |
| 29 | __required_car_parking_spaces__ | Number of car parking spaces required |
| 30 | __total_of_special_requests__ | Number of special requests made |
| 31 | __reservation_status__ | Last reservation status (Check-Out, Canceled, No-Show) |
| 32 | __reservation_status_date__ | Date of the last reservation status |
| 33 | __name__ | Guest's name |
| 34 | __email__ | Guest's email address |
| 35 | __phone-number__ | Guest's phone number |
| 36 | __credit_card__ | Last four digits of the guest's credit card |
</div>

### <p style = "font-size : 40px; color : #f9858b ; font-family : 'Calibri'; text-align : center; background-color : #bdfff6; border-radius: 5px 5px;"><strong>Importing Libraries</strong></p> 

In [1]:
import warnings
warnings.filterwarnings("ignore")

# database connection
from sqlalchemy import create_engine

# data manipulation
import pandas as pd
import numpy as np

# visualization
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

# data preprocessing


### <p style = "font-size : 25px; color : #ff0099; font-family : 'Comic Sans MS'; "><strong>Load Dataset</strong></p> 

In [2]:
engine = create_engine("mysql+mysqlconnector://projects:AIMLprojects1@127.0.0.1:3306/projects_db")
conn = engine.connect()

In [3]:
query = "SELECT * FROM hotel_booking"

In [4]:
df = pd.read_sql(query, engine)
df.head()

Unnamed: 0,hotel,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,...,customer_type,adr,required_car_parking_spaces,total_of_special_requests,reservation_status,reservation_status_date,name,email,phone-number,credit_card
0,Resort Hotel,0,342,2015,July,27,1,0,0,2,...,Transient,0.0,0,0,Check-Out,2015-07-01,Ernest Barnes,Ernest.Barnes31@outlook.com,669-792-1661,************4322
1,Resort Hotel,0,737,2015,July,27,1,0,0,2,...,Transient,0.0,0,0,Check-Out,2015-07-01,Andrea Baker,Andrea_Baker94@aol.com,858-637-6955,************9157
2,Resort Hotel,0,7,2015,July,27,1,0,1,1,...,Transient,75.0,0,0,Check-Out,2015-07-02,Rebecca Parker,Rebecca_Parker@comcast.net,652-885-2745,************3734
3,Resort Hotel,0,13,2015,July,27,1,0,1,1,...,Transient,75.0,0,0,Check-Out,2015-07-02,Laura Murray,Laura_M@gmail.com,364-656-8427,************5677
4,Resort Hotel,0,14,2015,July,27,1,0,2,2,...,Transient,98.0,0,1,Check-Out,2015-07-03,Linda Hines,LHines@verizon.com,713-226-5883,************5498


### <p style = "font-size : 40px; color : #f9858b ; font-family : 'Calibri'; text-align : center; background-color : #bdfff6; border-radius: 5px 5px;"><strong>Feature Engineering</strong></p> 

### <p style = "font-size : 40px; color : #f9858b ; font-family : 'Calibri'; text-align : center; background-color : #bdfff6; border-radius: 5px 5px;"><strong>Handling Missing Values</strong></p> 

### <p style = "font-size : 40px; color : #f9858b ; font-family : 'Calibri'; text-align : center; background-color : #bdfff6; border-radius: 5px 5px;"><strong>Handling Noisy Data</strong></p> 

### <p style = "font-size : 40px; color : #f9858b ; font-family : 'Calibri'; text-align : center; background-color : #bdfff6; border-radius: 5px 5px;"><strong>Encoding Categorical Features</strong></p> 

### <p style = "font-size : 40px; color : #f9858b ; font-family : 'Calibri'; text-align : center; background-color : #bdfff6; border-radius: 5px 5px;"><strong>Normalizing/Scaling</strong></p> 

### <p style = "font-size : 40px; color : #f9858b ; font-family : 'Calibri'; text-align : center; background-color : #bdfff6; border-radius: 5px 5px;"><strong>Handling Imbalanced Data</strong></p> 