# PRICING DINÁMICO DE HABITACIONES DE HOTEL

*Cristian Rubio Barato*

*Francisco Martínez Esteso*

*José Vicente García López*

*Víctor Ortega Gómez*


## ÍNDICE

1. [ANÁLISIS EXPLORATORIO DE DATOS](#1-análisis-exploratorio-de-datos)  
   1.1 [Análisis Variables Predictoras](#11-análisis-de-variables-predictoras)  
   1.2 [Análisis Univariado](#12-análisis-univariado)  
   1.3 [Análisis Multivariado](#13-análisis-multivariado)  
   1.4 [Correlación de Variables Continuas](#14-correlación-de-variables-continuas)  
   1.5 [Importancia de Variables Continuas](#15-importancia-de-variables-continuas)  

2. [PREPROCESAMIENTO](#2-preprocesamiento)  
   2.1 [Limpieza de los datos](#21-limpieza-de-los-datos)  
   2.2 [Pipeline de Datos](#22-pipeline-de-datos)

## 1. ANÁLISIS EXPLORATORIO DE DATOS

In [6]:
import pandas as pd

In [7]:
df = pd.read_csv('hotel_booking.csv')

categorical_vars = df.select_dtypes(include=['object', 'category']).columns.tolist()
continuous_vars = df.select_dtypes(include=['int64', 'float64']).columns.tolist()

print("Variables Categóricas:", categorical_vars)
print("Variables Continuas:", continuous_vars)

Variables Categóricas: ['hotel', 'arrival_date_month', 'meal', 'country', 'market_segment', 'distribution_channel', 'reserved_room_type', 'assigned_room_type', 'deposit_type', 'customer_type', 'reservation_status', 'reservation_status_date', 'name', 'email', 'phone-number', 'credit_card']
Variables Continuas: ['is_canceled', 'lead_time', 'arrival_date_year', 'arrival_date_week_number', 'arrival_date_day_of_month', 'stays_in_weekend_nights', 'stays_in_week_nights', 'adults', 'children', 'babies', 'is_repeated_guest', 'previous_cancellations', 'previous_bookings_not_canceled', 'booking_changes', 'agent', 'company', 'days_in_waiting_list', 'adr', 'required_car_parking_spaces', 'total_of_special_requests']


In [8]:
df.isnull().sum()

hotel                                  0
is_canceled                            0
lead_time                              0
arrival_date_year                      0
arrival_date_month                     0
arrival_date_week_number               0
arrival_date_day_of_month              0
stays_in_weekend_nights                0
stays_in_week_nights                   0
adults                                 0
children                               4
babies                                 0
meal                                   0
country                              488
market_segment                         0
distribution_channel                   0
is_repeated_guest                      0
previous_cancellations                 0
previous_bookings_not_canceled         0
reserved_room_type                     0
assigned_room_type                     0
booking_changes                        0
deposit_type                           0
agent                              16340
company         

In [9]:
df.drop(columns = ["agent", "company", "arrival_date_week_number", "email", "phone-number", "credit_card", "name"], inplace = True)
df.dropna(inplace = True)

In [10]:
# Pasar "reservation_status_date" a datetime

df["reservation_status_date"] = pd.to_datetime(df["reservation_status_date"]).dt.date

df["reservation_status_year"] = pd.to_datetime(df["reservation_status_date"]).dt.year

In [11]:
# Pasar "arrival_date_month" y "children" a enteros

df["arrival_date_month"] = pd.to_datetime(df["arrival_date_month"], format = "%B").dt.month

df["children"] = df["children"].astype(int)

### 1.1 Análisis Variables Predictoras

In [14]:
df.describe(include = "object")

Unnamed: 0,hotel,meal,country,market_segment,distribution_channel,reserved_room_type,assigned_room_type,deposit_type,customer_type,reservation_status,reservation_status_date
count,118898,118898,118898,118898,118898,118898,118898,118898,118898,118898,118898
unique,2,5,177,7,5,10,12,3,4,3,926
top,City Hotel,BB,PRT,Online TA,TA/TO,A,A,No Deposit,Transient,Check-Out,2015-10-21
freq,79302,91863,48586,56402,97730,85601,73863,104163,89174,74745,1461


In [15]:
df.describe()

Unnamed: 0,is_canceled,lead_time,arrival_date_year,arrival_date_month,arrival_date_day_of_month,stays_in_weekend_nights,stays_in_week_nights,adults,children,babies,is_repeated_guest,previous_cancellations,previous_bookings_not_canceled,booking_changes,days_in_waiting_list,adr,required_car_parking_spaces,total_of_special_requests,reservation_status_year
count,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0,118898.0
mean,0.371352,104.311435,2016.157656,6.552886,15.80088,0.928897,2.502145,1.858391,0.104207,0.007948,0.032011,0.087142,0.131634,0.221181,2.330754,102.003243,0.061885,0.571683,2016.094535
std,0.483168,106.903309,0.707459,3.08705,8.780324,0.996216,1.900168,0.578576,0.399172,0.09738,0.176029,0.845869,1.484672,0.652785,17.630452,50.485862,0.244172,0.792678,0.71539
min,0.0,0.0,2015.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-6.38,0.0,0.0,2014.0
25%,0.0,18.0,2016.0,4.0,8.0,0.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,70.0,0.0,0.0,2016.0
50%,0.0,69.0,2016.0,7.0,16.0,1.0,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,95.0,0.0,0.0,2016.0
75%,1.0,161.0,2017.0,9.0,23.0,2.0,3.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,126.0,0.0,1.0,2017.0
max,1.0,737.0,2017.0,12.0,31.0,16.0,41.0,55.0,10.0,10.0,1.0,26.0,72.0,21.0,391.0,5400.0,8.0,5.0,2017.0


### 1.2 Análisis Univariado

### 1.3 Análisis Multivariado

### 1.4 Correlación de Variables Continuas

### 1.5 Importancia de Vriables Continuas

## 2. PREPROCESAMIENTO

### 2.1 Limpieza de los datos

### 2.2 Pipeline de Datos