# Exploratory Data Analysis

The data provided for the project is in CSV, which is hard to read in text format. As usual, I'll import it using Pandas, check it out, and perform fixes as necessary.

In [1]:
import pandas as pd

# Required Datasets

## Amenities

In [2]:
df_amenities = pd.read_csv('../data/csv/amenities.csv').set_index('amenity_id')

Looking at the amenities table, some of it is confusing. Is there a specific meaning to "booking required"? Some of the amenities say "by appointment only" but don't require booking, which seems contradictory. But without a subject matter expert on hand, there's no way to know.

In [3]:
df_amenities

Unnamed: 0_level_0,category,name,price,duration,description,availability,location,booking_required,min_notice_hours
amenity_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
AM000001,Spa Services,Swedish Massage,120,60,Experience pure relaxation with our Swedish Ma...,24/7,Spa & Wellness Center,True,2
AM000002,Spa Services,Deep Tissue Massage,140,60,"Indulge in our signature Deep Tissue Massage, ...",6:00-22:00,Spa & Wellness Center,False,4
AM000003,Spa Services,Couples Massage Experience,280,90,Indulge in our signature Couples Massage Exper...,By appointment only,Spa & Wellness Center,False,0
AM000004,Spa Services,Hot Stone Therapy,160,90,Experience pure relaxation with our Hot Stone ...,6:00-22:00,Pool Area,False,1
AM000005,Spa Services,Luxury Facial Treatment,180,90,Indulge in our signature Luxury Facial Treatme...,24/7,Main Building,False,24
AM000006,Spa Services,Deluxe Manicure & Pedicure,120,120,Indulge in our signature Deluxe Manicure & Ped...,By appointment only,Main Building,False,4
AM000007,Spa Services,Aromatherapy Journey,150,75,Experience pure relaxation with our Aromathera...,6:00-22:00,Pool Area,False,0
AM000008,Fitness Services,Personal Training Session,80,45,Transform your fitness journey with our Person...,By appointment only,Pool Area,False,1
AM000009,Fitness Services,Yoga Class,40,60,Transform your fitness journey with our Yoga C...,24/7,In-room,True,1
AM000010,Fitness Services,Group Fitness Class,35,45,Achieve your wellness goals through our Group ...,24/7,Main Building,True,2


In [4]:
df_amenities.to_pickle('../data/pandas/amenities.pkl')

## Customers

In [5]:
df_customers = pd.read_csv('../data/csv/customers.csv').set_index('customer_id')

Note that customers have preferences, which we may want to take into account when making recommendations. There's also a language listed. Many of them are obscure, so we can't support them. 

In [6]:
df_customers

Unnamed: 0_level_0,first_name,last_name,email,phone,address,preferences,nationality,language,loyalty_tier
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,Robert,Smith,robert.smith@wyatt-huynh.com,001-625-164-8298,"698 Eddie Stravenue, East Marymouth, NH 45002","[""Turndown service"", ""Gluten-free options"", ""S...",Anguilla,Quechua,Gold
2,Tiffany,Garner,tiffany.garner@smith.com,001-924-161-9263x89503,"82496 Bob Expressway Suite 023, Amberburgh, MD...","[""King bed"", ""Kosher meals""]",Australia,Bihari languages,Titanium
3,Jennifer,Miller,jennifer.miller@mccormick.com,817.900.7596,"PSC 8351, Box 8217, APO AE 44482","[""Extra towels"", ""Late check-out"", ""Halal meal...",Djibouti,Guarani,Bronze
4,Frank,Roberts,frank.roberts@davis.info,(914)861-7623,"PSC 2055, Box 2622, APO AP 89621","[""Pet-friendly"", ""Anniversary celebration"", ""S...",Swaziland,Rundi,Bronze
5,Patricia,Choi,patricia.choi@holt.info,001-399-328-3594x2880,"58891 Eric Station Apt. 475, Parkerview, WA 65054","[""Turndown service"", ""Laundry service"", ""Valet...",Lesotho,Bashkir,Bronze
...,...,...,...,...,...,...,...,...,...
49996,Stacy,Harrison,stacy.harrison@snow.com,451-368-9921x96006,"126 Thomas Unions, North Selenahaven, SD 49492","[""Birthday celebration"", ""Concierge service"", ...",Slovakia (Slovak Republic),Ukrainian,Bronze
49997,Cheryl,Santiago,cheryl.santiago@stone-jimenez.com,+1-580-041-8918x34434,"067 Thomas Roads, Port Stevenside, MO 47990","[""Extra towels"", ""Extra pillows"", ""Business tr...",Turkmenistan,Panjabi,Bronze
49998,Kaitlyn,Davenport,kaitlyn.davenport@hunt-austin.com,(742)524-4647x339,"6226 David Extensions, Espinozafurt, ND 24552","[""Birthday celebration"", ""Dairy-free options"",...",Belarus,Afar,Platinum
49999,Grant,Mcintosh,grant.mcintosh@howard.com,902.465.0973x96638,"042 Jacob Glen, Port Brandy, OH 74179","[""Honeymoon package"", ""Garden view"", ""Twin beds""]",Bouvet Island (Bouvetoya),Kashmiri,Bronze


Everything's non-null, so we're good to go.

In [7]:
df_customers.info()

<class 'pandas.core.frame.DataFrame'>
Index: 50000 entries, 1 to 50000
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   first_name    50000 non-null  object
 1   last_name     50000 non-null  object
 2   email         50000 non-null  object
 3   phone         50000 non-null  object
 4   address       50000 non-null  object
 5   preferences   50000 non-null  object
 6   nationality   50000 non-null  object
 7   language      50000 non-null  object
 8   loyalty_tier  50000 non-null  object
dtypes: object(9)
memory usage: 3.8+ MB


In [8]:
df_customers.to_pickle('../data/pandas/customers.pkl')

## FAQ

In [9]:
df_faq = pd.read_csv('../data/csv/faq_knowledge_base.csv').set_index('faq_id')

The FAQs have some information that may not be useful for us: category, subcategory, keywords, helpful_votes, and views. Otherwise, we just need to put the information in a vector database. The question remains, how do we tell when a relevant question is being asked?

In [10]:
df_faq

Unnamed: 0_level_0,category,subcategory,question,answer,keywords,last_updated,helpful_votes,views
faq_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
FAQ000001,booking,reservations,How do I make a reservation?,You can make a reservation through our website...,"book, reserve, reservation, booking",2024-03-09,957,3813
FAQ000002,booking,check-in/out,What is the check-in/check-out time?,Check-in time is 3:00 PM and check-out time is...,"check in, check out, arrival, departure",2024-10-15,967,2793
FAQ000003,booking,modifications,Can I modify my booking?,"Yes, you can modify your booking through our w...","change, modify, update, edit",2024-12-05,443,1929
FAQ000004,booking,cancellations,What is the cancellation policy?,Free cancellation is available up to 48 hours ...,"cancel, refund, cancellation",2024-05-20,611,1300
FAQ000005,booking,special requests,Do you offer early check-in/late check-out?,"Subject to availability, early check-in and la...","early check-in, late check-out, extended stay",2024-12-18,239,3954
FAQ000006,amenities,facilities,What amenities are included?,"Our hotel offers complimentary Wi-Fi, fitness ...","amenities, facilities, services",2024-10-24,855,1860
FAQ000007,amenities,dining,Is breakfast included?,"Yes, breakfast is included with most room rate...","breakfast, dining, restaurant",2024-01-10,365,1943
FAQ000008,amenities,wellness,Do you have a fitness center?,"Yes, our fitness center is open 24/7 and featu...","fitness, gym, exercise",2024-10-05,988,1471
FAQ000009,amenities,business,Is there a swimming pool?,"Yes, we have both indoor and outdoor pools ope...","pool, swimming, recreation",2024-10-15,400,2881
FAQ000010,amenities,recreation,Do you offer spa services?,"Our full-service spa offers massages, facials,...","spa, wellness, treatments",2024-01-07,145,766


In [11]:
df_faq.to_pickle('../data/pandas/faq_knowledge_base.pkl')

## Room Availability

Booking a room is the most basic feature of the bot.

In [12]:
df_room_availability = pd.read_csv('../data/csv/room_availability.csv')

In [13]:
df_room_availability

Unnamed: 0,room_id,room_number,date,status,price,max_occupancy
0,RM000001,101,2025-01-04,Booked,,2
1,RM000001,101,2025-01-05,Booked,,2
2,RM000001,101,2025-01-06,Booked,,2
3,RM000001,101,2025-01-07,Booked,,2
4,RM000001,101,2025-01-08,Booked,,2
...,...,...,...,...,...,...
218995,RM000600,2030,2025-12-30,Booked,,6
218996,RM000600,2030,2025-12-31,Booked,,6
218997,RM000600,2030,2026-01-01,Booked,,6
218998,RM000600,2030,2026-01-02,Available,4617.6,6


In [14]:
df_room_availability.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 219000 entries, 0 to 218999
Data columns (total 6 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   room_id        219000 non-null  object 
 1   room_number    219000 non-null  int64  
 2   date           219000 non-null  object 
 3   status         219000 non-null  object 
 4   price          87377 non-null   float64
 5   max_occupancy  219000 non-null  int64  
dtypes: float64(1), int64(2), object(3)
memory usage: 10.0+ MB


The number of prices equals the number of available rooms.

In [15]:
(df_room_availability['status']=="Available").sum()

np.int64(87377)

Check the unique statuses for the rooms

In [16]:
df_room_availability['status'].unique()

array(['Booked', 'Available', 'Maintenance'], dtype=object)

Sanity check: every room is listed for every date

In [17]:
print(f'Number of days listed: {df_room_availability['date'].unique().shape[0]}')
print(f'Number of rooms listed: {df_room_availability['room_number'].unique().shape[0]}')
print(f'Product: {df_room_availability['date'].unique().shape[0]*df_room_availability['room_number'].unique().shape[0]}')
print(f'Number of rows: {df_room_availability.shape[0]}')

Number of days listed: 365
Number of rooms listed: 600
Product: 219000
Number of rows: 219000


In [18]:
df_room_availability.to_pickle('../data/pandas/room_availability.pkl')

## Rooms

In [13]:
df_rooms = pd.read_csv('../data/csv/rooms.csv').set_index('room_id')

Right away we can see an issue with view_type. "Evening Turndown Service" is not a view type. Also notice that some rooms have multiple view types listed under additional_amenities, but not under view_type. We'll have a bit of work to do there.

In [14]:
df_rooms

Unnamed: 0_level_0,room_number,floor,type,square_feet,basic_amenities,additional_amenities,max_occupancy,bed_type,view_type,accessibility,status,last_renovation,base_rate,max_rate
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
RM000001,101,1,Standard,410,"['Air Conditioning', 'Smart TV', 'Premium Coff...",['City View'],2,Queen,City View,False,Available,2022-10-25,350,500
RM000002,102,1,Standard,361,"['Air Conditioning', 'Smart TV', 'Premium Coff...",['Evening Turndown Service'],2,Queen,Standard View,False,Available,2023-03-01,350,500
RM000003,103,1,Standard,431,"['Air Conditioning', 'Smart TV', 'Premium Coff...",['Courtyard View'],2,Double Queen,Courtyard View,False,Available,2022-08-10,350,500
RM000004,104,1,Standard,391,"['Air Conditioning', 'Smart TV', 'Premium Coff...","['City View', 'Courtyard View', 'Evening Turnd...",2,Queen,Courtyard View,True,Occupied,2023-09-06,350,500
RM000005,105,1,Standard,373,"['Air Conditioning', 'Smart TV', 'Premium Coff...","['Courtyard View', 'City View', 'Evening Turnd...",2,Double Queen,Evening Turndown Service,True,Occupied,2024-04-13,350,500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
RM000596,2026,20,Presidential Suite,2369,"['Air Conditioning', 'Multiple 75"" Smart TVs',...","['Private Pool', 'Private Terrace', 'Luxury Ca...",6,King + Multiple Sofa Beds,Standard View,False,Maintenance,2024-10-01,3500,5000
RM000597,2027,20,Presidential Suite,2020,"['Air Conditioning', 'Multiple 75"" Smart TVs',...","['Private Terrace', 'Grand Piano', 'Private Gy...",6,King + Multiple Sofa Beds,Standard View,False,Maintenance,2024-04-17,3500,5000
RM000598,2028,20,Presidential Suite,1883,"['Air Conditioning', 'Multiple 75"" Smart TVs',...","['Grand Piano', 'Private Terrace', 'Dedicated ...",6,King + Multiple Sofa Beds,Standard View,False,Occupied,2022-03-09,3500,5000
RM000599,2029,20,Presidential Suite,1643,"['Air Conditioning', 'Multiple 75"" Smart TVs',...","['Private Terrace', 'Private Chef Available', ...",6,King + Multiple Sofa Beds,Standard View,False,Maintenance,2023-07-24,3500,5000


All items are non-null, which is convenient.

In [15]:
df_rooms.info()

<class 'pandas.core.frame.DataFrame'>
Index: 600 entries, RM000001 to RM000600
Data columns (total 14 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   room_number           600 non-null    int64 
 1   floor                 600 non-null    int64 
 2   type                  600 non-null    object
 3   square_feet           600 non-null    int64 
 4   basic_amenities       600 non-null    object
 5   additional_amenities  600 non-null    object
 6   max_occupancy         600 non-null    int64 
 7   bed_type              600 non-null    object
 8   view_type             600 non-null    object
 9   accessibility         600 non-null    bool  
 10  status                600 non-null    object
 11  last_renovation       600 non-null    object
 12  base_rate             600 non-null    int64 
 13  max_rate              600 non-null    int64 
dtypes: bool(1), int64(6), object(7)
memory usage: 66.2+ KB


In [16]:
df_rooms.describe()

Unnamed: 0,room_number,floor,square_feet,max_occupancy,base_rate,max_rate
count,600.0,600.0,600.0,600.0,600.0,600.0
mean,1065.5,10.5,887.535,3.55,1070.0,1712.5
std,577.174266,5.771093,641.182981,1.245023,1051.876454,1505.987632
min,101.0,1.0,350.0,2.0,350.0,500.0
25%,583.25,5.75,466.0,3.0,500.0,750.0
50%,1065.5,10.5,578.0,3.0,500.0,750.0
75%,1547.75,15.25,1024.0,4.0,1000.0,2000.0
max,2030.0,20.0,2999.0,6.0,3500.0,5000.0


In [17]:
df_rooms.shape[0]

600

In [18]:
df_rooms['type'].unique()

array(['Standard', 'Deluxe', 'Suite', 'Presidential Suite'], dtype=object)

In [19]:
df_rooms['status'].unique()

array(['Available', 'Occupied', 'Maintenance'], dtype=object)

In [20]:
df_rooms['bed_type'].unique()

array(['Queen', 'Double Queen', 'King', 'Double King', 'King + Sofa Bed',
       'King + Multiple Sofa Beds'], dtype=object)

In [21]:
df_rooms['max_occupancy'].unique()

array([2, 3, 4, 6])

In [23]:
df_rooms['basic_amenities'].unique()

array(["['Air Conditioning', 'Smart TV', 'Premium Coffee Maker', 'Mini Fridge', 'Hair Dryer', 'In-Room Safe', 'Work Desk', 'High-Speed WiFi', 'Bathrobes', 'Slippers']",
       '[\'Air Conditioning\', \'55" Smart TV\', \'Nespresso Machine\', \'Mini Fridge\', \'Hair Dryer\', \'In-Room Safe\', \'Work Desk\', \'High-Speed WiFi\', \'Bluetooth Speaker\', \'Microwave\', \'Premium Bathrobes\', \'Designer Slippers\', \'Evening Turndown Service\']',
       '[\'Air Conditioning\', \'65" Smart TV\', \'Nespresso Machine\', \'Full-Size Refrigerator\', \'Hair Dryer\', \'In-Room Safe\', \'Executive Work Desk\', \'Ultra-High-Speed WiFi\', \'Bose Sound System\', \'Microwave\', \'Premium Bathrobes\', \'Designer Slippers\', \'Kitchenette\', \'Dining Area\', \'Living Room Area\', \'Guest Bathroom\', \'Evening Turndown Service\', \'Welcome Amenity\']',
       '[\'Air Conditioning\', \'Multiple 75" Smart TVs\', \'Professional Coffee Bar\', \'Full-Size Refrigerator\', \'Hair Dryer\', \'In-Room Safe\', \'Execu

The columns basic_amenities and additional_amenities are string lists. Convert them to Python lists so we can use them.

In [25]:
import ast
df_rooms['basic_amenities'] = df_rooms['basic_amenities'].apply(ast.literal_eval)
df_rooms['additional_amenities'] = df_rooms['additional_amenities'].apply(ast.literal_eval)

In [26]:
df_rooms

Unnamed: 0_level_0,room_number,floor,type,square_feet,basic_amenities,additional_amenities,max_occupancy,bed_type,view_type,accessibility,status,last_renovation,base_rate,max_rate
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
RM000001,101,1,Standard,410,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[City View],2,Queen,City View,False,Available,2022-10-25,350,500
RM000002,102,1,Standard,361,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[Evening Turndown Service],2,Queen,Standard View,False,Available,2023-03-01,350,500
RM000003,103,1,Standard,431,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[Courtyard View],2,Double Queen,Courtyard View,False,Available,2022-08-10,350,500
RM000004,104,1,Standard,391,"[Air Conditioning, Smart TV, Premium Coffee Ma...","[City View, Courtyard View, Evening Turndown S...",2,Queen,Courtyard View,True,Occupied,2023-09-06,350,500
RM000005,105,1,Standard,373,"[Air Conditioning, Smart TV, Premium Coffee Ma...","[Courtyard View, City View, Evening Turndown S...",2,Double Queen,Evening Turndown Service,True,Occupied,2024-04-13,350,500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
RM000596,2026,20,Presidential Suite,2369,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Pool, Private Terrace, Luxury Car Ser...",6,King + Multiple Sofa Beds,Standard View,False,Maintenance,2024-10-01,3500,5000
RM000597,2027,20,Presidential Suite,2020,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Terrace, Grand Piano, Private Gym Equ...",6,King + Multiple Sofa Beds,Standard View,False,Maintenance,2024-04-17,3500,5000
RM000598,2028,20,Presidential Suite,1883,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Grand Piano, Private Terrace, Dedicated Conci...",6,King + Multiple Sofa Beds,Standard View,False,Occupied,2022-03-09,3500,5000
RM000599,2029,20,Presidential Suite,1643,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Terrace, Private Chef Available, Priv...",6,King + Multiple Sofa Beds,Standard View,False,Maintenance,2023-07-24,3500,5000


There are issues with many rooms' view_type that we'll have to correct.

In [31]:
df_rooms['view_type'].unique()

array(['City View', 'Standard View', 'Courtyard View',
       'Evening Turndown Service', 'Pool View', 'Ocean View', 'Balcony',
       'Lounge Access', 'Soaking Tub', 'Steam Shower', 'Wine Fridge',
       'Corner View', 'Butler Service', 'Jacuzzi Tub',
       'Wraparound Balcony', 'Executive Lounge Access', 'Private Pool',
       'Private Chef Available', 'Grand Piano', 'Luxury Car Service',
       'Steam Room', 'Private Gym Equipment', 'Private Butler Service',
       'Private Terrace', 'Sauna', 'Dedicated Concierge',
       'Panoramic Ocean View'], dtype=object)

240 of the 600 rooms have an incorrectly listed view_type

In [33]:
valid_views = {'City View', 'Standard View', 'Courtyard View', 'Pool View', 'Ocean View', 'Corner View', 'Panoramic Ocean View'}
df_rooms['view_type'][~df_rooms['view_type'].isin(valid_views)]

room_id
RM000005    Evening Turndown Service
RM000015    Evening Turndown Service
RM000017    Evening Turndown Service
RM000018    Evening Turndown Service
RM000025    Evening Turndown Service
                      ...           
RM000588                       Sauna
RM000590      Private Chef Available
RM000594       Private Gym Equipment
RM000595          Luxury Car Service
RM000600      Private Butler Service
Name: view_type, Length: 240, dtype: object

Correct the view_type of all the rooms, and add all the allowed view_type from additional_amenities to all rooms at the same time.

In [34]:
def get_room_views(row):
    # Note: uses valid_views from above
    room_views = set()
    if row['view_type'] in valid_views:
        room_views.add(row['view_type'])
        
    room_views |= set(row['additional_amenities']) & valid_views
    row['view_type'] = list(room_views)
    return row

df_rooms.apply(get_room_views, axis=1)

Unnamed: 0_level_0,room_number,floor,type,square_feet,basic_amenities,additional_amenities,max_occupancy,bed_type,view_type,accessibility,status,last_renovation,base_rate,max_rate
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
RM000001,101,1,Standard,410,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[City View],2,Queen,[City View],False,Available,2022-10-25,350,500
RM000002,102,1,Standard,361,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[Evening Turndown Service],2,Queen,[Standard View],False,Available,2023-03-01,350,500
RM000003,103,1,Standard,431,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[Courtyard View],2,Double Queen,[Courtyard View],False,Available,2022-08-10,350,500
RM000004,104,1,Standard,391,"[Air Conditioning, Smart TV, Premium Coffee Ma...","[City View, Courtyard View, Evening Turndown S...",2,Queen,"[City View, Courtyard View]",True,Occupied,2023-09-06,350,500
RM000005,105,1,Standard,373,"[Air Conditioning, Smart TV, Premium Coffee Ma...","[Courtyard View, City View, Evening Turndown S...",2,Double Queen,"[Courtyard View, City View]",True,Occupied,2024-04-13,350,500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
RM000596,2026,20,Presidential Suite,2369,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Pool, Private Terrace, Luxury Car Ser...",6,King + Multiple Sofa Beds,[Standard View],False,Maintenance,2024-10-01,3500,5000
RM000597,2027,20,Presidential Suite,2020,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Terrace, Grand Piano, Private Gym Equ...",6,King + Multiple Sofa Beds,[Standard View],False,Maintenance,2024-04-17,3500,5000
RM000598,2028,20,Presidential Suite,1883,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Grand Piano, Private Terrace, Dedicated Conci...",6,King + Multiple Sofa Beds,[Standard View],False,Occupied,2022-03-09,3500,5000
RM000599,2029,20,Presidential Suite,1643,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Terrace, Private Chef Available, Priv...",6,King + Multiple Sofa Beds,[Standard View],False,Maintenance,2023-07-24,3500,5000


No rooms are lacking a view_type

In [35]:
df_rooms[df_rooms['view_type'].apply(len) < 1]

Unnamed: 0_level_0,room_number,floor,type,square_feet,basic_amenities,additional_amenities,max_occupancy,bed_type,view_type,accessibility,status,last_renovation,base_rate,max_rate
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1


In [36]:
df_rooms.to_pickle('../data/pandas/rooms.pkl')

## Services

It's not clear what the difference between an amenity and a service is. In fact, the word "services" is used in the category name of most amenities above. Either way, I can have the bot provide information about services if I have time.

In [37]:
df_services = pd.read_csv('../data/csv/services.csv').set_index('service_id')

In [38]:
df_services

Unnamed: 0_level_0,service_type,name,description,duration_minutes,price,department,booking_required,min_notice_hours
service_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
SV000001,Spa Treatment,Spa Treatment - 120 min,Swedish massage for 120 minutes using Modern t...,120,264.98,Spa & Wellness,True,4
SV000002,Spa Treatment,Spa Treatment - 60 min,Luxurious 60-minute Deep Tissue treatment feat...,60,232.03,Spa & Wellness,True,4
SV000003,Spa Treatment,Spa Treatment - 120 min,Premium Deep Tissue experience for 120 minutes...,120,251.99,Spa & Wellness,True,24
SV000004,Spa Treatment,Spa Treatment - 90 min,Luxurious 90-minute Hot Stone treatment featur...,90,157.55,Spa & Wellness,True,1
SV000005,Personal Training,Personal Training - 60 min,Custom 60-minute HIIT workout emphasizing Core...,60,133.13,Fitness Center,True,4
SV000006,Personal Training,Personal Training - 30 min,Personalized HIIT training for 30 minutes targ...,30,138.75,Fitness Center,True,4
SV000007,Personal Training,Personal Training - 60 min,Personalized Functional Fitness training for 6...,60,82.2,Fitness Center,True,24
SV000008,Personal Training,Personal Training - 60 min,Custom 60-minute Strength Training workout emp...,60,143.7,Fitness Center,True,2
SV000009,Wellness Consultation,Wellness Consultation - 60 min,In-depth Wellness Strategy session for 60 minu...,60,94.86,Spa & Wellness,True,1
SV000010,Wellness Consultation,Wellness Consultation - 45 min,Comprehensive 45-minute Nutrition Planning ass...,45,109.6,Spa & Wellness,True,2


In [77]:
df_services.to_pickle('../data/pandas/services.pkl')

# Non-Required Datasets

## Amenity Usage

A secondary priority, but a good idea to include to help build customer preferences. I can use it when interacting with the customer.

In [39]:
df_amenity_usage = pd.read_csv('../data/csv/amenity_usage.csv').set_index('usage_id')

In [40]:
df_amenity_usage

Unnamed: 0_level_0,customer_id,amenity_type,service_name,usage_date,duration_minutes,cost,payment_method,staff_id,staff_name,satisfaction_rating
usage_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AU000001,22293,Luxury Water Sports,Scuba Diving Experience,2025-01-02 12:01:57,240,400,Discover,ST000839,Joseph Myers,5
AU000002,44166,Entertainment Services,Live Music Performance,2024-10-01 12:30:01,120,800,Diners Club / Carte Blanche,ST000409,Sharon Stephenson,5
AU000003,35727,Room Service,Romantic Dinner Setup,2024-02-29 21:27:23,120,150,VISA 16 digit,ST000026,Breanna Santiago,3
AU000004,29773,Fitness Services,Group Fitness Class,2025-01-02 06:47:28,45,35,VISA 16 digit,ST000617,Christopher Willis,2
AU000005,1089,Entertainment Services,Private Movie Screening,2024-11-02 05:20:15,180,300,JCB 16 digit,ST000361,Riley Mason,2
...,...,...,...,...,...,...,...,...,...,...
AU049996,40901,Spa Services,Luxury Facial Treatment,2024-01-21 06:38:56,90,180,VISA 19 digit,ST000106,Christina Knight,2
AU049997,8167,Luxury Water Sports,Private Yacht Charter with Crew,2024-09-16 02:05:00,480,3000,Mastercard,ST000002,Lauren Hooper,2
AU049998,49633,Business Services,Business Center Access,2024-11-07 13:27:22,60,50,VISA 13 digit,ST000214,Sheila Manning,4
AU049999,46855,Business Services,Private Conference Room,2024-10-10 13:06:36,240,200,American Express,ST000681,Nicole Simpson,3


In [41]:
df_amenity_usage.info()

<class 'pandas.core.frame.DataFrame'>
Index: 50000 entries, AU000001 to AU050000
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   customer_id          50000 non-null  int64 
 1   amenity_type         50000 non-null  object
 2   service_name         50000 non-null  object
 3   usage_date           50000 non-null  object
 4   duration_minutes     50000 non-null  int64 
 5   cost                 50000 non-null  int64 
 6   payment_method       50000 non-null  object
 7   staff_id             50000 non-null  object
 8   staff_name           50000 non-null  object
 9   satisfaction_rating  50000 non-null  int64 
dtypes: int64(4), object(6)
memory usage: 4.2+ MB


In [78]:
df_amenity_usage.to_pickle('../data/pandas/amenity_usage.pkl')

## Event Bookings

I probably won't do anything with this as it would require a lot of work to find open slots to add events. It's also clear that somebody accidentally deleted the first row.

In [42]:
df_event_bookings = pd.read_csv('../data/csv/event_bookings.csv').set_index('event_booking_id')

In [43]:
df_event_bookings

Unnamed: 0_level_0,customer_id,event_type,space_id,event_date,duration_hours,attendees,setup_requirements,additional_setup_notes,setup_start_time,setup_completion_time
event_booking_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
B000002,7827,Corporate Meeting,ES000003,2025-08-05 04:19:27,2,18,"Projector and screen, U-shaped table setup, Co...",,2025-08-05 00:19:27,2025-08-05 03:49:27
EB000003,32874,Anniversary,ES000004,2025-06-08 14:48:43,4,143,"Dance floor, Champagne station, Gift table, De...",,2025-06-08 12:48:43,2025-06-08 14:18:43
EB000004,35910,Anniversary,ES000002,2025-12-13 09:29:33,5,31,"Intimate table setup, Photo display area, Danc...",,2025-12-13 07:29:33,2025-12-13 08:59:33
EB000005,16931,Anniversary,ES000004,2025-09-25 11:57:58,3,35,"Gift table, Dance floor, Champagne station, Ph...",,2025-09-25 09:57:58,2025-09-25 11:27:58
EB000006,26370,Birthday,ES000001,2025-05-10 19:00:41,4,16,"Round table setup, Basic audio system, Dance f...",,2025-05-10 16:00:41,2025-05-10 18:30:41
...,...,...,...,...,...,...,...,...,...,...
EB049996,27738,Wedding,ES000002,2025-07-13 00:31:52,6,227,"Photo booth area, Dance floor installation, Au...",,2025-07-12 22:31:52,2025-07-13 00:01:52
EB049997,8898,Conference,ES000003,2025-11-29 02:02:11,6,46,"Power strips for laptops, Whiteboard or flipch...",,2025-11-29 00:02:11,2025-11-29 01:32:11
EB049998,42961,Anniversary,ES000003,2025-10-18 23:48:16,4,138,"Decorative lighting, Gift table, Dance floor, ...",,2025-10-18 20:48:16,2025-10-18 23:18:16
EB049999,28372,Conference,ES000004,2025-04-08 23:29:28,8,199,"Theater-style seating, Podium and microphone, ...",,2025-04-08 19:29:28,2025-04-08 22:59:28


In [44]:
df_event_bookings = df_event_bookings.rename(index={'B000002':'EB000002'})
df_event_bookings

Unnamed: 0_level_0,customer_id,event_type,space_id,event_date,duration_hours,attendees,setup_requirements,additional_setup_notes,setup_start_time,setup_completion_time
event_booking_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
EB000002,7827,Corporate Meeting,ES000003,2025-08-05 04:19:27,2,18,"Projector and screen, U-shaped table setup, Co...",,2025-08-05 00:19:27,2025-08-05 03:49:27
EB000003,32874,Anniversary,ES000004,2025-06-08 14:48:43,4,143,"Dance floor, Champagne station, Gift table, De...",,2025-06-08 12:48:43,2025-06-08 14:18:43
EB000004,35910,Anniversary,ES000002,2025-12-13 09:29:33,5,31,"Intimate table setup, Photo display area, Danc...",,2025-12-13 07:29:33,2025-12-13 08:59:33
EB000005,16931,Anniversary,ES000004,2025-09-25 11:57:58,3,35,"Gift table, Dance floor, Champagne station, Ph...",,2025-09-25 09:57:58,2025-09-25 11:27:58
EB000006,26370,Birthday,ES000001,2025-05-10 19:00:41,4,16,"Round table setup, Basic audio system, Dance f...",,2025-05-10 16:00:41,2025-05-10 18:30:41
...,...,...,...,...,...,...,...,...,...,...
EB049996,27738,Wedding,ES000002,2025-07-13 00:31:52,6,227,"Photo booth area, Dance floor installation, Au...",,2025-07-12 22:31:52,2025-07-13 00:01:52
EB049997,8898,Conference,ES000003,2025-11-29 02:02:11,6,46,"Power strips for laptops, Whiteboard or flipch...",,2025-11-29 00:02:11,2025-11-29 01:32:11
EB049998,42961,Anniversary,ES000003,2025-10-18 23:48:16,4,138,"Decorative lighting, Gift table, Dance floor, ...",,2025-10-18 20:48:16,2025-10-18 23:18:16
EB049999,28372,Conference,ES000004,2025-04-08 23:29:28,8,199,"Theater-style seating, Podium and microphone, ...",,2025-04-08 19:29:28,2025-04-08 22:59:28


In [45]:
df_event_bookings.info()

<class 'pandas.core.frame.DataFrame'>
Index: 49999 entries, EB000002 to EB050000
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   customer_id             49999 non-null  int64 
 1   event_type              49999 non-null  object
 2   space_id                49999 non-null  object
 3   event_date              49999 non-null  object
 4   duration_hours          49999 non-null  int64 
 5   attendees               49999 non-null  int64 
 6   setup_requirements      49999 non-null  object
 7   additional_setup_notes  14923 non-null  object
 8   setup_start_time        49999 non-null  object
 9   setup_completion_time   49999 non-null  object
dtypes: int64(3), object(7)
memory usage: 4.2+ MB


In [79]:
df_event_bookings.to_pickle('../data/pandas/event_bookings.pkl')

## Event Spaces

In [46]:
df_event_spaces = pd.read_csv('../data/csv/event_spaces.csv').set_index('space_id')

There are only four event spaces.

In [47]:
df_event_spaces

Unnamed: 0_level_0,name,location,capacity,square_feet,price_per_hour,features,layout_options,min_booking_hours,availability,catering_available,setup_time,cleanup_time,accessibility
space_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ES000001,Grand Ballroom,Main Floor,500,10000,1000,"Stage, Dance Floor, Premium AV System, Chandel...","Theater, Banquet, Classroom, Reception",2,7:00-23:00,True,60,60,Full ADA Compliance
ES000002,Conference Room A,Business Level,100,2000,300,"Projector, Video Conference System, Whiteboard...","Boardroom, U-Shape, Classroom",1,24/7,True,30,30,Full ADA Compliance
ES000003,Garden Terrace,Outdoor,200,5000,500,"Garden Views, Covered Area, Outdoor Lighting, ...","Reception, Banquet, Ceremony",1,7:00-23:00,True,30,30,Partial Accessibility
ES000004,Rooftop Lounge,Top Floor,150,3000,600,"Ocean View, Bar Area, Lounge Furniture, Sound ...","Reception, Cocktail, Lounge",1,7:00-23:00,True,30,30,Full ADA Compliance


In [80]:
df_event_spaces.to_pickle('../data/pandas/event_spaces.pkl')

## Event Tracking

Has nothing to do with event spaces. Rather, it tracks official events in the hotel. I could work with this table by just giving the agent a staff_id and adding to the table whenever the agent books a room for a customer.

In [48]:
df_event_tracking = pd.read_csv('../data/csv/event_tracking.csv').set_index('event_id')

In [49]:
df_event_tracking

Unnamed: 0_level_0,booking_id,event_type,event_name,timestamp,details,staff_id,status
event_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
EVT000001,BK000001,Booking,Booking Created,2024-04-01 20:19:29,Initial booking created through website,ST000058,Completed
EVT000002,BK000001,Check-in,Guest Arrived,2024-04-21 20:19:29,Express check-in completed,ST000015,Completed
EVT000003,BK000001,Check-in,Key Issued,2024-04-21 20:24:29,Standard processing completed,ST000010,Completed
EVT000004,BK000001,Stay,Extra Amenities Delivered,2024-04-23 16:19:29,Follow-up scheduled if needed,ST000083,Completed
EVT000005,BK000001,Stay,VIP Service Provided,2024-04-24 05:19:29,Guest satisfaction confirmed,ST000044,Completed
...,...,...,...,...,...,...,...
EVT613225,BK050000,Stay,Noise Complaint,2025-02-07 22:35:23,Service delivered as requested,ST000049,Completed
EVT613226,BK050000,Stay,VIP Service Provided,2025-02-08 15:35:23,Guest satisfaction confirmed,ST000043,Completed
EVT613227,BK050000,Stay,Housekeeping Visit,2025-02-09 20:35:23,Service delivered as requested,ST000073,Completed
EVT613228,BK050000,Check-out,Final Bill Generated,2025-02-10 10:30:23,Express check-out processed,ST000010,Completed


In [50]:
df_event_tracking.info()

<class 'pandas.core.frame.DataFrame'>
Index: 613229 entries, EVT000001 to EVT613229
Data columns (total 7 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   booking_id  613229 non-null  object
 1   event_type  613229 non-null  object
 2   event_name  613229 non-null  object
 3   timestamp   613229 non-null  object
 4   details     613229 non-null  object
 5   staff_id    613229 non-null  object
 6   status      613229 non-null  object
dtypes: object(7)
memory usage: 37.4+ MB


In [81]:
df_event_tracking.to_pickle('../data/pandas/event_tracking.pkl')

## Feedback

I'll probably do nothing with this one. It's pretty far outside the scope of the requirements.

In [51]:
df_feedback = pd.read_csv('../data/csv/feedback.csv').set_index('feedback_id')

In [52]:
df_feedback

Unnamed: 0_level_0,booking_id,customer_id,rating,feedback_text,feedback_date,sentiment,category,subcategory,feedback_source,language,...,helpful_votes,satisfaction_scores,tags,status,response_required,response_text,staff_response,response_date,response_time_hours,resolved
feedback_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
FB000001,BK046730,48297,4,The room met standard expectations. air condit...,2024-12-12 13:00:19,neutral,room,cleanliness,email,de,...,60,"{'cleanliness': 5, 'service': 1, 'value': 3, '...","comfort, room quality, amenities",resolved,False,We appreciate your honest feedback about our r...,We appreciate your honest feedback about our r...,2024-12-13 20:00:19,31.0,True
FB000002,BK036590,26661,1,I was disappointed with the room experience. a...,2025-05-10 13:00:01,negative,room,maintenance,email,de,...,15,"{'cleanliness': 1, 'service': 4, 'value': 1, '...","comfort, maintenance",pending,True,We're sorry to hear about your issues with our...,We're sorry to hear about your issues with our...,2025-05-11 11:00:01,22.0,False
FB000003,BK001092,9972,5,What an outstanding service experience! effici...,2024-03-29 21:07:50,positive,service,valet,mobile,fr,...,44,"{'cleanliness': 1, 'service': 3, 'value': 2, '...",professionalism,resolved,True,We're delighted to hear about your experience ...,,2024-03-31 13:07:50,40.0,True
FB000004,BK034638,26872,5,What an outstanding dining experience! authent...,2025-11-07 11:00:08,positive,dining,menu variety,mobile,es,...,26,"{'cleanliness': 1, 'service': 3, 'value': 4, '...","menu, restaurant",resolved,True,Your kind words about our dining mean a lot to...,Your kind words about our dining mean a lot to...,2025-11-08 23:00:08,36.0,True
FB000005,BK011053,3094,4,The service exceeded all my expectations. quic...,2025-11-07 14:00:54,positive,service,front desk,in-person,es,...,96,"{'cleanliness': 5, 'service': 4, 'value': 4, '...","responsiveness, service quality, professionalism",resolved,False,We're delighted to hear about your experience ...,,2025-11-08 05:00:54,15.0,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
FB016662,BK008239,2428,4,I had an exceptional room experience during my...,2024-07-10 11:00:26,positive,room,cleanliness,mobile,zh,...,93,"{'cleanliness': 3, 'service': 3, 'value': 5, '...","housekeeping, amenities",resolved,False,Thank you for your wonderful feedback about ou...,,2024-07-11 09:00:26,22.0,True
FB016663,BK037149,36745,5,I had an exceptional amenities experience duri...,2025-02-26 16:00:53,positive,amenities,pool,mobile,de,...,75,"{'cleanliness': 3, 'service': 5, 'value': 2, '...","quality, amenities, availability",resolved,False,Your kind words about our amenities mean a lot...,,2025-02-27 10:00:53,18.0,True
FB016664,BK013042,29709,4,The dining was adequate for my needs. food tem...,2024-02-20 11:00:57,neutral,dining,value,in-person,es,...,68,"{'cleanliness': 2, 'service': 4, 'value': 1, '...","restaurant, menu",resolved,False,We appreciate your honest feedback about our d...,,2024-02-21 00:00:57,13.0,True
FB016665,BK030260,544,5,I had an exceptional service experience during...,2025-05-30 14:00:32,positive,service,housekeeping,in-person,en,...,65,"{'cleanliness': 5, 'service': 2, 'value': 1, '...",responsiveness,resolved,False,Your kind words about our service mean a lot t...,Your kind words about our service mean a lot t...,2025-05-31 13:00:32,23.0,True


In [53]:
df_feedback.info()

<class 'pandas.core.frame.DataFrame'>
Index: 16666 entries, FB000001 to FB016666
Data columns (total 21 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   booking_id           16666 non-null  object 
 1   customer_id          16666 non-null  int64  
 2   rating               16666 non-null  int64  
 3   feedback_text        16666 non-null  object 
 4   feedback_date        16666 non-null  object 
 5   sentiment            16666 non-null  object 
 6   category             16666 non-null  object 
 7   subcategory          16666 non-null  object 
 8   feedback_source      16666 non-null  object 
 9   language             16666 non-null  object 
 10  is_verified_stay     16666 non-null  bool   
 11  helpful_votes        16666 non-null  int64  
 12  satisfaction_scores  16666 non-null  object 
 13  tags                 16666 non-null  object 
 14  status               16666 non-null  object 
 15  response_required    16666 non-

In [82]:
df_feedback.to_pickle('../data/pandas/feedback.pkl')

## Payments

I may add if I have time. I have no idea how to calculate the processing fee, so I'd probably just choose a random percentage.

In [54]:
df_payments = pd.read_csv('../data/csv/payments.csv').set_index('payment_id')

In [55]:
df_payments

Unnamed: 0_level_0,booking_id,amount,payment_method,payment_provider,status,timestamp,processing_fee,total_charged
payment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
PAY000001,BK000001,12550.217065,Credit Card,American Express,Completed,2024-03-28 20:19:29,364.256295,12914.473360
PAY000002,BK000002,1575.092632,Bank Transfer,Wire Transfer,Completed,2024-12-10 04:16:46,18.750926,1593.843559
PAY000003,BK000002,7875.463162,Bank Transfer,Wire Transfer,Completed,2024-12-13 04:16:46,93.754632,7969.217794
PAY000004,BK000003,3890.886356,Debit Card,MasterCard Debit,Completed,2024-08-23 15:36:00,58.563295,3949.449651
PAY000005,BK000004,7474.905837,Corporate Account,Corporate Card,Completed,2024-05-19 22:47:30,74.749058,7549.654895
...,...,...,...,...,...,...,...,...
PAY056829,BK049996,9326.472597,Corporate Account,Corporate Card,Completed,2025-03-14 06:15:43,93.264726,9419.737323
PAY056830,BK049997,3446.171316,Debit Card,Visa Debit,Completed,2024-05-05 10:24:06,51.892570,3498.063886
PAY056831,BK049998,7987.745199,Credit Card,MasterCard,Completed,2025-04-21 13:29:16,231.944611,8219.689809
PAY056832,BK049999,493.968508,Credit Card,Discover,Completed,2024-06-12 15:27:46,14.625087,508.593595


In [56]:
df_payments.info()

<class 'pandas.core.frame.DataFrame'>
Index: 56833 entries, PAY000001 to PAY056833
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   booking_id        56833 non-null  object 
 1   amount            56833 non-null  float64
 2   payment_method    56833 non-null  object 
 3   payment_provider  56833 non-null  object 
 4   status            56833 non-null  object 
 5   timestamp         56833 non-null  object 
 6   processing_fee    56833 non-null  float64
 7   total_charged     56833 non-null  float64
dtypes: float64(3), object(5)
memory usage: 3.9+ MB


In [83]:
df_payments.to_pickle('../data/pandas/payments.pkl')

## Promotions

Will implement these if I have a chance. May be complicated if the LLM can't just understand what they're saying.

In [57]:
df_promos = pd.read_csv('../data/csv/promotions.csv').set_index('promo_id')

In [58]:
df_promos

Unnamed: 0_level_0,name,description,discount_type,discount_value,min_stay,applicable_room_types,start_date,end_date,blackout_dates,terms_conditions,booking_code,status
promo_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
PR000001,Early Bird Special,Book 60 days in advance and save 20% on your stay,Percentage,20,0,"Standard, Deluxe, Suite",2025-01-21,2025-05-24,,Cancellation policies apply; please refer to t...,PROMO4797,Ended
PR000002,Weekend Getaway,Special weekend rates including breakfast and ...,Package,15,2,"Deluxe, Suite",2025-01-19,2025-03-12,,Advance booking required; subject to availabil...,PROMO8026,Scheduled
PR000003,Summer Sale,Enjoy 25% off during summer months,Percentage,25,3,All,2025-01-08,2025-05-12,,Advance booking required; subject to availabil...,PROMO4398,Ended
PR000004,Long Stay Discount,Stay 7 nights or more and receive 30% off,Percentage,30,7,All,2025-01-11,2025-07-04,,Promotional rates are non-refundable and non-t...,PROMO8571,Scheduled
PR000005,Spa Package,$200 spa credit with minimum 2-night stay,Fixed Amount,200,2,"Deluxe, Suite, Presidential Suite",2025-01-31,2025-05-04,2025-01-04,Advance booking required; subject to availabil...,PROMO1558,Active


In [84]:
df_promos.to_pickle('../data/pandas/promotions.pkl')

## Recommendations Knowledge Base

This isn't required, and using it would require correlating the hotel's location with the the places being recommended. I don't know the hotel's location, so basically it's a no go.

In [59]:
df_recommendations = pd.read_csv('../data/csv/recommendations_knowledge_base.csv').set_index('recommendation_id')

In [60]:
df_recommendations

Unnamed: 0_level_0,category,name,description,address,distance_km,price_range,rating,review_count,booking_required,seasonal,tags,keywords,last_verified
recommendation_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
REC000001,dining,International restaurant,Diverse menu with global flavors,"163 Nancy Streets, Janiceside, Kansas 72333",0.6,$,4.7,343,True,True,Romantic,"dining, international restaurant, $, popular, ...",2025-01-03
REC000002,dining,Local fine dining restaurant,Award-winning restaurant featuring seasonal lo...,"317 John Plains, Richardstad, Tennessee 62606",3.0,$$,4.3,988,True,False,Family Friendly,"dining, local fine dining restaurant, $$, loca...",2025-01-03
REC000003,activities,Adventure activity,Exciting outdoor adventure,"9197 Richard Wall, Christopherton, Louisiana 7...",2.6,$,4.8,273,True,True,Family Friendly,"activities, adventure activity, $, tourist spo...",2025-01-03
REC000004,entertainment,Art gallery,Contemporary art exhibitions,"8471 Smith Highway, West Tammyton, North Dakot...",1.0,$,4.4,565,True,False,"Budget, Family Friendly, Luxury","entertainment, art gallery, $, local favorite,...",2025-01-03
REC000005,activities,Adventure activity,Exciting outdoor adventure,"019 Troy Pike, Merritttown, Vermont 19256",1.4,$,4.9,296,False,True,"Business, Local Experience","activities, adventure activity, $, hidden gem,...",2025-01-03
...,...,...,...,...,...,...,...,...,...,...,...,...,...
REC000096,transportation,Walking tour,Guided walking tours,"3395 Emily Wells, Charleshaven, Nevada 28620",1.4,$,4.3,916,False,True,Romantic,"transportation, walking tour, $, popular, casu...",2025-01-03
REC000097,entertainment,Night club,Trendy nightlife spot,"4748 Michael Spring, North Kathleen, Tennessee...",2.5,$$,4.4,888,True,True,"Local Experience, Budget","entertainment, night club, $$, hidden gem, cas...",2025-01-03
REC000098,activities,Historical site tour,Guided tour of historic landmarks,"73785 Andres Road, South Aaron, Wisconsin 54881",8.8,$$,4.2,390,True,True,"Local Experience, Luxury, Business","activities, historical site tour, $$, popular,...",2025-01-03
REC000099,entertainment,Live music venue,Vibrant live music scene,"07104 Patrick Squares, Port Blakehaven, Michig...",1.8,$$$$,5.0,834,True,False,"Business, Budget","entertainment, live music venue, $$$$, popular...",2025-01-03


In [61]:
df_recommendations.info()

<class 'pandas.core.frame.DataFrame'>
Index: 100 entries, REC000001 to REC000100
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   category          100 non-null    object 
 1   name              100 non-null    object 
 2   description       100 non-null    object 
 3   address           100 non-null    object 
 4   distance_km       100 non-null    float64
 5   price_range       100 non-null    object 
 6   rating            100 non-null    float64
 7   review_count      100 non-null    int64  
 8   booking_required  100 non-null    bool   
 9   seasonal          100 non-null    bool   
 10  tags              100 non-null    object 
 11  keywords          100 non-null    object 
 12  last_verified     100 non-null    object 
dtypes: bool(2), float64(2), int64(1), object(8)
memory usage: 9.6+ KB


In [86]:
df_recommendations.to_pickle('../data/pandas/recommendations_knowledge_base.pkl')

## Restaurant Bookings

Outside the scope of our requirements. I also know nothing about the restaurants, so have no information to base bookings on. All I could do is set up some restaurant names and time slots then accept party size information and special requests.

In [62]:
df_restaurant_bookings = pd.read_csv('../data/csv/restaurant_bookings.csv').set_index('booking_id')

In [63]:
df_restaurant_bookings

Unnamed: 0_level_0,customer_id,restaurant,booking_date,time_slot,party_size,special_requests,dietary_requirements
booking_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
RB000001,22768,Sunset Restaurant,2024-02-01 10:42:11,17:00,5,Booster seat needed; Bar seating; Custom menu ...,Gluten-free
RB000002,23381,Skyline Lounge,2025-09-17 20:28:11,19:00,2,Vegan menu options,Vegan
RB000003,5494,Skyline Lounge,2025-10-27 22:01:28,15:00,7,Private booth; Custom menu tasting,
RB000004,1444,Skyline Lounge,2024-11-26 06:38:57,14:00,4,Anniversary celebration - special dessert requ...,Vegetarian
RB000005,45122,Blue Horizon Main,2024-08-13 02:56:35,20:00,7,Nut-free preparation required; Extra space for...,Vegetarian
...,...,...,...,...,...,...,...
RB049996,48388,Sunset Restaurant,2024-01-09 21:11:18,13:00,8,Shellfish allergy - strict preparation require...,Gluten-free
RB049997,18492,Blue Horizon Main,2024-12-17 13:39:02,15:00,6,Private booth; Gluten-free options required; B...,Halal
RB049998,38430,Blue Horizon Main,2025-02-06 15:41:41,18:00,3,Business dinner - discrete service requested,Gluten-free
RB049999,14926,Skyline Lounge,2024-10-26 07:49:34,21:00,3,Romantic dinner setup requested; Kosher meal o...,Halal


In [64]:
df_restaurant_bookings.info()

<class 'pandas.core.frame.DataFrame'>
Index: 50000 entries, RB000001 to RB050000
Data columns (total 7 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   customer_id           50000 non-null  int64 
 1   restaurant            50000 non-null  object
 2   booking_date          50000 non-null  object
 3   time_slot             50000 non-null  object
 4   party_size            50000 non-null  int64 
 5   special_requests      45842 non-null  object
 6   dietary_requirements  39912 non-null  object
dtypes: int64(2), object(5)
memory usage: 3.1+ MB


In [87]:
df_restaurant_bookings.to_pickle('../data/pandas/restaurant_bookings.pkl')

## Room Bookings

This is perhaps a secondary priority, but a good idea to include. I don't have a way to add all the information that goes into this table. However, I can use the information when interacting with the customer.

In [65]:
df_room_bookings = pd.read_csv('../data/csv/room_bookings.csv').set_index('booking_id')

In [66]:
df_room_bookings

Unnamed: 0_level_0,customer_id,room_number,room_type,check_in,check_out,duration_days,num_adults,num_children,loyalty_tier,special_amenities,special_requests,booking_status,payment_method,total_amount,points_earned
booking_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
BK000001,20963,1730,Suite,2024-04-21 20:19:29,2024-04-30 14:00:29,9,3,0,Gold,"Executive Lounge Access, Premium Welcome Drink",,Cancelled,JCB 16 digit,12550.217065,25100
BK000002,11750,928,Deluxe,2025-01-11 04:16:46,2025-01-24 16:16:46,12,1,2,Titanium,"Luxury Airport Transfer, Custom Room Fragrance",,Checked-in,Diners Club / Carte Blanche,7875.463162,31501
BK000003,2178,525,Deluxe,2024-08-24 15:36:00,2024-09-01 16:00:00,8,1,1,Platinum,"Premium Amenities, Airport Transfer",,Confirmed,VISA 19 digit,3890.886356,11672
BK000004,19877,113,Standard,2024-05-27 22:47:30,2024-06-10 13:00:30,14,2,0,Silver,"Premium WiFi, Fruit Basket",,Checked-in,JCB 16 digit,7474.905837,11212
BK000005,44458,1002,Deluxe,2025-11-11 02:01:34,2025-11-19 11:00:34,8,1,1,Bronze,"Basic WiFi, Welcome Drink",,Cancelled,JCB 16 digit,4426.074946,4426
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
BK049996,22089,2019,Presidential Suite,2025-04-03 06:15:43,2025-04-06 18:15:43,2,2,1,Titanium,"Premium Welcome Gift, Custom Room Fragrance","Near swimming pool, Connecting rooms needed, K...",Checked-in,Maestro,9326.472597,37305
BK049997,35986,614,Deluxe,2024-05-21 10:24:06,2024-05-29 16:00:06,8,1,0,Platinum,"Evening Turndown Service, Airport Transfer",,Confirmed,VISA 16 digit,3446.171316,10338
BK049998,2468,830,Deluxe,2025-05-03 13:29:16,2025-05-18 01:29:16,13,2,1,Titanium,"Private Butler Service, Personal Chef Available","Connecting rooms needed, High floor room reque...",Checked-in,VISA 16 digit,7987.745199,31950
BK049999,5728,206,Standard,2024-06-14 15:27:46,2024-06-15 11:00:46,1,2,0,Bronze,"Basic WiFi, Welcome Drink","High floor room requested, Memory foam pillows...",Cancelled,Discover,493.968508,493


Only special_requests has null values.

In [67]:
df_room_bookings.info()

<class 'pandas.core.frame.DataFrame'>
Index: 50000 entries, BK000001 to BK050000
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   customer_id        50000 non-null  int64  
 1   room_number        50000 non-null  int64  
 2   room_type          50000 non-null  object 
 3   check_in           50000 non-null  object 
 4   check_out          50000 non-null  object 
 5   duration_days      50000 non-null  int64  
 6   num_adults         50000 non-null  int64  
 7   num_children       50000 non-null  int64  
 8   loyalty_tier       50000 non-null  object 
 9   special_amenities  50000 non-null  object 
 10  special_requests   15138 non-null  object 
 11  booking_status     50000 non-null  object 
 12  payment_method     50000 non-null  object 
 13  total_amount       50000 non-null  float64
 14  points_earned      50000 non-null  int64  
dtypes: float64(1), int64(6), object(8)
memory usage: 6.1+ MB


In [88]:
df_room_bookings.to_pickle('../data/pandas/room_bookings.pkl')

## Service Appointments

Another secondary priority. I don't have all the information necessary to add rows to this table, but it could be useful to use when answering a customer's questions.

In [68]:
df_service_appointments = pd.read_csv('../data/csv/service_appointments.csv').set_index('appointment_id')

In [69]:
df_service_appointments

Unnamed: 0_level_0,booking_id,customer_id,service_type,department,appointment_date,appointment_end,duration_minutes,description,cost,status,staff_id,staff_name
appointment_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
SA000001,BK000097,9928,Wellness Consultation,Spa & Wellness,2025-01-30 09:00:00,2025-01-30 09:45:00,45,In-depth Health Coaching session for 45 minute...,170.16,Scheduled,ST000973,Staff Member
SA000002,BK000117,30189,Massage Therapy,Spa & Wellness,2025-01-20 09:00:00,2025-01-20 10:00:00,60,Specialized Relaxation therapy session for 60 ...,114.39,Scheduled,ST000879,Staff Member
SA000003,BK000117,30189,Spa Treatment,Spa & Wellness,2025-01-20 10:00:00,2025-01-20 12:00:00,120,Luxurious 120-minute Deep Tissue treatment fea...,248.11,Scheduled,ST000879,Staff Member
SA000004,BK000245,43542,Spa Treatment,Spa & Wellness,2025-01-28 09:00:00,2025-01-28 10:00:00,60,Premium Hot Stone experience for 60 minutes in...,182.03,Scheduled,ST000142,Staff Member
SA000005,BK000245,43542,Massage Therapy,Spa & Wellness,2025-01-28 10:00:00,2025-01-28 11:30:00,90,Targeted 90-minute Relaxation treatment employ...,168.93,Scheduled,ST000142,Staff Member
...,...,...,...,...,...,...,...,...,...,...,...,...
SA001366,BK049696,18145,Spa Treatment,Spa & Wellness,2025-01-15 13:45:00,2025-01-15 15:45:00,120,Luxurious 120-minute Aromatherapy treatment fe...,162.43,Scheduled,ST000056,Staff Member
SA001367,BK049722,6741,Spa Treatment,Spa & Wellness,2025-01-15 09:00:00,2025-01-15 10:30:00,90,Premium Aromatherapy experience for 90 minutes...,201.28,Scheduled,ST000298,Staff Member
SA001368,BK049722,6741,Wellness Consultation,Spa & Wellness,2025-01-14 11:15:00,2025-01-14 12:00:00,45,45-minute Wellness Strategy consultation focus...,114.13,Scheduled,ST000191,Staff Member
SA001369,BK049722,6741,Wellness Consultation,Spa & Wellness,2025-01-15 15:45:00,2025-01-15 16:30:00,45,45-minute Lifestyle Assessment consultation fo...,153.38,Scheduled,ST000056,Staff Member


In [70]:
df_service_appointments.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1370 entries, SA000001 to SA001370
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   booking_id        1370 non-null   object 
 1   customer_id       1370 non-null   int64  
 2   service_type      1370 non-null   object 
 3   department        1370 non-null   object 
 4   appointment_date  1370 non-null   object 
 5   appointment_end   1370 non-null   object 
 6   duration_minutes  1370 non-null   int64  
 7   description       1370 non-null   object 
 8   cost              1370 non-null   float64
 9   status            1370 non-null   object 
 10  staff_id          1370 non-null   object 
 11  staff_name        1370 non-null   object 
dtypes: float64(1), int64(2), object(9)
memory usage: 139.1+ KB


In [89]:
df_service_appointments.to_pickle('../data/pandas/service_appointments.pkl')

## Staff

I won't use this. There's no good way to correlate it with anything besides the staff schedules, which has its own problems.

In [71]:
df_staff = pd.read_csv('../data/csv/staff.csv').set_index('staff_id')

In [72]:
df_staff

Unnamed: 0_level_0,first_name,last_name,email,phone,hire_date,department,position,schedule,certifications
staff_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ST000001,Melissa,Davis,melissa.davis@bluehorizon.com,(969)432-1482x1480,2022-07-24 03:17:39,Security,Security Manager,Night,
ST000002,Lauren,Hooper,lauren.hooper@bluehorizon.com,889-257-8085x08557,2021-07-08 01:05:20,Maintenance,Engineer,Afternoon,
ST000003,Holly,Owens,holly.owens@bluehorizon.com,178.127.9118x4786,2023-03-27 20:04:45,Front Desk,Guest Relations Manager,Afternoon,
ST000004,Kathryn,Brock,kathryn.brock@bluehorizon.com,075.768.1729x927,2020-03-22 02:32:30,Administration,Accountant,Afternoon,"Hospitality Management, Security License, Firs..."
ST000005,Brandy,Burke,brandy.burke@bluehorizon.com,131.302.7014,2021-02-26 04:11:22,Maintenance,Engineer,Night,"Massage Therapy License, First Aid"
...,...,...,...,...,...,...,...,...,...
ST000996,James,Stevenson,james.stevenson@bluehorizon.com,966.401.8159x25687,2021-07-19 00:21:21,Front Desk,Receptionist,Rotating,
ST000997,John,Walters,john.walters@bluehorizon.com,+1-296-295-3179x51567,2021-08-19 03:48:11,Administration,HR Specialist,Morning,
ST000998,Antonio,Burgess,antonio.burgess@bluehorizon.com,(602)807-4185x48168,2024-08-14 07:30:37,Front Desk,Receptionist,Night,First Aid
ST000999,Robert,Barrett,robert.barrett@bluehorizon.com,(836)075-9733,2024-01-08 22:20:42,Housekeeping,Supervisor,Rotating,


In [73]:
df_staff.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1000 entries, ST000001 to ST001000
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   first_name      1000 non-null   object
 1   last_name       1000 non-null   object
 2   email           1000 non-null   object
 3   phone           1000 non-null   object
 4   hire_date       1000 non-null   object
 5   department      1000 non-null   object
 6   position        1000 non-null   object
 7   schedule        1000 non-null   object
 8   certifications  290 non-null    object
dtypes: object(9)
memory usage: 78.1+ KB


In [90]:
df_staff.to_pickle('../data/pandas/staff.pkl')

## Staff Schedules

This table is severely lacking. For me to make use of it, I need to be able to correlate it with the amenities and services, and it's missing most of that information.

In [74]:
df_staff_schedules = pd.read_csv('../data/csv/staff_schedules.csv').set_index(['staff_id', 'date'])

In [75]:
df_staff_schedules

Unnamed: 0_level_0,Unnamed: 1_level_0,shift,department
staff_id,date,Unnamed: 2_level_1,Unnamed: 3_level_1
ST000211,2025-01-04 03:03:28.336604,Morning,Security
ST000301,2025-01-04 03:03:28.336604,Morning,Security
ST000749,2025-01-04 03:03:28.336604,Morning,Security
ST000200,2025-01-04 03:03:28.336604,Afternoon,Security
ST000211,2025-01-04 03:03:28.336604,Afternoon,Security
...,...,...,...
ST000654,2025-02-03 03:03:28.336604,Afternoon,Spa & Wellness
ST000472,2025-02-03 03:03:28.336604,Afternoon,Spa & Wellness
ST000832,2025-02-03 03:03:28.336604,Night,Spa & Wellness
ST000103,2025-02-03 03:03:28.336604,Night,Spa & Wellness


In [76]:
df_staff_schedules.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 2046 entries, ('ST000211', '2025-01-04 03:03:28.336604') to ('ST000870', '2025-02-03 03:03:28.336604')
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   shift       2046 non-null   object
 1   department  2046 non-null   object
dtypes: object(2)
memory usage: 77.9+ KB


In [91]:
df_staff_schedules.to_pickle('../data/pandas/staff_schedules.pkl')