The data is hard to read in text format, so import it using Pandas, check it out, and perform fixes as necessary.

In [1]:
import pandas as pd

## Room Availability

In [2]:
df_room_availability = pd.read_csv('../data/csv/room_availability.csv')
df_room_availability.head(20)

Unnamed: 0,room_id,room_number,date,status,price,max_occupancy
0,RM000001,101,2025-01-04,Booked,,2
1,RM000001,101,2025-01-05,Booked,,2
2,RM000001,101,2025-01-06,Booked,,2
3,RM000001,101,2025-01-07,Booked,,2
4,RM000001,101,2025-01-08,Booked,,2
5,RM000001,101,2025-01-09,Booked,,2
6,RM000001,101,2025-01-10,Booked,,2
7,RM000001,101,2025-01-11,Booked,,2
8,RM000001,101,2025-01-12,Booked,,2
9,RM000001,101,2025-01-13,Booked,,2


In [3]:
df_room_availability.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 219000 entries, 0 to 218999
Data columns (total 6 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   room_id        219000 non-null  object 
 1   room_number    219000 non-null  int64  
 2   date           219000 non-null  object 
 3   status         219000 non-null  object 
 4   price          87377 non-null   float64
 5   max_occupancy  219000 non-null  int64  
dtypes: float64(1), int64(2), object(3)
memory usage: 10.0+ MB


Check the unique statuses for the rooms

In [4]:
df_room_availability['status'].unique()

array(['Booked', 'Available', 'Maintenance'], dtype=object)

Sanity check: every room is listed for every date

In [5]:
print(f'Number of days listed: {df_room_availability['date'].unique().shape[0]}')
print(f'Number of rooms listed: {df_room_availability['room_number'].unique().shape[0]}')
print(f'Product: {df_room_availability['date'].unique().shape[0]*df_room_availability['room_number'].unique().shape[0]}')
print(f'Number of rows: {df_room_availability.shape[0]}')

Number of days listed: 365
Number of rooms listed: 600
Product: 219000
Number of rows: 219000


## Rooms

In [6]:
df_rooms = pd.read_csv('../data/csv/rooms.csv').set_index('room_id')

Right away we can see an issue with view_type. "Evening Turndown Service" is not a view type. Also notice that some rooms have multiple view types listed under additional_amenities, but not under view_type. We'll have a bit of work to do there.

In [7]:
df_rooms.head()

Unnamed: 0_level_0,room_number,floor,type,square_feet,basic_amenities,additional_amenities,max_occupancy,bed_type,view_type,accessibility,status,last_renovation,base_rate,max_rate
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
RM000001,101,1,Standard,410,"['Air Conditioning', 'Smart TV', 'Premium Coff...",['City View'],2,Queen,City View,False,Available,2022-10-25,350,500
RM000002,102,1,Standard,361,"['Air Conditioning', 'Smart TV', 'Premium Coff...",['Evening Turndown Service'],2,Queen,Standard View,False,Available,2023-03-01,350,500
RM000003,103,1,Standard,431,"['Air Conditioning', 'Smart TV', 'Premium Coff...",['Courtyard View'],2,Double Queen,Courtyard View,False,Available,2022-08-10,350,500
RM000004,104,1,Standard,391,"['Air Conditioning', 'Smart TV', 'Premium Coff...","['City View', 'Courtyard View', 'Evening Turnd...",2,Queen,Courtyard View,True,Occupied,2023-09-06,350,500
RM000005,105,1,Standard,373,"['Air Conditioning', 'Smart TV', 'Premium Coff...","['Courtyard View', 'City View', 'Evening Turnd...",2,Double Queen,Evening Turndown Service,True,Occupied,2024-04-13,350,500


All items are non-null, which is convenient.

In [8]:
df_rooms.info()

<class 'pandas.core.frame.DataFrame'>
Index: 600 entries, RM000001 to RM000600
Data columns (total 14 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   room_number           600 non-null    int64 
 1   floor                 600 non-null    int64 
 2   type                  600 non-null    object
 3   square_feet           600 non-null    int64 
 4   basic_amenities       600 non-null    object
 5   additional_amenities  600 non-null    object
 6   max_occupancy         600 non-null    int64 
 7   bed_type              600 non-null    object
 8   view_type             600 non-null    object
 9   accessibility         600 non-null    bool  
 10  status                600 non-null    object
 11  last_renovation       600 non-null    object
 12  base_rate             600 non-null    int64 
 13  max_rate              600 non-null    int64 
dtypes: bool(1), int64(6), object(7)
memory usage: 66.2+ KB


In [9]:
df_rooms.describe()

Unnamed: 0,room_number,floor,square_feet,max_occupancy,base_rate,max_rate
count,600.0,600.0,600.0,600.0,600.0,600.0
mean,1065.5,10.5,887.535,3.55,1070.0,1712.5
std,577.174266,5.771093,641.182981,1.245023,1051.876454,1505.987632
min,101.0,1.0,350.0,2.0,350.0,500.0
25%,583.25,5.75,466.0,3.0,500.0,750.0
50%,1065.5,10.5,578.0,3.0,500.0,750.0
75%,1547.75,15.25,1024.0,4.0,1000.0,2000.0
max,2030.0,20.0,2999.0,6.0,3500.0,5000.0


In [10]:
df_rooms.shape[0]

600

In [11]:
df_rooms['type'].unique()

array(['Standard', 'Deluxe', 'Suite', 'Presidential Suite'], dtype=object)

In [12]:
df_rooms['status'].unique()

array(['Available', 'Occupied', 'Maintenance'], dtype=object)

In [13]:
df_rooms['bed_type'].unique()

array(['Queen', 'Double Queen', 'King', 'Double King', 'King + Sofa Bed',
       'King + Multiple Sofa Beds'], dtype=object)

In [14]:
df_rooms['max_occupancy'].unique()

array([2, 3, 4, 6])

In [15]:
df_rooms['basic_amenities'].unique()

array(["['Air Conditioning', 'Smart TV', 'Premium Coffee Maker', 'Mini Fridge', 'Hair Dryer', 'In-Room Safe', 'Work Desk', 'High-Speed WiFi', 'Bathrobes', 'Slippers']",
       '[\'Air Conditioning\', \'55" Smart TV\', \'Nespresso Machine\', \'Mini Fridge\', \'Hair Dryer\', \'In-Room Safe\', \'Work Desk\', \'High-Speed WiFi\', \'Bluetooth Speaker\', \'Microwave\', \'Premium Bathrobes\', \'Designer Slippers\', \'Evening Turndown Service\']',
       '[\'Air Conditioning\', \'65" Smart TV\', \'Nespresso Machine\', \'Full-Size Refrigerator\', \'Hair Dryer\', \'In-Room Safe\', \'Executive Work Desk\', \'Ultra-High-Speed WiFi\', \'Bose Sound System\', \'Microwave\', \'Premium Bathrobes\', \'Designer Slippers\', \'Kitchenette\', \'Dining Area\', \'Living Room Area\', \'Guest Bathroom\', \'Evening Turndown Service\', \'Welcome Amenity\']',
       '[\'Air Conditioning\', \'Multiple 75" Smart TVs\', \'Professional Coffee Bar\', \'Full-Size Refrigerator\', \'Hair Dryer\', \'In-Room Safe\', \'Execu

The columns basic_amenities and additional_amenities are string lists. Convert them to Python lists so we can use them.

In [16]:
import ast
df_rooms['basic_amenities'] = df_rooms['basic_amenities'].apply(ast.literal_eval)
df_rooms['additional_amenities'] = df_rooms['additional_amenities'].apply(ast.literal_eval)

In [17]:
df_rooms

Unnamed: 0_level_0,room_number,floor,type,square_feet,basic_amenities,additional_amenities,max_occupancy,bed_type,view_type,accessibility,status,last_renovation,base_rate,max_rate
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
RM000001,101,1,Standard,410,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[City View],2,Queen,City View,False,Available,2022-10-25,350,500
RM000002,102,1,Standard,361,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[Evening Turndown Service],2,Queen,Standard View,False,Available,2023-03-01,350,500
RM000003,103,1,Standard,431,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[Courtyard View],2,Double Queen,Courtyard View,False,Available,2022-08-10,350,500
RM000004,104,1,Standard,391,"[Air Conditioning, Smart TV, Premium Coffee Ma...","[City View, Courtyard View, Evening Turndown S...",2,Queen,Courtyard View,True,Occupied,2023-09-06,350,500
RM000005,105,1,Standard,373,"[Air Conditioning, Smart TV, Premium Coffee Ma...","[Courtyard View, City View, Evening Turndown S...",2,Double Queen,Evening Turndown Service,True,Occupied,2024-04-13,350,500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
RM000596,2026,20,Presidential Suite,2369,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Pool, Private Terrace, Luxury Car Ser...",6,King + Multiple Sofa Beds,Standard View,False,Maintenance,2024-10-01,3500,5000
RM000597,2027,20,Presidential Suite,2020,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Terrace, Grand Piano, Private Gym Equ...",6,King + Multiple Sofa Beds,Standard View,False,Maintenance,2024-04-17,3500,5000
RM000598,2028,20,Presidential Suite,1883,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Grand Piano, Private Terrace, Dedicated Conci...",6,King + Multiple Sofa Beds,Standard View,False,Occupied,2022-03-09,3500,5000
RM000599,2029,20,Presidential Suite,1643,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Terrace, Private Chef Available, Priv...",6,King + Multiple Sofa Beds,Standard View,False,Maintenance,2023-07-24,3500,5000


There are issues with many rooms' view_type that we'll have to correct.

In [18]:
df_rooms['view_type'].unique()

array(['City View', 'Standard View', 'Courtyard View',
       'Evening Turndown Service', 'Pool View', 'Ocean View', 'Balcony',
       'Lounge Access', 'Soaking Tub', 'Steam Shower', 'Wine Fridge',
       'Corner View', 'Butler Service', 'Jacuzzi Tub',
       'Wraparound Balcony', 'Executive Lounge Access', 'Private Pool',
       'Private Chef Available', 'Grand Piano', 'Luxury Car Service',
       'Steam Room', 'Private Gym Equipment', 'Private Butler Service',
       'Private Terrace', 'Sauna', 'Dedicated Concierge',
       'Panoramic Ocean View'], dtype=object)

240 of the 600 rooms have an incorrectly listed view_type

In [19]:
df_rooms.iloc[319]['additional_amenities']

['Ocean View', 'Soaking Tub', 'Lounge Access', 'Pool View', 'Balcony']

In [20]:
valid_views = {'City View', 'Standard View', 'Courtyard View', 'Pool View', 'Ocean View', 'Corner View', 'Panoramic Ocean View'}
df_rooms['view_type'][~df_rooms['view_type'].isin(valid_views)]

room_id
RM000005    Evening Turndown Service
RM000015    Evening Turndown Service
RM000017    Evening Turndown Service
RM000018    Evening Turndown Service
RM000025    Evening Turndown Service
                      ...           
RM000588                       Sauna
RM000590      Private Chef Available
RM000594       Private Gym Equipment
RM000595          Luxury Car Service
RM000600      Private Butler Service
Name: view_type, Length: 240, dtype: object

Correct the view_type of all the rooms, and add all the allowed view_type from additional_amenities to all rooms at the same time.

In [21]:
def get_room_views(row):
    # Note: uses valid_views from above
    room_views = set()
    if row['view_type'] in valid_views:
        room_views.add(row['view_type'])
        
    room_views |= set(row['additional_amenities']) & valid_views
    row['view_type'] = list(room_views)
    return row

df_rooms.apply(get_room_views, axis=1)

Unnamed: 0_level_0,room_number,floor,type,square_feet,basic_amenities,additional_amenities,max_occupancy,bed_type,view_type,accessibility,status,last_renovation,base_rate,max_rate
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
RM000001,101,1,Standard,410,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[City View],2,Queen,[City View],False,Available,2022-10-25,350,500
RM000002,102,1,Standard,361,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[Evening Turndown Service],2,Queen,[Standard View],False,Available,2023-03-01,350,500
RM000003,103,1,Standard,431,"[Air Conditioning, Smart TV, Premium Coffee Ma...",[Courtyard View],2,Double Queen,[Courtyard View],False,Available,2022-08-10,350,500
RM000004,104,1,Standard,391,"[Air Conditioning, Smart TV, Premium Coffee Ma...","[City View, Courtyard View, Evening Turndown S...",2,Queen,"[Courtyard View, City View]",True,Occupied,2023-09-06,350,500
RM000005,105,1,Standard,373,"[Air Conditioning, Smart TV, Premium Coffee Ma...","[Courtyard View, City View, Evening Turndown S...",2,Double Queen,"[Courtyard View, City View]",True,Occupied,2024-04-13,350,500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
RM000596,2026,20,Presidential Suite,2369,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Pool, Private Terrace, Luxury Car Ser...",6,King + Multiple Sofa Beds,[Standard View],False,Maintenance,2024-10-01,3500,5000
RM000597,2027,20,Presidential Suite,2020,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Terrace, Grand Piano, Private Gym Equ...",6,King + Multiple Sofa Beds,[Standard View],False,Maintenance,2024-04-17,3500,5000
RM000598,2028,20,Presidential Suite,1883,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Grand Piano, Private Terrace, Dedicated Conci...",6,King + Multiple Sofa Beds,[Standard View],False,Occupied,2022-03-09,3500,5000
RM000599,2029,20,Presidential Suite,1643,"[Air Conditioning, Multiple 75"" Smart TVs, Pro...","[Private Terrace, Private Chef Available, Priv...",6,King + Multiple Sofa Beds,[Standard View],False,Maintenance,2023-07-24,3500,5000


No rooms are lacking a view_type

In [22]:
df_rooms[df_rooms['view_type'].apply(len) < 1]

Unnamed: 0_level_0,room_number,floor,type,square_feet,basic_amenities,additional_amenities,max_occupancy,bed_type,view_type,accessibility,status,last_renovation,base_rate,max_rate
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1


## Customers

In [23]:
df_customers = pd.read_csv('../data/csv/customers.csv').set_index('customer_id')

Note that customers have preferences, which we may want to take into account when making recommendations. There's also a language listed. Many of them are obscure, so we can't support them. 

In [24]:
df_customers.head(10)

Unnamed: 0_level_0,first_name,last_name,email,phone,address,preferences,nationality,language,loyalty_tier
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,Robert,Smith,robert.smith@wyatt-huynh.com,001-625-164-8298,"698 Eddie Stravenue, East Marymouth, NH 45002","[""Turndown service"", ""Gluten-free options"", ""S...",Anguilla,Quechua,Gold
2,Tiffany,Garner,tiffany.garner@smith.com,001-924-161-9263x89503,"82496 Bob Expressway Suite 023, Amberburgh, MD...","[""King bed"", ""Kosher meals""]",Australia,Bihari languages,Titanium
3,Jennifer,Miller,jennifer.miller@mccormick.com,817.900.7596,"PSC 8351, Box 8217, APO AE 44482","[""Extra towels"", ""Late check-out"", ""Halal meal...",Djibouti,Guarani,Bronze
4,Frank,Roberts,frank.roberts@davis.info,(914)861-7623,"PSC 2055, Box 2622, APO AP 89621","[""Pet-friendly"", ""Anniversary celebration"", ""S...",Swaziland,Rundi,Bronze
5,Patricia,Choi,patricia.choi@holt.info,001-399-328-3594x2880,"58891 Eric Station Apt. 475, Parkerview, WA 65054","[""Turndown service"", ""Laundry service"", ""Valet...",Lesotho,Bashkir,Bronze
6,Joseph,Johnson,joseph.johnson@taylor-smith.com,1936724534,"94506 Ethan Divide Suite 356, North Edward, AZ...","[""Queen bed"", ""King bed"", ""Halal meals""]",India,Kalaallisut,Silver
7,Michelle,Allen,michelle.allen@hernandez-clark.org,+1-524-011-8388,"082 Chapman Tunnel Apt. 767, West Elainefurt, ...","[""Queen bed"", ""Memory foam pillows"", ""Vegetari...",Palestinian Territory,Swedish,Bronze
8,Andre,Harrison,andre.harrison@flores.com,(699)091-7119,"3817 Harrison Manors, East Mark, GA 50727","[""Room service"", ""Turndown service"", ""Food all...",Myanmar,Northern Sami,Silver
9,John,Chaney,john.chaney@bradshaw-mccullough.net,001-832-703-1181,"43757 Hart Run, Hancockhaven, NC 04449","[""Slippers"", ""Ocean view"", ""Non-smoking"", ""Lau...",Malawi,Sindhi,Titanium
10,Renee,Richardson,renee.richardson@snyder-miller.info,537-469-1239,"Unit 6492 Box 0840, DPO AA 44056","[""Non-smoking"", ""Airport shuttle""]",Puerto Rico,Tswana,Gold


Everything's non-null, so we're good to go.

In [25]:
df_customers.info()

<class 'pandas.core.frame.DataFrame'>
Index: 50000 entries, 1 to 50000
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   first_name    50000 non-null  object
 1   last_name     50000 non-null  object
 2   email         50000 non-null  object
 3   phone         50000 non-null  object
 4   address       50000 non-null  object
 5   preferences   50000 non-null  object
 6   nationality   50000 non-null  object
 7   language      50000 non-null  object
 8   loyalty_tier  50000 non-null  object
dtypes: object(9)
memory usage: 3.8+ MB


## Promotions

Will implement these if I have a chance. May be complicated if the LLM can't just understand what they're saying.

In [26]:
df_promos = pd.read_csv('../data/csv/promotions.csv').set_index('promo_id')

In [27]:
df_promos

Unnamed: 0_level_0,name,description,discount_type,discount_value,min_stay,applicable_room_types,start_date,end_date,blackout_dates,terms_conditions,booking_code,status
promo_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
PR000001,Early Bird Special,Book 60 days in advance and save 20% on your stay,Percentage,20,0,"Standard, Deluxe, Suite",2025-01-21,2025-05-24,,Cancellation policies apply; please refer to t...,PROMO4797,Ended
PR000002,Weekend Getaway,Special weekend rates including breakfast and ...,Package,15,2,"Deluxe, Suite",2025-01-19,2025-03-12,,Advance booking required; subject to availabil...,PROMO8026,Scheduled
PR000003,Summer Sale,Enjoy 25% off during summer months,Percentage,25,3,All,2025-01-08,2025-05-12,,Advance booking required; subject to availabil...,PROMO4398,Ended
PR000004,Long Stay Discount,Stay 7 nights or more and receive 30% off,Percentage,30,7,All,2025-01-11,2025-07-04,,Promotional rates are non-refundable and non-t...,PROMO8571,Scheduled
PR000005,Spa Package,$200 spa credit with minimum 2-night stay,Fixed Amount,200,2,"Deluxe, Suite, Presidential Suite",2025-01-31,2025-05-04,2025-01-04,Advance booking required; subject to availabil...,PROMO1558,Active


## FAQ

In [28]:
df_faq = pd.read_csv('../data/csv/faq_knowledge_base.csv').set_index('faq_id')

In [29]:
df_faq

Unnamed: 0_level_0,category,subcategory,question,answer,keywords,last_updated,helpful_votes,views
faq_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
FAQ000001,booking,reservations,How do I make a reservation?,You can make a reservation through our website...,"book, reserve, reservation, booking",2024-03-09,957,3813
FAQ000002,booking,check-in/out,What is the check-in/check-out time?,Check-in time is 3:00 PM and check-out time is...,"check in, check out, arrival, departure",2024-10-15,967,2793
FAQ000003,booking,modifications,Can I modify my booking?,"Yes, you can modify your booking through our w...","change, modify, update, edit",2024-12-05,443,1929
FAQ000004,booking,cancellations,What is the cancellation policy?,Free cancellation is available up to 48 hours ...,"cancel, refund, cancellation",2024-05-20,611,1300
FAQ000005,booking,special requests,Do you offer early check-in/late check-out?,"Subject to availability, early check-in and la...","early check-in, late check-out, extended stay",2024-12-18,239,3954
FAQ000006,amenities,facilities,What amenities are included?,"Our hotel offers complimentary Wi-Fi, fitness ...","amenities, facilities, services",2024-10-24,855,1860
FAQ000007,amenities,dining,Is breakfast included?,"Yes, breakfast is included with most room rate...","breakfast, dining, restaurant",2024-01-10,365,1943
FAQ000008,amenities,wellness,Do you have a fitness center?,"Yes, our fitness center is open 24/7 and featu...","fitness, gym, exercise",2024-10-05,988,1471
FAQ000009,amenities,business,Is there a swimming pool?,"Yes, we have both indoor and outdoor pools ope...","pool, swimming, recreation",2024-10-15,400,2881
FAQ000010,amenities,recreation,Do you offer spa services?,"Our full-service spa offers massages, facials,...","spa, wellness, treatments",2024-01-07,145,766
