<a href="https://colab.research.google.com/github/hallosayaimroatubelajargithub/sistemrekomendasi/blob/main/content_based_filtering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Content Based Filtering : Hotel Bandung**

In [3]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords
import re
import random

df = pd.read_csv("https://raw.githubusercontent.com/hallosayaimroatubelajargithub/sistemrekomendasi/main/dataset/hotel_bandung_english.csv")
df.head()

Unnamed: 0,name,address,description
0,Capital O 253 Topas Galeria Hotel,"Jl. Dr. Djundjunan No. 153, 40173 Bandung, Ind...","A 10-minute drive from Bandung Airport, Topas ..."
1,Sheraton Bandung Hotel & Towers,"Jl. Ir H Juanda 390, 40135 Bandung, Indonesia",Sheraton Hotel & Towers offers 5-star accommod...
2,OYO 794 Ln 9 Bandung Residence,"Jalan Lemahnendeut No 9, Sukajadi, 40164 Bandu...","Conveniently located in Sukajadi, Bandung, OYO..."
3,OYO 226 LJ hotel,"Jl. Malabar No.2, Malabar, Lengkong, Dago, Asi...","Featuring a shared lounge, OYO 226 LJ hotel is..."
4,OYO 230 Maleo Residence,"JI. Dangeur Indah II No. 15, Sukagalih, Sukaja...",Attractively set in the Sukajadi district of B...


1. **Ikhtisar**

In [4]:
df.describe()

Unnamed: 0,name,address,description
count,105,105,105
unique,101,102,103
top,OYO 794 Ln 9 Bandung Residence,"Jalan Lemahnendeut No 9, Sukajadi, 40164 Bandu...","Conveniently located in Sukajadi, Bandung, OYO..."
freq,3,3,2


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105 entries, 0 to 104
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   name         105 non-null    object
 1   address      105 non-null    object
 2   description  105 non-null    object
dtypes: object(3)
memory usage: 2.6+ KB


**2. Deskripsi Hotel (Sebelum Preprocessing)**

In [6]:
def print_description(index):
    example = df[df.index == index][['description', 'name', 'address']].values[0]
    if len(example) > 0:
        print(example[0])
        print('Nama:', example[1])
        print('Alamat:', example[2])

In [7]:
print_description(1)

Sheraton Hotel & Towers offers 5-star accommodation in the middle of a green landscape in Bandung. All spacious rooms come with a flat-screen cable TV. The hotel offers an outdoor pool, spa center and restaurant with mountain views. Wi-Fi access is available free in all areas of the hotel. Elegant rooms have modern interiors, light wood furnishings and large windows. Each provides a comfortable seating area, DVD player and private bathroom with shower. You can work out in the gym or enjoy body treatments at the spa. Reception staff are ready to serve your needs for 24 hours. International and Asian dishes are offered at Feast Restaurant, while soft drinks are served at Samsara Lounge. A variety of cocktails and snacks are also available at Poolside Terrace. Sheraton Bandung Hotel & Towers is a 10-minute drive from Juanda Culture Park and Dago area, where various factory outlets are located. Husein Sastranegara Airport is a 30-minute drive away.
Nama: Sheraton Bandung Hotel & Towers
Ala

In [8]:
print_description(50)

Featuring an outdoor pool and a restaurant, House-Sangkuriang is conveniently located just a 5-minute walk from Dago’s factory outlets. It has a 24-hour front desk and provides free Wi-Fi access in all areas. Elegant and warmly lit, the air-conditioned rooms in House-Sangkuriang include hardwood floors. A flat-screen satellite TV, an electric kettle and a free one-time minibar are among the in-room comforts, and a shower, slippers and a hairdryer are included in the private bathrooms. The hotel also serves daily afternoon tea in the lobby and on the pool terrace. Cihampelas Walk Mall is a 10-minute drive from the property, and Husein Sastranegara Airport is a 20-minute drive away. Airport transportation can be arranged upon request. The staff at the front desk can assist with valet parking and luggage storage. Housing a business center, the hotel also provides laundry service for a fee. International dishes are served at Dining Room. Guests can also dine in the comfort of their rooms.


In [9]:
print_description(89)

With Stasiun Hall Bus Terminal reachable in a 4-minute walk, neo MORITZ Homestay has accommodations, a restaurant, a garden, a bar and a terrace. Guests wishing to travel light can make use of Towels/Sheets (extra fee). A halal breakfast is available every morning at the family stay. Merdeka Palace is a 17-minute walk from neo MORITZ Homestay, while Braga City Walk is 1.4 km from the property. The nearest airport is Husein Sastranegara Airport, 5.1 km from the accommodation.
Nama: Neo MORITZ Homestay
Alamat: Jl. Kebon Jati No. 35 Luxor Permai Complex Behind the Market, 40181 Bandung, Indonesia


**3. Text Preprocessing**

In [17]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [19]:
clean_spcl = re.compile('[/(){}\[\]\|@,;]')
clean_symbol = re.compile('[^0-9a-z #+_]')
# stopworda = set(stopwords.words('english'))

def clean_text(text):
    """
        text: a string
        
        return: modified initial string
    """
    text = text.lower() # lowercase text
    text = clean_spcl.sub(' ', text)
    text = clean_symbol.sub('', text)
    # text = ' '.join(word for word in text.split() if word not in stopworda) # hapus stopword dari kolom deskripsi
    return text
    
df['desc_clean'] = df['description'].apply(clean_text)

In [20]:
df.head()

Unnamed: 0,name,address,description,desc_clean
0,Capital O 253 Topas Galeria Hotel,"Jl. Dr. Djundjunan No. 153, 40173 Bandung, Ind...","A 10-minute drive from Bandung Airport, Topas ...",a 10minute drive from bandung airport topas g...
1,Sheraton Bandung Hotel & Towers,"Jl. Ir H Juanda 390, 40135 Bandung, Indonesia",Sheraton Hotel & Towers offers 5-star accommod...,sheraton hotel towers offers 5star accommodat...
2,OYO 794 Ln 9 Bandung Residence,"Jalan Lemahnendeut No 9, Sukajadi, 40164 Bandu...","Conveniently located in Sukajadi, Bandung, OYO...",conveniently located in sukajadi bandung oyo...
3,OYO 226 LJ hotel,"Jl. Malabar No.2, Malabar, Lengkong, Dago, Asi...","Featuring a shared lounge, OYO 226 LJ hotel is...",featuring a shared lounge oyo 226 lj hotel is...
4,OYO 230 Maleo Residence,"JI. Dangeur Indah II No. 15, Sukagalih, Sukaja...",Attractively set in the Sukajadi district of B...,attractively set in the sukajadi district of b...


**4. Deskripsi Hotel (Setelah Preprocessing)**

In [21]:
# Deskripsi kedua (Setelah preprocessing)
def print_description_clean(index):
    example = df[df.index == index][['desc_clean', 'name', 'address']].values[0]
    if len(example) > 0:
        print(example[0])
        print('Nama:', example[1])
        print('Alamat:', example[2])

In [22]:
print_description_clean(1)

sheraton hotel  towers offers 5star accommodation in the middle of a green landscape in bandungall spacious rooms come with a flatscreen cable tvthe hotel offers an outdoor pool  spa center and restaurant with mountain viewswifi access is available free in all areas of the hotelelegant rooms have modern interiors  light wood furnishings and large windowseach provides a comfortable seating area  dvd player and private bathroom with showeryou can work out in the gym or enjoy body treatments at the spareception staff are ready to serve your needs for 24 hoursinternational and asian dishes are offered at feast restaurant  while soft drinks are served at samsara loungea variety of cocktails and snacks are also available at poolside terracesheraton bandung hotel  towers is a 10minute drive from juanda culture park and dago area  where various factory outlets are locatedhusein sastranegara airport is a 30minute drive away
Nama: Sheraton Bandung Hotel & Towers
Alamat: Jl. Ir H Juanda 390, 4013

In [23]:
print_description_clean(50)

featuring an outdoor pool and a restaurant  housesangkuriang is conveniently located just a 5minute walk from dagos factory outlets it has a 24hour front desk and provides free wifi access in all areas elegant and warmly lit  the airconditioned rooms in housesangkuriang include hardwood floors a flatscreen satellite tv  an electric kettle and a free onetime minibar are among the inroom comforts  and a shower  slippers and a hairdryer are included in the private bathrooms the hotel also serves daily afternoon tea in the lobby and on the pool terrace cihampelas walk mall is a 10minute drive from the property  and husein sastranegara airport is a 20minute drive away airport transportation can be arranged upon request the staff at the front desk can assist with valet parking and luggage storage housing a business center  the hotel also provides laundry service for a fee international dishes are served at dining room guests can also dine in the comfort of their rooms
Nama: House Sangkuriang

In [24]:
print_description(89)

With Stasiun Hall Bus Terminal reachable in a 4-minute walk, neo MORITZ Homestay has accommodations, a restaurant, a garden, a bar and a terrace. Guests wishing to travel light can make use of Towels/Sheets (extra fee). A halal breakfast is available every morning at the family stay. Merdeka Palace is a 17-minute walk from neo MORITZ Homestay, while Braga City Walk is 1.4 km from the property. The nearest airport is Husein Sastranegara Airport, 5.1 km from the accommodation.
Nama: Neo MORITZ Homestay
Alamat: Jl. Kebon Jati No. 35 Luxor Permai Complex Behind the Market, 40181 Bandung, Indonesia


**5. TF-IDF & Cosine Similarity**

In [25]:
df.set_index('name', inplace=True)
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(df['desc_clean'])
cos_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
cos_sim

array([[1.        , 0.02250818, 0.01254879, ..., 0.01044102, 0.04017144,
        0.03531754],
       [0.02250818, 1.        , 0.01040992, ..., 0.01269843, 0.02856891,
        0.01847406],
       [0.01254879, 0.01040992, 1.        , ..., 0.12575247, 0.01082423,
        0.02511644],
       ...,
       [0.01044102, 0.01269843, 0.12575247, ..., 1.        , 0.01065003,
        0.02392556],
       [0.04017144, 0.02856891, 0.01082423, ..., 0.01065003, 1.        ,
        0.03826221],
       [0.03531754, 0.01847406, 0.02511644, ..., 0.02392556, 0.03826221,
        1.        ]])

In [26]:
# Set index utama di kolom 'name'
indices = pd.Series(df.index)
indices[:50]

0                Capital O 253 Topas Galeria Hotel
1                  Sheraton Bandung Hotel & Towers
2                   OYO 794 Ln 9 Bandung Residence
3                                 OYO 226 LJ hotel
4                          OYO 230 Maleo Residence
5                        OYO 167 Dago's Hill Hotel
6                   OYO 794 Ln 9 Bandung Residence
7                       OYO 196 Horizone Residence
8     OYO 483 Flagship Tamansari Panoramic Bandung
9               OYO 295 Grha Ciumbuleuit Residence
10                            OYO 193 SM Residence
11              Capital O 874 Hotel Nyland Pasteur
12                            OYO 352 Sabang Hotel
13                                  Hilton Bandung
14             InterContinental Bandung Dago Pakar
15                                Aryaduta Bandung
16               Art Deco Luxury Hotel & Residence
17                            Crowne Plaza Bandung
18          Best Western Premier La Grande Bandung
19                         éL R

**6. Modelling**

In [27]:
def recommendations(name, cos_sim = cos_sim):
    
    recommended_hotel = []
    
    # Mengambil nama hotel berdasarkan variabel indicies
    idx = indices[indices == name].index[0]

    # Membuat series berdasarkan skor kesamaan
    score_series = pd.Series(cos_sim[idx]).sort_values(ascending = False)

    # mengambil index dan dibuat 10 baris rekomendasi terbaik
    top_10_indexes = list(score_series.iloc[1:11].index)
    
    for i in top_10_indexes:
        recommended_hotel.append(list(df.index)[i])
        
    return recommended_hotel

**7. Prediksi**

In [28]:
recommendations('Benua Hotel')

['FOX Lite Hotel Metro Indah Bandung',
 'InterContinental Bandung Dago Pakar',
 'Zest Sukajadi Hotel Bandung',
 'M Premiere Hotel Dago Bandung',
 'Ibis Bandung Pasteur',
 'Serela Cihampelas Hotel',
 'Grand Cordela Hotel Bandung ',
 'Favehotel Hyper Square',
 'HARRIS Hotel & Conventions Ciumbuleuit - Bandung',
 'Hemangini Hotel Bandung']

In [29]:
recommendations("Serela Cihampelas Hotel")

['Vio Cihampelas',
 'Grand Sovia Hotel',
 'Neo Dipatiukur Bandung',
 'Grand Tjokro Bandung',
 'HARRIS Hotel & Conventions Ciumbuleuit - Bandung',
 'InterContinental Bandung Dago Pakar',
 'Ibis Bandung Pasteur',
 'Tebu Hotel Bandung',
 'Benua Hotel',
 'Aryaduta Bandung']

In [30]:
recommendations("Ibis Bandung Pasteur")

['Aston Pasteur',
 'Neo Dipatiukur Bandung',
 'De JAVA Hotel Bandung',
 'OYO 193 SM Residence',
 'InterContinental Bandung Dago Pakar',
 'Ibis Budget Bandung Asia Africa',
 'Garden Permata Hotel',
 'HARRIS Hotel & Conventions Ciumbuleuit - Bandung',
 'The Luxton Bandung',
 'Favehotel Braga']

**And Then HAHAHA Finish**