
The purpose of this code is to build a recommendation system for tourism places based on user ratings and additional information about the places.


this code aims to build a recommendation system for tourism places using a collaborative filtering approach and includes both numerical and text data for the recommendation model. The neural network is used to learn embeddings for categorical variables, and the model is trained on the provided data.

 The recommendation is based on user ratings and additional textual information about the places.


The code imports necessary libraries such as Pandas for data manipulation, scikit-learn for label encoding, Keras for building a neural network, and Sastrawi for text preprocessing in Indonesian.

In [1]:
!pip install Sastrawi

Collecting Sastrawi
  Downloading Sastrawi-1.0.1-py2.py3-none-any.whl (209 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/209.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m204.8/209.7 kB[0m [31m6.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.7/209.7 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Sastrawi
Successfully installed Sastrawi-1.0.1


In [2]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from keras.models import Model
from keras.layers import Input, Embedding, Flatten, Dense, Concatenate
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory
from Sastrawi.StopWordRemover.StopWordRemoverFactory import StopWordRemoverFactory

Data from CSV files (tourism_rating.csv, tourism_with_id.csv, and user.csv) is loaded into Pandas DataFrames.


In [3]:
data_tourism_rating = pd.read_csv('tourism_rating.csv')
data_tourism_with_id = pd.read_csv('tourism_with_id.csv')
data_user = pd.read_csv('user.csv')

This code initializes three text processing tools commonly used in natural language processing (NLP) and text mining tasks.



*   TfidfVectorizer for creating TF-IDF representations of text data with a limit on the number of features.

*   A stemming tool from the Sastrawi library for reducing words to their base forms.

*   A stopword remover tool from the Sastrawi library for removing common words that might not contribute much to the analysis.







In [4]:
tv = TfidfVectorizer(max_features=5000)
stem = StemmerFactory().create_stemmer()
stopword = StopWordRemoverFactory().create_stop_word_remover()

 Unnecessary columns are dropped from data_tourism_with_id.

In [5]:
data_tourism_with_id.drop(['Time_Minutes', 'Coordinate', 'Unnamed: 11', 'Unnamed: 12'], axis=1, inplace=True)

Average ratings for each place are calculated and merged with the original tourism data based on the 'Place_Id'.
making new column on dataframe by merge average_ratings, data_tourism_with_id, on place_id column

In [6]:
average_ratings = data_tourism_rating.groupby('Place_Id')['Place_Ratings'].mean().reset_index()

In [15]:
data_rekomendasi = pd.merge(average_ratings, data_tourism_with_id, on='Place_Id')








*   Combine relevant text data ('Description' and 'Category') into a new 'Tags' column.
*   Drop unnecessary columns ('Price', 'Place_Ratings', 'Description').

*   Apply text preprocessing techniques, including stemming and stopword removal, to the 'Tags' column.






In [16]:
def preprocessing(data):
    if isinstance(data, str):  # Check if data is a string
        data = data.lower()
        data = stem.stem(data)
        data = stopword.remove(data)
    return data

data_tempat = data_rekomendasi.copy()
data_tempat['Tags'] = data_tempat['Description'] + ' ' + data_tempat['Category']
data_tempat.drop(['Price', 'Place_Ratings', 'Description'], axis=1, inplace=True)


In [9]:
data_tempat.Tags = data_tempat.Tags.apply(preprocessing)

Split the preprocessed data into training and testing sets using the train_test_split function.

In [10]:
train_data, test_data = train_test_split(data_tempat, test_size=0.2, random_state=42)


A neural network model is defined using Keras with three input layers, label encoding, embedding layers, and dense layers. The model is compiled, and then it is trained on the training dataset.

*   Define a neural network model using Keras with three input layers for categorical variables ('Place_Name', 'Category', 'City').

*   Apply label encoding to categorical variables and use embedding layers to learn dense representations for these variables.

*   Concatenate the embeddings and add dense layers to the model.


In [11]:
embedding_size = 10

input_place_name = Input(shape=(1,), name='place_name_input')
input_category = Input(shape=(1,), name='category_input')
input_city = Input(shape=(1,), name='city_input')


place_name_encoder = LabelEncoder()
category_encoder = LabelEncoder()
city_encoder = LabelEncoder()


train_data['Place_Name'] = place_name_encoder.fit_transform(train_data['Place_Name'])
train_data['Category'] = category_encoder.fit_transform(train_data['Category'])
train_data['City'] = city_encoder.fit_transform(train_data['City'])


embedding_place_name = Embedding(train_data['Place_Name'].nunique(), embedding_size)(input_place_name)
embedding_category = Embedding(train_data['Category'].nunique(), embedding_size)(input_category)
embedding_city = Embedding(train_data['City'].nunique(), embedding_size)(input_city)

flatten_place_name = Flatten()(embedding_place_name)
flatten_category = Flatten()(embedding_category)
flatten_city = Flatten()(embedding_city)

concatenated_inputs = Concatenate()([flatten_place_name, flatten_category, flatten_city])

In [12]:
dense1 = Dense(128, activation='relu')(concatenated_inputs)
dense2 = Dense(64, activation='relu')(dense1)
output_layer = Dense(1, activation='linear', name='output')(dense2)

embedding_model = Model(inputs=[input_place_name, input_category, input_city], outputs=output_layer)
embedding_model.compile(optimizer='adam', loss='mean_squared_error')

embedding_model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 place_name_input (InputLay  [(None, 1)]                  0         []                            
 er)                                                                                              
                                                                                                  
 category_input (InputLayer  [(None, 1)]                  0         []                            
 )                                                                                                
                                                                                                  
 city_input (InputLayer)     [(None, 1)]                  0         []                            
                                                                                              

Compile the model with the mean squared error loss function and the Adam optimizer.
Train the model on the training dataset for 10 epochs.

In [13]:
# Train the model
X_train = [train_data['Place_Name'].values, train_data['Category'].values, train_data['City'].values]
y_train = train_data['Rating'].values

embedding_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7931f9ace0b0>

this line is for saving few file tha for another source in this project

*   Save the trained neural network model to an HDF5 file (embedding_model.h5).
*   Save the test data to a pickle file (test_data.pkl).
*   Save the class labels learned during label encoding for 'Place_Name', 'Category', and 'City' to CSV files (place_name_encoder_classes.csv, category_encoder_classes.csv, city_encoder_classes.csv).






In [14]:
# Save the trained model to a file
embedding_model.save('embedding_model.h5')

# Save the test data to a pickle file
test_data.to_pickle('test_data.pkl')
place_name_encoder_df = pd.DataFrame({'class': place_name_encoder.classes_})
place_name_encoder_df.to_csv('place_name_encoder_classes.csv', index=False)

category_encoder_df = pd.DataFrame({'class': category_encoder.classes_})
category_encoder_df.to_csv('category_encoder_classes.csv', index=False)

city_encoder_df = pd.DataFrame({'class': city_encoder.classes_})
city_encoder_df.to_csv('city_encoder_classes.csv', index=False)

  saving_api.save_model(
