# Machine Learning Model Using Collaborative Filtering Method  
The model applies both item-based and user-based collaborative filtering.  

**Item-based**: This approach evaluates similarities between the items (tourist attractions).  
**User-based**: This approach evaluates similarities in users' interaction (rating) histories and the relationships between travel destinations.

## Import All Packages/Library

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras import regularizers
from tensorflow.keras.layers import Input, Embedding, Flatten, Dense, Concatenate, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping
from scipy.spatial.distance import cosine
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from tensorflow.keras.models import save_model
from google.colab import files
import pickle

## Access Dataset in Drive
**Objective:** Accessing the dataset from Google Drive is required for seamless integration with Google Colab during data processing.

In [None]:
# Mount drive to access dataset.
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
# Load the dataset from Drive.
dir = 'gdrive/My Drive/Internship/Portfolio/1_J-GO/'
data = pd.read_csv(dir+'data/data.csv')
data.head()

Unnamed: 0,place_id,place_name,description_id,category_id,description_en,category_en,price,place_rating,latitude,longitude,user_id,user_rating
0,1,Taman Pintar Yogyakarta,Taman Pintar Yogyakarta (bahasa Jawa: Hanacara...,Taman Pintar Yogyakarta (bahasa Jawa: Hanacara...,"Taman Pintar Yogyakarta (Javanese: Hanacaraka,...","Taman Pintar Yogyakarta (Javanese: Hanacaraka,...",6000,2.72,-7.800671,110.367655,2,4.0
1,1,Taman Pintar Yogyakarta,Taman Pintar Yogyakarta (bahasa Jawa: Hanacara...,Taman Pintar Yogyakarta (bahasa Jawa: Hanacara...,"Taman Pintar Yogyakarta (Javanese: Hanacaraka,...","Taman Pintar Yogyakarta (Javanese: Hanacaraka,...",6000,2.72,-7.800671,110.367655,23,4.0
2,1,Taman Pintar Yogyakarta,Taman Pintar Yogyakarta (bahasa Jawa: Hanacara...,Taman Pintar Yogyakarta (bahasa Jawa: Hanacara...,"Taman Pintar Yogyakarta (Javanese: Hanacaraka,...","Taman Pintar Yogyakarta (Javanese: Hanacaraka,...",6000,2.72,-7.800671,110.367655,25,2.0
3,1,Taman Pintar Yogyakarta,Taman Pintar Yogyakarta (bahasa Jawa: Hanacara...,Taman Pintar Yogyakarta (bahasa Jawa: Hanacara...,"Taman Pintar Yogyakarta (Javanese: Hanacaraka,...","Taman Pintar Yogyakarta (Javanese: Hanacaraka,...",6000,2.72,-7.800671,110.367655,39,5.0
4,1,Taman Pintar Yogyakarta,Taman Pintar Yogyakarta (bahasa Jawa: Hanacara...,Taman Pintar Yogyakarta (bahasa Jawa: Hanacara...,"Taman Pintar Yogyakarta (Javanese: Hanacaraka,...","Taman Pintar Yogyakarta (Javanese: Hanacaraka,...",6000,2.72,-7.800671,110.367655,43,4.0


## Collaborative Filtering

### Get and Convert Required Data  
**Objective:** Ensure the data is in a usable format before modeling.

In [None]:
# Get user ID, place name, and ratings data.
data = data[["user_id", "place_name", "place_rating"]]
data.head()

Unnamed: 0,user_id,place_name,place_rating
0,2,Taman Pintar Yogyakarta,2.72
1,23,Taman Pintar Yogyakarta,2.72
2,25,Taman Pintar Yogyakarta,2.72
3,39,Taman Pintar Yogyakarta,2.72
4,43,Taman Pintar Yogyakarta,2.72


In [None]:
# In similarity analysis, converting data from long to wide format is essential for easier comparison and computation.

# Before that, checking for and dropping duplicated rows is necessary to avoid errors.
data.duplicated().sum()

190

In [None]:
data.drop_duplicates(inplace=True)
data.duplicated().sum()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data.drop_duplicates(inplace=True)


0

In [None]:
# Since duplicated rows have already been handled, the data can now be converted to wide format.
data_wide = data.pivot(index="user_id",columns="place_name",values="place_rating")
data_wide.head()

place_name,ARTJOG MMXIX,Affandi Museum,Agro Tourism Bhumi Merapi,Air Terjun Banyu Nibo,Air Terjun Kedung Manglu,Air Terjun Kedung Pedut,Air Terjun Sindet,Air Terjun Sri Gethuk,Aisha tour planner & transport service,Alun Alun Selatan Yogyakarta,...,Wisata Kraton Jogja,Wisata Pangol Hill,Wisata Taman Kelinci Borobudur,Wisata Telaga Potorono,Wisata Watu Amben,XT Square,Yogyakarta Night Tours - Meeting Point Klasik : Historical Walking and Food Tour,bukit indah patuk,pantai Trisik,puncak bucu
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,


In [None]:
# Replace NaN values with 0.
# This ensures that missing values are treated as no interaction or no rating given by the user.
data_wide.fillna(0, inplace=True)
data_wide.head()

place_name,ARTJOG MMXIX,Affandi Museum,Agro Tourism Bhumi Merapi,Air Terjun Banyu Nibo,Air Terjun Kedung Manglu,Air Terjun Kedung Pedut,Air Terjun Sindet,Air Terjun Sri Gethuk,Aisha tour planner & transport service,Alun Alun Selatan Yogyakarta,...,Wisata Kraton Jogja,Wisata Pangol Hill,Wisata Taman Kelinci Borobudur,Wisata Telaga Potorono,Wisata Watu Amben,XT Square,Yogyakarta Night Tours - Meeting Point Klasik : Historical Walking and Food Tour,bukit indah patuk,pantai Trisik,puncak bucu
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Item-Based Collaborative Filtering


This approach does not consider users' data.

#### Adjusting Data

In [None]:
# Drop the user column in different dataframe.
data_placebased = data_wide.copy()
data_placebased = data_placebased.reset_index()
data_placebased = data_placebased.drop("user_id", axis=1)
data_placebased.head()

place_name,ARTJOG MMXIX,Affandi Museum,Agro Tourism Bhumi Merapi,Air Terjun Banyu Nibo,Air Terjun Kedung Manglu,Air Terjun Kedung Pedut,Air Terjun Sindet,Air Terjun Sri Gethuk,Aisha tour planner & transport service,Alun Alun Selatan Yogyakarta,...,Wisata Kraton Jogja,Wisata Pangol Hill,Wisata Taman Kelinci Borobudur,Wisata Telaga Potorono,Wisata Watu Amben,XT Square,Yogyakarta Night Tours - Meeting Point Klasik : Historical Walking and Food Tour,bukit indah patuk,pantai Trisik,puncak bucu
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Find Relations (Similarities) Between Tourist Attractions

In [None]:
# Create a dataframe for place data (place vs place) to find relations.
place_similarities = pd.DataFrame(index=data_placebased.columns,
                                columns=data_placebased.columns)
place_similarities.head()

place_name,ARTJOG MMXIX,Affandi Museum,Agro Tourism Bhumi Merapi,Air Terjun Banyu Nibo,Air Terjun Kedung Manglu,Air Terjun Kedung Pedut,Air Terjun Sindet,Air Terjun Sri Gethuk,Aisha tour planner & transport service,Alun Alun Selatan Yogyakarta,...,Wisata Kraton Jogja,Wisata Pangol Hill,Wisata Taman Kelinci Borobudur,Wisata Telaga Potorono,Wisata Watu Amben,XT Square,Yogyakarta Night Tours - Meeting Point Klasik : Historical Walking and Food Tour,bukit indah patuk,pantai Trisik,puncak bucu
place_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ARTJOG MMXIX,,,,,,,,,,,...,,,,,,,,,,
Affandi Museum,,,,,,,,,,,...,,,,,,,,,,
Agro Tourism Bhumi Merapi,,,,,,,,,,,...,,,,,,,,,,
Air Terjun Banyu Nibo,,,,,,,,,,,...,,,,,,,,,,
Air Terjun Kedung Manglu,,,,,,,,,,,...,,,,,,,,,,


The similarities will be calculated using cosine similarity.  

The resulting similarity ranges from −1 meaning exactly opposite, to 1 meaning exactly the same, with 0 indicating orthogonality (decorrelation), and in-between values indicating intermediate similarity or dissimilarity.  

Essentially, cosine similarity calculates the sum of the product of the first and second columns, then divides it by the product of the square roots of the sum of squares of each column.

In [None]:
# Calculate similarity between places.
for i in range(0,len(place_similarities.columns)) :
    # Loop through the columns for each column.
    for j in range(0,len(place_similarities.columns)) :
      # Fill in placeholder with cosine similarities.
      place_similarities.iloc[i,j] = 1-cosine(data_placebased.iloc[:,i],data_placebased.iloc[:,j])

In [None]:
place_similarities.head()

place_name,ARTJOG MMXIX,Affandi Museum,Agro Tourism Bhumi Merapi,Air Terjun Banyu Nibo,Air Terjun Kedung Manglu,Air Terjun Kedung Pedut,Air Terjun Sindet,Air Terjun Sri Gethuk,Aisha tour planner & transport service,Alun Alun Selatan Yogyakarta,...,Wisata Kraton Jogja,Wisata Pangol Hill,Wisata Taman Kelinci Borobudur,Wisata Telaga Potorono,Wisata Watu Amben,XT Square,Yogyakarta Night Tours - Meeting Point Klasik : Historical Walking and Food Tour,bukit indah patuk,pantai Trisik,puncak bucu
place_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ARTJOG MMXIX,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Affandi Museum,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Agro Tourism Bhumi Merapi,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Air Terjun Banyu Nibo,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Air Terjun Kedung Manglu,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Display Most Similar Tourist Attractions

With the similarity matrix filled out, each place's *neighbour* (i.e., closely similar place) can be found by looping through **place_similarities**, sorting each column in descending order, and grabbing the names of the top neighbours. In this case, I use the top 10.

In [None]:
# Looking for neighbour data based on the similarity matrix.
data_neighbours = pd.DataFrame(index=place_similarities.columns,columns=range(1,11))

# Loop through our similarity dataframe and fill in neighbouring place names.
for i in range(0,len(place_similarities.columns)):
    data_neighbours.iloc[i,:10] = place_similarities.iloc[0:,i].sort_values(ascending=False)[:10].index # Display the top 10 neighbours

data_neighbours

Unnamed: 0_level_0,1,2,3,4,5,6,7,8,9,10
place_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
ARTJOG MMXIX,ARTJOG MMXIX,Pantai Ngrenehan,Pantai Sedahan,Pantai Sanglen,Pantai Samas,Pantai Sadranan,Pantai Pulang Sawal,Pantai Pok Tunggal,Pantai Patihan,Pantai Pasir Puncu
Affandi Museum,Affandi Museum,Pantai Patihan,Pantai Nguluran,Pantai Parangkusumo,Pantai Parangracuk,Pantai Parangtritis,Pantai Pasir Mendit,Pantai Pasir Puncu,Pantai Pok Tunggal,Puncak Kuda Sembrani-Desa Wisata Banjarasri
Agro Tourism Bhumi Merapi,Agro Tourism Bhumi Merapi,ARTJOG MMXIX,Pantai Pok Tunggal,Pantai Parangkusumo,Pantai Parangracuk,Pantai Parangtritis,Pantai Pasir Mendit,Pantai Pasir Puncu,Pantai Patihan,Pantai Pulang Sawal
Air Terjun Banyu Nibo,Air Terjun Banyu Nibo,ARTJOG MMXIX,Pantai Pok Tunggal,Pantai Parangkusumo,Pantai Parangracuk,Pantai Parangtritis,Pantai Pasir Mendit,Pantai Pasir Puncu,Pantai Patihan,Pantai Pulang Sawal
Air Terjun Kedung Manglu,Air Terjun Kedung Manglu,ARTJOG MMXIX,Pantai Pok Tunggal,Pantai Parangkusumo,Pantai Parangracuk,Pantai Parangtritis,Pantai Pasir Mendit,Pantai Pasir Puncu,Pantai Patihan,Pantai Pulang Sawal
...,...,...,...,...,...,...,...,...,...,...
XT Square,XT Square,ARTJOG MMXIX,Pantai Patihan,Pantai Nguluran,Pantai Parangkusumo,Pantai Parangracuk,Pantai Parangtritis,Pantai Pasir Mendit,Pantai Pasir Puncu,Pantai Pok Tunggal
Yogyakarta Night Tours - Meeting Point Klasik : Historical Walking and Food Tour,Yogyakarta Night Tours - Meeting Point Klasik ...,ARTJOG MMXIX,Pantai Patihan,Pantai Nguluran,Pantai Parangkusumo,Pantai Parangracuk,Pantai Parangtritis,Pantai Pasir Mendit,Pantai Pasir Puncu,Pantai Pok Tunggal
bukit indah patuk,bukit indah patuk,ARTJOG MMXIX,Pantai Patihan,Pantai Nguluran,Pantai Parangkusumo,Pantai Parangracuk,Pantai Parangtritis,Pantai Pasir Mendit,Pantai Pasir Puncu,Pantai Pok Tunggal
pantai Trisik,pantai Trisik,ARTJOG MMXIX,Pantai Patihan,Pantai Nguluran,Pantai Parangkusumo,Pantai Parangracuk,Pantai Parangtritis,Pantai Pasir Mendit,Pantai Pasir Puncu,Pantai Pok Tunggal


### User-Based Collaborative Filtering

#### Get Similarity Scores

In [None]:
# Helper function to get similarity scores.
def getScore(history, similarities):
   return sum(history*similarities)/sum(similarities)

In [None]:
# Reset wide data index to start from 0 and store to user-based dataframe.
data_userbased = data_wide.reset_index()
data_userbased.head()

place_name,user_id,ARTJOG MMXIX,Affandi Museum,Agro Tourism Bhumi Merapi,Air Terjun Banyu Nibo,Air Terjun Kedung Manglu,Air Terjun Kedung Pedut,Air Terjun Sindet,Air Terjun Sri Gethuk,Aisha tour planner & transport service,...,Wisata Kraton Jogja,Wisata Pangol Hill,Wisata Taman Kelinci Borobudur,Wisata Telaga Potorono,Wisata Watu Amben,XT Square,Yogyakarta Night Tours - Meeting Point Klasik : Historical Walking and Food Tour,bukit indah patuk,pantai Trisik,puncak bucu
0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
# Create a place holder matrix for similarities between user and fill in the user ID column.
user_similarities = pd.DataFrame(index=data_userbased.index,columns=data_userbased.columns)
user_similarities.iloc[:,:1] = data_userbased.iloc[:,:1]
user_similarities.head()

place_name,user_id,ARTJOG MMXIX,Affandi Museum,Agro Tourism Bhumi Merapi,Air Terjun Banyu Nibo,Air Terjun Kedung Manglu,Air Terjun Kedung Pedut,Air Terjun Sindet,Air Terjun Sri Gethuk,Aisha tour planner & transport service,...,Wisata Kraton Jogja,Wisata Pangol Hill,Wisata Taman Kelinci Borobudur,Wisata Telaga Potorono,Wisata Watu Amben,XT Square,Yogyakarta Night Tours - Meeting Point Klasik : Historical Walking and Food Tour,bukit indah patuk,pantai Trisik,puncak bucu
0,1,,,,,,,,,,...,,,,,,,,,,
1,2,,,,,,,,,,...,,,,,,,,,,
2,3,,,,,,,,,,...,,,,,,,,,,
3,4,,,,,,,,,,...,,,,,,,,,,
4,5,,,,,,,,,,...,,,,,,,,,,


#### Processing the First 500 Users (Rows)

In [None]:
# New dataframe for the first 500 users (rows) and all places (columns) of the data_userbased dataframe.
data_userbased_500 = data_userbased.iloc[:500,:]

# New dataframe for the first 500 users (rows) and all places (columns) of the user_similarities dataframe.
user_similarities_500 = user_similarities.iloc[:500,:]
# Run for only 500 users because it might be too slow beyond that.

In [None]:
# Iterate through each data in the matrix.
for i in range(0,len(user_similarities_500.index)):
    for j in range(1,len(user_similarities_500.columns)):

        # Get the current user and tourism (place).
        user = user_similarities_500.index[i]
        tourism = user_similarities_500.columns[j]

        # If the user has already rated the place, set the predicted rating to 0.
        if data_userbased_500.iloc[i][j] == 1:
            user_similarities_500.iloc[i][j] = 0
        # If the user has not rated the place, predict the rating.
        else:
            # Get the top 10 similar places to the current tourism.
            tourism_top_names = data_neighbours.loc[tourism][1:10]
            # Get the similarity scores between the current tourism and its top 10 similar places.
            tourism_top_sims = place_similarities.loc[tourism].sort_values(ascending=False)[1:10]
            # Get the ratings the user has given to those similar places.
            user_rated = data_placebased.loc[user,tourism_top_names]

            # Calculate the predicted rating using the getScore function.
            user_similarities_500.iloc[i][j] = getScore(user_rated,tourism_top_sims)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  user_similarities_500.iloc[i][j] = getScore(user_rated,tourism_top_sims)
  user_similarities_500.iloc[i][j] = getScore(user_rated,tourism_top_sims)
  if data_userbased_500.iloc[i][j] == 1:
  return sum(history*similarities)/sum(similarities)
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure thi

### User Recommendation

In [None]:
# Get the top places for user.
data_recommend = pd.DataFrame(index=user_similarities.index, columns=['user_id','1','2','3','4','5','6'])
data_recommend.iloc[0:,0] = user_similarities.iloc[:,0]

In [None]:
# Instead of top places scores, show names.
for i in range(0,len(user_similarities.index)):
    data_recommend.iloc[i,1:] = user_similarities.iloc[i,:].sort_values(ascending=False).iloc[1:7,].index.transpose()

In [None]:
# Print samples.
print (data_recommend.iloc[:10,:4])

  user_id                                    1  \
0       1                       Pantai Patihan   
1       2                   Keraton Yogyakarta   
2       3              Taman Budaya Yogyakarta   
3       4  Museum Benteng Vredeburg Yogyakarta   
4       5                         Pantai Samas   
5       6  Museum Benteng Vredeburg Yogyakarta   
6       7                     Situs Warungboto   
7       8                 Studio Alam Gamplong   
8       9           Sindu Kusuma Edupark (SKE)   
9      10                         Pantai Jogan   

                                     2                                    3  
0  Desa Wisata Rumah Domes/Teletubbies                The Lost World Castle  
1              Taman Pintar Yogyakarta                      Candi Prambanan  
2          Bukit Paralayang, Watugupit                   Keraton Yogyakarta  
3   Alun-alun Utara Keraton Yogyakarta              Bukit Wisata Pulepayung  
4                 Desa Wisata Pulesari                     

## Machine Learning Model

### Generate Training Data

In [None]:
train_users = []
train_places = []
train_ratings = []

In [None]:
# Create a mapping from place names to numerical indices.
item_index = {tourism: i for i, tourism in enumerate(user_similarities_500.columns[1:])}

for i in range(len(user_similarities_500.index)):
    for j in range(1, len(user_similarities_500.columns)):
        user = user_similarities_500.index[i]
        tourism = user_similarities_500.columns[j]
        score = user_similarities_500.iloc[i, j]
        # Only include meaningful scores in training.
        # Check if the score is not NaN and is a valid number.
        if not np.isnan(score) and np.isfinite(score):
            train_users.append(user)
            # Use the numerical index instead of the place name.
            train_places.append(item_index[tourism])
            train_ratings.append(score)

In [None]:
# Convert to numpy arrays.
train_users = np.array(train_users)
train_places = np.array(train_places)
train_ratings = np.array(train_ratings)

### Input Training Data to Model

In [None]:
# Number of users and places.
n_users = len(user_similarities_500.index)
n_places = len(user_similarities_500.columns) - 1

In [None]:
# Define inputs.
user_input = Input(shape=(1,))
place_input = Input(shape=(1,))

### Layers of the Model

In [None]:
# Embeddings for users and places with L2 regularization to prevent overfitting.
user_embedding = Embedding(input_dim=n_users, output_dim=200, embeddings_regularizer=regularizers.l2(0.001))(user_input)
place_embedding = Embedding(input_dim=n_places, output_dim=200, embeddings_regularizer=regularizers.l2(0.001))(place_input)

In [None]:
# Flatten embeddings.
user_vec = Flatten()(user_embedding)
place_vec = Flatten()(place_embedding)

# Combine embeddings.
x = Concatenate()([user_vec, place_vec])

In [None]:
# Fully connected layers with Dropout and L2 regularization.
x = Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = Dropout(0.2)(x)  # Dropout to prevent overfitting.
x = Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.001))(x)
x = layers.BatchNormalization()(x) # Improve training.
x = Dropout(0.2)(x)  # Dropout to prevent overfitting.

output = Dense(1)(x)

### Compile The Model

In [None]:
# Compile the model.
optimizer = tf.keras.optimizers.Adam(learning_rate=0.00005)
model = Model(inputs=[user_input, place_input], outputs=output)
model.compile(optimizer=optimizer, loss='mean_squared_error', metrics=['mae'])

### Train and Evaluate The Model

In [None]:
# Define EarlyStopping to monitor validation loss and stop training if it doesn't improve.
# This helps avoid overfitting when the model starts to memorize the training data.
early_stopping = EarlyStopping(monitor='val_mae', patience=10, restore_best_weights=True)

# Train the model.
history = model.fit(
    [train_users, train_places],
    train_ratings,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stopping]
)

Epoch 1/50
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 8ms/step - loss: 0.6592 - mae: 0.3897 - val_loss: 0.4416 - val_mae: 0.2404
Epoch 2/50
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 6ms/step - loss: 0.4290 - mae: 0.2417 - val_loss: 0.4034 - val_mae: 0.2416
Epoch 3/50
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 8ms/step - loss: 0.3838 - mae: 0.2326 - val_loss: 0.3591 - val_mae: 0.2417
Epoch 4/50
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 8ms/step - loss: 0.3332 - mae: 0.2297 - val_loss: 0.3046 - val_mae: 0.2414
Epoch 5/50
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 8ms/step - loss: 0.2748 - mae: 0.2262 - val_loss: 0.2500 - val_mae: 0.2412
Epoch 6/50
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 8ms/step - loss: 0.2217 - mae: 0.2256 - val_loss: 0.2044 - val_mae: 0.2412
Epoch 7/50
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [

In [None]:
# Evaluate the model
predicted_ratings = model.predict([train_users, train_places])

# Remove NaN values from predicted ratings before calculation
# This step is added as a precaution in case the model predicts NaNs
mask = np.isfinite(predicted_ratings.flatten())
predicted_ratings = predicted_ratings[mask]
train_ratings = train_ratings[mask]

# Calculate RMSE
rmse = np.sqrt(mean_squared_error(train_ratings, predicted_ratings))

# Calculate MAE
mae = mean_absolute_error(train_ratings, predicted_ratings)

print(f"Root Mean Squared Error (RMSE): {rmse}")
print(f"Mean Absolute Error (MAE): {mae}")

[1m1969/1969[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step
Root Mean Squared Error (RMSE): 0.22269916424218514
Mean Absolute Error (MAE): 0.15909006009783125


**Interpreting the results:**
-  RMSE: This metric represents the average difference between predicted ratings and actual ratings. A lower RMSE indicates better accuracy. Ideally, RMSE should be below 1 in a rating prediction scenario where user ratings are typically on an integer scale of 1 to 5. In this case, I aimed to achieve an RMSE below 0.25 to minimize the error.
- MAE: This metric represents the average absolute difference between predicted and actual ratings. It is less sensitive to outliers compared to RMSE. A lower MAE also indicates better accuracy. Similar to RMSE, an MAE value below 1 is generally considered good for a 1-5 user rating scale. In this case, I also aimed for an MAE below 0.25 to minimize the error.

## Save The Model

In [None]:
# The model saved to H5 and Pickle files that is common for deploying machine learning model.

# Save model to H5 file
model.save(dir+'model/jgo.h5')
files.download(dir+'model/jgo.h5')

# Save model to Pickle file
with open(dir+'model/jgo.pkl', 'wb') as f:
    pickle.dump(model, f)
files.download(dir+'model/jgo.pkl')



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>