# Acknowledgements
---

Dataset ini diambil dari [https://www.kaggle.com](https://www.kaggle.com/datasets/ruchi798/bookcrossing-dataset)

# Import Dataset from Kaggle
---

Import dataset terlebih dulu lalu di unzip

In [1]:
# from google.colab import drive

# drive.mount('/content/drive')

In [2]:
# !mkdir ~/.kaggle
# !cp /content/drive/MyDrive/kaggle.json ~/.kaggle
# !chmod 600 ~/.kaggle/kaggle.json

In [3]:
# !kaggle datasets download -d ruchi798/bookcrossing-dataset

In [4]:
# !unzip bookcrossing-dataset.zip

# Import Library for Exploratory Data Analysis

---

Import library yang akan digunakan untuk data analisis, data visualisasi, data preprocessing dan modeling

In [5]:
# library for data loading and data analysis
import pandas as pd
import numpy as np

# library for data visualization
from matplotlib import pyplot as plt
import seaborn as sns
%matplotlib inline

# Data Loading

In [6]:
path = 'Books Data with Category Language and Summary/Preprocessed_data.csv'
df = pd.read_csv(path, index_col=[0])
df.head()

Unnamed: 0,user_id,location,age,isbn,rating,book_title,book_author,year_of_publication,publisher,img_s,img_m,img_l,Summary,Language,Category,city,state,country
0,2,"stockton, california, usa",18.0,195153448,0,Classical Mythology,Mark P. O. Morford,2002.0,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,Provides an introduction to classical myths pl...,en,['Social Science'],stockton,california,usa
1,8,"timmins, ontario, canada",34.7439,2005018,5,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],timmins,ontario,canada
2,11400,"ottawa, ontario, canada",49.0,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],ottawa,ontario,canada
3,11676,"n/a, n/a, n/a",34.7439,2005018,8,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],,,
4,41385,"sudbury, ontario, canada",34.7439,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],sudbury,ontario,canada


In [7]:
# check the shape of dataframe
print(f'The data has {df.shape[0]} records and {df.shape[1]} columns.')

The data has 1031175 records and 18 columns.


# Exploratory Data Analysis
---

Dataset ini memiliki 1.031.175 baris data dan 18 kolom :
* user_id : merupakan id dari user
* location : merupakan lokasi user tinggal
* age : merupakan umur dari user
* isbn : merupakan kode pengidentifikasi buku
* rating : merupakan rating yang user berikan untuk buku
* book_title : merupakan judul dari buku
* book_author : merupakan penulis dari buku
* year_of_publication : merupakan tahun publikasi buku
* publisher : merupakan penerbit buku
* img_s/img_m/img_l : merupakan cover dari buku
* summary : merupakan sinopsis dari buku
* language : merupakan bahasa terjemahan buku
* category : merupakan kategori buku
* city : merupakan kota buku tersebut dibeli
* state : merupakan provinsi buku tersebut dibeli
* country : merupakan negara buku tersebut dibeli

### Mengecek data apakaha memiliki missing value atau tidak

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1031175 entries, 0 to 1031174
Data columns (total 18 columns):
 #   Column               Non-Null Count    Dtype  
---  ------               --------------    -----  
 0   user_id              1031175 non-null  int64  
 1   location             1031175 non-null  object 
 2   age                  1031175 non-null  float64
 3   isbn                 1031175 non-null  object 
 4   rating               1031175 non-null  int64  
 5   book_title           1031175 non-null  object 
 6   book_author          1031175 non-null  object 
 7   year_of_publication  1031175 non-null  float64
 8   publisher            1031175 non-null  object 
 9   img_s                1031175 non-null  object 
 10  img_m                1031175 non-null  object 
 11  img_l                1031175 non-null  object 
 12  Summary              1031175 non-null  object 
 13  Language             1031175 non-null  object 
 14  Category             1031175 non-null  object 
 15

Dapat kita lihat bahwa beberapa kolom di dataset memiliki jumlah yang berbeda. Hal ini mengindikasikan bahwa terdapat missing value pada data

In [9]:
print('Total missing value in dataframe:', df.isnull().sum().sum(), 'records')

Total missing value in dataframe: 72275 records


In [10]:
col_with_missing = [col for col in df.columns if df[col].isnull().any()]
print('Column with missing value:', col_with_missing)

Column with missing value: ['city', 'state', 'country']


Seperti yang kita lihat bahwa kolom city, state, country memiliki missing value. Ada banyak cara dalam menangani missing value, namun pada kasus kali ini kita akan menghapus kolom karena tidak terlalu berpengaruh pada rekomendasi buku

In [11]:
df_no_missing = df.drop(col_with_missing, axis=1)
df_no_missing.head()

Unnamed: 0,user_id,location,age,isbn,rating,book_title,book_author,year_of_publication,publisher,img_s,img_m,img_l,Summary,Language,Category
0,2,"stockton, california, usa",18.0,195153448,0,Classical Mythology,Mark P. O. Morford,2002.0,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,Provides an introduction to classical myths pl...,en,['Social Science']
1,8,"timmins, ontario, canada",34.7439,2005018,5,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses']
2,11400,"ottawa, ontario, canada",49.0,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses']
3,11676,"n/a, n/a, n/a",34.7439,2005018,8,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses']
4,41385,"sudbury, ontario, canada",34.7439,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses']


In [12]:
df_no_missing.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1031175 entries, 0 to 1031174
Data columns (total 15 columns):
 #   Column               Non-Null Count    Dtype  
---  ------               --------------    -----  
 0   user_id              1031175 non-null  int64  
 1   location             1031175 non-null  object 
 2   age                  1031175 non-null  float64
 3   isbn                 1031175 non-null  object 
 4   rating               1031175 non-null  int64  
 5   book_title           1031175 non-null  object 
 6   book_author          1031175 non-null  object 
 7   year_of_publication  1031175 non-null  float64
 8   publisher            1031175 non-null  object 
 9   img_s                1031175 non-null  object 
 10  img_m                1031175 non-null  object 
 11  img_l                1031175 non-null  object 
 12  Summary              1031175 non-null  object 
 13  Language             1031175 non-null  object 
 14  Category             1031175 non-null  object 
dty

In [13]:
print('Total missing value in dataframe:', df_no_missing.isnull().sum().sum(), 'records')

Total missing value in dataframe: 0 records


Load the rating dataset

In [14]:
ratings_path = 'Book reviews/Book reviews/BX-Book-Ratings.csv'
ratings = pd.read_csv(ratings_path, encoding='unicode_escape', sep=';')
ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [15]:
# check the shape of dataframe
print(f'The data has {ratings.shape[0]} records and {ratings.shape[1]} columns.')

The data has 1149780 records and 3 columns.


In [16]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1149780 entries, 0 to 1149779
Data columns (total 3 columns):
 #   Column       Non-Null Count    Dtype 
---  ------       --------------    ----- 
 0   User-ID      1149780 non-null  int64 
 1   ISBN         1149780 non-null  object
 2   Book-Rating  1149780 non-null  int64 
dtypes: int64(2), object(1)
memory usage: 26.3+ MB


In [17]:
print('Total missing value in dataframe:', df_no_missing.isnull().sum().sum(), 'records')

Total missing value in dataframe: 0 records


In [18]:
ratings.rename(columns={'ISBN': 'isbn'}, inplace=True)
ratings.head()

Unnamed: 0,User-ID,isbn,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [19]:
ratings_no_dup = ratings.drop_duplicates(['isbn'])
print(f'The data has {ratings_no_dup.shape[0]} records and {ratings_no_dup.shape[1]} columns.')

The data has 340556 records and 3 columns.


# Data Preparation
---

Kita akan menghapus banyak kolom pada data kali ini karena banyak kolom pada data tidak memberikan informasi yang relevan terhadap rekomendasi buku seperti user_id, age, rating, dsb.

In [20]:
col_to_drop = ['user_id', 'location', 'age', 'rating', 'year_of_publication', 'img_s', 'img_m', 'img_l', 'Summary', 'Language']
df_dropped = df_no_missing.drop(col_to_drop, axis=1)
df_dropped.head()

Unnamed: 0,isbn,book_title,book_author,publisher,Category
0,195153448,Classical Mythology,Mark P. O. Morford,Oxford University Press,['Social Science']
1,2005018,Clara Callan,Richard Bruce Wright,HarperFlamingo Canada,['Actresses']
2,2005018,Clara Callan,Richard Bruce Wright,HarperFlamingo Canada,['Actresses']
3,2005018,Clara Callan,Richard Bruce Wright,HarperFlamingo Canada,['Actresses']
4,2005018,Clara Callan,Richard Bruce Wright,HarperFlamingo Canada,['Actresses']


In [21]:
# check the shape of dataframe
print(f'The data has {df_dropped.shape[0]} records and {df_dropped.shape[1]} columns.')

The data has 1031175 records and 5 columns.


In [22]:
# dropping duplicate record
df_no_dup = df_dropped.drop_duplicates(['book_title'])
df_no_dup.head()

Unnamed: 0,isbn,book_title,book_author,publisher,Category
0,195153448,Classical Mythology,Mark P. O. Morford,Oxford University Press,['Social Science']
1,2005018,Clara Callan,Richard Bruce Wright,HarperFlamingo Canada,['Actresses']
15,60973129,Decision in Normandy,Carlo D'Este,HarperPerennial,['1940-1949']
18,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,Farrar Straus Giroux,['Medical']
29,393045218,The Mummies of Urumchi,E. J. W. Barber,W. W. Norton & Company,['Design']


In [23]:
# check the shape of dataframe
print(f'The data has {df_no_dup.shape[0]} records and {df_no_dup.shape[1]} columns.')

The data has 241090 records and 5 columns.


Merge ratings dataframe with book dataframe

In [24]:
book_with_rating = df_no_dup.merge(ratings_no_dup, on='isbn').drop(['isbn', 'User-ID'], axis=1)
book_with_rating.head()

Unnamed: 0,book_title,book_author,publisher,Category,Book-Rating
0,Classical Mythology,Mark P. O. Morford,Oxford University Press,['Social Science'],0
1,Clara Callan,Richard Bruce Wright,HarperFlamingo Canada,['Actresses'],5
2,Decision in Normandy,Carlo D'Este,HarperPerennial,['1940-1949'],0
3,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,Farrar Straus Giroux,['Medical'],0
4,The Mummies of Urumchi,E. J. W. Barber,W. W. Norton & Company,['Design'],0


In [25]:
book_with_rating.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 241090 entries, 0 to 241089
Data columns (total 5 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   book_title   241090 non-null  object
 1   book_author  241090 non-null  object
 2   publisher    241090 non-null  object
 3   Category     241090 non-null  object
 4   Book-Rating  241090 non-null  int64 
dtypes: int64(1), object(4)
memory usage: 11.0+ MB


In [26]:
# check the shape of dataframe
print(f'The data has {book_with_rating.shape[0]} records and {book_with_rating.shape[1]} columns.')

The data has 241090 records and 5 columns.


Selanjutnya kita akan menghapus kategori yang memiliki jumlah buku dibawah 100 dan diatas 1000 agar mengurangi waktu dan ruang komputasi

In [27]:
cat_sorted = np.sort(book_with_rating['Category'].unique())
unused_cat = [cat for cat in cat_sorted if len(book_with_rating[book_with_rating['Category'] == cat]) < 100 or len(book_with_rating[book_with_rating['Category'] == cat]) > 1000]

In [28]:
clean_cat_df = book_with_rating.loc[~book_with_rating['Category'].isin(unused_cat)]
clean_cat_df.head()

Unnamed: 0,book_title,book_author,publisher,Category,Book-Rating
3,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,Farrar Straus Giroux,['Medical'],0
4,The Mummies of Urumchi,E. J. W. Barber,W. W. Norton & Company,['Design'],0
16,More Cunning Than Man: A Social History of Rat...,Robert Hendrickson,Kensington Publishing Corp.,['Nature'],6
22,If I'd Known Then What I Know Now: Why Not Lea...,J. R. Parrish,Cypress House,['Reference'],10
70,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,Ballantine Books,['Science'],0


In [29]:
# check the shape of dataframe
print(f'The data has {clean_cat_df.shape[0]} records and {clean_cat_df.shape[1]} columns.')

The data has 19461 records and 5 columns.


### Data Preprocessing
Karena data pada kolom category terbungkus dalam tanda kurung siku, maka data harus dibersihkan agar dapat diterima baik oleh model

In [30]:
def clean_category(text):
  text = re.sub(r'[\[\]]', '', text)
  text = text.replace("'", '')
  text = text.replace('"', '')
  text = text.replace('.', '')
  return text

In [31]:
import re

clean_cat_df['clean_category'] = clean_cat_df['Category'].apply(clean_category)
clean_cat_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,book_title,book_author,publisher,Category,Book-Rating,clean_category
3,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,Farrar Straus Giroux,['Medical'],0,Medical
4,The Mummies of Urumchi,E. J. W. Barber,W. W. Norton & Company,['Design'],0,Design
16,More Cunning Than Man: A Social History of Rat...,Robert Hendrickson,Kensington Publishing Corp.,['Nature'],6,Nature
22,If I'd Known Then What I Know Now: Why Not Lea...,J. R. Parrish,Cypress House,['Reference'],10,Reference
70,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,Ballantine Books,['Science'],0,Science


Melihat apakah kategori sudah benar benar bersih dari tanda titik di akhir kategori

In [32]:
clean_cat_sort = np.sort(clean_cat_df['clean_category'].unique())
for cat in clean_cat_sort:
  print(cat)

Adolescence
Adventure and adventurers
Adventure stories
African Americans
American fiction
Animals
Antiques & Collectibles
Architecture
Art
Australia
Bible
Brothers and sisters
Cats
Childrens stories
Christian life
Comics & Graphic Novels
Crafts & Hobbies
Design
Detective and mystery stories
Dogs
Drama
Education
English fiction
English language
Families
Foreign Language Study
Friendship
Games
Games & Activities
Gardening
House & Home
Language Arts & Disciplines
Law
Literary Collections
Literary Criticism
Mathematics
Medical
Music
Nature
Performing Arts
Pets
Philosophy
Photography
Poetry
Political Science
Reference
Science
Sports & Recreation
Technology & Engineering
True Crime


Lalu kita bisa memperbaikin tipografi pada kategori yang sama namun hanya penulisannya berbeda

In [33]:
df_clean = clean_cat_df.drop(['Category'], axis=1)
df_clean.head()

Unnamed: 0,book_title,book_author,publisher,Book-Rating,clean_category
3,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,Farrar Straus Giroux,0,Medical
4,The Mummies of Urumchi,E. J. W. Barber,W. W. Norton & Company,0,Design
16,More Cunning Than Man: A Social History of Rat...,Robert Hendrickson,Kensington Publishing Corp.,6,Nature
22,If I'd Known Then What I Know Now: Why Not Lea...,J. R. Parrish,Cypress House,10,Reference
70,The Dragons of Eden: Speculations on the Evolu...,Carl Sagan,Ballantine Books,0,Science


In [34]:
# check the shape of dataframe
print(f'The data has {df_clean.shape[0]} records and {df_clean.shape[1]} columns.')

The data has 19461 records and 5 columns.


In [35]:
for cat in clean_cat_sort:
  print(cat, 'has', len(df_clean[df_clean['clean_category'] == cat]), 'records')

Adolescence has 109 records
Adventure and adventurers has 102 records
Adventure stories has 192 records
African Americans has 106 records
American fiction has 148 records
Animals has 305 records
Antiques & Collectibles has 152 records
Architecture has 183 records
Art has 867 records
Australia has 106 records
Bible has 123 records
Brothers and sisters has 153 records
Cats has 163 records
Childrens stories has 386 records
Christian life has 183 records
Comics & Graphic Novels has 539 records
Crafts & Hobbies has 654 records
Design has 117 records
Detective and mystery stories has 398 records
Dogs has 117 records
Drama has 685 records
Education has 620 records
English fiction has 128 records
English language has 132 records
Families has 128 records
Foreign Language Study has 349 records
Friendship has 195 records
Games has 218 records
Games & Activities has 228 records
Gardening has 419 records
House & Home has 291 records
Language Arts & Disciplines has 669 records
Law has 192 records
Li

# Modeling
---

Untuk model kali ini kita akan menggunakan Content Based Filtering dimana tujuan model ini adalah mencari similarity antara buku

In [36]:
from sklearn.feature_extraction.text import TfidfVectorizer

tf = TfidfVectorizer()
tfidf_matrix = tf.fit_transform(df_clean['clean_category'])

In [37]:
from sklearn.metrics.pairwise import cosine_similarity
 
# Menghitung cosine similarity pada matrix tf-idf
cosine_sim = cosine_similarity(tfidf_matrix)

In [38]:
cosine_sim_df = pd.DataFrame(cosine_sim, index=df_clean['book_title'], columns=df_clean['book_title'])
cosine_sim_df.sample(5, axis=1).sample(10, axis=0)

book_title,The eye of the storm,The Art of Fiction: A Guide for Writers and Readers,The Artist's Handbook of Materials and Techniques,Contrary to Popular Opinion,The Bubblegum Crisis: Grand Mal
book_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"How to Select, Use and Maintain Garden Equipment",0.0,0.0,0.0,0.0,0.0
Ice Hockey (Play the Game),0.0,0.0,0.0,0.0,0.0
Doctor Faustus and Other Plays (Oxford World's Classics),0.0,0.0,0.0,0.0,0.0
Things Fall Apart (Cliffs Notes),0.0,1.0,0.0,1.0,0.0
A Coney Island of the Mind: Poems (New Directions Paperback No. 74),0.0,0.0,0.0,0.0,0.0
John Milton (Longman Critical Readers),0.0,0.408269,0.0,0.408269,0.0
"The Book of Movie Lists: An Offbeat, Provocative Collection of the Best and Worst of Everything in Movies",0.0,0.0,0.0,0.0,0.0
Timeline Month In The Life Of A Guy Who Refuses,0.0,1.0,0.0,1.0,0.0
The Throwing Madonna: Essays on the Brain,0.0,0.0,0.0,0.0,0.0
"The Three Pillars of Zen: Teaching, Practice, and Enlightenment",0.0,0.0,0.0,0.0,0.0


In [39]:
def get_recommendations(book_title, similarity_data=cosine_sim_df, items=df_clean[['book_title', 'book_author', 'publisher', 'clean_category', 'Book-Rating']], k=10):
  """
  Rekomendasi Buku berdasarkan kemiripan dataframe

  Parameter:
  ---
  book_title : tipe data string (str)
              Judul Buku (index kemiripan dataframe)
  similarity_data : tipe data pd.DataFrame (object)
                    Kesamaan dataframe, simetrik, dengan book_title sebagai 
                    indeks dan kolom
  items : tipe data pd.DataFrame (object)
          Mengandung kedua nama dan fitur lainnya yang digunakan untuk mendefinisikan kemiripan
  k : tipe data integer (int)
      Banyaknya jumlah rekomendasi yang diberikan
  ---


  Pada index ini, kita mengambil k dengan nilai similarity terbesar 
  pada index matrix yang diberikan (i).
  """

  index = similarity_data.loc[:,book_title].to_numpy().argpartition(range(-1, -k, -1))
  closest = similarity_data.columns[index[-1:-(k+2):-1]]
  closest = closest.drop(book_title, errors='ignore')
  return pd.DataFrame(closest).merge(items).head(k)

In [40]:
df_clean[df_clean['book_title'].eq('The Mummies of Urumchi')]

Unnamed: 0,book_title,book_author,publisher,Book-Rating,clean_category
4,The Mummies of Urumchi,E. J. W. Barber,W. W. Norton & Company,0,Design


In [41]:
recommended_book = get_recommendations('The Mummies of Urumchi')
recommended_book.sort_values(['Book-Rating'], ascending=False)

Unnamed: 0,book_title,book_author,publisher,clean_category,Book-Rating
1,Hole in the Wall,Judith M. Grieshaber,Cantz Editions,Design,10
2,Ready-To-Use Celtic Borders on Layout Grids (D...,Mallory Pearce,Dover Publications,Design,9
6,Fairie-Ality: The Fashion Collection,David Ellwand,Candlewick Press (MA),Design,9
7,Treasury of Fantastic and Mythological Creatur...,Richard Huber,Dover Publications,Design,7
8,Type Rules!,Ilene Strizver,North Light Books,Design,6
0,Ready-To-Use School and Education Illustration...,Tom Tierney,Dover Publications,Design,0
3,Zen Style: Balance and Simplicity for Your Home,Jane Tidbury,Universe Publishing (NY),Design,0
4,Art of the Book : From Medieval Manuscript to ...,James Bettley,Victoria & Albert Museum,Design,0
5,Stained Glass : From its Origins to the Present,Virginia Chieffo Raguin,Harry N Abrams,Design,0
9,Abstract and Geometric Patterns: Clip Art (Nor...,Not Applicable (Na ),F & W Pubns,Design,0


# Evaluation
---

Untuk evaluasi content based filtering, kita dapat menggunakan precision@k dalam menentukan apakah rekomendasi relevan atau tidak

In [42]:
k = 10
threshold = 5
book_ratings = recommended_book['Book-Rating'].values
book_relevances = book_ratings > threshold
precision = len(book_ratings[book_relevances]) / k
print(f'The precision of the recommendation system is {precision:.1%}')

The precision of the recommendation system is 50.0%
