# Acknowledgements
---

Dataset ini diambil dari [https://www.kaggle.com](https://www.kaggle.com/datasets/ruchi798/bookcrossing-dataset)

# Import Dataset from Kaggle
---

Import dataset terlebih dulu lalu di unzip

In [4]:
 !mkdir ~/.kaggle
 !cp /content/drive/MyDrive/kaggle.json ~/.kaggle
 !chmod 600 ~/.kaggle/kaggle.json

cp: cannot stat '/content/drive/MyDrive/kaggle.json': No such file or directory
chmod: cannot access '/root/.kaggle/kaggle.json': No such file or directory


In [5]:
 !kaggle datasets download -d ruchi798/bookcrossing-dataset

Dataset URL: https://www.kaggle.com/datasets/ruchi798/bookcrossing-dataset
License(s): CC0-1.0
Downloading bookcrossing-dataset.zip to /content
 75% 57.0M/76.1M [00:00<00:00, 69.9MB/s]
100% 76.1M/76.1M [00:00<00:00, 82.4MB/s]


In [6]:
 !unzip bookcrossing-dataset.zip

Archive:  bookcrossing-dataset.zip
  inflating: Book reviews/Book reviews/BX-Book-Ratings.csv  
  inflating: Book reviews/Book reviews/BX-Users.csv  
  inflating: Book reviews/Book reviews/BX_Books.csv  
  inflating: Books Data with Category Language and Summary/Preprocessed_data.csv  


# Import Library for Exploratory Data Analysis

---

Import library yang akan digunakan untuk data analisis, data visualisasi, data preprocessing dan modeling

In [7]:
# library for data loading and data analysis
import pandas as pd
import numpy as np

# library for data visualization
from matplotlib import pyplot as plt
import seaborn as sns
%matplotlib inline

# Data Loading

In [8]:
path = 'Books Data with Category Language and Summary/Preprocessed_data.csv'
df = pd.read_csv(path, index_col=[0])
df.head()

Unnamed: 0,user_id,location,age,isbn,rating,book_title,book_author,year_of_publication,publisher,img_s,img_m,img_l,Summary,Language,Category,city,state,country
0,2,"stockton, california, usa",18.0,195153448,0,Classical Mythology,Mark P. O. Morford,2002.0,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,Provides an introduction to classical myths pl...,en,['Social Science'],stockton,california,usa
1,8,"timmins, ontario, canada",34.7439,2005018,5,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],timmins,ontario,canada
2,11400,"ottawa, ontario, canada",49.0,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],ottawa,ontario,canada
3,11676,"n/a, n/a, n/a",34.7439,2005018,8,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],,,
4,41385,"sudbury, ontario, canada",34.7439,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],sudbury,ontario,canada


In [9]:
# check the shape of dataframe
print(f'The data has {df.shape[0]} records and {df.shape[1]} columns.')

The data has 1031175 records and 18 columns.


# Exploratory Data Analysis
---

Dataset ini memiliki 1.031.175 baris data dan 18 kolom :
* user_id : merupakan id dari user
* location : merupakan lokasi user tinggal
* age : merupakan umur dari user
* isbn : merupakan kode pengidentifikasi buku
* rating : merupakan rating yang user berikan untuk buku
* book_title : merupakan judul dari buku
* book_author : merupakan penulis dari buku
* year_of_publication : merupakan tahun publikasi buku
* publisher : merupakan penerbit buku
* img_s/img_m/img_l : merupakan cover dari buku
* summary : merupakan sinopsis dari buku
* language : merupakan bahasa terjemahan buku
* category : merupakan kategori buku
* city : merupakan kota buku tersebut dibeli
* state : merupakan provinsi buku tersebut dibeli
* country : merupakan negara buku tersebut dibeli

### Mengecek data apakaha memiliki missing value atau tidak

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1031175 entries, 0 to 1031174
Data columns (total 18 columns):
 #   Column               Non-Null Count    Dtype  
---  ------               --------------    -----  
 0   user_id              1031175 non-null  int64  
 1   location             1031175 non-null  object 
 2   age                  1031175 non-null  float64
 3   isbn                 1031175 non-null  object 
 4   rating               1031175 non-null  int64  
 5   book_title           1031175 non-null  object 
 6   book_author          1031174 non-null  object 
 7   year_of_publication  1031175 non-null  float64
 8   publisher            1031175 non-null  object 
 9   img_s                1031175 non-null  object 
 10  img_m                1031175 non-null  object 
 11  img_l                1031175 non-null  object 
 12  Summary              1031175 non-null  object 
 13  Language             1031175 non-null  object 
 14  Category             1031175 non-null  object 
 15  cit

Dapat kita lihat bahwa beberapa kolom di dataset memiliki jumlah yang berbeda. Hal ini mengindikasikan bahwa terdapat missing value pada data

In [11]:
print('Total missing value in dataframe:', df.isnull().sum().sum(), 'records')

Total missing value in dataframe: 72276 records


In [12]:
col_with_missing = [col for col in df.columns if df[col].isnull().any()]
print('Column with missing value:', col_with_missing)

Column with missing value: ['book_author', 'city', 'state', 'country']


Seperti yang kita lihat bahwa kolom city, state, country memiliki missing value. Ada banyak cara dalam menangani missing value, namun pada kasus kali ini kita akan menghapus kolom karena tidak terlalu berpengaruh pada rekomendasi buku

In [13]:
df_no_missing = df.drop(col_with_missing, axis=1)
df_no_missing.head()

Unnamed: 0,user_id,location,age,isbn,rating,book_title,year_of_publication,publisher,img_s,img_m,img_l,Summary,Language,Category
0,2,"stockton, california, usa",18.0,195153448,0,Classical Mythology,2002.0,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,Provides an introduction to classical myths pl...,en,['Social Science']
1,8,"timmins, ontario, canada",34.7439,2005018,5,Clara Callan,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses']
2,11400,"ottawa, ontario, canada",49.0,2005018,0,Clara Callan,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses']
3,11676,"n/a, n/a, n/a",34.7439,2005018,8,Clara Callan,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses']
4,41385,"sudbury, ontario, canada",34.7439,2005018,0,Clara Callan,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses']


In [14]:
df_no_missing.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1031175 entries, 0 to 1031174
Data columns (total 14 columns):
 #   Column               Non-Null Count    Dtype  
---  ------               --------------    -----  
 0   user_id              1031175 non-null  int64  
 1   location             1031175 non-null  object 
 2   age                  1031175 non-null  float64
 3   isbn                 1031175 non-null  object 
 4   rating               1031175 non-null  int64  
 5   book_title           1031175 non-null  object 
 6   year_of_publication  1031175 non-null  float64
 7   publisher            1031175 non-null  object 
 8   img_s                1031175 non-null  object 
 9   img_m                1031175 non-null  object 
 10  img_l                1031175 non-null  object 
 11  Summary              1031175 non-null  object 
 12  Language             1031175 non-null  object 
 13  Category             1031175 non-null  object 
dtypes: float64(2), int64(2), object(10)
memory usage: 118.0

In [15]:
print('Total missing value in dataframe:', df_no_missing.isnull().sum().sum(), 'records')

Total missing value in dataframe: 0 records


Load the rating dataset

In [16]:
ratings_path = 'Book reviews/Book reviews/BX-Book-Ratings.csv'
ratings = pd.read_csv(ratings_path, encoding='unicode_escape', sep=';')
ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [17]:
# check the shape of dataframe
print(f'The data has {ratings.shape[0]} records and {ratings.shape[1]} columns.')

The data has 1149780 records and 3 columns.


In [18]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1149780 entries, 0 to 1149779
Data columns (total 3 columns):
 #   Column       Non-Null Count    Dtype 
---  ------       --------------    ----- 
 0   User-ID      1149780 non-null  int64 
 1   ISBN         1149780 non-null  object
 2   Book-Rating  1149780 non-null  int64 
dtypes: int64(2), object(1)
memory usage: 26.3+ MB


In [19]:
print('Total missing value in dataframe:', df_no_missing.isnull().sum().sum(), 'records')

Total missing value in dataframe: 0 records


In [20]:
ratings.rename(columns={'ISBN': 'isbn'}, inplace=True)
ratings.head()

Unnamed: 0,User-ID,isbn,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [21]:
ratings_no_dup = ratings.drop_duplicates(['isbn'])
print(f'The data has {ratings_no_dup.shape[0]} records and {ratings_no_dup.shape[1]} columns.')

The data has 340556 records and 3 columns.


# Data Preparation
---

Kita akan menghapus banyak kolom pada data kali ini karena banyak kolom pada data tidak memberikan informasi yang relevan terhadap rekomendasi buku seperti user_id, age, rating, dsb.

In [22]:
col_to_drop = ['user_id', 'location', 'age', 'rating', 'year_of_publication', 'img_s', 'img_m', 'img_l', 'Summary', 'Language']
df_dropped = df_no_missing.drop(col_to_drop, axis=1)
df_dropped.head()

Unnamed: 0,isbn,book_title,publisher,Category
0,195153448,Classical Mythology,Oxford University Press,['Social Science']
1,2005018,Clara Callan,HarperFlamingo Canada,['Actresses']
2,2005018,Clara Callan,HarperFlamingo Canada,['Actresses']
3,2005018,Clara Callan,HarperFlamingo Canada,['Actresses']
4,2005018,Clara Callan,HarperFlamingo Canada,['Actresses']


In [23]:
# check the shape of dataframe
print(f'The data has {df_dropped.shape[0]} records and {df_dropped.shape[1]} columns.')

The data has 1031175 records and 4 columns.


In [24]:
# dropping duplicate record
df_no_dup = df_dropped.drop_duplicates(['book_title'])
df_no_dup.head()

Unnamed: 0,isbn,book_title,publisher,Category
0,195153448,Classical Mythology,Oxford University Press,['Social Science']
1,2005018,Clara Callan,HarperFlamingo Canada,['Actresses']
15,60973129,Decision in Normandy,HarperPerennial,['1940-1949']
18,374157065,Flu: The Story of the Great Influenza Pandemic...,Farrar Straus Giroux,['Medical']
29,393045218,The Mummies of Urumchi,W. W. Norton & Company,['Design']


In [25]:
# check the shape of dataframe
print(f'The data has {df_no_dup.shape[0]} records and {df_no_dup.shape[1]} columns.')

The data has 241090 records and 4 columns.


Merge ratings dataframe with book dataframe

In [26]:
book_with_rating = df_no_dup.merge(ratings_no_dup, on='isbn').drop(['isbn', 'User-ID'], axis=1)
book_with_rating.head()

Unnamed: 0,book_title,publisher,Category,Book-Rating
0,Classical Mythology,Oxford University Press,['Social Science'],0
1,Clara Callan,HarperFlamingo Canada,['Actresses'],5
2,Decision in Normandy,HarperPerennial,['1940-1949'],0
3,Flu: The Story of the Great Influenza Pandemic...,Farrar Straus Giroux,['Medical'],0
4,The Mummies of Urumchi,W. W. Norton & Company,['Design'],0


In [27]:
book_with_rating.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 241090 entries, 0 to 241089
Data columns (total 4 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   book_title   241090 non-null  object
 1   publisher    241090 non-null  object
 2   Category     241090 non-null  object
 3   Book-Rating  241090 non-null  int64 
dtypes: int64(1), object(3)
memory usage: 7.4+ MB


In [28]:
# check the shape of dataframe
print(f'The data has {book_with_rating.shape[0]} records and {book_with_rating.shape[1]} columns.')

The data has 241090 records and 4 columns.


Selanjutnya kita akan menghapus kategori yang memiliki jumlah buku dibawah 100 dan diatas 1000 agar mengurangi waktu dan ruang komputasi

In [29]:
cat_sorted = np.sort(book_with_rating['Category'].unique())
unused_cat = [cat for cat in cat_sorted if len(book_with_rating[book_with_rating['Category'] == cat]) < 100 or len(book_with_rating[book_with_rating['Category'] == cat]) > 1000]

In [30]:
clean_cat_df = book_with_rating.loc[~book_with_rating['Category'].isin(unused_cat)]
clean_cat_df.head()

Unnamed: 0,book_title,publisher,Category,Book-Rating
3,Flu: The Story of the Great Influenza Pandemic...,Farrar Straus Giroux,['Medical'],0
4,The Mummies of Urumchi,W. W. Norton & Company,['Design'],0
16,More Cunning Than Man: A Social History of Rat...,Kensington Publishing Corp.,['Nature'],6
22,If I'd Known Then What I Know Now: Why Not Lea...,Cypress House,['Reference'],10
70,The Dragons of Eden: Speculations on the Evolu...,Ballantine Books,['Science'],0


In [31]:
# check the shape of dataframe
print(f'The data has {clean_cat_df.shape[0]} records and {clean_cat_df.shape[1]} columns.')

The data has 19461 records and 4 columns.


### Data Preprocessing
Karena data pada kolom category terbungkus dalam tanda kurung siku, maka data harus dibersihkan agar dapat diterima baik oleh model

In [32]:
def clean_category(text):
  text = re.sub(r'[\[\]]', '', text)
  text = text.replace("'", '')
  text = text.replace('"', '')
  text = text.replace('.', '')
  return text

In [33]:
import re

clean_cat_df['clean_category'] = clean_cat_df['Category'].apply(clean_category)
clean_cat_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  clean_cat_df['clean_category'] = clean_cat_df['Category'].apply(clean_category)


Unnamed: 0,book_title,publisher,Category,Book-Rating,clean_category
3,Flu: The Story of the Great Influenza Pandemic...,Farrar Straus Giroux,['Medical'],0,Medical
4,The Mummies of Urumchi,W. W. Norton & Company,['Design'],0,Design
16,More Cunning Than Man: A Social History of Rat...,Kensington Publishing Corp.,['Nature'],6,Nature
22,If I'd Known Then What I Know Now: Why Not Lea...,Cypress House,['Reference'],10,Reference
70,The Dragons of Eden: Speculations on the Evolu...,Ballantine Books,['Science'],0,Science


Melihat apakah kategori sudah benar benar bersih dari tanda titik di akhir kategori

In [34]:
clean_cat_sort = np.sort(clean_cat_df['clean_category'].unique())
for cat in clean_cat_sort:
  print(cat)

Adolescence
Adventure and adventurers
Adventure stories
African Americans
American fiction
Animals
Antiques & Collectibles
Architecture
Art
Australia
Bible
Brothers and sisters
Cats
Childrens stories
Christian life
Comics & Graphic Novels
Crafts & Hobbies
Design
Detective and mystery stories
Dogs
Drama
Education
English fiction
English language
Families
Foreign Language Study
Friendship
Games
Games & Activities
Gardening
House & Home
Language Arts & Disciplines
Law
Literary Collections
Literary Criticism
Mathematics
Medical
Music
Nature
Performing Arts
Pets
Philosophy
Photography
Poetry
Political Science
Reference
Science
Sports & Recreation
Technology & Engineering
True Crime


Lalu kita bisa memperbaikin tipografi pada kategori yang sama namun hanya penulisannya berbeda

In [35]:
df_clean = clean_cat_df.drop(['Category'], axis=1)
df_clean.head()

Unnamed: 0,book_title,publisher,Book-Rating,clean_category
3,Flu: The Story of the Great Influenza Pandemic...,Farrar Straus Giroux,0,Medical
4,The Mummies of Urumchi,W. W. Norton & Company,0,Design
16,More Cunning Than Man: A Social History of Rat...,Kensington Publishing Corp.,6,Nature
22,If I'd Known Then What I Know Now: Why Not Lea...,Cypress House,10,Reference
70,The Dragons of Eden: Speculations on the Evolu...,Ballantine Books,0,Science


In [36]:
# check the shape of dataframe
print(f'The data has {df_clean.shape[0]} records and {df_clean.shape[1]} columns.')

The data has 19461 records and 4 columns.


In [37]:
for cat in clean_cat_sort:
  print(cat, 'has', len(df_clean[df_clean['clean_category'] == cat]), 'records')

Adolescence has 109 records
Adventure and adventurers has 102 records
Adventure stories has 192 records
African Americans has 106 records
American fiction has 148 records
Animals has 305 records
Antiques & Collectibles has 152 records
Architecture has 183 records
Art has 867 records
Australia has 106 records
Bible has 123 records
Brothers and sisters has 153 records
Cats has 163 records
Childrens stories has 386 records
Christian life has 183 records
Comics & Graphic Novels has 539 records
Crafts & Hobbies has 654 records
Design has 117 records
Detective and mystery stories has 398 records
Dogs has 117 records
Drama has 685 records
Education has 620 records
English fiction has 128 records
English language has 132 records
Families has 128 records
Foreign Language Study has 349 records
Friendship has 195 records
Games has 218 records
Games & Activities has 228 records
Gardening has 419 records
House & Home has 291 records
Language Arts & Disciplines has 669 records
Law has 192 records
Li

# Modeling
---

Untuk model kali ini kita akan menggunakan Content Based Filtering dimana tujuan model ini adalah mencari similarity antara buku

In [38]:
from sklearn.feature_extraction.text import TfidfVectorizer

tf = TfidfVectorizer()
tfidf_matrix = tf.fit_transform(df_clean['clean_category'])

In [39]:
from sklearn.metrics.pairwise import cosine_similarity

# Menghitung cosine similarity pada matrix tf-idf
cosine_sim = cosine_similarity(tfidf_matrix)

In [40]:
cosine_sim_df = pd.DataFrame(cosine_sim, index=df_clean['book_title'], columns=df_clean['book_title'])
cosine_sim_df.sample(5, axis=1).sample(10, axis=0)

book_title,"Decorating With Americana: How to Know It, Where to Find It, and How to Make It Work for You",New Chicana/Chicano Writing 1,"La souris, la mouche et l'homme","The Case of the Hotel Who-Done-It: A Novelization (Adventures of Mary-Kate & Ashley, No 7)",The Tragedy of King Lear (The New Cambridge Shakespeare)
book_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
The Big Picture: Who Killed Hollywood? and Other Essays,0.0,0.0,0.0,0.0,1.0
Man on the Moon: The Shooting Script (Newmarket Shooting Script (Paper)),0.0,0.0,0.0,0.0,0.0
Creating Readers,0.0,0.0,0.0,0.0,0.0
TWISTER: THE SCIENCE OF TORNADOES AND THE MAKING OF A NATURAL DISASTER MOVIE : The Science of Tornadoes and the Making of a Natural Disaster Movie,0.0,0.0,1.0,0.0,0.0
The Pinball Effect: How Renaissance Water Gardens Made the Carburetor Possible-And Other Journeys Through Knowledge (Pinball Effect),0.0,0.0,0.0,0.0,0.0
You Can Get Published (You Can Write It Series),0.0,0.0,0.0,0.0,0.0
Jiffy Phrasebook French,0.0,0.0,0.0,0.0,0.0
American Women Writers,0.0,0.0,0.0,0.0,0.0
LUCY: THE BEGINNINGS OF HUMANKIND,0.0,0.0,0.0,0.0,0.0
Having It All,0.0,0.0,0.0,0.0,0.0


In [51]:
# prompt: def get_recommendations(book_title, similarity_data=cosine_sim_df, items=df_clean[['book_title', 'book_author', 'publisher', 'clean_category', 'Book-Rating']], k=10):
#   """
#   Rekomendasi Buku berdasarkan kemiripan dataframe
#   Parameter:
#   ---
#   book_title : tipe data string (str)
#               Judul Buku (index kemiripan dataframe)
#   similarity_data : tipe data pd.DataFrame (object)

def get_recommendations(book_title, similarity_data=cosine_sim_df, items=df_clean[['book_title', 'publisher', 'clean_category', 'Book-Rating']], k=10):
  """
  Rekomendasi Buku berdasarkan kemiripan dataframe
  Parameter:
  ---
  book_title : tipe data string (str)
              Judul Buku (index kemiripan dataframe)
  similarity_data : tipe data pd.DataFrame (object)
                    Dataframe kemiripan
  items : tipe data pd.DataFrame (object)
          Dataframe yang berisi informasi buku
  k : tipe data integer (int)
      Jumlah rekomendasi yang diinginkan
  Return:
  ---
  list
      List Buku Rekomendasi
  """
  try:
    # Mengambil data similarity dari buku yang di input
    similarity_scores = similarity_data[book_title]

    # Mengurutkan buku berdasarkan similarity score
    similarity_scores = similarity_scores.sort_values(ascending=False)

    # Mengambil k buku dengan similarity tertinggi
    top_k_books = similarity_scores[1:k+1].index.tolist()

    # Mengembalikan list buku rekomendasi
    return items[items['book_title'].isin(top_k_books)]
  except KeyError:
    print(f"Buku dengan judul '{book_title}' tidak ditemukan dalam dataset.")
    return []

In [52]:
df_clean[df_clean['book_title'].eq('The Mummies of Urumchi')]

Unnamed: 0,book_title,publisher,Book-Rating,clean_category
4,The Mummies of Urumchi,W. W. Norton & Company,0,Design


In [53]:
recommended_book = get_recommendations('The Mummies of Urumchi')
recommended_book.sort_values(['Book-Rating'], ascending=False)

Unnamed: 0,book_title,publisher,clean_category,Book-Rating
15484,Work Clothes (Chic Simple) : Casual Dress for...,Knopf,Design,8
58967,The Great American Pin-Up,Taschen,Design,6
26342,Woven by the Grandmothers: Nineteenth-Century ...,Smithsonian Books,Design,5
190279,Make Your Scanner a Great Design & Production ...,North Light Books,Design,5
222816,Artistically Cultivated Herbs: How to Train He...,Woodbridge Press Publishing Company,Design,5
76115,Model: The Ugly Business of Beautiful Women,Harpercollins,Design,0
115881,"1,001 Advertising Cuts from the Twenties and T...",Dover Publications,Design,0
122333,Missoni (Made in Italy),Gingko Press,Design,0
152008,Black + White,Chronicle Books,Design,0
226078,The Parisian Woman's Guide to Style,Universe Publishing (NY),Design,0


# Evaluation
---

Untuk evaluasi content based filtering, kita dapat menggunakan precision@k dalam menentukan apakah rekomendasi relevan atau tidak

In [54]:
k = 10
threshold = 5
book_ratings = recommended_book['Book-Rating'].values
book_relevances = book_ratings > threshold
precision = len(book_ratings[book_relevances]) / k
print(f'The precision of the recommendation system is {precision:.1%}')

The precision of the recommendation system is 20.0%


In [55]:
# get requirement txt

!pip freeze > requirements.txt