# "Bookrating (Collaborative-Filtering)"
> "Prediction of tangible books to read using collaborative filtering"

- toc: false
- branch: master
- badges: true
- comments: true
- categories: [jupyter, pytorch, pytorch-lightning]
- hide: false
- search_exclude: true

In [1]:
%%capture
!pip install -U fastai

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [16]:
from fastai.collab import *
import pandas as pd
import torch.nn as nn

In [4]:
pathr = '/content/drive/MyDrive/my-datasets/collaborative-filtering/BX-Book-Ratings.csv'
pathb = '/content/drive/MyDrive/my-datasets/collaborative-filtering/BX-Books.csv'
pathu = '/content/drive/MyDrive/my-datasets/collaborative-filtering/BX-Users.csv'

In [5]:
dfr = pd.read_csv(pathr, sep=';', error_bad_lines=False, encoding='latin-1')
dfb = pd.read_csv(pathb, sep=';', error_bad_lines=False, encoding='latin-1')
dfu = pd.read_csv(pathu, sep=';', error_bad_lines=False, encoding='latin-1')

b'Skipping line 6452: expected 8 fields, saw 9\nSkipping line 43667: expected 8 fields, saw 10\nSkipping line 51751: expected 8 fields, saw 9\n'
b'Skipping line 92038: expected 8 fields, saw 9\nSkipping line 104319: expected 8 fields, saw 9\nSkipping line 121768: expected 8 fields, saw 9\n'
b'Skipping line 144058: expected 8 fields, saw 9\nSkipping line 150789: expected 8 fields, saw 9\nSkipping line 157128: expected 8 fields, saw 9\nSkipping line 180189: expected 8 fields, saw 9\nSkipping line 185738: expected 8 fields, saw 9\n'
b'Skipping line 209388: expected 8 fields, saw 9\nSkipping line 220626: expected 8 fields, saw 9\nSkipping line 227933: expected 8 fields, saw 11\nSkipping line 228957: expected 8 fields, saw 10\nSkipping line 245933: expected 8 fields, saw 9\nSkipping line 251296: expected 8 fields, saw 9\nSkipping line 259941: expected 8 fields, saw 9\nSkipping line 261529: expected 8 fields, saw 9\n'
  interactivity=interactivity, compiler=compiler, result=result)


In [6]:
dfb = dfb[['ISBN','Book-Title','Book-Author','Year-Of-Publication','Publisher']]

In [7]:
dfr.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [8]:
dfb.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic of 1918 and the Search for the Virus That Caused It,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company


In [9]:
df = dfr.merge(dfb)
df.head()

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
1,2313,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
2,6543,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
3,8680,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
4,10314,034545104X,9,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books


In [10]:
dls = CollabDataLoaders.from_df(df, item_name='Book-Title', bs=64)
dls.show_batch()

Unnamed: 0,User-ID,Book-Title,Book-Rating
0,153662,Too Much Too Soon Int,0
1,93047,Bartholomew and the Oobleck : (Caldecott Honor Book),9
2,25008,The Talented Mr. Ripley (Vintage Crime/Black Lizard),0
3,172054,The Quilter's Apprentice,0
4,76942,Carriers,0
5,15957,"Ruth Park's \Harp in the South\"" Novels""",0
6,247429,"Women Pray: Voices Through the Ages, from Many Faiths, Cultures, and Traditions",0
7,225199,La VÃ?Â©nus d'ille,0
8,224525,Psychic Tarot: Illustrated with the Aquarian Tarot Deck,10
9,154176,SPELLBINDER X,0


In [12]:
learn = collab_learner(dls, y_range=(0,5.5), n_factors=50)

In [13]:
learn.fit_one_cycle(5, 2e-3, wd=0.1)

epoch,train_loss,valid_loss,time
0,13.079836,13.260295,02:53
1,12.92739,12.971729,02:53
2,12.63707,12.929997,02:54
3,12.430797,12.923674,02:52
4,12.032089,12.930154,02:52


In [28]:
def recommend(book):
  movie_factors = learn.model.i_weight.weight
  idx = dls.classes['Book-Title'].o2i[book]
  dist = nn.CosineSimilarity(dim=1)(movie_factors, movie_factors[idx][None])
  indices = dist.argsort(descending=True)[1:6]
  return dls.classes['Book-Title'][indices]

In [47]:
res = recommend('Harry Potter and the Prisoner of Azkaban (Book 3)')
for i in res:
  print(i)

Harry Potter and the Goblet of Fire (Book 4)
Harry Potter and the Chamber of Secrets (Book 2)
The X-Planes: X-1 to X-45: 3rd Edition
Sanctuary: Finding Moments of Refuge in the Presence of God
Harry Potter and the Sorcerer's Stone (Book 1)
