# Priporočilni sistem za knjige
Za seminarsko nalogo sem se odločil da bom implementiral algoritme, opisane v navodilih na podatkih o knjigah. Podatke lahko najdemo na [povezavi](https://github.com/caserec/Datasets-for-Recommender-Systems/tree/master/Processed%20Datasets/BookCrossing). 

In [1]:
RATINGS_FILE = 'data/book_crossing/book_ratings.dat'
BOOKS_FILE = 'data/book_crossing/items_info.dat'

## UserItemData
Za branje ocen sem napisal class z imenom UserItemData. V konstruktorju sprejme pot do datoteke, dva datuma in minimalno število ocen, ki jih mora imeti knjiga. Lahko pa prejme tudi pot do datoteke pickle, v tem primeru naloži razred iz .pkl datoteke.
Razred ima za uporabnika viden le en atribut, data. Ta je Pandas DataFrame. Ima stolpce UserID, bookID, Rating in Date. 
Metode ki jih ima razred so sledeče:
- save_to_db(path), prejme parameter path, ki pove kam naj shrani razred v datoteko .pkl
- get_rating(user_id, book_id), prejme parametra userID in bookID in vrne oceno ki jo je uporabnik dal določenemu filmu. Če uporabnik ali film nista v atributu data, metoda vrne None
- get_number_of_users(), vrne število različnih uporabnikov v data
- nratings(), vnre število vseh ocen v data
- get_sum_of_ratings(book_id), prejme bookID, vrne vsoto vseh ocen za ta film. Če film ne obstaja v data vrne 0.
- get_number_of_ratings(book_id), prejme bookID, vrne število ocen za ta film. Če film ne obstaja v data vrne 0.
- get_avg_rating(book_id), prejme bookID, vrne povprečno oceno za ta film.

In [17]:
import pickle
import pandas as pd

class UserItemData:
    def __init__(self, path=None, pickle_path=None, min_ratings=0):
        if pickle_path:
            self.__path, self.__min_ratings, self.data = self.__load_from_db(pickle_path)
        else:
            self.__path = path
            self.__min_ratings = min_ratings
            self.data = self.__load_data()
        
    def __load_data(self):
        data = []
        try:
            with open(self.__path, 'r') as f:
                next(f)
                for line in f:
                    userID, bookID, rating= line.strip().split('\t')
                    data.append([int(userID), int(bookID), float(rating)])
            
            df = pd.DataFrame(data, columns=['userID', 'bookID', 'rating'])
            book_counts = df['bookID'].value_counts()
            valid_books = book_counts[book_counts >= self.__min_ratings].index
            df = df[df['bookID'].isin(valid_books)]
            return df
        
        except FileNotFoundError:
            print('Error: File not found')
            return None
    
    def save_to_db(self, path):
        with open(path, 'wb') as f:
            pickle.dump(self, f)
    
    def __load_from_db(path):
        try:
            with open(path, 'rb') as f:
                return pickle.load(f)
        except FileNotFoundError:
            print('Error: File not found')
            return None
        except pickle.UnpicklingError:
            print('Error: Unpickling error')
            return None
    
    def get_rating(self, user_id, book_id):
        rating = self.data[(self.data['userID'] == user_id) & (self.data['bookID'] == book_id)]['rating']
        if rating.empty:
            return None
        return rating.values[0]
    
    def get_number_of_users(self):
        return len(self.data['userID'].unique())
    
    def nrating(self):
        return len(self.data)
    
    def get_sum_of_ratings(self, book_id):
        book_ratings = self.data.groupby('bookID')['rating'].sum()
        if book_id in book_ratings:
            return book_ratings[book_id]
        else:
            return 0
    
    def get_number_of_ratings(self, book_id):
        book_ratings_count = self.data.groupby('bookID')['rating'].count()
        if book_id in book_ratings_count:
            return book_ratings_count[book_id]
        else:
            return 0
    
    def get_avg_rating(self, book_id):
        return self.get_sum_of_ratings(book_id) / self.get_number_of_ratings(book_id)
    
    def get_movies(self):
        return self.data['bookID'].unique()
    def get_all_ratings(self, book):
        return self.data[self.data['bookID'] == book]

In [3]:
uid = UserItemData(RATINGS_FILE)
print(uid.nrating())

62656


## BookData
Razred za branje podatkov o knjigah. Razred vsebuje naslednje metode:
- nbooks(), vrne število knjig
- get_title(id), prjeme bookID, vrne naslov knjige če ta obstaja, če ne vrne None
- get_ids_and_titles(), vrne seznam terk, kjer je na prvem mestu bookID in na drugem naslov

In [4]:
class BookData:
    def __init__(self, path):
        self.__path = path
        self.data = self.__load_data()
    
    def __load_data(self):
        data = []
        with open(self.__path, 'r') as f:
            next(f)
            for line in f:
                line = line.strip().split('\t')
                data.append(line)
        return data
    
    def nbooks(self):
        return len(self.data)

    def get_title(self, id):
        for line in self.data:
            if int(line[0]) == id:
                return line[2]
        return None

    def get_ids_and_titles(self):
        return [(int(line[0]), line[1]) for line in self.data]

In [5]:
bd = BookData(BOOKS_FILE)
print(bd.nbooks())

17384


## Naključni predikator
Naključni predikator je razred, ki v konstruktorju prejme minimalno in maksimalno oceno. Metoda fit shrani podatke o knjigah, metoda predict pa vrne naključno vrednost med min in max za vsak produkt. 

In [14]:
import random
class RandomPredictor:
    def __init__(self, min_rating, max_rating):
        self.min_rating = min_rating
        self.max_rating = max_rating
        self.uim = None

    def fit(self, X):
        self.uim = X

    def predict(self, user_id, rec_seen=True):
        if self.uim is None:
            raise ValueError("Error: fit() method was not called yet.")
        
        ratings = dict()
        
        md = BookData(BOOKS_FILE)

        for el in md.data:
            book_id = int(el[0])
            if self.uim.get_rating(user_id, book_id) is None:
                ratings[book_id] = random.randint(self.min_rating, self.max_rating)
            elif rec_seen:
                ratings[book_id] = self.uim.get_rating(user_id, book_id)
        
        return ratings



In [20]:
md = BookData(BOOKS_FILE)
uim = UserItemData(RATINGS_FILE)
rp = RandomPredictor(1, 10)
rp.fit(uim)
pred = rp.predict(78)
items = [1, 3, 20, 50, 100]
for item in items:
    print("Film: {}, ocena: {}".format(md.get_title(item), pred[item]))

Film: Decision in Normandy, ocena: 1
Film: What If?: The World's Foremost Military Historians Imagine What Might Have Been, ocena: 8
Film: Isle of Dogs, ocena: 5
Film: Female Intelligence, ocena: 2
Film: Die Mars- Chroniken. Roman in ErzÃ?Â¤hlungen., ocena: 7


## Priporočanje 
Razred za priporočanje knjig. Razred v konstruktorju sprejme predikator. Razred ima naslednje metode:
- fit(X), kjer je X tipa UserItemData
- recommend(userID, n, rec_seen), prejme parametre userID, za katerega bo razred napovedal knjige, n predstavlja število knjig, ki naj jih napove in rec_seen je boolean, ki priporoča že prebrane knjige, če je nastavljen na True. Metoda vrne slovar kjer je ključ bookID in value ocena, ki jo napovemo za to knjigo za tega uporabnika. 

In [6]:
class Recommender:
    def __init__(self, predictor):
        self.predictor = predictor
    
    def fit(self, X):
        self.predictor.fit(X)

    def recommend(self, user_id, n=10, rec_seen=False):
        predictions = self.predictor.predict(user_id, rec_seen)
        sorted_predictions = sorted(predictions.items(), key=lambda x: x[1], reverse=True)
        return {k: v for k, v in sorted_predictions[:n]}

In [24]:
md = BookData(BOOKS_FILE)
uim = UserItemData(RATINGS_FILE)
rp = RandomPredictor(1, 10)
rec = Recommender(rp)
rec.fit(uim)
rec_items = rec.recommend(78, n=5, rec_seen=False)
for idbook, val in rec_items.items():
    print("Film: {}, ocena: {}".format(md.get_title(idbook), val))  

Film: The Mummies of Urumchi, ocena: 10
Film: Timeline, ocena: 10
Film: Prague : A Novel, ocena: 10
Film: Shabanu: Daughter of the Wind (Border Trilogy), ocena: 10
Film: Haveli (Laurel Leaf Books), ocena: 10


## Napovedovanje s povprečjem
Razred AveragePredictor sprejme parameter b, ki mora biti večji ali enak 0. Povprečje se izračuna po sledeči formuli:
$$ \text{avg} = \frac{\text{vs} + b \cdot g_{\text{avg}}}{n + b} $$

In [27]:
class AveragePredictor:
    def __init__(self, b):
        self.__uid = None
        self.__b = b
    
    def fit(self, uim):
        self.__uid = uim
        self.__ratings = dict()
        global_sum_of_ratings = 0
        global_counter = 0
        book_dict = dict()  # book_id: (number_of_ratings, sum_of_ratings)
        md = BookData(BOOKS_FILE)

        for el in md.data:
            book_id = int(el[0])

            sum_of_ratings = self.__uid.get_sum_of_ratings(book_id)
            global_sum_of_ratings += sum_of_ratings
            number_of_ratings = self.__uid.get_number_of_ratings(book_id)
            global_counter += number_of_ratings

            book_dict[book_id] = (number_of_ratings, sum_of_ratings)
        
        global_avg = float(global_sum_of_ratings / global_counter)
        for book_id, (number_of_ratings, sum_of_ratings) in book_dict.items():
                    self.__ratings[book_id] = (sum_of_ratings + self.__b * global_avg) / (number_of_ratings + self.__b)

    def predict(self, user_id, rec_seen=True):
        if self.__uid is None:
            raise ValueError("Error: fit() method was not called yet.")
        
        
        predictions = {}
        user_rated_books = set()

        if not rec_seen:
            for (_, (uid, mid, _)) in self.__uid.data.iterrows():
                if uid == user_id:
                    user_rated_books.add(mid)

        for book_id, avg_rating in self.__ratings.items():
            if rec_seen or book_id not in user_rated_books:
                predictions[book_id] = avg_rating

        return predictions

In [28]:
md = BookData(BOOKS_FILE)
uim = UserItemData(RATINGS_FILE)
rp = AveragePredictor(100)
rec = Recommender(rp)
rec.fit(uim)
rec_items = rec.recommend(78, n=5, rec_seen=False)
for idbook, val in rec_items.items():
    print("Film: {}, ocena: {}".format(md.get_title(idbook), val))  

Film: Harry Potter and the Chamber of Secrets (Book 2), ocena: 8.546645294863009
Film: Tycoon'S Temptation (Silhouette Desire, No. 1414), ocena: 8.4728569131659
Film: Deadly Decisions, ocena: 8.465670467002166
Film: Past Lives, Present Dreams: How to Use Reincarnation for Personal Growth, ocena: 8.464733811640189
Film: Seabiscuit: An American Legend, ocena: 8.43566448895003


## Napovedovanje s številom ogledov
Prediktor ViewsPredictor za vsak film vrne število ogledov posameznega filma. Za ogled filma sem štel oceno. 

In [29]:
class ViewsPredictor:
    def __init__(self):
        self.__uid = None
    
    def fit(self, uim):
        self.__uid = uim
        self.__ratings = dict()
        md = BookData(BOOKS_FILE)
        for line in md.data:
            book_id = int(line[0])
            self.__ratings[book_id] = uim.get_number_of_ratings(book_id)
    
    def predict(self, user_id, rec_seen=True):
        if self.__uid is None:
            raise ValueError("Error: fit() method was not called yet.")
        
        
        predictions = {}
        user_rated_books = self.__uid.data[self.__uid.data['userID'] == user_id]['bookID'].unique()

        for book_id, avg_rating in self.__ratings.items():
            if rec_seen or book_id not in user_rated_books:
                predictions[book_id] = avg_rating

        return predictions

In [30]:
md = BookData(BOOKS_FILE)
uim = UserItemData(RATINGS_FILE)
rp = ViewsPredictor()
rec = Recommender(rp)
rec.fit(uim)
rec_items = rec.recommend(78, n=5, rec_seen=False)
for idbook, val in rec_items.items():
    print("Film: {}, ocena: {}".format(md.get_title(idbook), val))  

Film: Impossible Vacation, ocena: 160
Film: The Rescue, ocena: 119
Film: Tycoon'S Temptation (Silhouette Desire, No. 1414), ocena: 89
Film: Past Lives, Present Dreams: How to Use Reincarnation for Personal Growth, ocena: 88
Film: The Queen of the Damned (Vampire Chronicles (Paperback)), ocena: 86


## Napovedovanje ocen s podobnostjo med produkti
Razred ItemBasedPredictor v konstruktorju prejme parametra min_values, ki pove, koliko je najmanjše število uporabnikov, ki je ocenilo obe knjigi in threashold, ki pove koliko mora biti najmanjša podobnost med knjigama, da upoštevamo podobnost. Podobnost se računa s popravljeno kosinusno razdaljo:
$$
\text{sim}(i, j) = \frac{\sum_{u \in U_{i,j}} \left( (R_{u,i} - R_u) \cdot (R_{u,j} - R_u) \right)}
{\sqrt{\sum_{u \in U_{i,j}} (R_{u,i} - R_u)^2} \cdot \sqrt{\sum_{u \in U_{i,j}} (R_{u,j} - R_u)^2}}
$$
Razred ima naslednje metode:
- fit(X), kjer je X tipa UserItemData. Metoda ne vrne ničesar, naredi pa matriko podobnosti, kjer so stolpci in vrstice knjige.
- predict(userID, rec_seen), ki prejme parameter userID, za katerega bomo napovedovali, in rec_seen, ki priporoči že videne filme, če je nastavljen na True. Funkcija vrne slovar, kjer je ključ bookID in vrednost napovedana ocena. 
- similarity(i1, i2), ki prejme dva bookID. Funkcija vrne podobnost med podanima knjigama.

In [7]:
class ItemBasedPredictor:
    def __init__(self, min_values=0, threshold=0):
        self.__uid = None
        self.__min_values = min_values
        self.__threshold = threshold
    
    def fit(self, uim):
        self.__uid = uim
        self.__uuid = sorted(list(self.__uid.data['userID'].unique()))
        self.__umid = sorted(list(self.__uid.data['bookID'].unique()))
        
        self.__matrix_ratings = [[None for _ in range(len(self.__umid))] for _ in range(len(self.__uuid))]
        
        for (_, (uid, mid, rating)) in self.__uid.data.iterrows():
            if(mid in self.__umid and uid in self.__uuid):
                self.__matrix_ratings[self.__uuid.index(uid)][self.__umid.index(mid)] = rating

        self.__df_matrix_ratings = pd.DataFrame(self.__matrix_ratings, columns=self.__umid, index=self.__uuid)
        self.__df_matrix_ratings.replace(0, pd.NA, inplace=True)

        self.__averages = list(self.__df_matrix_ratings.mean(axis=1, skipna=True))

        for i in range(len(self.__matrix_ratings)):
            for j in range(len(self.__matrix_ratings[i])):
                if self.__matrix_ratings[i][j]:
                    self.__matrix_ratings[i][j] = self.__matrix_ratings[i][j] - self.__averages[i]
        
        self.similarity_matrix = [[0 for _ in range(len(self.__umid))] for _ in range(len(self.__umid))]

        for i in range(len(self.__umid)):
            for j in range(i, len(self.__umid)):
                if i == j:
                    self.similarity_matrix[i][j] = 1
                else:
                    sim = self.__sim(i, j)
                    self.similarity_matrix[i][j] = sim
                    self.similarity_matrix[j][i] = sim

    def __sim(self, i1, i2):
        num = 0
        den1 = 0
        den2 = 0
        counter = 0

        for user_ratings in self.__matrix_ratings:
            if user_ratings[i1] and user_ratings[i2]:
                num += user_ratings[i1] * user_ratings[i2]
                den1 += user_ratings[i1] ** 2
                den2 += user_ratings[i2] ** 2
                counter += 1
        
        
        den = (den1 ** 0.5) * (den2 ** 0.5)
        return num / den if den != 0 and counter >= self.__min_values and num / den > self.__threshold else 0
    
    def similarity(self, i1, i2):
        return self.similarity_matrix[self.__umid.index(i1)][self.__umid.index(i2)]
    
    def predict(self, userID, rec_seen=True):
        if self.__uid is None:
            raise ValueError("Error: fit() method was not called yet.")
        
        predictions = {}

        for j in range(len(self.__matrix_ratings[0])):
            if self.__matrix_ratings[self.__uuid.index(userID)][j] is None:
                num = 0
                den = 0
                for i in range(len(self.__matrix_ratings[0])):
                    if self.__matrix_ratings[self.__uuid.index(userID)][i]:
                        num += self.similarity_matrix[j][i] * self.__matrix_ratings[self.__uuid.index(userID)][i]
                        den += self.similarity_matrix[j][i]
                if den != 0:
                    predictions[self.__umid[j]] = (num / den) + self.__averages[self.__uuid.index(userID)]
        
        if rec_seen:
            for el in self.__umid:
                if self.__matrix_ratings[self.__uuid.index(userID)][self.__umid.index(el)]:
                    predictions[el] = self.__matrix_ratings[self.__uuid.index(userID)][self.__umid.index(el)]
        else:
            return predictions
        
    def get_n_most_similar_books(self, n):
        similar_books = dict()
        for i in range(len(self.similarity_matrix)):
            for j in range(i, len(self.similarity_matrix[i])):
                if i != j:
                    similar_books[(self.__umid[i], self.__umid[j])] = self.similarity_matrix[i][j]
        
        return dict(sorted(similar_books.items(), key=lambda x: x[1], reverse=True)[:n])
    
    def similar_items(self, item, n):
        return_dict = dict()
        for i in range(len(self.similarity_matrix[self.__umid.index(item)])):
            return_dict[self.__umid[i]] = self.similarity_matrix[self.__umid.index(item)][i]
        
        return dict(sorted(return_dict.items(), key=lambda x: x[1], reverse=True)[1:n + 1])

In [25]:
md = BookData(BOOKS_FILE)
uim = UserItemData(RATINGS_FILE, min_ratings=50)
rp = ItemBasedPredictor()
rec = Recommender(rp)
rec.fit(uim)
print("Podobnost med filmoma", md.get_title(228), "in", md.get_title(2332) + ": ", rp.similarity(228, 2332))
print("Podobnost med filmoma", md.get_title(228), "in", md.get_title(1199) + ":", rp.similarity(228, 1199))
print("Podobnost med filmoma", md.get_title(228), "in", md.get_title(202) + ":", rp.similarity(228, 202))

Podobnost med filmoma Impossible Vacation in Reckless Abandon:  0.012999118515033369
Podobnost med filmoma Impossible Vacation in Martha Stuart's Better Than You at Entertaining (A Parody): 0.462016133727974
Podobnost med filmoma Impossible Vacation in What a Wonderful World: A Lifetime of Recordings: 0.21345636076006932


In [79]:
print("Predictions for 78: ")
rec_items = rec.recommend(78, n=15, rec_seen=False)
for idbook, val in rec_items.items():
    print("  -Knjiga: {}, ocena: {}".format(md.get_title(idbook), val))

Predictions for 78: 
  -Knjiga: Impossible Vacation, ocena: 9.0
  -Knjiga: The Girls' Guide to Hunting and Fishing, ocena: 9.0
  -Knjiga: The Rescue, ocena: 9.0
  -Knjiga: The Truth About Texas: Who Needs to Brag? We'Ve Got the Facts, ocena: 9.0
  -Knjiga: Martha Stuart's Better Than You at Entertaining (A Parody), ocena: 9.0
  -Knjiga: The Queen of the Damned (Vampire Chronicles (Paperback)), ocena: 9.0
  -Knjiga: The 10th Kingdom (Hallmark Entertainment Books), ocena: 8.733898279090134
  -Knjiga: Seabiscuit: An American Legend, ocena: 8.693729339753363
  -Knjiga: Past Lives, Present Dreams: How to Use Reincarnation for Personal Growth, ocena: 8.606500990016562
  -Knjiga: The Mummy or Ramses the Damned, ocena: 8.485443086871669
  -Knjiga: Deadly Decisions, ocena: 8.475534550148861
  -Knjiga: Carnal Innocence, ocena: 8.44116432327583
  -Knjiga: Harry Potter and the Chamber of Secrets (Book 2), ocena: 8.276131616877434
  -Knjiga: Life's Little Instruction Book, ocena: 8.094883303942312


In [39]:
result = rp.get_n_most_similar_books(20)
for (film1, film2), v in result.items():
    print("Knjiga1:", md.get_title(int(film1)) + ", Knjiga2:", md.get_title(int(film2)) + ", Podobnost:", v)

Knjiga1: Rules of the Wild, Knjiga2: She's Come Undone (Oprah's Book Club (Paperback)), Podobnost: 1.0000000000000002
Knjiga1: Mama Makes Up Her Mind: And Other Dangers of Southern Living, Knjiga2: Harry Potter and the Chamber of Secrets (Book 2), Podobnost: 1.0000000000000002
Knjiga1: Reckless Abandon, Knjiga2: Past Lives, Present Dreams: How to Use Reincarnation for Personal Growth, Podobnost: 1.0000000000000002
Knjiga1: Airframe, Knjiga2: Tu Nombre Escrito En El Agua (La Sonrisa Vertical), Podobnost: 1.0
Knjiga1: Airframe, Knjiga2: Martha Stuart's Better Than You at Entertaining (A Parody), Podobnost: 1.0
Knjiga1: Tu Nombre Escrito En El Agua (La Sonrisa Vertical), Knjiga2: The Mummy or Ramses the Damned, Podobnost: 1.0
Knjiga1: Tu Nombre Escrito En El Agua (La Sonrisa Vertical), Knjiga2: Imaginary Lands, Podobnost: 1.0
Knjiga1: The Deal, Knjiga2: Mansfield Park (Signet Classics (Paperback)), Podobnost: 1.0
Knjiga1: The Deal, Knjiga2: The Selfish Gene, Podobnost: 1.0
Knjiga1: The De

### Priporočanje tipa "Bralci, ki so brali A, so brali tudi B"? 

In [38]:
rec_items = rp.similar_items(1199, 10)
print('Filmi podobni "The Lord of the Rings: The Fellowship of the Ring": ')
for idbook, val in rec_items.items():
    print("Film: {}, ocena: {}".format(md.get_title(idbook), val))

Filmi podobni "The Lord of the Rings: The Fellowship of the Ring": 
Film: Jurassic Park, ocena: 1.0
Film: The Selfish Gene, ocena: 1.0
Film: Martha Stuart's Better Than You at Entertaining (A Parody), ocena: 1
Film: The Cider House Rules, ocena: 0.819017231472004
Film: Mama Makes Up Her Mind: And Other Dangers of Southern Living, ocena: 0.632855843958507
Film: Mansfield Park (Signet Classics (Paperback)), ocena: 0.6244709737980126
Film: Winterdance: The Fine Madness of Running the Iditarod, ocena: 0.5232506837254686
Film: Impossible Vacation, ocena: 0.462016133727974
Film: Seabiscuit: An American Legend, ocena: 0.3790011805627099
Film: El Senor De Los Anillos: LA Comunidad Del Anillo (Lord of the Rings (Spanish)), ocena: 0.3470071277297951


## Priporočilo zame
Dodal sem 49 ocen še zase. 

In [36]:
md = BookData(BOOKS_FILE)
uim = UserItemData('data/book_crossing/book_ratings_mine.dat', min_ratings=50)
rp = ItemBasedPredictor()
rec = Recommender(rp)
rec.fit(uim)
print("Predictions for me: ")
rec_items = rec.recommend(69350, n=10, rec_seen=False)
for idbook, val in rec_items.items():
    print("  -Knjiga: {}, ocena: {}".format(md.get_title(idbook), val))

[ 202 2566 1714 1893 2565 1440  259  228   44  407 1018   74  388  799
    8 1136 1289 2597   14  132  643  900 2156   64  504 1272 1199  642
  329 1888 1169  997   56 1315  297  281  243  594 2332  707  595]
Predictions for me: 
  -Knjiga: Desperation, ocena: 6.499747368046444
  -Knjiga: She's Come Undone (Oprah's Book Club (Paperback)), ocena: 6.31118042904847
  -Knjiga: The Selfish Gene, ocena: 5.505889775778636
  -Knjiga: Summer of Storms, ocena: 5.495625056531859
  -Knjiga: Tu Nombre Escrito En El Agua (La Sonrisa Vertical), ocena: 5.027492317557118
  -Knjiga: Coyote Waits (Joe Leaphorn/Jim Chee Novels), ocena: 5.0
  -Knjiga: The Rescue, ocena: 4.977800712582963
  -Knjiga: What a Wonderful World: A Lifetime of Recordings, ocena: 4.791546032197206
  -Knjiga: Jurassic Park, ocena: 4.742171984750613
  -Knjiga: Imaginary Lands, ocena: 4.391107815975575


## Slope one
Razred SlopeOnePredicator napoveduje ocene glede na metodo Slope one. V programu sem uporabil spodnji formuli.
$$
dev(i, j) = \frac{\sum_{u \in U_{i,j}} (R_{u,i} - R_{u,j})}{|U_{i,j}|}
$$
$$
pred(u, i) = \frac{\sum_{j \in I_u} \big((dev(i, j) + R_{u,j}) \cdot |U_{i,j}|\big)}{\sum_{j \in I_u} |U_{i,j}|}
$$
Razred ima ponovno metodi fit in predict, metoda fit prejme instanco razreda UserItemData in predict prejme userID ter rec_seen.

In [39]:
class SlopeOnePredictor:
    def __init__(self):
        self.__uid = None
        self.__dev_matrix = None
        self.__uuid = None
        self.__umid = None
        self.__matrix_ratings = None
        self.__rated_both = None
    
    def fit(self, uid):
        self.__uid = uid
        self.__uuid = sorted(list(self.__uid.data['userID'].unique()))
        self.__umid = sorted(list(self.__uid.data['bookID'].unique()))

        self.__matrix_ratings = [[None for _ in range(len(self.__umid))] for _ in range(len(self.__uuid))]
        self.__rated_both = [[None for _ in range(len(self.__umid))] for _ in range(len(self.__umid))]
        
        for (_, (uid, mid, rating)) in self.__uid.data.iterrows():
            if(mid in self.__umid and uid in self.__uuid):
                self.__matrix_ratings[self.__uuid.index(uid)][self.__umid.index(mid)] = rating
        
        self.__dev_matrix = [[0 for _ in range(len(self.__umid))] for _ in range(len(self.__umid))]
        for i in range(len(self.__umid)):
            for j in range(i, len(self.__umid)):
                if i == j:
                    self.__dev_matrix[i][j] = 0
                else:
                    dev = self.__dev(i, j)
                    self.__dev_matrix[i][j] = dev[0]
                    self.__dev_matrix[j][i] = 0 - dev[0]
                    self.__rated_both[i][j] = dev[1]
                    self.__rated_both[j][i] = dev[1]
    
    def __dev(self, i, j):
        num = 0
        den = 0

        for u in range(len(self.__uuid)):
            if(self.__matrix_ratings[u][i] and self.__matrix_ratings[u][j]):
                num += (self.__matrix_ratings[u][i] - self.__matrix_ratings[u][j])
                den += 1
        
        return ((num/den), den) if den > 0 else (0, den)
            
    
    def predict(self, userID, rec_seen=True):
        predictions = {}
        for el in range(len(self.__umid)):
            if self.__matrix_ratings[self.__uuid.index(userID)][el] == None:
                predictions[self.__umid[el]] = self.__get_prediction(userID, el)
            elif rec_seen:
                predictions[self.__umid[el]] = self.__matrix_ratings[self.__uuid.index(userID)][el]
        return predictions

    def __get_prediction(self, userID, i):
        num = 0
        dev = 0
        for j in range(len(self.__umid)):
            if self.__matrix_ratings[self.__uuid.index(userID)][j]:
                num += (self.__dev(i, j)[0] + self.__matrix_ratings[self.__uuid.index(userID)][j]) * self.__rated_both[i][j]
                dev += self.__rated_both[i][j]
        
        return num / dev if dev > 0 else 0

In [40]:
md = BookData(BOOKS_FILE)
uim = UserItemData(RATINGS_FILE, min_ratings=50)
rp = SlopeOnePredictor()
rec = Recommender(rp)
rec.fit(uim)

print("Predictions for 78: ")
rec_items = rec.recommend(78, n=15, rec_seen=False)
for idmovie, val in rec_items.items():
    print("Film: {}, ocena: {}".format(md.get_title(idmovie), val))

Predictions for 78: 
Film: Mama Makes Up Her Mind: And Other Dangers of Southern Living, ocena: 9.285714285714286
Film: Seabiscuit: An American Legend, ocena: 9.222222222222221
Film: Tycoon'S Temptation (Silhouette Desire, No. 1414), ocena: 9.1
Film: Mansfield Park (Signet Classics (Paperback)), ocena: 8.818181818181818
Film: Martha Stuart's Better Than You at Entertaining (A Parody), ocena: 8.538461538461538
Film: Flight from Big Tangle (Orca Young Reader), ocena: 8.444444444444445
Film: The Cider House Rules, ocena: 8.428571428571429
Film: Impossible Vacation, ocena: 8.4
Film: El Senor De Los Anillos: LA Comunidad Del Anillo (Lord of the Rings (Spanish)), ocena: 8.346153846153847
Film: Past Lives, Present Dreams: How to Use Reincarnation for Personal Growth, ocena: 8.294117647058824
Film: Carnal Innocence, ocena: 8.25
Film: Harry Potter and the Chamber of Secrets (Book 2), ocena: 8.2
Film: The Truth About Texas: Who Needs to Brag? We'Ve Got the Facts, ocena: 8.181818181818182
Film: S