# TER 
## Système de recommendation de livres basé sur le contenu

## Problématique

Comment concevoir un système de recommandation de livres efficace, capable d'exploiter le contenu textuel (descriptions, résumés, avis utilisateurs) pour proposer des ouvrages similaires à ceux appréciés par un lecteur, sans dépendre de l'historique global des utilisateurs ?

## Objectif

Prédire les livres qu'un utilisateur pourrait aimer en fonction de ses préférences (recommandation collaborative) et des similarités avec d'autres livres (recommandation basée sur le contenu)

## Exploration des données

## Prétraitement

- Nettoyer les données (dropna, enlever les stop words (nltk ou sklearn)...)
- Eviter la casse
- Tokenisation (simpleprocess de gensim)

## Partie basée sur le contenu

### Modèle

- Utilisation de Word2Vec (gensim models)
(pas sûr mais peut-être passer à BERT pour du deep learning) (TF-IDF)
- Entrainement du modèle sur les données

### Système de recommendation

- Calculer la similarité (cosine silimarity ou retourner les vecteurs les plus proches)

## Partie recommendation collaborative

### Modèle

- SVD (Surprise), Matrix Factorization-based algorithms

### Système de recommendation 

- Prendre n premiers éléments renvoyés dans la prédiction du modèle

## Partie recommendation hybride

- Combiner les deux scores pour recommender un livre (addition avec un poids alpha qui pourrait servir à contrôler l'influence du type de recommendation)

- Renvoie les n meilleures recommendations

## Interface graphique

- Streamlit/Flask

# A reflechir

### Apprentissage supervisé

- classes basées sur le genre -> filtrer lors de la recommendation en focntion de celui-ci

In [1]:
import pandas as pd
import numpy as np
from surprise import Dataset, Reader, SVD

In [2]:
path = "./datasets/"

books = pd.read_csv(path+"Books.csv")
users= pd.read_csv(path+"Users.csv")
ratings = pd.read_csv(path+"Ratings.csv")

  books = pd.read_csv(path+"Books.csv")


In [8]:
from IPython.display import display, HTML

def show_image(url, width=100):
    return f'<img src="{url}" width="{width}">'


In [9]:
books.head().style.format({'Image-URL-S':show_image})

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,,http://images.amazon.com/images/P/0195153448.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0195153448.01.LZZZZZZZ.jpg
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,,http://images.amazon.com/images/P/0002005018.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0002005018.01.LZZZZZZZ.jpg
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,,http://images.amazon.com/images/P/0060973129.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0060973129.01.LZZZZZZZ.jpg
3,374157065,Flu: The Story of the Great Influenza Pandemic of 1918 and the Search for the Virus That Caused It,Gina Bari Kolata,1999,Farrar Straus Giroux,,http://images.amazon.com/images/P/0374157065.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0374157065.01.LZZZZZZZ.jpg
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton & Company,,http://images.amazon.com/images/P/0393045218.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/0393045218.01.LZZZZZZZ.jpg


In [None]:
users.head()

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [6]:
ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [12]:
new_books=books
new_books["Book-Title"]=new_books["Book-Title"].str.lower()
new_books["Book-Author"]=new_books["Book-Author"].str.lower()
new_books["Publisher"]=new_books["Publisher"].str.lower()
new_books

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,0195153448,classical mythology,mark p. o. morford,2002,oxford university press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,0002005018,clara callan,richard bruce wright,2001,harperflamingo canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,0060973129,decision in normandy,carlo d'este,1991,harperperennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,0374157065,flu: the story of the great influenza pandemic...,gina bari kolata,1999,farrar straus giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,0393045218,the mummies of urumchi,e. j. w. barber,1999,w. w. norton &amp; company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...
...,...,...,...,...,...,...,...,...
271355,0440400988,there's a bat in bunk five,paula danziger,1988,random house childrens pub (mm),http://images.amazon.com/images/P/0440400988.0...,http://images.amazon.com/images/P/0440400988.0...,http://images.amazon.com/images/P/0440400988.0...
271356,0525447644,from one to one hundred,teri sloat,1991,dutton books,http://images.amazon.com/images/P/0525447644.0...,http://images.amazon.com/images/P/0525447644.0...,http://images.amazon.com/images/P/0525447644.0...
271357,006008667X,lily dale : the true story of the town that ta...,christine wicker,2004,harpersanfrancisco,http://images.amazon.com/images/P/006008667X.0...,http://images.amazon.com/images/P/006008667X.0...,http://images.amazon.com/images/P/006008667X.0...
271358,0192126040,republic (world's classics),plato,1996,oxford university press,http://images.amazon.com/images/P/0192126040.0...,http://images.amazon.com/images/P/0192126040.0...,http://images.amazon.com/images/P/0192126040.0...


In [4]:
reader = Reader(rating_scale=(0, 10))
data=Dataset.load_from_df(ratings[["User-ID", "ISBN", "Book-Rating"]], reader)

In [9]:
from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer()

In [5]:
#Recommendation collaborative

svd=SVD()
train_set=data.build_full_trainset()
svd.fit(train_set)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1fb129167a0>

In [6]:
# Fonction de recommandation
def reco_collab(user_id, n):
    # Construire les éléments non encore notés par l'utilisateur
    anti_testset = train_set.build_anti_testset()
    
    # Filtrer pour ne garder que les éléments pour cet utilisateur
    user_testset = [entry for entry in anti_testset if entry[0] == user_id]
    print(user_testset)
    
    # Prédire les notes
    predictions = svd.test(user_testset)
    
    # Trier par estimation décroissante
    predictions.sort(key=lambda x: x.est, reverse=True)
    
    # Récupérer les n meilleurs items
    reco = [pred.iid for pred in predictions[:n]]
    return reco

In [None]:
def reco_collab(user_id, n):
    user_items = set(j for (j, _) in train_set.ur[inner_uid])

    anti_testset_user = [
        (user_id, train_set.to_raw_iid(i), 0)  # 0 est une note fictive
        for i in train_set.all_items()
        if i not in user_items
    ]

    predictions = svd.test(anti_testset_user)
    
    predictions.sort(key=lambda x: x.est, reverse=True)

    reco = [pred.iid for pred in predictions[:n]]
    return reco


In [13]:
reco_collab(276729, 5)

['1844262553', '0615116426', '8826703132', '0439064864', '0394800389']

In [5]:
len(["Blasphemous", "Blasphemous 2"])

2