# Système de recommandation de livres basé sur le contenu

# Cahier des charges

Projet : Système de recommandation de livres

Objectif : 
- Prédire les livres qu'un utilisateur pourrait aimer en fonction de ses préférences (recommandation collaborative) et des similarités avec d'autres livres (recommandation basée sur le contenu)

Techniques Utilisées :
- Filtrage collaboratif (basé-mémoire/basé-modèle)
- Recommandation basée sur le contenu (TF-IDF, embeddings : Word2Vec, GloVe, etc)
- Modèles : KNN, SVD, NMF, etc

Étapes :

Dataset
- Utiliser : https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset
- Tester d’autres datasets de recommandation de livres

Prétraitement
- Nettoyage des données (valeurs manquantes, doublons)
- Feature engineering (genres, auteurs, notes moyennes)
- Vectorisation des textes (titres, descriptions) avec TF-IDF, Word2Vec, GloVe, etc

In [50]:
import pandas as pd
import numpy as np
from surprise import Dataset, Reader, SVD

In [91]:
path = "./datasets/"

books = pd.read_csv(path+"Books.csv")
users= pd.read_csv(path+"Users.csv")
ratings = pd.read_csv(path+"Ratings.csv")

In [105]:
books.rename(columns={'Book-Title':'title', 'Book-Author': 'author'}, inplace=True)
books.head()

Unnamed: 0,ISBN,title,author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


In [53]:
def show_image(url, width=100):
    """ Affiche les images du dataset"""
    return f'<img src="{url}" width="{width}">'

def url_to_img(df, func):
    return df.style.format({'Image-URL-S':func, 'Image-URL-M':func, 'Image-URL-L':func}, escape=False)

In [54]:
mask=np.where(books.isna())
books_nan=books.iloc[mask[0]]

books_nan

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
118033,0751352497,A+ Quiz Masters:01 Earth,,1999,Dorling Kindersley,http://images.amazon.com/images/P/0751352497.0...,http://images.amazon.com/images/P/0751352497.0...,http://images.amazon.com/images/P/0751352497.0...
128890,193169656X,Tyrant Moon,Elaine Corvidae,2002,,http://images.amazon.com/images/P/193169656X.0...,http://images.amazon.com/images/P/193169656X.0...,http://images.amazon.com/images/P/193169656X.0...
129037,1931696993,Finders Keepers,Linnea Sinclair,2001,,http://images.amazon.com/images/P/1931696993.0...,http://images.amazon.com/images/P/1931696993.0...,http://images.amazon.com/images/P/1931696993.0...
187689,9627982032,The Credit Suisse Guide to Managing Your Perso...,,1995,Edinburgh Financial Publishing,http://images.amazon.com/images/P/9627982032.0...,http://images.amazon.com/images/P/9627982032.0...,http://images.amazon.com/images/P/9627982032.0...


In [55]:
books.fillna('Unknown', inplace=True)

In [56]:
books.iloc[mask[0]]

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
118033,0751352497,A+ Quiz Masters:01 Earth,Unknown,1999,Dorling Kindersley,http://images.amazon.com/images/P/0751352497.0...,http://images.amazon.com/images/P/0751352497.0...,http://images.amazon.com/images/P/0751352497.0...
128890,193169656X,Tyrant Moon,Elaine Corvidae,2002,Unknown,http://images.amazon.com/images/P/193169656X.0...,http://images.amazon.com/images/P/193169656X.0...,http://images.amazon.com/images/P/193169656X.0...
129037,1931696993,Finders Keepers,Linnea Sinclair,2001,Unknown,http://images.amazon.com/images/P/1931696993.0...,http://images.amazon.com/images/P/1931696993.0...,http://images.amazon.com/images/P/1931696993.0...
187689,9627982032,The Credit Suisse Guide to Managing Your Perso...,Unknown,1995,Edinburgh Financial Publishing,http://images.amazon.com/images/P/9627982032.0...,http://images.amazon.com/images/P/9627982032.0...,http://images.amazon.com/images/P/9627982032.0...


lignes a verif :

- 118033
- 128890
- 129037
- 187689
- 209540
- 220731
- 221678

In [57]:
books_nan.style.format({'Image-URL-S':show_image, 'Image-URL-M':show_image, 'Image-URL-L':show_image})

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
118033,0751352497,A+ Quiz Masters:01 Earth,,1999,Dorling Kindersley,,,
128890,193169656X,Tyrant Moon,Elaine Corvidae,2002,,,,
129037,1931696993,Finders Keepers,Linnea Sinclair,2001,,,,
187689,9627982032,The Credit Suisse Guide to Managing Your Personal Wealth,,1995,Edinburgh Financial Publishing,,,


In [58]:
books.duplicated().sum()

0

In [107]:
ratings.rename(columns={'Book-Rating':'rating'}, inplace=True)
ratings.head()

Unnamed: 0,User-ID,ISBN,rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [108]:
ratings.isna().sum()

User-ID    0
ISBN       0
rating     0
dtype: int64

In [109]:
ratings.duplicated().sum()

0

In [110]:
ratings.head()

Unnamed: 0,User-ID,ISBN,rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [111]:
ratings['User-ID'].value_counts()

User-ID
11676     13602
198711     7550
153662     6109
98391      5891
35859      5850
          ...  
116180        1
116166        1
116154        1
116137        1
276723        1
Name: count, Length: 105283, dtype: int64

In [79]:
X=ratings['User-ID'].value_counts() > 50
X

User-ID
11676      True
198711     True
153662     True
98391      True
35859      True
          ...  
116180    False
116166    False
116154    False
116137    False
276723    False
Name: count, Length: 105283, dtype: bool

In [80]:
X[X].shape

(3371,)

In [81]:
Y=X[X].index
Y

Index([ 11676, 198711, 153662,  98391,  35859, 212898, 278418,  76352, 110973,
       235105,
       ...
       195079, 239486,  63506, 220911, 187613,  90105, 109040, 137803,  83034,
       190939],
      dtype='int64', name='User-ID', length=3371)

In [82]:
ratings_test=ratings[ratings['User-ID'].isin(Y)]
ratings_test.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
173,276847,446364193,0
174,276847,3257200552,5
175,276847,3379015180,0
176,276847,3404145909,8
177,276847,3404148576,8


In [83]:
ratings_books=ratings.merge(books, on='ISBN')
ratings_books.head()

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
1,276726,0155061224,5,Rites of Passage,Judith Rae,2001,Heinle,http://images.amazon.com/images/P/0155061224.0...,http://images.amazon.com/images/P/0155061224.0...,http://images.amazon.com/images/P/0155061224.0...
2,276727,0446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
3,276729,052165615X,3,Help!: Level 1,Philip Prowse,1999,Cambridge University Press,http://images.amazon.com/images/P/052165615X.0...,http://images.amazon.com/images/P/052165615X.0...,http://images.amazon.com/images/P/052165615X.0...
4,276729,0521795028,6,The Amsterdam Connection : Level 4 (Cambridge ...,Sue Leather,2001,Cambridge University Press,http://images.amazon.com/images/P/0521795028.0...,http://images.amazon.com/images/P/0521795028.0...,http://images.amazon.com/images/P/0521795028.0...


In [118]:
book_rating_stats = ratings.groupby('ISBN')['rating'].mean().reset_index()
book_rating_stats

Unnamed: 0,ISBN,rating
0,0330299891,3.0
1,0375404120,1.5
2,0586045007,0.0
3,9022906116,3.5
4,9032803328,0.0
...,...,...
340551,cn113107,0.0
340552,ooo7156103,7.0
340553,§423350229,0.0
340554,´3499128624,8.0


In [119]:
books_w_ratings=book_rating_stats.merge(books, on='ISBN')
books_w_ratings.head()

Unnamed: 0,ISBN,rating,title,author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,0000913154,8.0,The Way Things Work: An Illustrated Encycloped...,C. van Amerongen (translator),1967,Simon &amp; Schuster,http://images.amazon.com/images/P/0000913154.0...,http://images.amazon.com/images/P/0000913154.0...,http://images.amazon.com/images/P/0000913154.0...
1,0001010565,0.0,Mog's Christmas,Judith Kerr,1992,Collins,http://images.amazon.com/images/P/0001010565.0...,http://images.amazon.com/images/P/0001010565.0...,http://images.amazon.com/images/P/0001010565.0...
2,0001046438,9.0,Liar,Stephen Fry,0,Harpercollins Uk,http://images.amazon.com/images/P/0001046438.0...,http://images.amazon.com/images/P/0001046438.0...,http://images.amazon.com/images/P/0001046438.0...
3,0001046713,0.0,Twopence to Cross the Mersey,Helen Forrester,1992,HarperCollins Publishers,http://images.amazon.com/images/P/0001046713.0...,http://images.amazon.com/images/P/0001046713.0...,http://images.amazon.com/images/P/0001046713.0...
4,000104687X,6.0,"T.S. Eliot Reading \The Wasteland\"" and Other ...",T.S. Eliot,1993,HarperCollins Publishers,http://images.amazon.com/images/P/000104687X.0...,http://images.amazon.com/images/P/000104687X.0...,http://images.amazon.com/images/P/000104687X.0...


In [85]:
number_ratings=ratings_books.groupby('Book-Title')['Book-Rating'].count().reset_index()
number_ratings.head()

Unnamed: 0,Book-Title,Book-Rating
0,A Light in the Storm: The Civil War Diary of ...,4
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,"Ask Lily (Young Women of Faith: Lily Series, ...",1
4,Beyond IBM: Leadership Marketing and Finance ...,1


In [87]:
number_ratings.rename(columns={'Book-Rating': 'num_of_ratings'}, inplace=True)
number_ratings.head()

Unnamed: 0,Book-Title,num_of_ratings
0,A Light in the Storm: The Civil War Diary of ...,4
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,"Ask Lily (Young Women of Faith: Lily Series, ...",1
4,Beyond IBM: Leadership Marketing and Finance ...,1


In [88]:
ratings_final=ratings_books.merge(number_ratings, on='Book-Title')
ratings_final.head()

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L,num_of_ratings
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,60
1,276726,0155061224,5,Rites of Passage,Judith Rae,2001,Heinle,http://images.amazon.com/images/P/0155061224.0...,http://images.amazon.com/images/P/0155061224.0...,http://images.amazon.com/images/P/0155061224.0...,14
2,276727,0446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,650
3,276729,052165615X,3,Help!: Level 1,Philip Prowse,1999,Cambridge University Press,http://images.amazon.com/images/P/052165615X.0...,http://images.amazon.com/images/P/052165615X.0...,http://images.amazon.com/images/P/052165615X.0...,1
4,276729,0521795028,6,The Amsterdam Connection : Level 4 (Cambridge ...,Sue Leather,2001,Cambridge University Press,http://images.amazon.com/images/P/0521795028.0...,http://images.amazon.com/images/P/0521795028.0...,http://images.amazon.com/images/P/0521795028.0...,1


In [62]:
users.isna().sum()

User-ID          0
Location         0
Age         110762
dtype: int64

In [63]:
users.duplicated().sum()

0

In [None]:
books_author=books.groupby('author').size().reset_index()
mask=books_author[0]>2

0         False
1         False
2         False
3         False
4         False
          ...  
102015    False
102016     True
102017    False
102018    False
102019    False
Name: 0, Length: 102020, dtype: bool

In [135]:
mask

(array([118033, 128890, 129037, 187689], dtype=int64),
 array([2, 4, 4, 2], dtype=int64))

In [49]:
#tests avec api google pour enrichissement

import requests
import time

def get_book_info(isbn, api_key=None):
    base_url = 'https://www.googleapis.com/books/v1/volumes'
    params = {
        'q': f'isbn:{isbn}',
    }
    if api_key:
        params['key'] = api_key

    response = requests.get(base_url, params=params)
    if response.status_code != 200:
        return None

    data = response.json()
    items = data.get('items')
    if not items:
            return {
                'title': None,
                'authors': None,
                'categories': None,
                'description': None,
                'publishedDate': None
            }

    volume_info = items[0]['volumeInfo']
    return {
        'Book-Title': volume_info.get('title'),
        'Book-Author': volume_info.get('authors'),
        'categories': volume_info.get('categories'),  # ← genres ici
        'description': volume_info.get('description'),
        'publishedDate': volume_info.get('publishedDate')
    }

# Exemple : enrichir les 10 premiers livres
books_df = pd.read_csv(path+'Books.csv')
enriched_data = []

for idx, row in books_df.head(10).iterrows():
    isbn = row['ISBN']
    info = get_book_info(isbn)  # ← mets ta clé ici
    enriched_data.append(info)
    time.sleep(0.1)  # pour éviter de se faire bloquer

enriched_df = pd.DataFrame(enriched_data)
books_df_enriched = pd.concat([books_df.head(10).reset_index(drop=True), enriched_df], axis=1)


In [48]:
books_df_enriched

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L,Book-Title.1,Book-Author.1,categories,description,publishedDate,title,authors
0,0195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,Classical Mythology,"[Mark P. O. Morford, Robert J. Lenardon]",[Social Science],Provides an introduction to classical myths pl...,2003,,
1,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,Clara Callan,[Richard Bruce Wright],[Actresses],"In a small town in Canada, Clara Callan reluct...",2001,,
2,0060973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,Decision in Normandy,[Carlo D'Este],[History],"Here, for the first time in paperback, is an o...",1991,,
3,0374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,Flu,[Gina Bari Kolata],[Medical],"""Scientists have recently discovered shards of...",1999,,
4,0393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,The Mummies of Ürümchi,[E. J. W. Barber],[Design],A look at the incredibly well-preserved ancien...,1999,,
5,0399135782,The Kitchen God's Wife,Amy Tan,1991,Putnam Pub Group,http://images.amazon.com/images/P/0399135782.0...,http://images.amazon.com/images/P/0399135782.0...,http://images.amazon.com/images/P/0399135782.0...,The Kitchen God's Wife,[Amy Tan],[Fiction],An absorbing narrative of Winnie Louie's life.,1991,,
6,0425176428,What If?: The World's Foremost Military Histor...,Robert Cowley,2000,Berkley Publishing Group,http://images.amazon.com/images/P/0425176428.0...,http://images.amazon.com/images/P/0425176428.0...,http://images.amazon.com/images/P/0425176428.0...,What If?,[Robert Cowley],[History],With its in-depth reflections on the monumenta...,2000-09-01,,
7,0671870432,PLEADING GUILTY,Scott Turow,1993,Audioworks,http://images.amazon.com/images/P/0671870432.0...,http://images.amazon.com/images/P/0671870432.0...,http://images.amazon.com/images/P/0671870432.0...,,,,,,,
8,0679425608,Under the Black Flag: The Romance and the Real...,David Cordingly,1996,Random House,http://images.amazon.com/images/P/0679425608.0...,http://images.amazon.com/images/P/0679425608.0...,http://images.amazon.com/images/P/0679425608.0...,Under the Black Flag,[David Cordingly],[Fiction],"For this rousing, revisionist history, the for...",1995,,
9,074322678X,Where You'll Find Me: And Other Stories,Ann Beattie,2002,Scribner,http://images.amazon.com/images/P/074322678X.0...,http://images.amazon.com/images/P/074322678X.0...,http://images.amazon.com/images/P/074322678X.0...,Where You'll Find Me,[Ann Beattie],[Fiction],"Now back in print, Ann Beattie's finest short ...",2002,,
