==================================
Content-basedfiltering using TF-IDF over title/author/publisher.
Optionalcollaborative filtering using implicit ALS over Name→Id–aligned ratings.
● BX-Books.csv: Id, Name, Authors, Publisher,ISBN, … (comma-delimited).
● BX-Book-Ratings.csv (optional): ID, Name,Rating with text labels like “it was amazing”, “really liked it”, “liked it”,“it was ok”, “did not like it”.
-
requirements.txt:
-
pandas numpy scipy scikit-learn implicit
-
Install:
-
pip install -rrequirements.txt
recommender/ ├── data/ | ├── BX-Books.csv│ | └── BX-Book-Ratings.csv |── recommender.py ├── main.py ├── requirements.txt └── README.md
- Build TF-IDF from Name +Authors + Publisher, compute cosine similarity, return nearest books for agiven query; supports substring + light fuzzy fallback.
- Convert text ratings tonumeric strengths, align by mapping ratings Name→books Id with normalizedtitles, build a sparse user–item matrix, and train implicit ALS when overlapexists.
- Blend normalized scores:hybrid = α · content + (1 − α) · CF.
-
python main.py
-
Type a title fragment(e.g., “Harry”, “Programming”, “HTML”) to see content-based results; CF results appear if enough ratings map to valid book Ids for user 'u_demo'.
from recommender import BookRecommender
rec = BookRecommender(
books_csv="data/BX-Books.csv",
books_sep=",",
encoding="utf-8",
ratings_csv="data/BX-Book-Ratings.csv", # set to None to skip CF
ratings_sep=",",
id_col_books="Id",
title_col="Name",
author_col="Authors",
publisher_col="Publisher",
id_col_ratings_item="ID",
id_col_ratings_user="user_id",
rating_col="Rating",
)
rec.fit()
cb = rec.recommend_like_title("Dune", top_k=10) # content-based
cf = rec.recommend_for_user("u_demo", top_k=10) # collaborative (if trained)
● If CF is empty, ensure many ratings rows mapto books via Name→Id; the loader already normalizes titles to improve overlap.
● To run content-only, set ratings_csv=None.
● BLAS threads are limited to 1 in code toavoid OpenBLAS oversubscription warnings on Windows.