This project was created for educational purposes to get understanding of how recommendation systems work. To practice and understand aspects of different approaches, several models were built to provide personalized book recommendations.
First, analysis of Book-Crossing and Goodreads data was implemented, which showed that these datasets contained a lot of missing and wrong information. Thus, only rating information was saved, and the corresponding book and author data were downloaded from Penguin Random House. The downloaded raw data were parsed and preprocessed for further use in building recommenders.
The practice started from collaborative filtering approaches. The following types of collaborative algorithms were implemented and described in detail:
- memory-based item-item and user-user algorithms
- matrix factorization approach based on singular value decomposition
- matrix factorization approach using gradient descent with or without user and item biases.
These approaches contain a lot of hyperparameters that should be tuned. So, Optuna was used for the optimization of hyperparameters, neptune.ai — for monitoring. The results of tuning can be found in this dashboard.
Next, a brief explanation of content-based algorithms and their comparison with collaborative filtering were provided, but their implementation is a plan for the future.