This repository contains the code and explanations for our Personalization Theory class project, in which we built book recommender systems using a dataset of the 10,000 most popular books on Goodreads. The final web application can be viewed at what-should-i-read-next.herokuapp.com
The file structure is as follows:
ratings.csv: contains the ratings data from Goodreads used for this project
ratings_us.csv: same as ratings.csv, with additional ratings for the four of us
requirements.txt: packages needed to run code in this repository
part1/: folder containing all work for part 1 of the project
all_results.csv: runtime, accuracy, and coverage for each model
baseline_algorithm.ipynb: implementation and results of baseline models
data_exploration_goodreads.ipynb: preliminary data exploration and plots
matrix-fact-manual-regularized.ipynb: implementation and results of Matrix Factorization
neighborhood-models.ipynb: implementation and results of neighborhood-based models
plot-model-comparisons.ipynb: plots comparing runtime, accuracy, and coverage of models
report.pdf: writeup of part 1
surprise_SVD_NMF.ipynb: implementation and results of SVD and NMF via SurPRISE
part2/: folder containing all work for part 2 of the project
FastFM_Convergence.ipynb: implementation of FastFM library for comparison
Grid_Search_LibFM.ipynb: implementation of LibFM library and grid search for parameter tuning
LibFM_Accuracy_Results.ipynb: Analysis of LIbFM grid search results
book_tags.csv: counts of user-defined tags for each book
books.csv: book metadata
books_meta_info.csv: book metadata with genre tag counts
books_with_summaries.p: NLP features
genre_recommender_final.ipynb: genre recommendation model
genres.csv: list of genres
get_genres.ipynb: code to match tags to genres
make_FM_features.ipynb: code to generate model features
report.pdf: writeup of part 2
summaries.csv: book summaries
summaries_nlp.ipynb: code to collect and process book summaries
tags.csv: tag ID-label mapping
dash-app/: repository for heroku web application
Procfile: file for heroku
app.py: web app code
books.csv: book metadata
genre_diversity_recs.csv: pre-computed diversity recommendations for each user
model_features_genres_only.npz: model features for predictions
model_genres_only: model file for predictions
predict.py: code to make predictions with user input
ratings_us.csv: ratings data plus our ratings
requirements.txt: package requirements for web app