Final project for Digital Text Analysis
HP-Clean contains the cleaned text of the top 100 Goodreads comments for each installment of the Harry Potter series.
comment-scrape uses the goodreads API and goodreads python package to scrape the comments from a book identified by its Goodreads ID.
top-books-description-scrape uses the goodreads API and goodreads python package to scrape the descriptions of the top 100 Goodreads books for every year in a given range.
clean-HP-comments is based on code by Fernando Nascimetno and was used to process and clean the Harry Potter comments text files scrapped from Goodreads.
topic-modeling-gensim is based on code by Fernando Nascimento and creates topic models from the cleaned text files.
clustering-HP-comments is a Jupyter notebook based on code by Fernando Nascimento that clusters the comments for each Harry Potter book.