Skip to content

jchill-git/HP-comments

Repository files navigation

HP-comments

Final project for Digital Text Analysis

Corpus

HP-Clean contains the cleaned text of the top 100 Goodreads comments for each installment of the Harry Potter series.

Code

comment-scrape uses the goodreads API and goodreads python package to scrape the comments from a book identified by its Goodreads ID.

top-books-description-scrape uses the goodreads API and goodreads python package to scrape the descriptions of the top 100 Goodreads books for every year in a given range.

clean-HP-comments is based on code by Fernando Nascimetno and was used to process and clean the Harry Potter comments text files scrapped from Goodreads.

topic-modeling-gensim is based on code by Fernando Nascimento and creates topic models from the cleaned text files.

clustering-HP-comments is a Jupyter notebook based on code by Fernando Nascimento that clusters the comments for each Harry Potter book.

About

Final project for Digital Text Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published