HP-comments

Final project for Digital Text Analysis

Corpus

HP-Clean contains the cleaned text of the top 100 Goodreads comments for each installment of the Harry Potter series.

Code

comment-scrape uses the goodreads API and goodreads python package to scrape the comments from a book identified by its Goodreads ID.

top-books-description-scrape uses the goodreads API and goodreads python package to scrape the descriptions of the top 100 Goodreads books for every year in a given range.

clean-HP-comments is based on code by Fernando Nascimetno and was used to process and clean the Harry Potter comments text files scrapped from Goodreads.

topic-modeling-gensim is based on code by Fernando Nascimento and creates topic models from the cleaned text files.

clustering-HP-comments is a Jupyter notebook based on code by Fernando Nascimento that clusters the comments for each Harry Potter book.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
HP_clean		HP_clean
README.md		README.md
clean-HP-comments		clean-HP-comments
clustering-HP-comments.ipynb		clustering-HP-comments.ipynb
comment-scrape		comment-scrape
top-books-description-scrape		top-books-description-scrape
topic-modeling-gensim		topic-modeling-gensim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HP-comments

Corpus

Code

About

Uh oh!

Releases

Packages

Languages

jchill-git/HP-comments

Folders and files

Latest commit

History

Repository files navigation

HP-comments

Corpus

Code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages