GitHub - kkogan/reddit_horror_nlp: NLP + topic modeling of Reddit horror stories

Design Draft

Question/need:

What can we uncover about the nature of the horror stories that we tell each other, and how has this changed over time?

Description of my sample data:

I used the pushshift.io reddit archive to retrieve every r/NoSleep post between 2010 and November 2019 — roughly 250K posts.

Techniques applied

This is an exploration of NLP and unsupervised learning techniques:

the scikit-learn, spaCy, nltk, and gensim libraries for things such as bi-gram phrases, lemmatization, stopwords, TFIDF vectorization
latent sematic analysis/dimensionality reduction using singular value decomposition, non-negative matrix factorization, and latent Dirichlet allocation. (NMF worked best for me)
a bit of practice with Flask and Tableau

Future Work

Improve the topic flask boilerplate UI and deploy it to the cloud instead of just running a local web server.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
flask-boilerplate		flask-boilerplate
resources		resources
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flask-boilerplate

flask-boilerplate

resources

resources

src

src

.gitignore

.gitignore

README.md

README.md

environment.yml

environment.yml

Repository files navigation

Design Draft

Question/need:

Description of my sample data:

Techniques applied

Future Work

About

Releases

Packages

Languages

kkogan/reddit_horror_nlp

Folders and files

Latest commit

History

Repository files navigation

Design Draft

Question/need:

Description of my sample data:

Techniques applied

Future Work

About

Resources

Stars

Watchers

Forks

Languages