Skip to content

slazien/hackernews_nlp

Repository files navigation

hackernews_nlp

NLP on Hacker News data (work in progress)

This project aims to do large-scale data analysis of textual and graph data consisting of posts and comments on Hackernews.

Current status:

  1. All posts and comments since the beginning of HN until July 2021 are included in a PostgreSQL database
  2. Some basic data exploration code is included in a Jupyter notebook

To do:

  1. Spectral clustering of users based on their post/comment content
  2. HN-style natural language generation
  3. Automatic post tagging based on Latent Dirichlet Allocation
  4. Entity extraction

About

NLP on Hacker News data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages