Display of statistics from data scraped from Hacker News
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
__tests__
src
.babelrc
.gitignore
LICENSE
README.md
gulpfile.js
package-lock.json
package.json
setupFile.js

README.md

Hacker News Stats

Hacker News Stats

Hacker News Stats is a display of statistics from data scraped from Y Combinator's news site, Hacker News. I use Requests, Beautiful Soup, and asyncio libraries to acquire and parse the first three pages of posts from the main Hacker News feed asynchronously. I then store the data in an Amazon RDS instance of PostgreSQL and serve it through a series of API endpoints that return statistics based on time period (e.g., /api/hacker_news/stats/hour/average_comment_count returns the average comment count for posts in the past hour, /api/hacker_news/stats/week/top_website returns the most common websites that articles were posted from). The Stats page displays various Highcharts visualizations of the scraped data, including a pie chart that shows a breakdown of the different types of posts, a word cloud of the most common words used in post comments (excluding stop words), and a bubble chart of the top five users who posted the most comments (with each bubble's width reflecting their total words used). Buttons at the top of the Stats page allow the user to toggle between different time periods of data (e.g., past hour, past day, past week).

Setup

  1. Clone this repository locally or on your server.
  2. Go to the project root directory, and install the required dependencies by running npm install.
  3. Compile the required stylesheets and scripts by running npm run build.