Skip to content

Periodically polls Hacker News top stories, applies Suckless Filters and publishes html front page to S3 bucket.

Notifications You must be signed in to change notification settings

porkbrain/suckless.hn

Repository files navigation

sucklesshn.porkbrain.com

This section is about motivation behind the project. See tech stack.

Some stories on HN are frustrating and time consuming for dubious value. I believe there are other people who would also like to see less of certain type of content, hence suckless hn.

  • Why doesn't this instead exist as an app into which I login as an HN user and it hides stories on my behalf?

    As a user I wouldn't login into a 3rd party app. As a developer I don't want to manage user credentials.

  • Can I have custom filters configurable from a UI?

    Out of scope. Create an issue or submit a PR if there's a filter you wish to use.

  • Will you change a filter I use without my knowledge?

    I am reluctant to change the logic of a filter once it's published. However if it absolutely needs to happen, you'll be informed by a short update notice at the bottom of the page.

  • Why not ML?

    I prefer a set of transparent and readable rules to decide what I don't see. Plus that's easier.

Suckless filters

A filter is given a story data and flags the story if it passes the filter. Feel free to create an issue for any missing but useful filter.

Each filter has a two landing pages. One with only stories which were flagged, one with anything but. This is decided by two modifiers: + and -. For example to only see stories from large newspapers visit https://sucklesshn.porkbrain.com/+bignews. To get HN without large newspapers visit https://sucklesshn.porkbrain.com/-bignews.

There are also groups of filters. For example https://sucklesshn.porkbrain.com/-amfg-bignews filters out large newspapers and all mentions of big tech. This also happens to be the default view on the homepage. - modifier in a group is conjunctive, i.e. only stories which didn't pass any of the filters are shown. + modifier is disjunctive, i.e. stories which passed any of the filters are shown. For example sucklesshn.porkbrain.com/+askhn+showhn shows "Show HN" or "Ask HN" stories.

List

List of implemented filters:

  • +askhn/-askhn flags "Ask HN" titles

  • +showhn/-showhn flags "Show HN" titles

  • +bignews/-bignews flags urls from large news sites Bloomberg, VICE, The Guardian, WSJ, CNBC, BBC, Forbes, Spectator, LA Times, The Hill and NY Times. More large news may be added later. Any general news website which has ~60 submissions (2 pages) in the past year falls into this category. HN search query: https://hn.algolia.com/?dateRange=pastYear&page=2&prefix=true&sort=byPopularity&type=story&query=${DOMAIN}.

  • +amfg/-amfg flags titles which mention "Google", "Facebook", "Apple" or "Microsoft". No more endless Google-bashing comment binging at 3 AM. Most of the time the submissions are scandalous and comment sections low entropy but addictive.

  • special +all front page which includes all HN top stories

List of filter groups:

Filters in a group are alphabetically sorted ASC.

Design

The binary is executed periodically (~ 30 min). Each generated page is an S3 object, therefore we don't need to provision a web server.

sqlite database stores ids of top HN posts that are already downloaded + some other data (timestamp of insertion, submission title, url, which filters it passed).

The endpoint to query top stories on HN is https://hacker-news.firebaseio.com/v0/topstories.json. We download stories which we haven't checked before. The data about a story is available via item endpoint.

We check each new story against Suckless filters before inserting it into the database table stories. The flags for each filter are persisted in story_filters table.

Final step is generating a new html for the sucklesshn.porkbrain.com front pages and uploading it into an S3 bucket. The S3 bucket is behind Cloudfront distribution to which the sucklesshn.porkbrain.com DNS zone records point. We set up different combinations of filters and upload those combinations as different S3 objects. The objects are all of Content-type: text/html, however they don't have .html extension.

Rate limiting

We handle rate limiting by simply skipping submission. Since we poll missing stories periodically, they will be fetched eventually.

We don't need to check all top stories. We can slice the top stories endpoint and only download first ~ 50 entries.

Wayback machine has some kind of rate limiting which fails concurrent requests. We run wayback machine GET requests sequentially.

Wayback machine

We leverage wayback machine APIs to provide users link to the latest archived snapshot at the time of the submission.

Please donate to keep Wayback machine awesome.

Build

I run the binary on my k8s homelab cluster as a cron job. Originally, this ran as a cron job on my raspberry pi 4, which is now a node in the cluster. I still build this project for ARM. See the k8s directory for more docs about how this project runs in the cluster.

I use a build script to build and test this project. First, you'll need to install cross:

cargo install --git https://github.com/anupdhml/cross.git --branch master

We use custom image for compilation to support [OpenSSL][cross-opensll].

Next, either use the build script or directly compile for armv7-unknown-linux-gnueabihf:

cross build --target armv7-unknown-linux-gnueabihf --release

Locally I build the docker image with the binary and push it to the Docker hub. That's where my k8s cluster pulls it from.

Env

See the .env.example file for environment variable the binary expects.

About

Periodically polls Hacker News top stories, applies Suckless Filters and publishes html front page to S3 bucket.

Topics

Resources

Stars

Watchers

Forks