Skip to content


Repository files navigation


Extract just the content from a web page.

Extract is a wrapper to turn the Mercury Parser into a web service.


Mercury already offers an API component, meant to be deployed to AWS Lambda. There are a few reasons why this exists as an alternative.

  1. Deploy elsewhere. Extract is a vanilla Node.js app, that is meant to run in a VM, and has no platform specific dependencies.

  2. Built-in authorization system.

  3. Performance. In my experience, running it on a VM has been faster than the lambda version.

Here's a graph where you can see a decrease in average response time around the 17. Feb mark. This is when Feedbin switched from the lambda hosted version, to extract running on a VPS.

Response Time


  1. Install Node.js and npm.

  2. Clone extract

    git clone
  3. Install the dependencies.

    cd extract
    npm install
  4. Run the server

    node app/server.js

    Alternatively, extract includes an ecosystem.config.js to use with pm2. You could use this in production.

    npm install --global pm2
    pm2 start ecosystem.config.js


Extract has a simple, file-based system for creating users and secret keys. This allows users to be added/removed while the system is running. In the ./users directory, the filename is the username and the contents is the secret key. To make a new user, run the following:

cd extract
mkdir users

# use your own secret key and username
echo "SECRET_KEY" > users/USERNAME

Once a username and password has been created, you can make a request.

An example request looks like:


The parts that you need are:

  • username your username
  • signature the hexadecimal HMAC-SHA1 signature of the URL you want to parse
  • base64_url base64 encoded version of the URL you want to parse

The URL is base64-encoded to avoid any issues in the way different systems encode URLs. It must use the RFC 4648 url-safe variant with no newlines.

If your platform does not offer a URL safe base64 option, you can replicate it. First create the base64 encoded string. Then replace the following characters:

  • + => -
  • / => _
  • \n => ""

Here's a sample implementation in ruby. You can use this as a reference for matching your implementation.

require "uri"
require "openssl"
require "base64"

username = "username"
secret = "secret"
host = "localhost"
port = 3000
url = ""

digest ="sha1")
signature = OpenSSL::HMAC.hexdigest(digest, secret, url)

base64_url = Base64.urlsafe_encode64(url).gsub("\n", ""){
  host: host,
  port: port,
  path: "/parser/#{username}/#{signature}",
  query: "base64_url=#{base64_url}"

The above example would produce:


With the output:

    "title": "Private by Default",
    "author": null,
    "date_published": "2018-09-11T00:00:00.000Z",
    "dek": null,
    "lead_image_url": "",
    "content": "<div>content</div>",
    "next_page_url": null,
    "url": "",
    "domain": "",
    "excerpt": "September 11, 2018 by Ben Ubois I want Feedbin to be the opposite of Big Social. I think people should have the right not to be tracked on the Internet and Feedbin can help facilitate that. Since&hellip;",
    "word_count": 787,
    "direction": "ltr",
    "total_pages": 1,
    "rendered_pages": 1


No description, website, or topics provided.







No releases published


No packages published