Skip to content

API Specs

Matt Kiser edited this page Jun 2, 2017 · 5 revisions

Lightweight product requirements for the WTFJHT API.

Purpose.

WTFJHT isn't a destination. It's an antidote to an impossible news environment with the goal of reaching users in the right place at the right time in the right format. To accomplish that goal, it needs to be flow like water downhill to all the end nodes: FB, Twitter, WhatsApp, Instagram, chatbots, IFTTT recipes, SMS, Slack channels, mobile apps, etc.

Goals.

  • Static API updated with each site build containing every day, each blurb, the underlying source links pulled out, a static copy of the source link, related content, blockquotes, and tags in as a well-structured, intuitive, and clean JSON blob
  • Leverage DiffBot or Mercury to add clean, static copies of the cited content and topic tags to the API
  • As simple as fucking possible

Current status.

There are two APIs:

  1. The first is the F18 Jekyll API plugin, which can be found here: https://whatthefuckjusthappenedtoday.com/api/v1/pages.json
  2. The second is by @milo-wata, which added a "today" json feed here: https://whatthefuckjusthappenedtoday.com/api/today.json

The first is pretty rough. It stripes out all the HTML, but doesn't escape all the characters correctly, making it pretty useless for anything but text mining.

The second is contained to only the latest post. It's a good start.

User stories.

  1. As a developer, I want to be able to pull a list of every post found in _posts and return the Title, Date, Description, URL, images, and whatever else is found in the YAML, so I can iterate through every post.
  2. As a developer, I want to then be able to get all the content for a specific day so I can reproduce the post in it's original form (meaning, give me the full HTML for the post).
  3. As a developer, I want to be able to get a specific, numbered blurb for a given day so I can reproduce this content elsewhere (in, say, a chatbot response).
  4. As a developer, I want to be able to get a list of all the URLs contained in the citation (the parenthetical list of third-party sources).
  5. As a developer, I want to be able to get a list of any blockquotes that are part of a blurb. Blockquotes are used as "updates" to an item – a way of signaling that this news blurb is evolving.
  6. As a developer I want to be able to get a list of any related content that's part of a blurb. These are the blue unordered lists. They are typically just a string followed by a cited URL. Other times they're an embedded tweet.
  7. As a developer, I want to be able to access a static copy of the underlying, cited source URL. I have both a DiffBot and a Mercury account, which provides nice, structured JSON as a response. Adding this would be huge for being able to traverse from any day, a blurb, down to the source in one go. The upshot on DiffBot is that they provide topic tags, which can and will power future product extensions, like Topic following. This is a nice to have feature that can come later.

Example.

Here's a super rough example structure for how this might look:

API
+All Daily Posts
++Post 1
+++Metadata: {title, description, date, images, etc}
+++Blurb 1
++++Cited sources: [{URL 1, DiffBot metadata for URL 1}, {URL 2, DiffBot metadata for URL 2}]
++++Blockquotes: {1, 2, n}
++++Relateds: {1, 2, n}
++Post 2
++Post n
...
++Last Post

Gotchas.

  • Embedded tweets. These fuckers will probably render poorly or break things.
  • Multi-paragraph blurbs. Shit like this will likely mess up attempts to simply scrape based on paragraphs.

Open ideas.

  • Utilizing Jekyll Collections as a way of keeping blurbs structured and well-formed. The downside is that this does not fit my current publish paradigm and likely causes different issues in the workflow in the future when I bring other folks on, though I don't actually know what those might be...
Clone this wiki locally