Skip to content

Converting 1.x ingester to 2.x API

Srdjan Grubor edited this page Aug 16, 2017 · 13 revisions

Converting 1.x ingester to a 2.x API

Using new APIs to provide metadata

Assets now have setter functions for providing certain metadata.

set_as_static_page()

(Blog, News) Marks the page as a "static" page so that it will show up on the main menu of an app. Use this for e.g. "About the author" or "FAQ" pages.

set_author(name: String)

(Blog) Pass in the name of the post author here. (The Blog article format currently only is designed to handle one author.) For the News article format, see set_authors().

set_authors(names: String or Array of String)

(News) Pass an array of names of article authors. If names is a string instead of an array, it is assumed there is only one author. For the Blog article format, see set_author().

set_date_published(date: Date or Date-constructible)

(Blog, News) Pass the date that the post or article was published. You can pass a Date object here. If it is not a Date object, then new Date() will be called on it, so you can also pass e.g. a timestamp.

set_section(tag: String)

(News) Pass in the ID string of a "set" (category, tag). This is not the human-readable string (though it's possible for the two to be identical); the human-readable name of the set is created later in the app builder. (Compare set_tags() for the blog article format; news articles are usually in one section of the news site at a time. If it turns out we need more tags, we can also add a set_tags() method to the news article format.)

set_source(sourceName: String)

(News) Name of the publication, e.g. "New York Times". This shows up in a link in the top right hand corner of rendered news articles.

set_synopsis(text: String)

(All) Sets the text that will be shown as the synopsis of a document. (The synopsis is often shown in the UI on "cards" if there is enough space.)

set_tags(tags: Array of String)

(Blog) Pass in an array of ID strings for "sets" (categories, tags). The ID string is not the human-readable string (though it's possible for the two to be identical); the human-readable name of the set is created later in the app builder. (Compare set_section() for the news article format.)

set_thumbnail(img: ImageAsset)

(All) Sets an image asset to be the article's thumbnail. (An image asset is returned e.g. from libingester.util.download_img() or libingester.util.download_image().)

Converting to handle the nightly feed runner

Since a nightly feed runner will invoke these and some changes have been made in the API for ease-of-use, we need a few changes on each ingester:

- Hatch default params

Adding default parameters to the Hatch constructor. The first one should allow us to have a properly-named directory when running locally, second one is the language we will use in xapian:

 function main() {
-    const hatch = new libingester.Hatch();
+    const hatch = new libingester.Hatch('alodita', 'en');

- Using https libingester module URI in package.json

We can't use ssh in the nightly feed runner to resolve the libingester as a dependency so this needs to be switched to https in package.json:

   "dependencies": {
-    "libingester": "ssh://git@github.com:endlessm/libingester.git",
+    "libingester": "https://github.com/endlessm/libingester.git",

- Adding npm start script target to package.json

This will be the target that the nightly feed runner will invoke to unify the ingester interfaces so it needs to be specified in package.json. We also specify an easier way to run the previewer (npm run preview <path>):

     "license": "UNLICENSED",
+    "scripts": {
+        "start": "node index.js"
+        "preview": "node_modules/.bin/hatch-preview"
+    },
     "dependencies": {

New API to support the new article formats

set_body(html: String or Cheerio object)

(Blog, News) Pass the body of the article here. Note that this will be rendered mostly unchanged, inside an <article> element. The ingester itself is responsible for cleaning up this HTML such that the stylesheet applies cleanly to it. See below for more hints.

set_custom_scss(scss: String)

(Blog, News) Here is where to customize the stylesheet. Most customizations should be able to be accomplished just by tweaking some SCSS variables which the stylesheet exposes. Here is a list of the variables:

  • $primary-light-color
  • $primary-dark-color
  • $accent-light-color
  • $accent-dark-color
  • $background-light-color
  • $background-dark-color
  • $title-font
  • $body-font
  • $context-font
  • $support-font The default stylesheet is included with @import '_default';. (Note that you can also leave out this import and start from a blank stylesheet to completely do your own thing.) If you are adding rules as well as customizing variables, make sure that the variables are specified before the import, and the rules are specified after it. So, this would be an example:
asset.set_custom_scss(`
    $body-font: 'Comic Sans';
    @import '_default';
`);

set_lede(html: String or Cheerio object)

(News) Sets the lede (opening paragraph) of the news article. This paragraph is marked up specially. (Make sure to remove this paragraph from the HTML that you pass to set_body(), otherwise it will be present twice in the rendered article.)

set_main_image(image: ImageAsset)

(Blog) Marks an image as the "main" image of the post, which is marked up specially. (Make sure to remove this image from the HTML that you pass to set_body(), otherwise it will be present twice in the rendered post.)

set_main_image(image: ImageAsset, caption: String or Cheerio object or undefined)

(News) Marks an image as the "main" image of the article, which is marked up specially. Includes an optional caption. Compare BlogArticle.set_main_image() and BlogArticle.set_main_image_caption(). (Make sure to remove this image from the HTML that you pass to set_body(), otherwise it will be present twice in the rendered post.)

set_main_image_caption(text: String)

(Blog) Sets a caption to go with the image set by BlogArticle.set_main_image().

set_read_more_link(html: String or Cheerio object)

(News) Sets the HTML rendered at the bottom of the article, which should include a link back to the original source. An example might be "Read more at <a href="http://...">Planet GNOME</a>".

set_read_more_text(text: String)

(Blog) Sets the text rendered at the bottom of the post. Unlike NewsArticle.set_read_more_link(), this takes text rather than HTML, because the blog article format is a bit more strict about what it renders there. An example might be "Original article at Planet GNOME" which is turned into a link during rendering.

Example code changes to support the new article formats

Here's an example ingester using the blog article format: https://github.com/endlessm/endless-content-factory/blob/b821b1de3fda4df8112de847ad5c17689442de9e/src/gp-ingestors/livingloving/index.js

Looking at the history of that file shows a few things that must be done:

  • Make sure to use machine IDs for tags. (In practice, these can be equal to the human-readable names, but notice that the sample code prefixes freeform Wordpress tags with tag: in order to distinguish them from fixed Wordpress categories.)
  • Remove any HTML templating that the ingester does itself, instead libingester will do this.
  • Use the correct article format: libingester.NewsArticle or libingester.BlogArticle.
  • Use the new data- and metadata-setting APIs as described above.
  • Change your package.json to require the newer version of libingester.
  • Make sure to call asset.render() when you are done with all the setter methods.

Most specs will include theme customizations in the form of SCSS variables. Include these variables and any other custom rules as described above with the set_custom_scss() method.

Clean up HTML before passing to set_body()

Example 1: https://github.com/endlessm/endless-content-factory/commit/9f6414fe589125e447c81d2020d537b99361d732 Example 2: https://github.com/endlessm/endless-content-factory/commit/ace4e3738db3c8d6852fdd531ac406fc7d43abd8

The ingester is responsible for passing cleaned-up HTML to set_body() that the stylesheet's rules will apply to. In many cases this will not be necessary, but in other cases quite a lot of cleanup might be needed. Some hints to take into account:

  • Figures should be inside <figure> elements, not paragraphs
  • Figures with captions should also contain <figcaption> elements
  • Quotes should be inside <blockquote> elements, properly formatted inside with <p> (and <cite> if applicable)
  • Paragraphs should be in <p> elements, not denoted with <br> linebreaks