Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a mechanism to create pages from Headless CMS API calls #401

Open
zoosky opened this issue Sep 6, 2018 · 16 comments

Comments

@zoosky
Copy link

commented Sep 6, 2018

Or make Gutenberg available as a library so that it can be used to generate content from CMS APIs content instead out of markdown files and directory structures.

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Sep 6, 2018

You can use Gutenberg as library already, it's not on crates.io though so you will need to vendor the repo directly.

Gutenberg itself is built as a library that is then used by a small bin crate.

Regarding headless CMS, I haven't looked into it yet but that's something I'm interested in. If you have more information about it and/or examples, please share!

@paulcmal

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2018

I think the key request here and in #374 is to create an abstraction around different means/backends of extracting the "frontmatter" info:

  • from Markdown files with an explicit frontmatter (TOML at the moment, but more could be supported)
  • directly from Markdown files without a frontmatter (as suggested in #374)
  • from any external API such as a headless CMS

So basically that would mean standardizing the interface to fetch the content of the site. That can mean documenting gutenberg's use as a library and let people build other tools upon, which is nice but does not actively encourage to create new use-cases. Or that can mean implementing some sort of dispatcher that can use configurable backends, whether integrated into gutenberg source or as separate plugins following an API.

It shouldn't be too hard for the adapters for different content providers to provide easy migration from one solution to the other, gutenberg's formats ⁽¹⁾ and expectations serving as a translation mechanism. This way we could migrate from a content source to another, such as from a headless CMS to another or to gutenberg's current content/ format.

More specifically about headless CMS, I'm very interested in that. I've been experimenting in the past with Directus CMS (a bit capricious, but by far the most documented project i've found) to be used as a backend to generate the Markdown files to be compiled by Hugo, but that's far from ideal. Being able to write a simple plugin (in Rust) to bridge Directus' API to a Gutenberg content API would make it easier to maintain in the long run.

In my opinion, although the tools themselves are Work in Progress, headless CMS go in the right direction: tools that do content/authorship management and do it right ⁽²⁾. Git having AFAIK no proper ACL system out-of-the-box (but having anyone with write permission able to rewrite history) make it the wrong tool to build upon for such purposes. i believe that's why netlifyCMS for instance is so popular.

Shifting to headless CMS raises multiple maintainability/security concerns like we have/had with "complete" CMS such as Wordpress, Joomla, Grav… But with such a setup we're still reducing the attack surface imho (by separating the concerns) and definitely solving the distribution headache by exporting to a static-site generator (static sites are p2p-friendly).

⁽¹⁾ If we go in this direction, I'd personally advise to implement a standard format and not apply Gutenberg's specific one. ActivityStreams 2.0 is a good candidate (see intro to social web protocols).
⁽²⁾ Some "headless" CMS provide a web interface out-of-the-box. But personally, I think a writing/editing workflow is yet another question to solve with different tools leveraging existing standards. The whole point of a headless CMS is to provide a simple API to manage content.

PS: Sorry that was a bit long. Don't hesitate to tell me if i'm boring or just burying you under piles of messages. I'm trying to make myself clear with long messages, but maybe i'm not so good at it and i know it can be overwhelming :)

EDIT: Wrongly referenced an unrelated issue.

@zoosky

This comment has been minimized.

Copy link
Author

commented Sep 6, 2018

@Keats There are two types of headless CMS. You can filter them here at https://headlesscms.org/ by "API driven" or "Git-based". The latter produces markdown files as input for a static website generator, the former has a more or less proprietary API. I'm most interested in those that use GraphQL as API.
Strapi is evolving and developed by a French core team. Gentics Mesh is IMHO one of the most promising ones in an enterprise environment.

@piedoom

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2018

I currently have success using Gutenberg with Netlify CMS (though since I plan on using Gutenberg for client work, I was looking to create a more robust, Gutenberg-specific frontend at some point). I'm not as read on the topic, unfortunately, so I can't contribute much more than my thoughts; having Git as storage is nice as it doesn't require any server that I have to run, but also building it on top of Git feels very... messy. I'm glad this thread has come up, as I didn't know there was so much info out there. I just planned a simple solution with an elm app that would keep all changes in browser storage, and then do one big commit on save to prevent needing to use specific github/gitlab APIs to show images like netlifyCMS does.

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Sep 6, 2018

I actually want to build something like NetlifyCMS for Gutenberg when I find the time and Gutenberg has all the basic blocks in place! I'm not too convinced about making it work for all static site generators though as you need to limit yourself to what the most basic tool will offer. Admittedly I haven't read too much on it yet though and have never used any at all so I could be wrong.
The API vs Git distinction is interesting, I didn't think of it that way before but I don't think it's one or the other, you could have a site linked to a repository where commits are happening on edits as a backup and you can choose to use an API or that git repo to render it. This way if the company shutdowns, you still have all your history.
I'll read about it and try some tools to have a better idea of what they are doing.

Keep in mind that any big features like that can only happen after i18n has landed, all the focus should go to that until it's done.

@piedoom

This comment has been minimized.

Copy link
Contributor

commented Sep 6, 2018

How exciting, (how exciting)! Seriously though, it's a great idea for after i18n. Without getting too off topic, is there a place to discuss these things like irc/discord/slack/etc?

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Sep 6, 2018

I'll try to setup a discourse instance once the project has changed name

@zoosky

This comment has been minimized.

Copy link
Author

commented Sep 7, 2018

@paulcmal Thanks for listing up possible use cases.

One particular use case is using the gutenberg lib as a nginx module, see https://github.com/nginxinc/ngx-rust or include it into a proxy like https://github.com/linkerd/linkerd2-proxy

The purpose of this setup is rendering and caching on demand:

  1. nginx/gutenberg gets a request from browser
  2. turns around and asks headless cms API for the requested content
  3. renders the content together with the template to html and writes it to disk
  4. respond to the browser with html response

Further hits of the same url are served from disk.

Headless CMS has a means for marking the disk cache stale or removing the page when the CMS authors update or delete a page.

What do you all think?

@paulcmal

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2018

Headless CMS has a means for marking the disk cache stale or removing the page when the CMS authors update or delete a page.

If your CMS has a proper hook system for such things, why access the data dynamically from your webserver in the first place? Couldn't a simple script build the new content and upload it to the folder served statically by nginx? Such a path of least privilege would probably be better for security and scaling reasons, don't you think?

Also, one piece of content changing may trigger taxonomies/sections rebuilding. I don't think gutenberg supports incremental builds at the moment so you probably would have to rebuild the whole website everytime anyway. So i really like your idea but from an outsider's perspective i feel like you're overengineering it (i mean the part about actually linking nginx, gutenberg, and the headless CMS) :)

@zoosky

This comment has been minimized.

Copy link
Author

commented Sep 7, 2018

@paulcmal I can understand that this proposal seems like 'overengineering'. Such a setup is clearly not for small sites and blogs, but rather for huge sites.
In practice the architecture I described is similar to what Adobe AEM CMS has in place. See https://helpx.adobe.com/experience-manager/dispatcher/using/dispatcher.html#WhyuseDispatchertoimplementCaching
When you have a large set of pages you want to avoid generating all with each content update. Be aware that AEM is not (yet) a headless CMS.

@paulcmal

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2018

seems like 'overengineering'

When i say i feel like it's overengineering, i'm not talking about the idea of only partially building the content as needed. I'm talking about the part of having the process take place as a webserver extension. You could achieve the same result with a less-privileged script rsyncing the built content to your front servers (no TTL → immediate propagation) or having these act as a caching reverse-proxy to a web server exposing the built content (cache TTL). Sorry the discussion is going off-topic again (sysadmin concerns). can't wait for a forum to open to have broader discussions there and more to-the-point discussions on here :D

When you have a large set of pages you want to avoid generating all with each content update

Supporting incremental builds directly in gutenberg would be a way to address the issue you're mentioning here, don't you think?

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Sep 7, 2018

I was about to post something roughly similar to what @paulcmal did.

Supporting incremental builds directly in gutenberg would be a way to address the issue you're mentioning here, don't you think?

I don't think it will be possible. With global functions you can do whatever you want, which means in lots of cases you don't actually know what you need to rebuild. The only way to do that effectively would be inspect the AST of each templates to see the global function calls but you need to take into account the conditionals, the blocks etc. A nightmare in short and very likely to still miss things, which is not really acceptable for incremental rebuilds. I think unless you go to 10000s of pages, a full rebuild would be preferable

@paulcmal

This comment has been minimized.

Copy link
Contributor

commented Sep 7, 2018

The only way to do that effectively would be inspect the AST of each templates

Wouldn't it be possible to keep track of content/templates relations and dependencies on template compile time and store it somewhere in a cache folder? I hear that's how Nikola's engine deals with that. This way when a file is added/changed/removed you "just" have to walk through the dependency graph looking for either the page or its ancestors. ⁽¹⁾ That's a conservative approach that may rebuild more than needed in most cases, but i don't think we would miss any cases this way. Am i being too naive?

unless you go to 10000s of pages

This can happen pretty quickly if you use your static site as a social (micro-)blogging platform. Can also happen with a webshop i guess?

⁽¹⁾ I think we need to check ancestry. For instance, in case a template calls on a section to display its latest children. Individual page relations on compile-time would not catch this, but seeing the template calls on the whole section and some content in the section has changed should trigger rebuild of the template.

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Sep 7, 2018

Wouldn't it be possible to keep track of content/templates relations and dependencies on template compile time and store it somewhere in a cache folder?

Tera already does that itself for inheritance/macros. Here's a more concreate example of the issue: https://github.com/Keats/book/blob/master/templates/page.html#L34
This will lookup the next section and use its permalink. If someone renames a section .md file in the content or change its path in the frontmatter, you will need to render every page since the menu is done via global function and used in every page. To know that you need to see that it does {% set index = get_section(path="_index.md") %} and then iterates on all its subsections. Without looking at the AST, you wouldn't know that changing a single path would require re-rendering the whole site.
Another example: you want to display the 3 most recent articles at the bottom of each article you write. Adding a new post or editing one the most recent requires a full rendering while editing an older one would be fine with re-rendering just itself.

The current live reload already tries its best to do an incremental rebuild but it totally fails whenever you use global functions.

This can happen pretty quickly if you use your static site as a social (micro-)blogging platform. Can also happen with a webshop i guess?

Speed mostly depends on having syntax highlighting on/off. With it off, it shouldn't take too long to render 10000s of pages. With it on, it's a bit more problematic :)

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Nov 9, 2018

This can happen pretty quickly if you use your static site as a social (micro-)blogging platform. Can also happen with a webshop i guess?

Just to get back on that, the current big benchmark site I am using takes about 10-11s to render in the next branch.
It contains 10000 pages of decent length (~70 lines) with syntax highlighting on, shortcodes, taxonomies and pagination (with 5 per page, so 2000 pages) so it should be fast enough for microblogging at scale.

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Nov 22, 2018

Can we move the discussion to https://zola.discourse.group/ ? I want to try to keep Github for bugs only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.