Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Custom Output Formats to Hugo #3122

Merged
merged 47 commits into from
Mar 27, 2017
Merged

Add Custom Output Formats to Hugo #3122

merged 47 commits into from
Mar 27, 2017

Conversation

bep
Copy link
Member

@bep bep commented Mar 2, 2017

@bep bep changed the title Add MediaTYpe WORK IN PROGRESS: Add custom output types Mar 2, 2017
@bep bep added the InProgress label Mar 2, 2017
@bep bep force-pushed the custom-output branch 2 times, most recently from e5d23dd to d975b9c Compare March 2, 2017 18:14
@bep bep added this to the v0.20 milestone Mar 2, 2017
@bep bep self-assigned this Mar 2, 2017
@bep
Copy link
Member Author

bep commented Mar 5, 2017

It would be great with some feedback on the above:

  • Does it in general make sense? Any big holes in the logic?
  • People gets confused by names ... MediaType is kind of given (it is a standard), but output and OutputType?
  • Other suggestions/corrections?

@rdwatters
Copy link
Contributor

@bep This is a really exciting feature, and thank you so much for putting in all the hard work. As an aside, I'm impressed with how well this spec is written and articulated. I can learn a lot from you.

That said, I see some points of confusion for "newbs," but I leave it up to you as to whether Hugo wants to continue to target this audience.

Confusion Over .Type:

  • type: in front matter
  • .Type as a page variable
  • Content type as an instance of an archetype
  • Arche_type_ as a parent from which content types inherit
  • Kind vs Type, which is another conversation and clearly a point of confusion (although I personally love this feature and .GetPage)

Recommended Nomenclature

To avoid confusion over the use of type throughout the documentation, I'd recommend the following instead:

  1. Output Format. This one works especially well since we refer to content files as supported formats--although this can be cleared up even more. I think this creates a clearer relationship between input | output; i.e. .md => .html, .adoc => .json, etc.
  2. mime. This is a second option and makes sense for reasons you've already pointed out, but this will be way less intuitive to newcomers...

Config and Front Matter

Configuration:

Your proposal of...

# Add some custom output type definitions:
[[output]]
name = "Calendar"
mediaType = "text/calendar+ics"
isPlainText = true
...

is really appealing because arbitrarily configuring output formats is a nice and flexible solution, but I wonder if this causes unnecessary confusion or adds too much bloat to config files, which can get huge already.

Front Matter

Page frontmatter:

title = "My Home Page"
output = ["html", "rss", "json", "calendar", "amp" ]

Again this is great at a granular level, but what about people who have a Hugo site with 1800 pages? Do users need this type of per-page control--and tedium--as they create or edit content? (Let's assume they can't .sh these values in).

Proposed, Ideal State

I have no idea whether this is feasible, so take it as you will.

Consider the desired mental model for Hugo end users. Here are some considerations:

1. Defining your own formats in config

metadataFormat = "yaml"

Vs.

[[output]]
> name = "AMP"
> mediaType = "text/html"
> path = "amp"

2. Naming Conventions for templates/lookup

Type Layouts
JSON json-index.json, index.json, _default/json-list.json, _default/list.json
AMP amp-index.html, index.html, _default/amp-list.html, _default/list.html

I think these will draw confusionb:

  • _default/json-list.json vs _default/list.json
  • _default/amp-list.html vs _default/list.html

Suggested Alternative

1. Defining formats in config

The question becomes whether users will need to arbitrarily define as many output types as they want or whether Hugo should come with a set of baked-in formats for easier reference: json,amp,ics.

So, for example, this might be a cleaner configuration:

# config.yml
outputs: ['html','json','amp','ics']

With the sane default (when omitted) of just:

#config.yml
outputs: ['html']

2. Naming Conventions for templates/lookup

And then with a similar directory structure like the following:

.
├── content
│   ├── events
│   └── posts
├── layouts
│   ├── _default
│   │   ├── list.html
│   │   ├── list.json
│   │   ├── single.amp.html
│   │   ├── single.html
│   │   ├── single.ics
│   │   └── single.json
│   ├── events
│   │   ├── single.html
│   │   └── single.ics
│   ├── index.amp.html
│   ├── index.html
│   ├── index.json
│   └── section
│       ├── posts.amp.html
│       ├── posts.html
│       └── posts.json

This doesn't require users to learn anything new about the lookup order and doesn't conflate file extension with name (see the language used around the .File object).

Note: This helps too since the lookup is going to be even more consistent and intuitive with your recent improvements to section 😄

Drawbacks

The drawbacks, however, are the same as one of the major drawbacks of the current lookup: how to keep certain pages from not rendering at all. For example, if in the above example I do not want to create individual .ics files for every file in posts.

Even with the current HTML-only model we have now, users are required to create a blank file in certain cases at layouts/<SECTION>/single.html if they want to keep multiple instances of a content type in one content directory but only render a single list page (e.g., with documentation and smooth scrolling).

BUT, I think your solution now solves this problem even for people who are only interested in generating HTML (although I'm not sure if that was your intention, but if so, awesome).

At the page level, outputs in the front matter would/could override the lookup. So continuing with your idea of explicitly declared formats in individual front matter being the only format rendered, you could do this in content/posts/special-post.md:

---
outputs: ['html','json']
---

So then only the following are rendered:

  • example.com/posts/special-post/index.html
  • example.com/posts/special-post/index.json

And the following are not because they override the lookup:

  • example.com/posts/special-post/index.amp.html
  • example.com/posts/special-post/index.ics

And this would be fantastic if it meant that in any single-content file I could add an empty array....

#posts/special-post.md
outputs: []

...and then example.com/posts/special-post/index.html...would render as nothing; i.e., no file rendered at all.

Questions

Also, I'm curious as to what your planning for pretty VS ugly URLs in config. Would it follow the same pattern? That is, special-post.json vs special-post/index.json?

@bep
Copy link
Member Author

bep commented Mar 5, 2017

@rdwatters thanks for the input.

  1. Media Type is kind of set in stone (it is well defined, rfc4288 -- MIME is something completely different). "Output type" could be something else, but "format" is too restrictive ... maybe. "Output definition"? But I think that using the keywords outputs (plural) in frontmatter and output per output definition (or whatever) makes sense.

  2. I like your lookup logic, I will change that.

  3. I think the user should also be able to replace the built-in default output set. But It would have to be "per Kind" (to match the current situation).

Also, I'm curious as to what your planning for pretty VS ugly URLs in config.

You assume I have planned every detail ... Much of the details will have to be decided when I run into them in tests (believe me, there will be lots of tidbits). I have not thought too hard about uglyURLs.

metadataFormat = "yaml"

I'm not sure I understand. We may have more built-in "output formats" that can be used (like AMP, JSON, YAML etc.), but the definition for a new one (say a rare calendar format), cannot be one word (we need the file suffix etc.).

Hugo will still aim to be extremely easy to use, but with some added power tools if you need it.

@rdwatters
Copy link
Contributor

Media Type is kind of set in stone (it is well defined, rfc4288 -- MIME is something completely different).

D'oh. My bad. Maybe I was thinking of mime.types and had a brain fart.

If this is really just a matter of running the same templating and changing the file extension:

#posts/post.md front matter
extensions: []

It's certainly pretty intuitive since it's a one-to-one relationship between what's in the front matter and the actual file extension used in the templating lookup I mentioned above, but I get the feeling this is too restrictive.

Hmmm...is just Outputs too abstract? That is, rather than Output Formats or Output Definitions.

But I think that using the keywords outputs (plural) in frontmatter and output per output definition (or whatever) makes sense.

So you're saying the following is not a feasible idea...

(Or maybe just a bad idea, which I'm okay with as well.)

#config.yml
outputs: ['html','json','amp']

#posts/post.md
outputs: ['html',json']

And instead...

#config.yaml
output: [...]?

Just to clarify, I'm talking about an actual one-liner in the configuration file, so...

logFile = ""
outputs = ["html","json","amp"]
metaDataFormat = "yaml"
pluralizelisttitles = false

With default being outputs = ["html"] if left blank...although the idea of Hugo just processing JSON templates and then outputting only JSON is kind of a neat idea; e.g.—explicitly declared outputs = ["json"] in a config creating only JSON files at build...could have some cool SPA implications, especially for those writing JS {{}} syntax in, say, their index.html.

@bep
Copy link
Member Author

bep commented Mar 5, 2017

It's certainly pretty intuitive since it's a one-to-one relationship between what's in the front matter and the actual file extension used in the templating lookup I mentioned above

No, that does not work. That breaks already on AMP, which is kind of one of the main motivations behind all of this.

As I said before:

  1. Yes, we should add the most common "output types" as predefined in Hugo, so you can just use "json", "amp" etc. in your config or frontmatter.
  2. Yes we should make it possible to re-define the default "outputs" per Kind in the site config (so you don't have to add outputs to every content file.
  3. We really need to the full MIME type definition to be able to handle it properly in the Hugo web server.

@rdwatters
Copy link
Contributor

rdwatters commented Mar 5, 2017

No, that does not work. That breaks already on AMP, which is kind of one of the main motivations behind all of this.

#config.toml
outputs = ["html","amp.html", "json"]

Yes we should make it possible to re-define the default "outputs" per Kind in the site config (so you don't have to add outputs to every content file.

Why even "per Kind"? I think the notion of .Kind, honestly, is very confusing for users (i.e, the difference between section, taxonomy,taxonomyTerm,etc).

So, for example, what I'm saying is it's on the user to ensure mirroring of what's in outputs = [foo,bar,baz] and the templating lookup order I mentioned above....

Example

You would have the following one-liner in your configuration:

#config.toml
outputs: ["html","json"]

And then you create posts/my-post.md. If you want it to render as a .json file, you will have to have the corresponding template according to the lookup order (simplified and not including any specific type declaration):

  • layouts/posts/single.json
  • layouts/_default/single.json
    Etc, etc...

If there isn't a corresponding file, it just doesn't build it. This actually reflects the current system, right? I mean, in that if you don't have any single.html template files anywhere in the lookup for posts/my-post.md to pull from, you're not going to get any output.

If I take what you mean by configuring outputs per .Kind, that would mean having to explicitly say something to the effect of...

#config.yml
output: 
    section: ["html","json"]
    taxonomy: ["html","rss"]
    taxonomyTerm: ...
....
# With the assumption that unspecified kinds output default "html"

Whereas what I'm saying is that a layouts/_default/list.json is going to create a JSON list file for all list templates but then a person can remove the JSON from being created at posts/index.json by adding the following front matter to content/posts/_index.md:

outputs: ["html"]

Or maybe you just don't want that page to be rendered to anything because the content is most definitely a post, but you want to only include it in that directory for your own sort of organizational purposes. In this sense, the directory becomes more like a "collection" (quasi Jekyll reference) and you can keep it from rendering at all with...

outputs: []

Or further customize the layout of the intended JSON output by going up the ladder in the lookup.

Are we saying the same thing here?

@rdwatters
Copy link
Contributor

@bep

How will partials be handled? That is, will there be such thing as partials/partialname.json?

I can't think of how blocks/base would provide a ton of benefit for media types other than HTML/XML...

@bep
Copy link
Member Author

bep commented Mar 5, 2017

If there isn't a corresponding file, it just doesn't build it. This actually reflects the current system, right?

No, it doesn't. We currently treat missing template file(s) as a WARNING, but there is an issue about making it an ERROR.

But this is side-tracking the discussion.

The default output set must be defined per Kind because that is what we have and do today (RSS is for all != regular pages). You may find it confusing, but then you should really start a new thread about changing that. A very common use case would be to define a JSON search index for the home page, and maybe also some other stuff for the section lists etc.

Having the "if there are no template, then we do not render that output type" will not work, as with the flexible layout system we have, we will in most cases find a template to use (esp. in the ambiguous cases like HTML vs AMP and the list templates etc.) and in many cases it does not make sense to render to a type just because we can dig up a template for it (RSS for regular pages etc.).

So we must have a defined set of rules.

Re. partials: That will eventually follow naturally when the big picture is painted -- as I said, I'm not digging into all the details now.

@rdwatters
Copy link
Contributor

rdwatters commented Mar 5, 2017

Me:

That said, I see some points of confusion for "newbs," but I leave it up to you as to whether Hugo wants to continue to target this audience.

You:

You may find it confusing, but then you should really start a new thread about changing that.

I don't find it confusing, but if I did, I can assure you I wouldn't be alone in such sentiment.

You're right, I could

a. start a new thread
b. completely overhaul the documentation for this project

I'll go with b.

@bep
Copy link
Member Author

bep commented Mar 5, 2017

I don't find it confusing, but if I did, I can assure you I'm not alone in this sentiment.

That sentence doesn't make sense ... :-)

Again, I appreciate your input on this, and I really want to make it as simple as possible ... With an emphasis on possible. If we want to do this, it should ... work. So any suggestions for a simpler model should be evaluated against the actual needs.

And the scope of this task is certainly NOT to redo the whole Kind thing (which we did have a pretty lengthy discussion about on the forum).

And note that

  • most people don't need to fiddle with Kind and friends
  • And if they do, new stuff is only confusing until it is familiar

@rdwatters
Copy link
Contributor

It makes enough sense colloquially, but you're right that I should rephrase:

"I am not confused by the concept of Kind, but if I were, I can assure you I wouldn't be alone in such sentiment."

That damn subjunctive mood is not a strength of we Chicagoans, haha. Besides, I hope these threads never come to that level of pedantry.

P.S. "Front matter" has always been two words. 😉

@bep
Copy link
Member Author

bep commented Mar 5, 2017

P.S. "Front matter" has always been two words.

It is a fairly new and constructed word, so I would say both would be equally correct; I prefer frontmatter. Besides, I'm from Norway, so English isn't my first language.

@rdwatters
Copy link
Contributor

Besides, I'm from Norway, so English isn't my first language.

I know. I was making a bad joke; i.e., by saying that I hope these conversations never care about things like spelling rules and then making that comment at the end.

Seriously, I would never give a fellow American a hard time about "proper" spelling of "front matter" and I'm a publisher!

Now I just feel bad. Sorry, Bjorn.

@budparr
Copy link

budparr commented Mar 6, 2017

So, as far as I can tell from this thread, to use this feature, one needs to:

  1. Declare the output format(s) in the config file.
  • There are set of formats available registered, but we are declaring the ones we intend to use (?).
  • We can also register a new type that's not already registered with Hugo.

Is it necessary to declare them at all in the config (pardon GO/Hugo ignorance here), unless we were registering a completely new type? Would a piece of content declaring that it was using a particular output type be sufficient?


  1. Declare the output formats to be used in a post's front matter.
  • It would be preferable to set this at the Section level.

As an aside, it would also be neat to be able to turn off the default "HTML" for a given Section (like with Kind, I think). I understand that may not be the point of this feature, just wondering if it would make that possible.


  1. Create the appropriate template.

This layout scheme

'index.json.json, index.json, _default/list.json.json, _default/list.json'

does seem a bit clunky with the output prefix repeated with the extension, but from what you're saying, it seems this is necessary for the case of AMP, as an example, because it has HTML as a prefix, not "amp."


Also, I like "output format"


Hope that helps.

@bep
Copy link
Member Author

bep commented Mar 6, 2017

@budparr I will make this more clear in the spec:

  1. We (Hugo) will hardcode a predefined list of the most common output types (not sure what that would be, suggestions welcome) with the obvious identifiers and sensible defaults: So whenever you want them, you can just say "json, yaml, amp ..." Once you need a more exotic output format (say a Calendar format?), or you need to override some setting(s) in the default (we probably have to add some settings on how to configure URLs), you must configure your own in config.toml.

  2. You should be able to set it both at the site config (a default list of outputs) and in the page front matter (<- @rdwatters ), but we probably need a way to restrict the site config to a subset of the pages (you may only want to create one output type for, say, the regular pages etc.)

  3. That looks clunky mainly because the example is bad. index.amp.json may be better -- and you can really ignore the alternatives you don't care for -- it just adds flexibility.

@wildhaber
Copy link

wildhaber commented Mar 6, 2017

Great to see progress in this topic. I'm really happy with the explanations above and many things have already been discussed.

There ist just one part that I'm not sure about is how to link those different pages together, like in AMP for example where you need to / should specify the related AMP page of a regular page and vice-versa.

From the Documentation:

Add the following to the non-AMP page:

<link rel="amphtml" href="https://www.example.com/url/to/amp/document.html">
And this to the AMP page:

<link rel="canonical" href="https://www.example.com/url/to/full/document.html">

Currently for the canonicals in Hugo we mostly used {{ .Permalink }} how this will change for other Paths?

Ideas:

{{ getPermalink <OutputTypeName> }}
# or
{{ .Permalink.<OutputTypeName> }} # same for .RelPermalink

I personally would prefer the getPermalink-Method but I cannot explain the reason :-)

Hope this part was not too much off-topic.

Keep up your great work.

Thanks.

@bep
Copy link
Member Author

bep commented Mar 6, 2017

Hope this part was not too much off-topic.

@wildhaber no, this was very much on-topic. I haven't invested too much time on AMP, so any feedback in that department is welcomed.

I suggest we add .Page.AlternativeOutputs (name ...?), so you can do something ala:

{{ range .AlternativeOutputs }}
<link rel="{{ .Name }}" href="{{ .Permalink }}">
{{ end }}

Will have to look into the value of rel above.

Would that make sense?

@wildhaber
Copy link

@bep I like the idea of having access to the AlternativeOutputs (name seems ok for me).

However I would like to have a direct access on an Output-Type like {{ .Page.AlternativeOutputs.<OutputTypeName> }} rather than only have the ability to loop through. Something like a map would fit well in this case.

So a developer can use this on different purposes without having to take care about the internal naming vs. standardized outputs.

AMP-Example:

# config.toml
[[output]]
name = "AMP"
mediaType = "text/html"
path = "amp"

So the value of .Name would be AMP but needs to be amphtml according to the specifications. Thats why accessing as follows is crucial:

{{ if ne .Page.AlternativeOutputs.amp nil }}
<link rel="amphtml" href="{{ .Permalink }}">
{{ end }}

.ics-Example

#config.toml
[[output]]
name = "Calendar"
mediaType = "text/calendar+ics"
isPlainText = true

Creating a link in the document to download the .ics file directly like:

<a href="{{ .Page.AlternativeOutputs.ics.Permalink }}">Add to the Calendar.</a>

In cases of .ics we might also need to think about customizing the Protocol in the configuration like webcal:// in this case.

For example:

#config.toml
[[output]]
name = "Calendar"
mediaType = "text/calendar+ics"
isPlainText = true
protocol = "webcal://"

I think the two examples above show the flexibility that are needed for custom outputs. But on the other hand Hugo developers would really gain an amazing feature.

@bep bep changed the title WORK IN PROGRESS: Add custom output types Add Custom Output Types to Hugo Mar 6, 2017
@bep bep force-pushed the custom-output branch 3 times, most recently from 2c7ba73 to 9a3f3b9 Compare March 6, 2017 20:36
bep added 23 commits March 27, 2017 10:55
This isn't meant to be the final useer docs on this feature!
To make it super-easy to create rel-links.
And remove the now superflous setPageURLs method.
And some other unsed fields and methods.
Using it for list pages doesn't work and has potential weird side-effects.

The user probably meant to range over .Site.ReqularPages, and that is now marked clearly in the log.
@bep bep merged commit 4923273 into gohugoio:master Mar 27, 2017
@bep bep moved this from In progress to Merged in Custom Output Formats Mar 27, 2017
@egardner
Copy link

Hi @bep – I'm late to this discussion but it sounds very promising!

To clarify, once this feature is finished, will Hugo support building pages in arbitrary formats and with arbitrary file extensions? I'm working on building some tooling on top of Hugo that could output a site as an EPUB, a process that requires building XML files with specific extensions like package.opf, toc.ncx, xhtml files, etc. – if this capability is about to land in Hugo that would be very exciting.

Thanks for all the great work on this project!

@bep
Copy link
Member Author

bep commented Mar 30, 2017

@egardner -- short answer: Yes, but I have not added the option to add your own MIME-types and output formats, but that will land in a couple of days.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants