Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-defined taxonomies #246

Closed
sjml opened this issue Mar 7, 2018 · 27 comments

Comments

Projects
None yet
4 participants
@sjml
Copy link

commented Mar 7, 2018

(I reached out to @Keats on Twitter to ask about this, and he said to open an issue here, so here it is.)

I was considering starting work on a PR to allow for user-defined taxonomies. What I have in mind is fairly simple (and what I already have implemented in my own, very basic Python SSG), but I don't want to start in on it if it's the wrong thing or not wanted.

Like in Hugo, you declare the singular and plural form of the taxonomy in the config.toml. (From here out, I don't know if this matches or diverges from Hugo's behavior; I never went down the road too far on it because I was implementing my own by then.)

When parsing frontmatter, if any of the fields match a taxonomy name, the page or section is added to that taxonomy under the given term or terms, passed through a slugify text filter to make sure they can live in a URL.

Where this becomes useful is being able to declare, for instance, that author = "authors" is a taxonomy, so that author data in frontmatter generates a page where you can see all content from a specific author, without having to do any additional work. It makes http://example.com/authors into a list of the authors.

This is something that could be accomplished via tagging, but it keeps things a bit cleaner to be able to use possibly-existing data.

Now it could be that I've simply overlooked or misunderstood some part of the Gutenberg documentation -- if this is easily accomplishable with existing functionality, please let me know!

If not, I'm happy to start making an effort at adding the functionality, provided @Keats gives a thumbs up to the idea.

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Mar 7, 2018

Ah I see, didn't think of that. It is kind of doable right now but you would need to add the authors manually as a section, something like:

-authors
   - _index.md
   - author_x.md
   - author_z.md

And have the _index.md render a custom authors.html template that will loop through its "pages" to render a list of authors and the author_X.md would have the name used in the articles in the front-matter and would render a author.html that would use https://www.getgutenberg.io/documentation/templates/overview/#get-section to get the section and do a forloop on the pages to only display the ones by author_X.
The catch is that it doesn't really work if you have the posts byt the same author in several sections and want to have them ordered. The other catch is that the author page won't be created automatically, you will need to create the .md file yourself.
The advantage is that it's easy to customise each author page if you want since each author page could use a differente template, which I believe is not doable with custom taxonomies.

In short, I'm not really sure. I guess it depends how common are custom taxonomies.

@sjml

This comment has been minimized.

Copy link
Author

commented Mar 7, 2018

Right, it's the lack of automation that I'm most wanting to solve. It's a good question how common such things are -- "authors" feels like a natural extension, but I know Hugo's examples take it in a pretty freeform database-y direction. (It's a little overgeneralized and thus not the easiest thing to understand or configure.)

I don't know if there's a need for that kind of flexibility in most cases, but the simpler ability to collate pages by a specific taxonomic term seems pretty useful. I don't know if that applies to anything other than "author," but it's one of those things where my immediate temptation is to make make it configurable. (i.e., just adding the functionality for authors would seem strange at this stage.)

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Mar 8, 2018

(i.e., just adding the functionality for authors would seem strange at this stage.)

Yeah definitely wouldn't be that way.

I guess we need to find out examples of custom taxonomies (I've never used them myself) to see what kind of usage they have before starting working on it.

Something to consider as well is that the taxonomy term will have to be in extra section of the frontmatter as those can be custom. Unless we add a taxonomies hashmap that would contain tags/categories as well but that's a breaking change so for the next major version. If we add custom taxonomies though, we need to make sure tags/categories are not special cased like right now otherwise that would be confusing.

@sjml

This comment has been minimized.

Copy link
Author

commented Mar 8, 2018

My instinct, as a first step, is to modify the existing categories and tags system to be generalized (i.e., the underlying code doesn't know the difference between categories and tags, just that they're taxonomies). After that, it's a question of where to pull the configuration data from -- I'd leave the thoughts on which part of the config file makes the most sense to you.

I would be interested in hearing other folks' thoughts on taxonomies, as my usage is admittedly pretty minor, too. I'd hazard a guess that Gutenberg doesn't need to allow as much flexibility there as other SSGs (its opinionated nature being one of its greatest strengths), but there would obviously be differences of thought as to where that line should be drawn.

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Mar 9, 2018

Hugo taxonomies page for reference: https://gohugo.io/content-management/taxonomies/

My instinct, as a first step, is to modify the existing categories and tags system to be generalized

Almost everything about taxonomies is already generic, just some hardcoded strings/names here and there.

First thought on the implementation:

Config file:

# A new taxonomy is defined by the plural name
taxonomies = ["tags", "categories", "authors"]

Front-matter:

[taxonomies]
# name has to match one of the config taxonomies
tags = ["something"]

To think of: do we want a way to specify that a taxonomy only take a single item, like a category

Currently, Gutenberg hardcode the templates names (tags.html, tag.html, categories.html, category.html). Instead, it should now look in templates/$TAXONOMY_NAME/{single,list}.html.

Hugo allows custom metadata in taxonomy terms (https://gohugo.io/content-management/taxonomies/#add-custom-metadata-to-a-taxonomy-term) but I'm not convinced it's worth it. If you do that, you might as well do the approach I outlined in the first post without dealing with custom taxonomies to get more flexibility.

Agreed on getting more feedback from taxonomy users, I just don't know where to find them!

@sjml

This comment has been minimized.

Copy link
Author

commented Mar 11, 2018

My instinct, as a first step, is to modify the existing categories and tags system to be generalized

Almost everything about taxonomies is already generic, just some hardcoded strings/names here and there.

Ah, ok. I had been basing my previous assumptions on the find_tags_and_categories function, but I see that it's pretty flexible. Just needs to take the current hardcoded values and pull them from the configuration.

To think of: do we want a way to specify that a taxonomy only take a single item, like a category

This is why I was thinking of the config specifying both a singular and a plural, so that we can look for author with a single value or authors with an array. It makes the system a little more forgiving for folks who might forget that it needs to be authors = ["John Smith"] instead of author = "John Smith". It does make the code a little more complicated, though, and the way Hugo specifies it is not the most clearly declarative thing.

What is your thinking behind having [taxonomies] as a separate section in the frontmatter, as opposed to just scrubbing through it looking for the defined taxonomies? Is it a speed thing?

Agreed on getting more feedback from taxonomy users, I just don't know where to find them!

Well, we can see if this issue attracts any. 😄

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Mar 12, 2018

Ah something additional to think of if we add those custom taxonomies is whether we want to add pagination to them. Like being able to paginate each tag page for example.

@caemor

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2018

It looks already pretty good what you have discussed here.

  • Keats first thought of the implementation is great.

  • sjmls more forgiving values in the frontmatter would also be a great addition. E.g. it wouldn't change the category field in the frontmatter and still would make it more flexible in general

  • Pagination looks useful for every "field" which has more than approx. 5 to 15 posts, but it looks a bit difficult to turn on/off.
    Maybe something similar to the following to turn it on?

taxonomies_paginate = ["tags", 15]
  • It would be great to have a global function like get_taxonomy("tags") to get access to the array including all elements of that taxonomy if needed on a different site than templates/tags.html -- or make these taxonomies globally accessible variables
@Keats

This comment has been minimized.

Copy link
Collaborator

commented Apr 25, 2018

Would be great if someone could summarize what to implement and to get more feedback on whether custom taxonomies are needed by many people since it could fit in the next release.

@sjml

This comment has been minimized.

Copy link
Author

commented Apr 29, 2018

Would be great if someone could summarize what to implement and to get more feedback on whether custom taxonomies are needed by many people since it could fit in the next release.

I could have a go at this, but probably not for another month or so given the other commitments on my plate at the moment. Don't know what your schedule is for the next release...

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Apr 30, 2018

One month should be ok, I still have the markdown rendering to rewrite and the Witcher 3 to finish
I'll write a comment if I end up starting on it before you

@Keats Keats referenced this issue May 23, 2018

Closed

Custom RSS feeds #309

@Keats

This comment has been minimized.

Copy link
Collaborator

commented May 23, 2018

Something else to add (from #309): enabling rss feed per taxonomy item.
Let's try to sum it up:

  • remove generate_tags_pages and generate_categories_pages from config.toml
  • add a taxonomies array of structs to the config file like the following pseudo-code
struct Taxonomy {
    name: String,  // the name used in the URL
    paginate: Option<usize>, (defaults to None)
    rss: bool, (defaults to false)
}
  • templates default to templates/$TAXONOMY_NAME/{single,list}.html
  • if rss is true, generate a rss at `$base_url/$TAXONOMY_NAME/{some_value}/rss.xml using the existing rss.xml template
  • Add a get_taxonomy global function that returns all the TaxonomyItem for a particular taxonomy (TaxonomyItem is an existing struct in the taxonomies sub-crate)

So a config.toml with tags/categories/authors could look like:

base_url = "http://blabla.com"

taxonomies = [
   { name: "tags", paginate: 10 },
   { name: "categories" },
   { name: "authors", paginate: 10, rss: true },
]

And in the front-matter:

+++
title = "Hello World"

[taxonomies]
tags = ["Rust", "JavaScript"]
categories = "Dev" 
+++

For the forgiveness of having taxonomies directly in the front-matter, the issue with that is right now the front-matters are serialised directly into the struct and is easy to map front-matter <-> struct when looking at the code. If it's in a nested hashmap, we also can't break someone site by adding a new field to the front-matter that happens to match a taxonomy name (like adding a author field, not that i'm planning to do that). I'm not entirely against it though.
Also, it wouldn't solve the category -> categories issue since we will need only one string to keep it simple and as we use the plural in the URLs...

Last issue, we can't enforce unique/multiple elements in this scheme: I could add 10 categories to a post while a post can only be a single category. I don't think it's a big deal myself as it's up to the users to use how they see it

@caemor

This comment has been minimized.

Copy link
Contributor

commented May 23, 2018

Your proposal looks great 👍

Add a get_taxonomy global function: detail on what that function will do is still a bit unclear to me (@caemor what do you want it to return exactly?)

  • get_taxonomy(tags) should give you the array of TaxonomyItem of the tags which is normally only available in the tags/list.html template, but might be sometimes useful elsewhere. But it should basically just be the full array of elements of that taxonomy

Last issue, we can't enforce unique/multiple elements in this scheme: I could add 10 categories to a post while a post can only be a single category. I don't think it's a big deal myself as it's up to the users to use how they see it

  • for the unique/multiple elements part: wouldn't it be possible to enforce it by adding another field to the struct which defaults to multiple elements? Although i don't think it's very important

  • How will this issue change the section/page variables as they both have access to these taxonomies? (array of tags and just a single string for category at the moment). I assume there needs to be a breaking change there.

@Keats

This comment has been minimized.

Copy link
Collaborator

commented May 23, 2018

get_taxonomy(tags) should give you the array of TaxonomyItem of the tags which is normally only available in the tags/list.html template, but might be sometimes useful elsewhere. But it should basically just be the full array of elements of that taxonomy

Updated

for the unique/multiple elements part: wouldn't it be possible to enforce it by adding another field to the struct which defaults to multiple elements? Although id don#t think it's very important

Yeah I don't think it's important enough to bother doing it tbh

How will this issue change the section/page variables as they both have access to these taxonomies? (array of tags and just a single string for category at the moment). I assume there needs to be a breaking change there.

The whole thing will be a breaking change for users using tags/categories so it will be part of the 0.4 release (currently on the next branch). I can't think of other potential breaking changes I'd like to make though so it should be the last one for a while, unless i18n requires one.

If anyone wants to work on it, I think it should be fairly straightforward as the taxonomies code is mostly generic already. Work should be done on the next branch

@sjml

This comment has been minimized.

Copy link
Author

commented Jun 12, 2018

So here I am much more than a month later realizing that I'll be out of useful internet range for most of the summer, so probably won't have a chance to work on this. Sorry for creating false hope, but I'm crossing my fingers that I wasn't the only one thinking about taking a bite of this...

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Jun 13, 2018

@sjml
No worries, anyone else wants to have a go at it?

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Jul 1, 2018

Last call before I pick it up!

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Jul 9, 2018

I have done good progress on it and had the following questions:

  • should we allow taxonomies without defining them in config.toml? That makes it easier to get started but you can get unwanted taxonomies created if you made a typo and we can't warn about it. I'm leaning toward requiring defining them
  • should we allow per-term rss.xml customisation? Ie putting a rss.xml in templates/$taxonomy/ would use that before looking up the default templates/rss.xml. I don't really see the point but maybe some people want it

More like and update to the current text: if we allow pagination on taxonomies terms, we also need to add paginate_path to it to be equivalent to the pagination system in sections.

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Jul 9, 2018

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Jul 10, 2018

The WIP is there: #330

It is still missing pagination but that shouldn't be too hard to add

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Jul 12, 2018

The PR linked above should now have all the elements of the proposal @caemor @sjml

I removed get_taxonomy_url in favour of get_taxonomy which should be more general

@Keats Keats added the done label Jul 17, 2018

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Jul 24, 2018

It would be super helpful if people interested in custom taxonomies could give a try to the next branch before I release 0.4. It wouldn't be nice to have to release 0.5 because I missed something!

@codesections

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2018

I've been testing this out on the next branch. A few minor comments:

  • I'm not a big fan of having the taxonomy templates live at templates/$TAXONOMY_NAME because that could result in a very cluttered templates directory if users have multiple taxonomies. Could it be templates/taxonomies/$TAXONOMY_NAME instead?
  • There doesn't currently appear to be a default template, even though the docs suggest that there is. I'm not sure if one is needed.
  • Should the term.pages objects have a weight field in case users want to sort items in the taxonomy by weight?
  • Confusingly, get_taxonomy did not work, but get_taxonomy_url worked and returned the full taxonomy

Finally, one bigger-picture question. Would it be possible to have earlier_in_taxonomy/later_in_taxonomy variables accessible on the taxonomy object? (Or, I guess, heavier_in_taxonomy/lighter_in_taxonomy if the pages are sorted by weight). The use case I'm thinking of is if someone wants to have separate series of posts in a blog and include a link to the next/previous entry in the series. For example, on your blog you might have all the Gutenberg-related posts in a gutenberg category/series and then want to have a link to the previous/next Gutenberg-related post. I tried to set this up with the existing behavior, and I got it working with the code below, but it was a bit hairy and made me wonder if there should be an easier way.

    {% if page.taxonomies.series %}
      {% set tax=get_taxonomy_url(kind="series", name=page.taxonomies.series) %}

      {% for tax_page in tax.items[0].pages %}
        {% if page.path == tax_page.path %}
          {% set_global series_number = loop.index0 %}
        {% endif %}
      {% endfor %}

      {% set next_in_series = series_number - 1 %}
      {% set previous_in_series = series_number + 1 %}
      {% set series = tax.items[0].pages %}
      {% if series[next_in_series] %}
        <a href="{{ series[next_in_series].permalink }}">Next in Series</a>
      {% endif %}
      {% if series[previous_in_series] %}
        <a href="{{ series[previous_in_series].permalink }}">Previous in Series</a>
      {% endif %}
    {% endif %}
@Keats

This comment has been minimized.

Copy link
Collaborator

commented Jul 27, 2018

I'm not a big fan of having the taxonomy templates live at templates/$TAXONOMY_NAME because that could result in a very cluttered templates directory if users have multiple taxonomies. Could it be templates/taxonomies/$TAXONOMY_NAME instead?

I'm not sure about that, I'm guessing most sites will have 1-2 taxonomies so it isn't a big deal.

There doesn't currently appear to be a default template, even though the docs suggest that there is. I'm not sure if one is needed.

Fixed the docs

Should the term.pages objects have a weight field in case users want to sort items in the taxonomy by weight?

Sorting in taxonomies is another can of worm that I hope to defer for as long as possible

Confusingly, get_taxonomy did not work, but get_taxonomy_url worked and returned the full taxonomy

Fixed

Would it be possible to have earlier_in_taxonomy/later_in_taxonomy variables accessible on the taxonomy object?

I don't think it makes too much sense. Taxonomies are just a way to group content: when you are on the actual page, it will be rendered using the page.html template or a custom template, not using the taxonomies templates so showing a previous/next in taxonomy is a bit weird. If this is for a serie for example, it would make more sense to be in a section than a taxonomy.

@codesections

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2018

Thanks, all that makes sense.

On the last point, let me flesh out the use-case I had in mind. Take the blog post screenshoted below as an example:

image

It's part 8 of 9 in a series of posts from a blog that has ~10 categories. Posts in the same category aren't published consecutively but all form a cohesive unit when taken together. Thus, it makes sense to have a link to the next/previous post in the category even if they aren't consecutive (and you can see those links in the screenshot). But, on the main blog page, the feed of posts is strictly chronological.

Right now, I don't think gutenberg gives a very good way to deal with this scenario. You can have all the blog posts in a single blog section and put them in separate categories. But then you run into the problem in my code snippet above, where you need to (from the page.html template) import the taxonomies object and then loop over it oddly to get the next/previous page in the category.

Or, as you suggest, you could have each category in a separate section. But that would just move the difficulty around: then you'd need to do something fancy in the template for the blog page to get all posts showing up in a single chronological feed. And you'd have pretty messy URLs—e.g., you might have www.example.com/blog/programing/post-title for one post and 'www.example.com/random-thoughts/post-title` for another. It feels wrong to push people into different sections to get different categories.

All that said, the work-around I found above works fine, even if it's a bit inelegant. So it's not something I feel that strongly about; I just wanted to flesh out the use-case a bit to show you what I was thinking about.

@Keats

This comment has been minimized.

Copy link
Collaborator

commented Jul 27, 2018

For that kind of case, I would probably use a custom template for those series posts rather than use taxonomies. I can see the use though, maybe in a later version

@codesections

This comment has been minimized.

Copy link
Contributor

commented Jul 27, 2018

My testing turned up a couple of other, minor issues that I might not be worth changing. Other than that, this looks great!

  • All taxonomies are global and have a URL at the root of the site. This could potentially present two minor problems:
    • Taxonomies might be combined when users don't expect them to be. For example, if a ecommerce site had a products section and a blog section, and tagged pages in each section, they might expect to have a /products/tags page and a /blog/tags page, but they'd just get a `tags' page that has both products and blog posts.
    • The URL might not be what site visitors expect (e.g., it'd be /tags instead of /blog/tags)
  • Second (and somewhat related) taxonomies can shadow pages/sections of the site. For example, if a user has an authors page and an authors taxonomy, both will live at /authors. Right now, whichever page was last modified earlier will be inaccessible, and there won't be any warning that this is occurring. But this seems like an edge case.

@Keats Keats closed this Aug 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.