Skip to content

Loading…

Arbitrary collection support #1941

Closed
benbalter opened this Issue · 34 comments
@benbalter

Breaking out the notes from #1929 so we can discuss the idea with more granularity and (hopefully) track its implementation.

The basic idea is that right now Jekyll recognizes two types of content, posts and pages. Pages can live anywhere, and posts must live in a _posts folder, and must be named a with a specific file format (0000-00-00-title.md). Both posts and pages can have tags, but any tag or category YAML front matter from posts, makes its way into site.tags and site.categories. Essentially, we've got two ways to describe very similar things.

Instead, lets make the basic unit of Jekyll a CollectionObject (or something more creatively named). Think of this as similar to a Node in Drupal or a Post in WordPress. Out of the box, Jekyll will ship with Post and Page Collections initialized, but a user could introduce an infinite number.

CollectionObjects would contain the basic logic (yaml front matter, markdown, permalinks, etc), and could be extended by core collections. Posts for example, would have specific file names, Pages could live anywhere.

To implement a new collection (let's say "Puppies") a user would add the following to their config.yml file:

collections:
  - puppies

This would tell jekyll to look in the _puppies folder (_#{collection_name}) for markdown (or other) files. In that puppies folder, we could have rover.md and spot.md, and they'd be available as site.puppies (as would site.posts and site.pages continue to work). I could then iterate, filter, do whatever. Any file in the _puppies folder will be read in, with or without YAML front matter.

You could also have kittens as _kittens and cars as _cars just as easily:

collections:
  - puppies
  - kittens
  - cars

If you want each puppy rendered (e.g., site.com/puppies/rover.html (or puppies/rover/index.html) you'd add it to a render array:

collections:
  - puppies

render: 
  - puppies

Again, by default, core collections (pages, posts) are rendered just like before. The idea here is that if a user is using Posts for something other than blog posts, or hacking Pages to be categorized, Jekyll has already failed them. Instead of being categories of pages (or posts), they should be a Collection.

Here are the full, non-prose notes from the meatspace brainstorming session:

Collections

  • If people are using blog posts for a non-blog post thing, Jekyll has already failed
  • By default, Jekyll registers [posts, pages]
  • Can describe additional collections in _config.yml
  • Can describe render in _config.yml, which by default contains pages, posts
  • Collections rendered, render to /collection/#{object} - maybe give it a path?
  • Jekyll looks to “/_#{collection}”
  • FIles read in as site.#{collection}
  • Collection object basic functionality, extended by posts and pages methods
  • Anything in _collections folder is processed as a thing, much like _posts
  • Posts, pages extends Collection (Ruby Object) (or something)
  • Posts read tags, categories in to site.tags and site.categories, have dates in file names
  • Pages don’t live in _collections folder

h/t goes to @parkr for seeding the idea.

@konklone

Makes sense to me. Why are "posts treated separately anyway? (I haven't used them, only pages.) Could you genericize this by merging "posts" back into "pages"?

@lukewil

I think this is an absolutely fantastic idea. I use Jekyll to publish books on the web, where Jekyll page == book chapter, and with collections implemented this way every author would have a collection all his own.

Just from my days dealing with custom post types in WP, CollectionObject is probably a scary name from a new user's point of view, but it's a heck of a lot better than calling them post types and then having a built in post type called posts!

@benbalter

Why are "posts treated separately anyway?

In practical terms, posts have dates and are sorted reverse chronologically. They also live in a _posts folder so that they can be identified as posts.

Could you genericize this by merging "posts" back into "pages"?

That's the idea. We'd have a basic CollectionObject (or whatever), which would be extended by Posts and Pages to provide use-case-specific functionality such as date handling.

Pages would be extended to allow them to live anywhere (e.g., not in a new _pages folder or something)... the existing behavior.

Posts would be extended to support date logic and to continue to pipe tag/category metadata into the site.tags and site.categories name spaces.

Think of it similar to WordPress's custom post type support, but for a static site.

@benbalter

CollectionObject is probably a scary name from a new user's point of view

Agreed. I like Collections, which is intuitive to non-technical users, but the individual thing collected needs to be called something in the abstract... I just don't know what.

Glad to put up a personal bounty of a (cup | glass | mug | shot | pint) of your favorite beverage to anyone who thinks of the better name. :smile:

@drinks

I love this idea. Is Document too ambiguous a name for a collection's instance object?

@stevector

Sculpin, a Jekyll-like tool built in PHP, just added a "Content Type" concept that sounds quiet like what is getting called a "collection" here. I suggested "content type" for Sculpin because I work primarily in Drupal and that is Drupal's name for the concept (well also "node type"). Here is the pull request @simensen just merged yesterday sculpin/sculpin#96

@scribu

If people are using blog posts for a non-blog post thing

Example?

@stevector

Hi @scribu, here's an example: the healthcare.gov "front end" which was built in Jekyll (http://developmentseed.org/blog/new-healthcare-gov-is-open-and-cms-free/) seemed to have a content-type-like concept within its _posts directory. The canonical repo was pulled down but I've still got a fork: https://github.com/stevector/healthcare.gov/tree/master/_posts

@simensen

It is fun to see this thread here. I'd be happy to talk about Sculpin's implementation or answer questions about how we ended up with what we did.

In Sculpin, everything was always considered a Source (all of the files) and Posts were always just a special case of a Source. That is, Posts were actually Sources themselves. The "collection" of Posts was really just a special collection of sources.

Initially there was a lot of "post" specific classes created (PostsCollection, PostItem (that wrapped a Source)) but with the Content Types stuff I've finally boiled it down to just "Content Types" and you can now define them via a kernel configuration. It uses a lot of magic to determine defaults for things ( projects are found in _projects and automatically selects the layout "project" ) but most of that can be overridden.

I opted to handle taxonomies on a type-by-type basis rather than site-wide. This means that if you have a configuration like:

sculpin_content_types:
    posts:
        taxonomies: [tags, categories]
    projects:
        taxonomies: [tags]

... you end up with five data providers, posts, posts_tags, posts_categories, projects and projects_tags. Meaning taxonomies are grouped with their type.

Adding site-wide taxonomies after the fact would be easier than trying to make them global to begin with and later deciding, "we want to support a separate set of tags for posts and projects" down the road.

@troyswanson
Jekyll member

How would the "Collections" feature sort files within a directory? For instance, in the example @benbalter gave, how would you tell the system that you wanted "spot" to appear first and "rover" to appear second? This is the same question I posed in #670 and didn't really get a good answer.

@benbalter

How would the "Collections" feature sort files within a directory?

Great question. I'd imagine alphabetically (or however the file system sorts it) within the site.#{collection} array, and ideally, we'd empower the edge to sort. E.g., a sort_by filter, to sort by the value of a given YAML property. Dumb core, smart edge. Allow for maximum flexibility across use cases.

@afeld

How would the "Collections" feature sort files within a directory?

I'd imagine alphabetically (or however the file system sorts it) within the site.#{collection} array, and ideally, we'd empower the edge to sort

Yep, sounds consistent to me: #1848 and #1849

@simensen

I wanted to ensure that collections could be sorted arbitrarily. One set (posts) might be pathname based, and another (say, projects) might be sorted based on title.

I was able to achieve this in a pretty nice way with the DiC I'm using (pasting the link to the commit below) but before I went that route I was considering making a bunch of sorters that you could configure:

collections:
  puppies:
    sort: meta
    meta_sort_key: breed
  kittens:
    sort: meta
    meta_sort_key: breed
    direction: ascending
  cars:
    sort: meta
    meta_sort_key: [size, name]
  posts:
    sort: filename
    direction: ascending

By having the collection type as a keyed array, you can have more flexibility down the road. For example, you could set the default layout or permalink strategy based for the type.

collections:
  puppies:
    sort: meta
    meta_sort_key: breed
    layout: puppy
    permalink: animals/puppies/:breed/:name

sculpin/sculpin@5043086

@troyswanson
Jekyll member

I'm afraid I have to be blunt - that solution is a pain in the ass.

The way I've seen a lot of file-based CMS's handle sorting is to use an integer value at the beginning of the filename. So, for instance, the file structure might look like this:

/_puppies
    01-spot.md
    02-rover.md

I suppose you could leave the sort_by filter in there in case there is a real reason for sorting by some random value (that isn't something obvious like a key/value pair like order: 2). Imo, this method eliminates so much configuration and annoyance that will very likely crop up as a result of this.

EDIT:
@simensen, I wasn't referring to your solution as a pain. That said, I'm firmly of the mind that the complexity of your solution is entirely too high for what most use cases of Jekyll are (anecdotally, obviously - I have no data around that claim). I'm probably being idealistic when it comes to simplicity, but this all seems very difficult to set up and train folks on. I'm not usually a fan of :sparkles: magic, but when it's obvious what is happening, it's simple, and it's consistent, I'm for it.

@budparr

I use posts for anything date-based and segregate various post types by category. I use category folders ('/category-name/_posts/yyyy-mm-dd-post-name.md') for organization and content creator convenience. Sounds very much like I would be able to replace that set up with collections, but does collections care if the collectionObject is date-based, or not, or if a collection has mixed content?

I'm working on a project where several different posts types will fit into a common category (or would-be collection): So, under a high-level category I have multiple date-based discussion posts surrounding a common (to a group) date-based question (and possibly even pages with background information). I'm separating them out using posts and pages along with categories. I think collections would help here because it gives me a higher level hierarchy to start with because I could have categories within collections (that is, be able to filter posts down from collection to categories within the collection.

The current (in progress and subject to change) hierarchy is here: https://github.com/sonnetmedia/prototype-iliadx-aba/tree/master/international-human-rights-law

Hope that's helpful to the discussion. Your explanation is certainly helpful after I puzzled over the original vision post for a while!

@mattr-
Jekyll member

If people are using blog posts for a non-blog post thing
Example?

Jekyll's own site used to consist of posts instead of pages before @benbalter we converted it to use pages in e1f0496

@mattr-
Jekyll member

I'm afraid I have to be blunt - that solution is a pain in the ass.

Like @benbalter said, "Dumb core, smart edge." If a user doesn't like the default sorting (whatever that happens to be) then they have the power to change it. Or they can disregard any sorting entirely and implement navigation similar to how we implement it on our own site.

I'm afraid I don't see what's so wrong with allowing the user to sort on their own.

@troyswanson
Jekyll member

@mattr- I understand that adage and it's totally fair, but how edge is it to want to sort pages? When it comes to documentation of any kind, I imagine sorting is of paramount importance.

I'm not saying there is something so wrong about allowing the user to sort on their own - in fact I mentioned that there's no reason why the sort_by filter would have to go away so that folks could still do it that way. I guess my problem with it is that it's not obvious and it's always manual. Any kind of sorting activity will require at least some level of custom code.

By the way, my side discussion about sorting might be taking away from the original conversation about collections, so I'm going to stop talking about it. I'm all for collections, I just want them to be well-organized. :smiley:

@mattr-
Jekyll member

Other than a default alphabetical sort order (for pages at least), Jekyll doesn't want to be responsible for how a user organizes their content.

@penibelst
Jekyll member

Some of the mentioned problems are based on peoples bad URL design. But I feel, that arbitrary collection support would simplify Jekyll’s model, which is a nice step.

  1. How do _data fit into this model?
  2. What about default front-matter?
  3. What about collections of collections? (Xzibit mode)
@cobyism
Jekyll member

In terms of terminology used for this idea, is there a reason Model isn't good? A lot of people will be familiar with the term from Rails/other MVC frameworks, and I think it's pretty much exactly the same concept when you get down to it. For example, implementing the "puppies" example from the top to the DB of a rails app would be done with something like:

rails generate model puppy name:string [...]

Thoughts?

@circa1977

I've been keeping an eye on Jekyll for the right project or some upcoming publishing goals. I've been struck by the impression that only one... collection was supported (blog) in addition to pages, when many sites need a blog, news/press, and events, for example. This feels right to me.

I've generally no interest in seeing things become complicated and abstract, but it's unrealistic for every Jekyll-powered site to be limited to 1 blog and some pages.

@penibelst
Jekyll member

@circa1977 and the others say that Jekyll is

limited to 1 blog and some pages

I can’t believe what I read. I hope all of you know be heart Tim Berners-Lee’s Cool URIs don't change from 1998. Remember, it’s not necessary to put every single peace of information in the URL.

Please, please, show me one existing static website you can not build with Jekyll’s posts and pages.

@circa1977

@penibelst

That's not quite what I said, so you shouldn't believe what you read. I said:

I've been struck by the impression that only one collection was supported (blog) in addition to pages

If you rely on the docs, you can create a set of blog posts or pages. Having a generic entity for multiple items alongside each other -- news, events, blog posts, products -- as well as specific pages would be a more practical solution, and a better documented method of building robust websites with multiple content collections.

If that's possible today, I also pointed out that:

I've been keeping an eye on Jekyll

I haven't built something with Jekyll yet. But I have read the docs. And the proposal here addresses a perceived shortcoming that is clear in the docs. If there's a way to do it today, it should be in the documentation. If it's not there because it's a roundabout way, I'm voting in favor of making it more straightforward.

I'm not entirely sure where I made any implications about what should go in a URL.

@wdenton

This seems like a great idea. Making pages and posts the same kind of thing (but still letting them be used as they are now, in the core), but allowing other varieties of that generic thing, and the means the handle them, would be very nice.

@afeld

Couldn't the collections be implicit based on the folder name? In other words, if there are a bunch of .md or .html files under a _puppycats directory, they could be collected under site.puppycats or site.collections.puppycats.

@penibelst
Jekyll member

I haven't built something with Jekyll yet.

I noticed.

@parkr
Jekyll member

Couldn't the collections be implicit based on the folder name?

Interesting proposition, but that introduces a bit of reverse magic I'd like to avoid. We do that for data already and it feels weird to me.

@benbalter

A lot of people will be familiar with the term from Rails/other MVC frameworks

Totally agree, but the venn diagram of Jekyll users and Rails users may not overlap as well as it does with other Ruby frameworks, say Sinatra.

I'd rather optimize for somewhat non-technical users. It's a lot easier to say a Frindle is like a Model, than to try to explain MVC to someone who's never been exposed.

Document? File? The bounty still stands. :smile:

@lukewil

I couldn't agree more, Ben. Never done any work in Rails (or even Ruby) before I got started with Jekyll in last Oct/Nov, and had it would have been utterly confusing for me to be reading about "models" on day one.

@cap10morgan

Couldn't the collections be implicit based on the folder name?

:thumbsup: @afeld. Convention over configuration is the way to go!

@parkr parkr referenced this issue
Merged

Call me collections #2199

6 of 6 tasks complete
@parkr parkr closed this in #2199
@pmackay

My understanding is that collections support doesnt include categories or tags. Is that true? Is it by design? Or would a valid new feature issue to post be to add that support? It would be useful IMHO.

@parkr
Jekyll member

My understanding is that collections support doesnt include categories or tags. Is that true? Is it by design? Or would a valid new feature issue to post be to add that support? It would be useful IMHO.

You're right – at the moment, the categorization of documents is not by traditional categories or tags, but rather by which collection they belong to. We thought this was a better idea – all related posts could be categorized in the filesystem, giving at-a-glance clarity to the category of a particular document.

@melborne melborne referenced this issue in jekyllrb-ja/jekyllrb-ja.github.io
Merged

collections.md初版翻訳 #193

@erlend-sh

Would love to see this going into the 3.0 rewrite #2636, as it would certainly remedy most of my confusions expressed in #2391.

File and Document both have my vote. I'll add Item to the candidates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.