WIP: refining/clarifying data dir functionality #4379
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[EDIT: The YAML specific issues previously raised herein were addressed by #4402, allowing me to narrow this issue.]
I started by looking into #4138, #3890, #4366, #4083 and #2441, but of course that lead deeper into the rabbit hole of of how Hugo is supposed to work or how it should work. I believe the analysis below is worth making and important. Even if the answer is to keep everything as it is, the clarifications I make probably should make it into Hugo documentation. But the length of my attempt at clarification below is an indication that current behavior might be too complicated. Worth repeating a little more loudly:
I chose to do a PR so I could include code that demonstrates current behavior. If it is decided to change this behavior, I could amend this commit with such changes.
Without further ado…
current Hugo behavior
Hugo loads data files into a data tree rooted in the
.Site.Data
variable. It translates the relative filesystem paths of each file into relative tree paths to each file's data within the tree. The last node in the tree path corresponds to the filename. Let's call this the file's "tree insertion point".One consequence of this is that file paths are indistinguishable from data. For example:
and
both produce
Another consequence is that given multiple data files, the data can overlap. When this happens data can be either combined, merged, or discarded according to precedence rules. The current rules are as follows:
If you want to see the code behind this, it's all in this method. But it will be a lot easier if you just look at actual Hugo inputs and outputs in the next section.
current behavior, illustrated by actual Hugo results
I constructed a scenario for which the current behavior could make sense, but also included within it an example of how it potentially breaks down or becomes confusing. The demo data files shown below are embedded in the new demo test included in this PR, and the output also shown below is encoded as expected test output (the test passes).
First, the user uses a theme designed for a music oriented website. The theme includes some music data that it uses for genre-specific pages layouts:
File 1: <theme>/data/music/genres.json
The user takes advantage of the "user data has precedence" rule, overriding the icon for one of the theme defined genres and also adding three new genres:
File 2: data/music/genres.json
The user then takes advantage of the "deeper data file has precedence" rule, adding a new field to one of the genres:
File 3: data/music/genres/blue-eyed-soul.json
The user then adds a data file for actual music that will be listed on the site. While it references the genre data (essentially via a foreign key), it is supposed to be separate table of data:
File 4: data/music.json
Here is the resulting data tree that Hugo makes available to templates via
Site.Data
(shown as JSON):non-obvious consequences
The non-obvious consequences are:
Data that doesn't belong in the same set can get mingled together. Grafting data files at deeper nodes in the tree can result in potentially useful override of data inserted at shallowed nodes (e.g. File 3 and File 2 respectively). But the same behavior can also result in data that should be distinct getting mixed together. File 4's song titles are mixed up with the genre list sourced from the other files. It's not obvious that data from files named
data/music/genres.json
anddata/music.json
would be mingled this way. Imagine the confusion when a template ranges over.Site.Data.music
.Merging of mapped data is "shallow", with map entries at the root of the data file being inserted or rejected wholesale. There is no attempt to merge the values of two colliding keys. Thus two maps with 10 entries each with one overlapping key will result in 19 entries, and the data for that one overlapping part aren't merged. You can see this in how the
rock
genre data in File 2 replaces rather than merges with the info in File 1. Likewise theblue-eye soul
genre data in File 3 replaces even the non-colliding leaf data in File 3. In both cases this is the opposite of what my imaginary user expected. Though Hugo emits useful warnings when this happens, I'm not sure that makes up for the complexity and potential for confusion:Hugo performance. It likely complicates any solution to Data Files eat memory #1065.
questions
decisions
See non-obvious consequences above for definition of intermingling vs merging.
Data intermingling
Keep things just as they are. Users can avoid the complexity if they want to.
Require all data be inserted at leaf nodes in the tree. This prevents the unintentional mingling of different data sets; every data set is in it's own sandbox. Uses can still override theme data, as files can still target the some leaf node. It makes it easier to support other data types (such as non string-keyed maps) in the future.
My recommendation is to do
#2
. It makes hugo data handling is far easier for users to understand. By removing the unexpected intermingling of data, costly confusion is avoided and data integrity is improved. It will make a solution to #1065 far easier.Data merging
Remove support for merging. No two files can have the same tree path. Leave merge semantics to the user, in the templates. The user can use arbitrary logic to figure out how one set of data overrides or gets merged with another.
Keep things as they are, Shallow merging.
Support deep merges. This would essentially add inheritance semantics to the data, addressing the issues with the rock and blue-eyed-soul genres in the example.
I lean toward
#1
. But I am unsure of current usage or its popularity.#2
as I stated in non-obvious consequences can result in confusion, and limits its use. My gut says do no merging or go all the way. But since the user can always use whatever merge logic they want in their templates,#1
makes most sense.