Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchical Yaml Config #2218

Closed
waylan opened this issue Oct 29, 2020 · 19 comments
Closed

Hierarchical Yaml Config #2218

waylan opened this issue Oct 29, 2020 · 19 comments

Comments

@waylan
Copy link
Member

waylan commented Oct 29, 2020

In the real world, many users have bits and pieces of various MkDocs sites which are shared. They have shared content among multiple different sites. Or they have multiple sites which all come under the same organization and share much of the same config with only the content being different. Or they may have some combination of the two. Often times, they may even include the source of all of their sites under a single repo.

However, MkDocs requires that each 'site' be completely defined in its own standalone config file. The assumption is that each site is in its own repo. There are no options to include or inherit options from other config files. Then the build command must be run either with the config file explicitly named or from a unique subdir. To edit/change a single option across all sites, the option must be manually edited in every individual config file.

Naturally, many users have developed various hacks to work around those limitations. And every time we want to move forward with certain types of changes, those hacks always get in the way. Therefore, I propose we officially support some type of hierarchical system which can pull config options from multiple files together for a single site. Unfortunately, there is no single obvious way forward here as YAML provides little out-of-the-box in this regard. Therefore I have outlined multiple different possible solutions below.

To be clear, I am not promising that we will implement anything proposed here. The goal of this discussion is to explore and narrow the options down to determine if we can settle on a single workable solution. The conclusion could be that we won't do anything. Obviously input from users who have real world experience with the situations described above would be most helpful here.

PyYAML's built-in inheritance

The PyYAML lib, which we use to parse the config files, has limited support for inheritance.

A config file might look like this:

dev: &default
    site_name: My Site
    theme:
        name: material
        # complex theme config here...

production:
    <<: *default
    site_url: 'https://example.com'

files:
    <<: *default
    site_url: ''
    use_directory_urls: false

Of course, with that config, we obviously want to use a different one of the configs for the dev server, the production website and files which are distributed for browsing on the file system. We would need provide some extra commandline flag to specify which one to use. But we might also define some defaults in the config file as well. Perhaps something like this:

defaults:
    - command: serve
      use: dev
    - command: deploy
      use: production
    - command: build
      use: files

Then, MkDocs would check the defaults and compare that against the command used to determine which of the configs to use. That seems simple enough to implement and requires no extra dependencies. However, I don;t really think it is the right solution. In addition to requiring us to add another commandline flag, there are a number of limitations.

All separate configs must be defined in the same file. We do not get deep merging of nested key-value pairs. We could only override or add root level items. In other words, to override one item three levels deep into the theme config would require redefining the entire theme config. The same goes for a multilevel nav. This StackOverflow question explores workarounds for these limitations.

File include

A simple implementation of this was proposed and rejected in #942. However, I think it is worthy of reevaluation. Of note is the pyyaml-include library, which provides a ready-made constructor. We would only need to add that constructor to the existing loader. To implement the same config as above, one might do:

theme-option.yml

name: material
# complex theme config here...

site-name-option.yml

"My Site"

dev.yml

site_name: !include site-name-option.yml
theme: !include theme-option.yml

production.yml

site_url: 'https://example.com'
site_name: !include site-name-option.yml
theme: !include theme-option.yml

file.yml

site_url: ''
use_directory_urls: false
site_name: !include site-name-option.yml
theme: !include theme-option.yml

To call this, you would need to explicitly provide one of dev.yml, production.yml or file.yml to the build command. There is no obvious way to define defaults here other than naming one of the files mkdocs.yml.

While this is very powerful, it requires a lot of files to be useful. As there is no inheritance, every root level key needs to be defined in every config as either a hard value or as an included value.

It seems to me that this might be more useful in stitching together a complex navigation (where each subdir might have its own nav.yml file and all are included in the base config), but is not really useful for the rest of the the config.

File merge

I found at least two libraries which will deep merge multiple YAML files together into a single config: yamlreader and hiyapyco. They both take a list of YAML files, and then deep merge them in order.

Deep merging allows a user to redefine one obscure theme config option three levels deep, or insert a new subsection in the nav multiple levels down. However, the user needs to pass a list of config files in order every time. Perhaps something like:

mkdocs build -f foo.yml -f bar.yml -f baz.yml

Mixing up the order would break/change the config in possibly unintended ways. There is no way to define the hierarchy in the config files themselves. While this has the potential to be very powerful, I don't find if particularly compelling from a usability perspective.

File inheritance

We could implement our own inheritance scheme. A YAML file would define its parent. That parent would then be parsed, and the current file would then get deep merged with the parent. Of course, when the parent was loaded, it was recursively loaded in the same way so that there could be any number of levels of inheritance. Some example libraries which might be useful for the merging include jsonmerge, mergedeep, and deep_merge (not a comprehensive list). jsonmerge has an interesting option of being able to define a scheme so that certain keys can be merged in different ways that others. The other solutions all provide a single merging behavior for everything.

There is no existing implementation that I could find so we would be inventing our own syntax here. I'm not sure how best to indicate that a file inherits from another so I went with !inheritsfrom filename.yml (I also considered !parent filename.yml). Suggestions are welcome. In any event, the same config as above might be defined like this:

dev.yml

site_name: My Site
theme:
    name: material
    # complex theme config here...

production.yml

!inheritsfrom dev.yml
site_url: 'https://example.com'
theme: some: deep: option: "new value"

Note the override of theme.some.deep.option which would only override that value and not any of the rest of the deeply nests structure. See the docs for the libs linked above for details.

files.yml

!inheritsfrom production.yml
site_url: ''
use_directory_urls: false

As with the "file include" proposal above, to call this, you would need to explicitly provide one of dev.yml, production.yml or file.yml to the build command. There is no obvious way to define defaults here other than naming one of the files mkdocs.yml.

While this seems like the best option to me, it also has the largest barrier in that we would be building and maintaining it ourselves. We would prefer to use something which was developed and maintained as a separate library.

@waylan
Copy link
Member Author

waylan commented Oct 29, 2020

In case I haven't made it clear, all of the proposed solutions would parse the various YAML files, merging or including them into a single config object. Only after the single config object is generated would we pass it to MkDocs' config validation. There would be no way for MkDocs' internal build process to even know that the config was defined in multiple files or which parts were defined where.

@waylan
Copy link
Member Author

waylan commented Oct 30, 2020

I created a very limited proof of concept for the File inheritance option here. It's not usable like that, but at least proves that it's possible. See also the comments there. And please discuss any proposed changes there, not here.

@waylan
Copy link
Member Author

waylan commented Nov 2, 2020

After thinking on all of the the above proposals some more and exploring more code as well as the YAML spec, I have some more thoughts on each of the proposed solutions:

PyYAML's built-in inheritance

While this has built-in support, it would require a major scheme change for MkDocs. To maintain backward compatibility, we would need some way to inform the config validator that the new different scheme is being used. Additionally, as there is no support for deep merging, I don't find this proposal very compelling.

File include

I'm torn on this one. On the one hand, it is easy to implement as existing libraries exist; it is easy to understand how it works; and it is extremely powerful. However, it doesn't seem like the best solution for what we are trying to accomplish. It might make more sense if we could restrict it to only work inside the nav or something, although I'm not sure that would be straightforward to do.

File merge

Based on the fact that their is no way to define the hierarchy within the config, this one is removed from contention.

File inheritance

While this one seems to be the most promising, the issue is with the YAML spec. The proposal as I originally wrote it included Invalid YAML (which is discussed in more detail here). The minimum valid YAML to make it work would be:

somekey: !inherit foo.yml
site_name: My Site

As we need to define the include within a key anyway, perhaps we could do the include/merge as a post-processor instead if having it be incorporated into the YAML parser itself. Therefore, the config might look like this:

inherit: foo.yaml
site_name: My Site

Then after parsing that and getting the Python Dict object, check for the inherit key and if it exists, load the file specified, remove the inhereit key, and then merge the two. An added benefit is that as we are no longer using a custom tag, any YAML parser which doesn't have knowledge of the custom inherit tag won't error on the unknown tag.

A simplified implementation might look something like this:

def load_config(file):
    config = yaml.load(file)
    if 'inherit' in config:
        with open(config['inherit']) as fd:
            parent = load_config(fd)
        config.pop('inherit')
        config = merge(parent, config)
    return config

Of course, the above lacks error checking and fails to account for relative paths etc. but it conveys the general idea . As it recursively calls itself to load the parent, there can be as many levels of inheritance as one wants.

After completing the above, the final merged config would be passed to the validator.

@waylan waylan added this to the 1.2 milestone Nov 9, 2020
@osresearch
Copy link

osresearch commented Nov 19, 2020

The !include extension plus environment variables seems like the most generic sort of functionality, which would reduce quite a bit of the hackery I'm using to generate mkdocs.yml files. Meta-Templating systems to control the configuration of templating systems are certainly a thing...

Restricting it to nav: seems very limiting - one of my uses is for a large list of redirects since I want to maintain compatibility with an old mediawiki URl scheme, so I have a script that generates redirect_maps permutations of file names with replaced by _ and leading lower case file names matched to their uppercase files.

And since there is already a custom yaml loader in use, adding the additional constructor as described in https://stackoverflow.com/a/9577670/14105 wouldn't be very much new code.

@fralau
Copy link

fralau commented Dec 7, 2020

Not sure whether this could help, but at present the mkdocs-macros plugin may provide solutions to a subset of the problems invoked (hierarchy of yaml files).

You might look at it as a workaround (disclaimer: I wrote mkdocs-macros, so my comment might come across as a plug for a plugin).

Anyway:
1. Include pieces of other yaml files into mkdocs.yml
2. Include other markdown files into existing ones.
3. You might also write a macro module, to do fancy footwork, such as adding navigation items etc. A macro module (written in Python) is to plugins, what plugins are to mkdocs (call it a pluglet? 🤔 ).

@waylan
Copy link
Member Author

waylan commented Dec 7, 2020

So it seams that every one is advocating for includes. While I recognize their power, I see them as a support nightmare and want to stay away from them. Why can't any of the other solutions work instead? To me, inheritance seems like the most sensible solution.

Personally, I don't need any of these features, so I would like some concrete use cases to be provided here. Please demonstrate why inheritance won't work for you and you need includes.

@fralau
Copy link

fralau commented Dec 7, 2020

For what it's worth, here is how I implemented a similar function. The purpose was to merge several files containing extra values, but I guess it could be generalized.

Here is the original issue that started this thought: fralau/mkdocs-macros-plugin#15

Basically the solution we used does not rely on any yaml extensions. Instead, it starts from a list of files (in my case an argument for the plugin, called include_yaml). It then attempts (at the on_config() stage) to make an update on the data structure (more on this later).

    def _load_yaml(self):
        "Load the the external yaml files"
        for el in self.config['include_yaml']:
            # get the directory of the yaml file:
            filename = os.path.join(self.project_dir, el)
            if os.path.isfile(filename):
                with open(filename) as f:
                    # load the yaml file
                    # NOTE: for the SafeLoader argument, see: https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation
                    content = yaml.load(f, Loader=yaml.SafeLoader)
                    trace("Loading yaml file:", filename)
                    update(self.variables, content)
            else:
                trace("WARNING: YAML configuration file was not found!",
                    filename)

Here is how the update() function was implemented, to merge the two dictionaries:

def update(d1, d2):
    """
    Update object d1, with object d2, recursively
    It has a simple behaviour:
    - if these are dictionaries, attempt to merge keys
      (recursively).
    - otherwise simply makes a deep copy.
    """
    BASIC_TYPES = (int, float, str, bool, complex)
    if isinstance(d1, dict) and isinstance(d2, dict):
        for key, value in d2.items():
            # print(key, value)
            if key in d1:
                # key exists
                if isinstance(d1[key], BASIC_TYPES):
                    d1[key] = value
                else:
                    update(d1[key], value)

            else:
                d1[key] = deepcopy(value)
    else:
        # if it is any kind of object
        d1 = deepcopy(d2)

The advantage of that approach, is that one decides (with Python code) what the best merging strategy should be. Perhaps it is not perfect, but the idea worked.

@osresearch
Copy link

Inheritance would likely work for my use case, although it seems significantly more complicated to implement compared to the straightforward inclusion. Would multiple inheritance be supported, however? I would like to separate out the generation of both the redirect_maps: as well as exclude-search: and nav: sections, all of which would be stored in separate generated files.

@fralau
Copy link

fralau commented Dec 11, 2020

I believe I understand, more or less, what inheritance could mean, but how that would apply in the case of YAML files is a bit fuzzy. Does it mean that the `mkdocs.yml' file of my mkdocs project gets to say "I import YAML files X, but I reimplement that part?" (overririding, or implementing).

I would have a tendency to look it at the other way round: i.e. that the external YAML files are "pieces and bits" ("odds and ends") that I want to assemble. Meaning I would want to say: "I want to import/append that branch of file X, and that branch of file Y". Am I on the opposite side of the spectrum ?

@waylan
Copy link
Member Author

waylan commented Dec 11, 2020

Would multiple inheritance be supported, however?

Yes, but not in the way you seem to want. For example, C inherits from B, which inherits from A. However, C cannot inherit from A and from B. In fact, there would be no reason to, as any document that inherits from B will always get A as A is B's ancestor.

This is the one clear advance that includes have over all other options. Includes make it easy to stick multiple files together into a single config. The objection I have to them is that they are more complex for the simple cases. And I have never needed anything but the simple cases in anything I have worked on. I see MkDocs as catering to the simple cases. If you want to use it for the complex cases, that's fine, but you may need to jump through a few hoops to make it work.

My point is, no-one has yet provided me with a convincing reason to use includes.

@waylan
Copy link
Member Author

waylan commented Dec 11, 2020

@fralau you got it. That is exactly the difference between inheritance and includes.

@fralau
Copy link

fralau commented Dec 11, 2020

@waylan Thanks. I like the idea of pulling files recursively (which BTW, would also have to work with the includes).

But to make sure we are on the same page, could you give us a simple example of YAML inheritance, with files A, B, C ? (C would be the final config file for the MkDocs project, right?).

@waylan
Copy link
Member Author

waylan commented Dec 11, 2020

To elaborate on the simple case I mentioned earlier. Go back to my original post. I have two sites which only have a different site_url and site_name but are otherwise identical. Includes are overly complex for that case. Inheritance makes much more sense.

However, every use case which argues for includes are very complex cases, which is not MkDocs target. So, to support includes we would first need to change MkDocs target audience. I'm not sure we want to do that.

could you give us a simple example of YAML inheritance, with files A, B, C ? (C would be the final config file for the MkDocs project, right?).

Yes, C would be the final target. Imagine a organization with multiple MkDocs sites which all share the same basic design. They would all have the same customized theme settings, etc, which would all be defined in A (I'm assuming A would not be deployable on its own). All of the various individual sites would inherit from A and define the unique items for each individual site (docs_dir, site_name, site_url, etc). One of those sites would be B. However, there could be a duplicate of site B which only exists at a different URL. C inherits from B and defines an override for the site_url only.

For that to work with includes, you would need to define each of the parts of A in separate files. Then in B, you would need to include each part (theme, plugins, etc). Finally you could define the parts unique to B. However, in C, you would need to copy everything you have in B and then change the one setting which is different. In this use case, inheritance is clearly simpler.

@fralau
Copy link

fralau commented Dec 11, 2020

@waylan I find your example of inheritance compelling. I do see how that could be used to maintain a series of different sites, with common features (particularly the plugins, etc.), and only specific itmes title, url, etc. which are different. That could make the creation/deployment of a new MkDocs site much faster and painless, including in large organizations.

I see, however two cases that could happen in a 'C' file:

  1. Replacing a branch (e.g. I want to have my own extra, or my own plugins), and that will overwrite what comes from B (and indirectly from A).
  2. Complementing a branch, typically I want to add my own extra, plugins, extra_css, markdown_extensions) to the inherited set. With the proviso, that sometimes I might want to redefine an item already defined (e.g. a plugin or an extension), but with my own parameters (depending on how this is implemented, that might be a corner case or not).

Furthermore, that does not "exclude includes". I suppose that hacks for includes will crop up regardless, like mushrooms always do; but this time, they will integrate within the framework you will have set... That's the ecosystem 🤷‍♂️

@waylan
Copy link
Member Author

waylan commented Dec 11, 2020

@fralau those cases are covered. Consider this base config which is truncated to the relevant section only:

markdown_extensions:
    - smarty
    - sane_lists
    - toc:
         separator: "_"
         permalink: True

Now, suppose for a specific site you want to disable permalinks on TOC.

inherit: base.yaml
markdown_extensions: 
    - toc:
        permalink: False

The two would be deep merged so that the config would be interpreted as:

markdown_extensions:
    - smarty
    - sane_lists
    - toc:
         separator: "_"
         permalink: False

The same could be done with plugins, themes, nav, etc.

@waylan
Copy link
Member Author

waylan commented May 6, 2021

Sigh. I worked really hard to develop a proposal for using inheritance. But there is a serious problem with my proposal: It ignores the fact that we use lists extensively throughout our config. If we simply used key/value pairs, then deep merging would be easy. There are multiple libraries which already do all the hard work. But none of them work with lists. When they encounter a list, they simply replace the list with the new one, which means all children of the first list are lost. Therefore, the example I gave in my previous comment actually doesn't work.

Maybe a custom solution could be built, but that is a lot of work and would create an added unwanted maintenance burden. As most lists in our config consistent of key/value pairs where only one pair is provided (a Python dict of length 1), we could try to come up with a solution where only the pair with the matching key is replaced in the list, but I couldn't find any implementations anywhere that does that. It feels too much like we would be inventing a weird one-off thing.

Given the above, it may be that includes are the way to go, which is too bad because I still think inheritance would have been a cleaner solution.

@waylan
Copy link
Member Author

waylan commented May 18, 2021

It recently occurred to me that there could be an issue with any of the proposals in this issue when being used with local paths as values. All paths defined in the config are resolved as relative to the config file. However, if multiple files are being merged (whatever method is used to do so), then the end result needs to have all paths relative to the "primary" config file, which may be in a different location than any of the include/inherited files. However, as the values are defined in the included/inherited files and may get included in or inherited by multiple different "primary" config files, the base of each relative path would need to be adjusted accordingly. I see a few issues with this:

  1. This is not something MkDocs can address after the YAML parser (in config validation) as MkDocs would have no knowledge of the paths of any included/inherited config files as well as which file any given value came from. Therefore it would need to be handled by a YAML plugin.
  2. YAML does not have a "path" or "url" data type, which makes it difficult to address this in a YAML plugin directly. Its not impossible, but the plugin would need to include a complete implementation which both recognized local relative paths and modified them, based on the relative locations of the config files being merged. I'm not aware of any preexisting YAML plugins which offer this functionality.

The one exception may be the "File Merge" proposal above. In that proposal, MkDocs is actually passed the path to every config file and merges them after the YAML parser parses each and returns a dict. As MkDocs itself would be merging the dicts, if would allow us to address this directly. However, as that proposal provides no way to define the hierarchy within the config files, it was rejected.

Of course, we could avoid using any YAML plugin and simply wrap calls to the YAML parser in our own code. 1) parse the "primary" file; 2) check the returned dict for "include" or "inherit" directives; 3) if found, parse the included/inherited YAML file; and 4) merge the dicts while adjusting path values.

Another option is to address this simply by documenting that any local path settings would need to be defined in the "primary" config file (or at least always relative to the "primary" config file). This also means that the user would not be able to use includes to stitch together a nav from multiple config files. Although, that has never been a goal of this proposal to begin with.

I think I'm inclined to go with the last option (document this as a limitation). If, in the future, it becomes a frequent issue for users, it can be reevaluated at that time and the implementation can be updated to account for it. However, if the requests are all (or at least mostly) about being able to stitch together a nav, then I think that should be resolved separately by MkDocs using some other directive which is not resolved by the YAML parser (perhaps in the call to get_nav).

@waylan
Copy link
Member Author

waylan commented May 18, 2021

Thinking about this some more, here is my current proposal for implementing config inheritance:

  1. The config validation for markdown_extensions and plugins would be modified to accept two different formats. The list based format we have now, and a new key/value pair format. If the later format is detected by config validation, the values would be converted to the existing format to maintain backward compatibility.
  2. As YAML parsing (including the inheritance plugin which does the merging) would happen before config validation, then we would document that in order to use inheritance within the markdown_extensions and plugins options, the user would need to use the new format only. In this way, all merging would happen to Python dicts, which gives us the option to choose from multiple existing third-party libraries.
  3. This would effectively exclude the nav config option from the equation as it also relies heavily on lists. At best, a user could override the complete nav. However, the user could not stitch together bits and pieces of the nav. This would alleviate the concerns about relative paths needing to be resolved. Presumably, a separate solution could be worked out for this which would be implemented outside of the YAML parser. Any such solution should be developed (and discussed) separately from this.
  4. The theme config already uses key/value pairs rather than lists, and is therefore not an issue. No other options currently support deep nesting and should not be an issue.
  5. As plugins define their own validation, any third-party plugins which support deep nesting of config options will be able to take the same steps to add support if need be.
  6. We will need to include a warning in the documentation reminding users that docs_dir, site_dir and similar settings will be resolved as relative to the "primary" config file, so they need to be careful about trying to be clever with any path-based values they define in inherited config files.

Given the above, the example provided in this comment would instead be as follows.

base.yml (the "parent" config)

markdown_extensions:
    smarty: {}
    sane_lists: {}
    toc:
         separator: "_"
         permalink: True

mkdocs.yml (the "primary" config which is passed to MkDocs)

inherit: base.yaml
markdown_extensions: 
    toc:
        permalink: False

The result

The two above files would be deep merged so that the config would be interpreted as follows:

markdown_extensions:
    smarty: {}
    sane_lists: {}
    toc:
         separator: "_"
         permalink: False

For comparison, that would be interpreted by config validation the same as this (currently supported) config:

markdown_extensions:
    - smarty
    - sane_lists
    - toc:
         separator: "_"
         permalink: False

@majkinetor
Copy link

majkinetor commented Sep 2, 2021

I think this still can be improved. Inheritance is not useful in all scenarios.

For example, I would like to have several nav alternatives which might do such as:

...
...
!include !ENV [MKDOCS_NAV, 'nav.yml']

While this solves a simple case that I can now include several different navigations (for instance complete documentation or just parts of it), it doesn't allow for inheritance so stuff like this wont work:

mkdocs.yaml

site_name: FOO
...
!include !ENV [MKDOCS_NAV, 'nav.yml']

nav.yml

site_name: BAR
nav:
- Home: index.md
- Demonstration: demo.md

Note that I can currently use this as alternative to include but merge is not happening:

mkdocs.yaml

site_name: FOO
...
INHERIT !ENV [MKDOCS_NAV, 'nav.yml']

In this case nav is applied if not present in mkdocs.yaml but you can't set alternative site_name as nav.yml becomes base.

Lets say we have MERGE keyword as better version of INHERIT (INHERIT currently can't be used multiple times as far as I can see).

Then this could be done as this:

mkdocs.yml

site_name: FOO
MERGE: nav.yml

So current file (mkdocs.yml) is merged with another one (nav.yml). Document could have multiple of merges like this:

mkdocs.yml

site_name: FOO
MERGE: vars.yml
MERGE: nav.yml

The file given in argument overrides any already defined attributes (or alternatively the behavior could be specified via param, such as MERGE: [nav.yml, 'override|ignore'] to override or ignore existing attributes).

So, in above case, mkdocs.yml can have its own site_name, and both vars.yml and nav.yml can have it, and nav will win as the last one.

With this I can have environments like this:

mkdocs.yml

defaults ...
defaults...

# separation of concerns if needed
MERGE: vars.yml
MERGE: plugins.yml

MERGE: !ENV MKDOCS_ENV

where environment can override any default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants