Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add static configuration (Sphinx.toml) #9040

Open
choldgraf opened this issue Mar 28, 2021 · 126 comments
Open

Add static configuration (Sphinx.toml) #9040

choldgraf opened this issue Mar 28, 2021 · 126 comments
Labels
internals:config type:enhancement enhance or introduce a new feature
Milestone

Comments

@choldgraf
Copy link
Contributor

choldgraf commented Mar 28, 2021

Background

One of the challenges in getting started with Sphinx is the conf.py file, for a few reasons:

  1. It is written in Python, and so it is Python-specific, even if the person writing the documentation is using a different language.
  2. It is a fully-flexible Python script, which can be overwhelming for users not accustomed to it.

Over the years, many other configuration formats have arisen, probably the two most well-known are YAML and TOML. For example. Jupyter Book provides a layer of YAML configuration on top of Sphinx. Users have responded that this is a really friendly pattern for beginners and experts alike. I wonder if Sphinx would be interested in allowing for YAML or TOML configuration as well.

Describe the solution you'd like

In addition to the current config option of conf.py, add another option:

Allow config with YAML. I think it would be useful if Sphinx allowed for:

  • conf.yml. This would be read-in with PyYAML.

This file would be read in and converted to Python variables directly, as if it was written in Python (conf.py). So for example:

# In conf.yml
key: value
mylist:
  - item1
  - item2
mydict:
  dk1: one
  dk2: two

would map onto

# In conf.py
key = "value"
mylist = ["item1", "item2"]
mydict = {"dk1": "one", "dk2": "two"}

Allow conf.py to be provided simultaneously. Some Sphinx builds will still need to run custom Python code (e.g., to set up some extensions etc). In this case, authors may wish to keep their "simple config" in the YAML file, and the complex config in pure Python.

If conf.py is supplied as well as conf.yml, then the environment defined in conf.py will over-rule anything in conf.yml.

So the order of operations would be:

  1. (if it exists) Read in variables from conf.yaml
  2. Update with variables from conf.py if it exists, overwriting variables created in 1
  3. Everything else is the same...

Describe alternatives you've considered

I've tried creating a lightweight extension that allows this but didn't have success because of the way that extensions are activated.

I have also considered other documentation engines like mkdocs, which use YAML, but I'd for this to be in the Sphinx ecosystem!

cc some others who have discussed this in the executablebooks/ repo: @pradyunsg @ericholscher @chrisjsewell

EDIT: I've updated the above description to remove mention of TOML, as I don't want that to derail conversation here!

@ericholscher
Copy link
Contributor

Just wanted to chime in here and say this would be a great addition. I think it would improve the onboarding experience, and allow simple configurations to be machine-parsable. The dynamic nature of the Python configs definitely leads to a lot of customizations that are harder to support in varied development environments, which is a very common mistake for first-time Read the Docs users.

I'm in favor on adding it, and I also wanted to note that between the Executable Books & RTD teams, we'd probably be willing to implement and document this work, so we're mostly looking for a 👍 or 👎 from the team before starting a PR.

@pradyunsg
Copy link
Contributor

pradyunsg commented Mar 29, 2021

A couple of thoughts from me:

  • This, with the cascading described, would be amazing!
  • Let’s only have a single file format though, and not allow for one-of YAML and TOML.
More unsolicited thoughts

IMO the choice for the file format comes down to “how much do you like nesting”. If you wanna have JSON-like arbitrary nesting, then YAML is likely a better fit.

I’m likely biased, but I do think conf.py’s generally flat structure translates very nicely toward TOML’s design. Most existing conf.py files are probably also almost-valid TOML files already!

Neither choice is wrong, both have gotchas, and I’d like it if we went with TOML here (it also helps the case for if/when I push to get a parser for that into the standard library).

@chrisjsewell
Copy link
Member

chrisjsewell commented Mar 29, 2021

Indeed the main thing is to get a 👍 from the sphinx team, and I am definitely +1 😄

In terms of YAML vs TOML; I would note that both jupyter-book (_config.yml and _toc.yml) and RTD (.readthedocs.yml) currently use YAML for their configuration files, and so at least for those use case, I feel TOML would be an additional overhead in understanding for the user

@jakobandersen
Copy link
Contributor

I don't have a fully formed opinion yet, but some thoughts:

  • conf.py can not go away, both for backwards compatibility, but also as the dynamic nature can be quite useful sometimes, e.g., loading the version from somewhere else, auto-generating content, etc. So as noted in the OP this would be another layer of configuration loaded before conf.py.
  • Therefore, the machine-readability of a configuration would at best be a convention that one could use at a single project, not for arbitrary third-party projects. Theoretically, one could put such conventions on ones own conf.py and read static config data until a custom line comment, but this is indeed icky.
  • I think it is too much to add the special key for Python code. Putting static data into such a config file is fine, and achieve easy machine-readability of that part (by convention). If you need arbitrary Python code, I would say to just stick it in conf.py.
  • The config file should not be misleading with respect to how the configuration really works. I'm not familiar with TOML, but it looks like it has "sections" which I'm not sure how maps to the configuration system. YAML seems to map more naturally.

@hukkin
Copy link

hukkin commented Mar 30, 2021

Therefore, the machine-readability of a configuration would at best be a convention that one could use at a single project, not for arbitrary third-party projects.

Maybe a terrible idea, but what if conf.py location was made configurable and nullable in conf.toml/yaml (perhaps still defaulting to conf.py?). Then in the conf.py == null case machine-readability would actually be a thing.

Maybe the null case could even be the default, given that the toml/yaml is a new feature, so it shouldn't break existing projects.

@chrisjsewell
Copy link
Member

chrisjsewell commented Mar 30, 2021

I guess the classic example of a co-existance of such files is the setuptools setup.py and setup.cfg. I would certainly check their implementation

@pradyunsg
Copy link
Contributor

pradyunsg commented Mar 30, 2021

What setuptools does is basically pretend there's a minimal setup.py file, if it doesn't exist. Notably, it's possible for tooling to detect whether setup.py exists and if it doesn't, it means that everything is declared statically.

For Sphinx, the minimal conf.py file would be empty, I guess?

@hukkin
Copy link

hukkin commented Mar 30, 2021

I'm not familiar with TOML, but it looks like it has "sections" which I'm not sure how maps to the configuration system. YAML seems to map more naturally

TOML allows top level keys though with no section defined. The following is valid toml:

project = "sphinx"
version = "0.0.1"

Sections are only required if there's dictionaries in conf.py in which case they feel very natural to me.

Why I'm not a huge YAML fan is that YAML types are difficult to parse for humans and machines alike. Consider something like

- yes    # bool
- "no"   # string
- false  # bool
- .6432  # float
- "0.1"  # string
- null   # null
- none   # string
- ~      # null
- 0xabba # int

@pradyunsg
Copy link
Contributor

I'm not familiar with TOML, but it looks like it has "sections" which I'm not sure how maps to the configuration system. YAML seems to map more naturally.

Well, TOML is literally designed to be a configuration file format. https://toml.io uses the tag line: "A config file format for humans". Think of it as unambiguous INI files. The clarity and unambitious nature of the format is why pyproject.toml is TOML based, same for Cargo.toml and more. :)

To address the specific question raised: All key-value pairs in a table [section] end up in a dictionary named section. In other words, it's how you do nesting.


Anyway, the reason I kept my thoughts on file format choices in hidden-unless-you're-curious is because I didn't want push this issue toward that way. I should've just omitted that whole thing.

Let's first wait for opinions on the general idea of static metadata in Sphinx, before discussing the exact file format further. :)

@chrisjsewell
Copy link
Member

Yeh, at the end of the day, JSON/YAML/TOML all basically map to each other 1-to-1, so it won't really affect the underlying code/logic to be written

@tk0miya
Copy link
Member

tk0miya commented Apr 2, 2021

+0 for supporting static config file. I don't think python script is not good for the config file. But it's reasonable to support a commonly used file format for sphinx.
-1 for supporting the combination of .yaml and .py. It's too complicated and I don't understand the worth of it.

And I don't have opinion for YAML vs TOML. Because I've never written a .toml file.

@choldgraf
Copy link
Contributor Author

@tk0miya thanks for your thoughts! Could you clarify why you don't want a combination of YAML and Python? I think the combination of YAML + Python fits the use-case that somebody wants 99% of their configuration in a well-structured config file, but also needs to run some custom Python code if a particular extension needs it or something. I think this is actually a pretty common use-case.

I think we should just scope this conversation to YAML since it is super common for config, and readthedocs uses it, and leave TOML to a later conversation

@chrisjsewell
Copy link
Member

sphinx.
-1 for supporting the combination of .yaml and .py.

It also feels like it would be very difficult to migrate everyone from py to yaml?

@jakobandersen
Copy link
Contributor

sphinx.
-1 for supporting the combination of .yaml and .py.

It also feels like it would be very difficult to migrate everyone from py to yaml?

What would be the benefit of doing that anyway? I can to some degree see the point that each author may want to shift as much as possible to something more easily parsable, but maybe I don't fully get the problem with the current setup:

  1. It's in Python, maybe the user doesn't know Python: well, the same argument can be made about YAML or whatever other format.
  2. It can contain arbitrary code: sure, but as a new user you don't need to put arbitrary code there, and if you read another project, then that arbitrary code would just move somewhere else where you would also need to understand it.

Can you elaborate on the reasoning behind this?

@choldgraf
Copy link
Contributor Author

choldgraf commented Apr 2, 2021

Here's a few thoughts:

benefits of YAML

  • Structured and easier to parse (so you can machine-read/write it much easier)
  • Language agnostic (so you don't give one language special status for most use-cases)
  • Extremely common (mkdocs, Hugo, readthedocs + almost any other SSG configure things with YAML. Many people already have a mental model of configuration with YAML). You are correct that perhaps a new user will need to "learn YAML", but because YAML is not a computer language, it is already very commonly used across many other computer languages.

downsides of YAML

  • A decent number of "gotchas" (e.g. true, false, etc)
  • Inflexible (because it is just a data structure, it has no notion of execution etc)

benefits of Python

  • Flexible and extensible
  • Well-known language

downsides of Python

  • Not structured, so hard to parse
  • Less-commonly used as a configuration step in similar tools (though nikola does use conf.py as well)
  • Complex, and can be intimidating to new users who must now learn a computer language
  • Implies that Sphinx is "just for Python documentation", which I don't think is true

So to me this sounds like a reasonable base for : Support YAML for simple configuration use-cases, which are probably most use-cases. For anything advanced, let people provide a conf.py for more complex configuration. YAML maps pretty cleanly onto variable creation in Python and there are a ton of YAML readers out there, so this would be both low-maintenance, and a good entry-point into the Sphinx ecosystem for people who are used to configuring things with YAML. It would also make it easier for services to build on top of Sphinx - for example, Jupyter Book or ReadTheDocs.

As an aside, one of the most common things people like about Jupyter Book (which is built on Sphinx), is that it supports YAML configuration. One reason I opened this issue is because so many people have told me they prefer YAML, that I think it is worth considering for core Sphinx, as I think it would be a benefit to many.

@jakobandersen
Copy link
Contributor

@choldgraf, right, basically I agree with all those points. Where the disagreement/confusion comes from is how this will work, and the comment from @chrisjsewell:

It also feels like it would be very difficult to migrate everyone from py to yaml?

Maybe I misunderstand, but that seems to imply that only one of conf.py and conf.yaml should exist?
One of the things I find really appealing with Sphinx, and Python in general, is the hackability. Sphinx is already quite extensible via documented means, but otherwise Python allows for easily run-time haxing of whatever is needed until a proper solution can be found.
Therefore I suggest conf.py and conf.yaml must co-exist, in the sense that variables in conf.py overrides those in conf.yaml. This makes the implementation backwards compatible and still allows arbitrary code for initialisation.

@chrisjsewell
Copy link
Member

Maybe I misunderstand, but that seems to imply that only one of conf.py and conf.yaml should exist?

Oh no, I'm arguing for exactly the opposite lol

@choldgraf
Copy link
Contributor Author

choldgraf commented Apr 2, 2021

@jakobandersen ah in that case I totally agree with you, I think @chrisjsewell was suggesting they need to co-exist as well. I'll try to clarify this in the title + top-comment as well

@choldgraf choldgraf changed the title Allow for configuration with YAML Allow for configuration with YAML in addition to conf.py Apr 2, 2021
@jakobandersen
Copy link
Contributor

Ah, all good then :-)
As an add-on suggestion: sphinx-quickstart should be updated to generate both files, static data and associated comments in the YAML file, and then additional comments explaining the relationship between the files and the rationale for having them (i.e., the essence of this thread). It could even be updated such that in the final script output where it explains how to proceed, then also write about how to configure with the YAML and Python files.

@shimizukawa
Copy link
Member

shimizukawa commented Apr 4, 2021

  • +1 for supporting static config file. I had thought about introducing conf.ini too, before YAML became as popular as it is now. This is because I felt that writing configuration in Python script is a subtle stumbling block for beginners.
  • -1 for supporting the combination of .yaml and .py. About the hackability of conf.py, I think it would be a good idea to be able to write a new extension mechanism for configuration, because I feel that allowing conf.yaml to override values in conf.py would introduce new stumbling blocks.

@pradyunsg
Copy link
Contributor

allowing conf.yaml to override values in conf.py would introduce new stumbling blocks.

Hmm... I would've imagined setting a value in conf.yml and conf.py would result in an error OR cause the Python value to be used.

@pradyunsg
Copy link
Contributor

pradyunsg commented Apr 4, 2021

Awesome! So, everyone is on board for (or ambivalent to) allowing static metadata! 🎉


I think there's a few things to decide on AFAICT:

  • file semantics
  • file name
  • file format

Remembering the law of triviality, I'm gonna focus on semantics first. :)

Semantics

At least 2 folks have stated a -1 for allowing both the static metadata file, and Python file to exist at the same time, because it would get confusing when folks define keys in both. I agree! Specifying values in both is weird and confusing.

I disagree that we shouldn't allow the files to complement each other when they both exist, without overlaps though. Allowing both to co-exist, and erroring out if the same value is defined in both (which isn't that much code complexity), allows for a significantly better experience with the static metadata:

I write a nice Sphinx site, with only static metadata. After some time, I realize I do need some amount of dynamic behaviour (idk, need to add to sys.path for autodoc to work). If we don't allow both files to co-exist, this means that now I'll have to translate the YAML configuration into Python values, and start all over again. Compare that to just being able to add that sys.path.append in a newly created conf.py and moving on. After the first experience, I don't think I'd bother with the YAML files again. The second one is much nicer!

File name

In conf.yml

Let's use sphinx instead of conf in the filename?

That way, it's much clearer what tool is being used. Searching for the filename on search engines will actually yield useful results; which likely won't happen for conf.yml.

File format

Full disclosure: I am the primary maintainer of TOML now. And, unsurpisingly, I'd like to advocate for adopting TOML over YAML here.

Excuse me for being lazy, and quoting some pieces of writing:

In the toml-lang/toml README -- this is the only quote where I've contributed wording.

TOML shares traits with other file formats used for application configuration and data serialization, such as YAML and JSON. TOML and JSON both are simple and use ubiquitous data types, making them easy to code for or parse with machines. TOML and YAML both emphasize human readability features, like comments that make it easier to understand the purpose of a given line. TOML differs in combining these, allowing comments (unlike JSON) but preserving simplicity (unlike YAML).

Because TOML is explicitly intended as a configuration file format, parsing it is easy, but it is not intended for serializing arbitrary data structures. TOML always has a hash table at the top level of the file, which can easily have data nested inside its keys, but it doesn't permit top-level arrays or floats, so it cannot directly serialize some data. There is also no standard identifying the start or end of a TOML file, which can complicate sending it through a stream. These details must be negotiated on the application layer.

INI files are frequently compared to TOML for their similarities in syntax and use as configuration files. However, there is no standardized format for INI and they do not gracefully handle more than one or two levels of nesting.

Comparison of configuration file languages, done during the PEP 518 discussion

Personally, I would sum up the above as:

|                             | YAML | JSON | CP  | TOML |
|-----------------------------+------+------+-----+------|
| Well-defined                | yes  | yes  |     | yes  |
| Real data types             | yes  | yes  |     | yes  |
| Sensible commenting support | yes  |      |     | yes  |
| Consistent unicode support  | yes  | yes  |     | yes  |
| Makes humans happy          |      |      | yes | yes  |

[snip] Given all of the above, I tend to think the trade-offs fall in favor of TOML.

PEP 518's discussion of "why not YAML"

One is that the specification is large: 86 pages if printed on letter-sized paper. That leaves the possibility that someone may use a feature of YAML that works with one parser but not another. It has been suggested to standardize on a subset, but that basically means creating a new standard specific to this file which is not tractable long-term.

Two is that YAML itself is not safe by default. The specification allows for the arbitrary execution of code which is best avoided when dealing with configuration data. It is of course possible to avoid this behavior -- for example, PyYAML provides a safe_load operation -- but if any tool carelessly uses load instead then they open themselves up to arbitrary code execution. While this PEP is focused on the building of projects which inherently involves code execution, other configuration data such as project name and version number may end up in the same file someday where arbitrary code execution is not desired.

Example demonstrating how YAML can be ambigous in weird ways, from earlier in this thread

Why I'm not a huge YAML fan is that YAML types are difficult to parse for humans and machines alike. Consider something like

- yes    # bool
- "no"   # string
- false  # bool
- .6432  # float
- "0.1"  # string
- null   # null
- none   # string
- ~      # null
- 0xabba # int

One fun example of this is the Norway-YAML law.

Finally, thanks to pyproject.toml, a lot of Python tooling is going to be configured through TOML going forward. It'd be nice for Sphinx to hop on board as well! :)

@chrisjsewell
Copy link
Member

At least 2 folks have stated a -1 for allowing both the static metadata file, and Python file to exist at the same time

Maybe I misunderstood, but I got the impression that @tk0miya wanted to completely remove the python file, rather than just restrict its use?

@chrisjsewell
Copy link
Member

realize I do need some amount of dynamic behaviour

One dynamic thing I actually do a lot in projects is add a builder-inited event, to run sphinx-apidoc and automate the build of the api documentation pages (which I gitignore from the repo). But maybe I am missing a better way to do this?

@pradyunsg
Copy link
Contributor

But maybe I am missing a better way to do this?

Well, if you're missing something, then it's you and I both. :)

One of the nice things about conf.py is that it also basically serves as an extension, once you add the setup function.

@bjones1
Copy link

bjones1 commented Apr 5, 2021

If Sphinx does allow for configuration from static metadata, I would suggest using Python literals as the file format; see an example (sphinx_static_config.zip) of conf.py below. Since this format is a subset of the Python language, everyone familiar with conf.py will already know to encode configuration data, rather than learning to express values in YAML/TOML/JSON/etc.

import ast


def setup(app):
    with open("sphinx-conf.pylit", encoding="utf-8") as f:
        cfg = ast.literal_eval("{\n" + f.read() + "\n}")
    for key, value in cfg.items():
        app.config[key] = value

@LecrisUT
Copy link

LecrisUT commented Mar 8, 2023

So what's the status on this? What's blocking it?

@astrojuanlu
Copy link
Contributor

What's blocking it is reaching consensus about a way forward.

To repeat what was summarized in the last comment a few pixels above, some Sphinx maintainers think having 2 methods is a bad idea and adds complexity, so there should be a hard transition. Some others think having 2 methods could pave a way to a smoother transition. These two visions cannot coexist, and until consensus is reached, the status quo will be maintained.

@LecrisUT
Copy link

LecrisUT commented Mar 8, 2023

I mean it's been 2 years and a few major releases in between. There is a simple solution to proritize the yaml and if a include item is passed, then include it as either additional yaml or conf.py format.

Much of the functionalities that needed to be dynamic seem to be handled by the plugins side. Otherwise, that can be handled by jinja templating it (maybe with values from pyproject.toml or importlib.metadata) or adding a few extra plugings, e.g. for the dynamic version.

@chrisjsewell
Copy link
Member

Well that and yaml vs toml. Probably at this point, with toml in core python, it might make more sense

@astrojuanlu
Copy link
Contributor

I mean, I don't think there's a "simple" solution (otherwise we probably wouldn't be having this conversation).

If you ask me, I do agree with @tk0miya and others that we should obliterate dynamic config completely, make it static (in whatever format we decide, please don't bring YAML vs TOML debates again), and any dynamic config should be handled by plugins/extensions/hooks/entry_points/whatever. The whole Python ecosystem is solidly moving in that direction.

@LecrisUT
Copy link

LecrisUT commented Mar 8, 2023

"Simple" I mean it is straightforward to implement on top of #9170. I don't think either camp would be objecting to this, and it is just 5-ish extra lines of code. But also the ecosystem and people's experience change, so it is worth testing the waters again every now-and-then.

About the yaml-toml. Most tools use a .sphinx.yaml format and are common outside python environments. But it's not like both cannot be included

@abhiaagarwal
Copy link

FYI, for anyone who's looking to at least have DRY for sphinx as a temporary solution, take a look at sphinx-toolbox/sphinx-pyproject by @domdfcoding so you can just make the conf.py a wrapper around pyproject.toml

@AA-Turner AA-Turner changed the title Allow for configuration with YAML in addition to conf.py Add static configuration (Sphinx.toml) Mar 17, 2023
@AA-Turner
Copy link
Member

Static configuration is a good idea. I think we can move incrementally, a first version to be committed would be if Sphinx.toml exists in confdir, use it as the only source. People needing more complex configuration (dynamic X/Y/Z) can still use conf.py, which isn't going away. In time, we might see how settings from Sphinx.toml and conf.py can be mixed, but for now I think let's use the exclusivity approach.

A

@AA-Turner AA-Turner modified the milestones: some future version, 7.0.0 Mar 17, 2023
@AA-Turner AA-Turner modified the milestones: 7.0.0, 7.1.0 Apr 29, 2023
@jeanas
Copy link
Contributor

jeanas commented Jun 7, 2023

I was pretty enthusiastic for TOML, especially for pyproject.toml integration (e.g., the version could by default be read from the standard packaging metadata) and started writing a patch... but I hit a snag: TOML doesn't support an equivalent of None. There are some configuration variables that use None in their format. If Sphinx had started with TOML config from day 1, all existing confvars would be designed with this in mind (e.g., not (x, y) but {"x": x, "y": y} so that (None, y) can be expressed as {"y": y}), but it takes some work to migrate existing confvals...

So, with disappointment, I have to say that this pretty much kills TOML in favor of YAML.

@jfbu jfbu modified the milestones: 7.1.0, 7.2.0 Jul 25, 2023
@electric-coder
Copy link

Static configuration is a good idea.

I don't think this is a priority, a lot of users are going to want conf.py if only for simple things like pulling their library version dynamically from pkg_resources for example, or having a custom extension to pprint a collection literal.

I think it'd be better for contributors to be focused on existing bugs that likely affect the majority of current Sphinx users than wasting energy on new features like .toml that won't solve anything for the userbase that's already relying on a dynamic conf.py.

@Viicos
Copy link

Viicos commented Jan 23, 2024

Considering the drawbacks of using static configuration and the lack of null values in TOML, would you consider an option to define a helper for static typing? Something like:

conf.py

from sphinx.somewhere import SphinxConf

conf = SphinxConf(
    project=...,
    ...
)

As an optional alternative of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internals:config type:enhancement enhance or introduce a new feature
Projects
None yet
Development

No branches or pull requests