Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design docs for the yaml file #3878

Merged
merged 12 commits into from
May 3, 2018

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Mar 29, 2018

I'm publishing the first draft, there are a some things that will need more work/review.

Related readthedocs/readthedocs-build#29, readthedocs/readthedocs-build#30, readthedocs/readthedocs-build#3, #3806, #3685, #2904, #2813


Several settings are already implemented and documented on
https://docs.readthedocs.io/en/latest/yaml-config.html.
So, they aren't covered with much detail here.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I included it here anyway or maybe just do this on the spec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this document is just the intended changes to the spec. It's not important to enumerate the entire spec here. We'll do this in a programmatic way next.


The current spec is documented on https://docs.readthedocs.io/en/latest/yaml-config.html
and https://github.com/rtfd/readthedocs-build/blob/master/docs/spec.rst

Copy link
Member Author

@stsewd stsewd Mar 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add the new spec here or maybe on another doc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you can write a general idea of each of the new fields here with the default/available options here.

Then we can discuss more specific details on each issue and update this document, probably.

But I think this document should reflect where we are going.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next step is to write schema so we can lay out our ideas before making code changes. This will be easier to parse, test, and verify than writing documentation.

There is some python tooling to help, ie: https://github.com/23andMe/Yamale

Bonus points if we can find a way to automate generating documentation/examples from our schema or validation models.

RTD will checkout to this branch and read this configurations before the real build process starts.

That solves one problem, but RTD still need to know when to update the others global settings.
Would be a waste of resources to made a new build each time a global setting is updated for it to take effect.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that maybe it's fine as it is, if the webhook is configured on the project, it will triggered a build anyway.

------------

Current repository which contains the code related to the configuration file:
https://github.com/rtfd/readthedocs-build
Copy link
Member Author

@stsewd stsewd Mar 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current repository which contains the code related to the configuration file:
https://github.com/rtfd/readthedocs-build

Footnotes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to put a section for the footnotes because Sphinx rendered it on the place were they are declared.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, still are rendered on the same place :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps its a character issue? All of these have -, the spec doesn't mention valid characters though.

Versioning the spec
-------------------

TODO
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more simple solution is to maintain a separated library for this (like https://github.com/rtfd/readthedocs-build/, but is to be moved to the rtd core, so I'm not sure).

@stsewd
Copy link
Member Author

stsewd commented Mar 29, 2018

Build failure on travis is unrelated

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

Another thing to consider is generating a YAML file from the existing DB options for a project, and then using that during the build. That would enable us to switch all builds over to YAML, without having users need to have a YAML file checked into their repo.

Global settings
~~~~~~~~~~~~~~~

Those settings will be read from the YAML file on the ``default branch``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the part that I'm the most unsure of. It feels quite magical to have the "default branch" be the place to set global project settings -- but I don't have a better option except to just keep those settings in the database, that I can think of.

It's really tricky to think about the best way to define redirects, for example. I could see reasons for them to be per-branch, but also wanting them to expand across all existing branches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there aren't too many global configurations (relevant ones). Analytics code and redirects are the most relevant here in my opinion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking this for redirects on the yaml file #2904 (comment)

What could be the case for per-version redirects? Just to keep them grouped? or maybe for the redirect page option? I think that could be a nice feature then, allow to specify the lang and version on the redirect page option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of reading the global settings from the default branch. This is the solution I was proposing in another discussion. I think we do need a place different that the DB to keep track/history for these settings and the best place I can think of is the default branch.

Also, I think we shouldn't put here any setting related to the build process. I'd like to be able to build very old versions of the documentation using the specific branch/tag without any failure. For this case, these global settings shouldn't affect the build process.

I'd say that global options that need to be overriden in a specific version for any reason (I don't have one, but Eric mentioned this case) could just have the same section of the default branch and override the particular setting: we can merge the global file with the version specific one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question here is: are we planning to remove all the data from the DB or we want to use the YAML from the default branch to populate our DB each time we receive a webhook?

On building process of stable we need,

  • check out default branch and pull
  • load the YAML in memory
  • check out stable
  • show YAML in build commands output
  • use that YAML to build

looks there are some coming and going here

or,

  • populate our DB when we receive a webhook on default branch (previously done)
  • check out stable
  • build using DB (as we currently do)

here we loose the ability to print the YAML

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, didn't understand well, are you proposing to use two files? (one for the global options and other for the regular ones)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, didn't understand well, are you proposing to use two files?

No. Just one file with different sections as we are doing now (maybe those sections can be overriden in the per-version yml in case it's needed)

Copy link
Member

@ericholscher ericholscher Apr 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could be the case for per-version redirects?

Say you rename a file in v2:

  • intro -> getting-started

When a user goes to /en/v1/intro/ you don't want them to get redirected to /en/v1/getting-started/, because it will 404. But on v2, that's exactly what you want.

This is currently solved by us only doing redirects on a 404, but there are likely other examples of design decisions that you want "from now until the future, but not in the past". So perhaps redirects are version specific, but then we need to keep state for every version somewhere, probably a copy of the YAML file at the root of the build, so the web server can read it and take into account redirects?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to keep track of the YAML files, putting them in the DB for each Version object isn't a bad idea, but either way we need to keep that state in a persistent way.

I think the bigger question is relying on the default_branch to effect other builds. It feels really confusing as a user, and there will be times when we get into a state where the default_branch syncing has failed at some point, and there isn't a good way to debug it. I think global state should probably continue to live in the DB itself, since that is 100% obvious and consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to echo @ericholscher opinion here. Project level state should always be in the database, and we aren't looking to get rid of database storage of this data. We should focus on per-version options only. YAML state should not populate database state, only supplement it.

I think we should store config metadata on the build, not just the YAML file. That is, we should store metadata on what we determine the build settings were through YAML and db setting merge at build.

I think we can move redirects to per-version state and remove changes to global state from this spec. Redirects should still exist as project-level options as well, but I think a YAML config spec can supplement more per-version redirects. We can talk more on this implementation in a later design decision, but having it in the spec would be good. Again, we wouldn't alter user-viewable state with per-version redirects, but we'd have to hide these from users and just use them in our redirect logic.

Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this document. Left some feedback.

Global settings
~~~~~~~~~~~~~~~

Those settings will be read from the YAML file on the ``default branch``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of reading the global settings from the default branch. This is the solution I was proposing in another discussion. I think we do need a place different that the DB to keep track/history for these settings and the best place I can think of is the default branch.

Also, I think we shouldn't put here any setting related to the build process. I'd like to be able to build very old versions of the documentation using the specific branch/tag without any failure. For this case, these global settings shouldn't affect the build process.

I'd say that global options that need to be overriden in a specific version for any reason (I don't have one, but Eric mentioned this case) could just have the same section of the default branch and override the particular setting: we can merge the global file with the version specific one.

~~~~

The current spec is documented on https://docs.readthedocs.io/en/latest/yaml-config.html
and https://github.com/rtfd/readthedocs-build/blob/master/docs/spec.rst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the spec.rst file should be completely removed since it's obsolete and a duplication of the yaml-config.html (which is way better)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we should remove the old spec.

Global settings
~~~~~~~~~~~~~~~

Those settings will be read from the YAML file on the ``default branch``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question here is: are we planning to remove all the data from the DB or we want to use the YAML from the default branch to populate our DB each time we receive a webhook?

On building process of stable we need,

  • check out default branch and pull
  • load the YAML in memory
  • check out stable
  • show YAML in build commands output
  • use that YAML to build

looks there are some coming and going here

or,

  • populate our DB when we receive a webhook on default branch (previously done)
  • check out stable
  • build using DB (as we currently do)

here we loose the ability to print the YAML


The current spec is documented on https://docs.readthedocs.io/en/latest/yaml-config.html
and https://github.com/rtfd/readthedocs-build/blob/master/docs/spec.rst

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you can write a general idea of each of the new fields here with the default/available options here.

Then we can discuss more specific details on each issue and update this document, probably.

But I think this document should reflect where we are going.


To decouple the configuration file from the database and keep the compatibility with projects without one,
we need to generate a YAML file from the existing database options,
this will also help with the `Adoption of the configuration file`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After writing my previuos comment and reading this, I think this could be kind of complex: each time a new option is added we will need to tweak our "DB to YAML" builder to handle the new option/format/etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this can be done only for previous projects, new ones can be forced to use the yaml file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... I'm not sure we can force the user to have a yaml file in the repo. I suppose that RTD tries to "just works" even when the repo only has an index.rst and nothing else. Forcing to have a yaml will break that idea.

To be considered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe autogenerate a configuration file for new projects like the defaults that are created when there isn't a sphinx project

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also only be converting DB -> YAML for options that exist in the DB. Presumably we won't be adding new DB options, so the DB -> YAML conversion should stay the same. If we add new YAML options, unless they are required, the conversion process will just not set them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to "convert" any database state. We don't want to decouple the actual database usage with our config file, just supplement it. Eventually, new projects don't have any version-level config in our database and we can hide the admin for these users. But, removing the database shouldn't be the focus. I think that removes the need to generate a config file, besides a basic suggestion file for old users.

Current repository which contains the code related to the configuration file:
https://github.com/rtfd/readthedocs-build

Footnotes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking like a good start, but I think we need to pump the brakes on removing our database. Storing state in the database isn't the problem, it's that we don't have a complete config file to push people away from using project level config for version level config options.

I think the next step is to start writing a schema and some actual examples. https://github.com/23andMe/Yamale looks promising.

- Have consistency around the spec
- Proper documentation for the end user
- Allow to specify the spec's version used on the YAML file
- Show the YAML file on the build process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should collect and show metadata on the yaml config and build configuration, not just display the yaml config.

Global settings
~~~~~~~~~~~~~~~

Those settings will be read from the YAML file on the ``default branch``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to echo @ericholscher opinion here. Project level state should always be in the database, and we aren't looking to get rid of database storage of this data. We should focus on per-version options only. YAML state should not populate database state, only supplement it.

I think we should store config metadata on the build, not just the YAML file. That is, we should store metadata on what we determine the build settings were through YAML and db setting merge at build.

I think we can move redirects to per-version state and remove changes to global state from this spec. Redirects should still exist as project-level options as well, but I think a YAML config spec can supplement more per-version redirects. We can talk more on this implementation in a later design decision, but having it in the spec would be good. Again, we wouldn't alter user-viewable state with per-version redirects, but we'd have to hide these from users and just use them in our redirect logic.


Several settings are already implemented and documented on
https://docs.readthedocs.io/en/latest/yaml-config.html.
So, they aren't covered with much detail here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this document is just the intended changes to the spec. It's not important to enumerate the entire spec here. We'll do this in a programmatic way next.

The spec of the configuration file must use this conventions.

- Use `[]` to indicate an empty list
- Use `null` to indicate a null value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay for using YAML spec operators properly! :)


The current spec is documented on https://docs.readthedocs.io/en/latest/yaml-config.html
and https://github.com/rtfd/readthedocs-build/blob/master/docs/spec.rst

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next step is to write schema so we can lay out our ideas before making code changes. This will be easier to parse, test, and verify than writing documentation.

There is some python tooling to help, ie: https://github.com/23andMe/Yamale

Bonus points if we can find a way to automate generating documentation/examples from our schema or validation models.


To decouple the configuration file from the database and keep the compatibility with projects without one,
we need to generate a YAML file from the existing database options,
this will also help with the `Adoption of the configuration file`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to "convert" any database state. We don't want to decouple the actual database usage with our config file, just supplement it. Eventually, new projects don't have any version-level config in our database and we can hide the admin for these users. But, removing the database shouldn't be the focus. I think that removes the need to generate a config file, besides a basic suggestion file for old users.

-----------------

- The repository is updated
- Checkout to the default branch and read the global settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global settings step can be skipped with removal of this requirement

- The repository is updated
- Checkout to the default branch and read the global settings
- Checkout to the current version and read the local settings
- Before the build process the YAML file is shown (similar to ``cat config.py`` step).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike this pattern, and dislike cat conf.py as well. I'm -1 on this, but we should collect metadata on the actual config we use to build. We can selectively show users this config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cat conf.py looks more a debugging output than a final user output.

I think it's easier to collect metadata from the YAML file, but the conf.py could be more error prone to do. Although, we may just need things like theme_options html_context or similar that we are interested in. That's another topic, though.

- Checkout to the current version and read the local settings
- Before the build process the YAML file is shown (similar to ``cat config.py`` step).
- Try to parse the YAML file (the build fails if there is an error)
- The version is built according to the settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an additional step of first bringing in settings from the database, and then merging the yaml config on top of this.

Current repository which contains the code related to the configuration file:
https://github.com/rtfd/readthedocs-build

Footnotes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps its a character issue? All of these have -, the spec doesn't mention valid characters though.

@stsewd
Copy link
Member Author

stsewd commented Apr 26, 2018

I made some changes based on the feedback, relevant changes are:

  • The YAML file only have per-version settings (no more globals here)
  • Use a validation scheme for the spec
  • Don't cat the YAML in the build process, but show the general settings that were used for the build

Hope I didn't forget anything.

@agjohnson
Copy link
Contributor

agjohnson commented May 3, 2018

Thanks @stsewd ! This looks great. I think we can start to move conversation on to more specific issues as we begin the planning phase of the work. A lot of this work will start of targeting our 2.5 release, and some of the more specific changes will be on a later milestone.

I think this project will start to be planned in a project for now, as the work will span milestones and is discrete enough to track in one spot. I've created https://github.com/orgs/rtfd/projects/2 where I'll start prioritizing some of the work that we'll be doing here.

@agjohnson
Copy link
Contributor

Closing here as i think we've been through this discussion enough here. We'll start by planning thing out a little bit more and setting up some priorities around the work. We have a number of issues to create once the schema is created, so that might be blocking for a little bit.

@agjohnson agjohnson closed this May 3, 2018
@agjohnson agjohnson reopened this May 3, 2018
@agjohnson agjohnson merged commit 3b2c1a8 into readthedocs:master May 3, 2018
@stsewd stsewd deleted the docs-design-yaml-file branch May 4, 2018 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants