Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema for configuration file with yamale #4084

Merged
merged 30 commits into from Jun 8, 2018

Conversation

@stsewd
Copy link
Member

@stsewd stsewd commented May 11, 2018

This refs to #4058

I haven't tested this yet, I will continue later, just wanted to upload it for an early review :)

@agjohnson agjohnson added this to the 2.5 milestone May 11, 2018
@@ -108,6 +108,21 @@ def make_test_hg():
return directory


def apply_fs(tmpdir, contents):
Copy link
Member Author

@stsewd stsewd May 14, 2018

Copy link
Member

@humitos humitos May 24, 2018

Since readthedocs depends on -build, why not just import and use it?

Copy link
Member Author

@stsewd stsewd May 25, 2018

ha, good point, but I feel a little unsure about that because we are going to re-integrate the build project 🤔

Copy link
Member

@humitos humitos May 29, 2018

I think that's a problem for the future :) and we will need to remove this function from here anyway, hehe.

apply_fs(self.tmpdir, file)
return path.join(self.tmpdir.strpath, 'rtd.yml')

def assertValidConfig(self, content):
Copy link
Member Author

@stsewd stsewd May 14, 2018

Camel case just for follow the same style as the self.assert methods

def assertValidConfig(self, content):
file = self.create_yaml(content)
data = yamale.make_data(file)
yamale.validate(self.schema, data)
Copy link
Member Author

@stsewd stsewd May 14, 2018

This function doesn't return anything when is successful

# Default: false
pip_install: bool(required=False)
# Install the project using python setup.py install or pip
install: enum('pip', 'setup.py', required=False)
Copy link
Member Author

@stsewd stsewd May 14, 2018

This is the first breaking change :), I think is more simple to do this, instead of having a boolean field for each

Copy link
Member Author

@stsewd stsewd May 14, 2018

Also, the extra requirements field just works when the project is installed using pip, but setup.py doesn't allow this too? 🤔

python: include('python', required=False)

# Configuration for sphinx
sphinx: include('sphinx', required=False)
Copy link
Member Author

@stsewd stsewd May 14, 2018

Instead of the Documentation type I have added a sphinx key. We want to still support mkdocs, right? If so, do you prefer the documentation type key back or having a mkdocs?

Copy link
Member Author

@stsewd stsewd May 15, 2018

We could use this to extend the build and support things like #1139

Copy link
Member

@ericholscher ericholscher May 24, 2018

Not sure. It does seem reasonable to have a sphinx and mkdocs section, and just always default to Sphinx unless we see a mkdocs section or something.

Copy link
Member

@humitos humitos May 24, 2018

Having two different sections could be useful be extended in the future.

In case this works as Eric mentioned, we will need a check to to accept one or the other, but not both.


# Give the virtual environment access to the global site-packages dir
# Default: false
system_packages: bool(required=False)
Copy link
Member Author

@stsewd stsewd May 14, 2018

I wasn't 100% sure if this should belong in the python key

Copy link
Member

@ericholscher ericholscher May 24, 2018

I mostly want this option to go away, but I don't know if we can do it quite yet. I think it can go in python though.

# Read the Docs configuration file

# The version of the spec to be use
version: enum('2')
Copy link
Member Author

@stsewd stsewd May 15, 2018

I assume this spec will be used only on the version 2, if there is another version, we will need to create another schema.

Also, I was thinking that we can use this to validate the current v1 API. But I'm not sure if that's worth it.

Copy link
Member

@ericholscher ericholscher May 24, 2018

Yea, I think we can keep v1 as it is.


# Formats of the documentation to be built
# Default: ['htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia']
formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)
Copy link
Member Author

@stsewd stsewd May 15, 2018

We can't break this line, sadly :/

Copy link
Member

@humitos humitos May 24, 2018

singlehtmllocalmedia shouldn't be in the default values after the discussion we had today.

Copy link
Member

@humitos humitos May 24, 2018

I also would remove the all option and make it always explicit

Copy link
Member

@ericholscher ericholscher May 25, 2018

I think maybe we should even just default to building HTML, and let users specify if they want more. We build a ton of PDF's & Epub's too that are never used.

The original thought was those were good for users, even if the doc owners never used them. Perhaps we could continue building PDF's, but epub's aren't heavily used.

Copy link
Member Author

@stsewd stsewd May 25, 2018

We build a ton of PDF's & Epub's too that are never used

haha, yeah I don't think much people download docs to read offline these days. I think the all keyword can remain just to keep consistency around the spec.

Maybe an empty list as default?

Copy link
Member

@humitos humitos May 29, 2018

Most of the people that uses the PDF is because they don't usually have 24/7 internet access --I know people who regularly ask for PDF to be downloaded.

I would like to suggest a crazy idea (for another PR, etc) that could help RTD servers to be cleaner and without tons of useless files: build extra resourses on demand and save a copy (sync servers) for the requested versions only. That way, we will only have the extra resources for those versions that at least where used once.

formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)

# The path to the requirements file from the root of the project
requirements_file: str(required=False)
Copy link
Member Author

@stsewd stsewd May 15, 2018

I think this can be moved to the python section

Copy link
Member Author

@stsewd stsewd May 15, 2018

People using conda, they just use the environment file, right?

Copy link
Member

@ericholscher ericholscher May 24, 2018

I believe so. Though moving it to Python could be a bit weird for projects not written in Python, but using this to install Sphinx plugins. I don't feel strongly either way.

Copy link
Member

@humitos humitos May 24, 2018

It sounds like something python specific to me since it's for pip which is something for python. I agree on moving it to the Python section.

redirects:
# Key/value list, represent redirects of type page
# from url -> to url
page: map(str(), str())
Copy link
Member Author

@stsewd stsewd May 15, 2018

For redirects, this is the only type that makes sense here. I didn't add the html_to_dir type, I think that's more like a global setting.

Copy link
Member Author

@stsewd stsewd May 15, 2018

Maybe we can just drop the page key and use only redirects

Copy link
Member

@ericholscher ericholscher May 24, 2018

Yea, we likely need a type as well (since we have a few types). So perhaps it's a list of dicts, with the dict being {type, from, to} or similar.

@stsewd
Copy link
Member Author

@stsewd stsewd commented May 16, 2018

I think we can add some stuff from Feature.

From the Feature model, I just found the pip_always_upgrade and use_setuptools_laterst features that may be useful.

@stsewd
Copy link
Member Author

@stsewd stsewd commented May 17, 2018

Additional options to consider:

Copy link
Member

@ericholscher ericholscher left a comment

This looks like a good approach. I like that it will do the validation for us, with us just maintaining the schema, instead of the entire validation code.

# Read the Docs configuration file

# The version of the spec to be use
version: enum('2')
Copy link
Member

@ericholscher ericholscher May 24, 2018

Yea, I think we can keep v1 as it is.

formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)

# The path to the requirements file from the root of the project
requirements_file: str(required=False)
Copy link
Member

@ericholscher ericholscher May 24, 2018

I believe so. Though moving it to Python could be a bit weird for projects not written in Python, but using this to install Sphinx plugins. I don't feel strongly either way.

python: include('python', required=False)

# Configuration for sphinx
sphinx: include('sphinx', required=False)
Copy link
Member

@ericholscher ericholscher May 24, 2018

Not sure. It does seem reasonable to have a sphinx and mkdocs section, and just always default to Sphinx unless we see a mkdocs section or something.


# Give the virtual environment access to the global site-packages dir
# Default: false
system_packages: bool(required=False)
Copy link
Member

@ericholscher ericholscher May 24, 2018

I mostly want this option to go away, but I don't know if we can do it quite yet. I think it can go in python though.

redirects:
# Key/value list, represent redirects of type page
# from url -> to url
page: map(str(), str())
Copy link
Member

@ericholscher ericholscher May 24, 2018

Yea, we likely need a type as well (since we have a few types). So perhaps it's a list of dicts, with the dict being {type, from, to} or similar.

Copy link
Member

@humitos humitos left a comment

This is just great! :)

@@ -108,6 +108,21 @@ def make_test_hg():
return directory


def apply_fs(tmpdir, contents):
Copy link
Member

@humitos humitos May 24, 2018

Since readthedocs depends on -build, why not just import and use it?


# Formats of the documentation to be built
# Default: ['htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia']
formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)
Copy link
Member

@humitos humitos May 24, 2018

singlehtmllocalmedia shouldn't be in the default values after the discussion we had today.


# Formats of the documentation to be built
# Default: ['htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia']
formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)
Copy link
Member

@humitos humitos May 24, 2018

I also would remove the all option and make it always explicit

formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)

# The path to the requirements file from the root of the project
requirements_file: str(required=False)
Copy link
Member

@humitos humitos May 24, 2018

It sounds like something python specific to me since it's for pip which is something for python. I agree on moving it to the Python section.

# Configuration for the documentation build process
build: include('build', required=False)

# Configuration of the Python executable to be used
Copy link
Member

@humitos humitos May 24, 2018

Instead of executable, maybe "Python environment"

with pytest.raises(ValueError) as excinfo:
yamale.validate(self.schema, data)
for msg in msgs:
self.assertIn(msg, str(excinfo.value))
Copy link
Member

@humitos humitos May 24, 2018

Is it possible to have multiple different messages in only one exception?

I think you can simplify this by receiving just a msg that it's a string.

Copy link
Member Author

@stsewd stsewd May 25, 2018

This is more for easy testing, so I don't have to write the whole error or use some weird regex p:

Copy link
Member

@humitos humitos May 29, 2018

Not following you.

What I mean here is that I think we don't need the for msg in msgs since the msgs will be only one always, right?

build:
image: "{image}"
'''
for image in ['1.0', '2.0', 'latest']:
Copy link
Member

@humitos humitos May 24, 2018

You may want to import the valid options from the -build package, so we don't need to keep this up to date every time we release a new version.

Copy link
Member Author

@stsewd stsewd May 25, 2018

Or maybe we can move the v1 to a similar spec. Also, if we release a new version, isn't better to just support it on the latest spec? I mean, so we can force to the users to update p:

content = '''
version: "2"
build:
image: "4.0"
Copy link
Member

@humitos humitos May 24, 2018

You may want to use an invalid name, since 4.0 will be valid sooner and we will need to change this.

Copy link
Member

@humitos humitos May 24, 2018

Mmm... although, if these tests are going to be fixed for v2, maybe we don't want this.

I'm not sure how we are going to handle the tests for different versions of the scheme. Maybe it worth to consider this when creating the test classes associated to this.

python:
guido: true
'''
self.assertValidConfig(content)
Copy link
Member

@humitos humitos May 24, 2018

Why this one does not fail?

Copy link
Member Author

@stsewd stsewd May 25, 2018

Because python loves Guido haha. This is for testing that we allow the user to put other keys on the spec :). Anyway, this was to test the version key wasn't required.

Copy link
Member

@humitos humitos May 29, 2018

This is for testing that we allow the user to put other keys on the spec :)

I suppose that this should explicity fail on keys that are not allowed/ignored. Otherwise, if you had a typo in a valid setting you won't be informed about this and your value/setting won't be picked by our system.

Similar case to what has happened with the readthedocs.yaml when it should be .yml.

Copy link
Member Author

@stsewd stsewd May 29, 2018

Yep, yamala doesn't have a strict mode, I'm trying to figure out if it can be patched

content = '''
version: "2"
python:
system_packages: not true
Copy link
Member

@humitos humitos May 24, 2018

What would happen here with null and with [] and '' for example?

Copy link
Member Author

@stsewd stsewd May 25, 2018

I'll add more test cases

@stsewd
Copy link
Member Author

@stsewd stsewd commented May 25, 2018

Just commenting about the yamale library:

  • There isn't a strict mode 23andMe/Yamale#27 (typos can cost lives)
  • There isn't a default value

I think we can solve both problems by monkey patching the lib (I don't see much activity in the project)

@stsewd
Copy link
Member Author

@stsewd stsewd commented May 26, 2018

I was having some mysterious problems with the temporal files, so have to change from class-based tests to pytest, also I wasn't using anything from django and now I can use a lot of pytest's features 😎

assertValidConfig(tmpdir, content)


@pytest.mark.parametrize('value', ['2', 'environment.py'])
Copy link
Member

@humitos humitos May 30, 2018

Why it does fail with environment.py? I think that it's a valid path and means that the environment.py file is at the root level of the repository.

(there are other places where I have the same question --for example, conda.environment)

Copy link
Member Author

@stsewd stsewd May 30, 2018

It fails because it doesn't exist in the current file system (there is only an environment.yaml file)

Copy link
Member

@humitos humitos May 30, 2018

Got it.

Two things here:

  1. I'd use something more descriptive on the values when we are testing something in particular, like non-existent-file.yml for example. In that case, you don't need to read the whole implementation to understand why it will fail
  2. I think the validation should raise two different exceptions. One for something that's not a "valid path" like http://not-a-valid-path.com and another one for /file/not/found

(note that in 2 I used what I proposed in 1 and it's very clear)

Copy link
Member Author

@stsewd stsewd May 30, 2018

I go a little crazy with the names on tests p:

I'm not sure how to validate a valid path without doing it in a hacky way or a complicated regex, not sure if it's worth it.

Copy link
Member

@humitos humitos May 31, 2018

We probably don't want to validate that the file exists on the schema, since any path is valid for schema but we want to validate that the file exists when using the YAML file to build the docs.

As @ericholscher said in another comment, we may want to have these validations at different places/steps.

(although, my suggestion about the names of the invalid options/files/etc is still valid and I think you should follow it :) )

ALL = 'all'


def flatten(dic, keep_key=False, position=None):
Copy link
Member Author

@stsewd stsewd May 31, 2018

This is a simplified version of the function used internally by yamale p:

Copy link
Member

@humitos humitos May 31, 2018

You may want to add a reference (for the future) to the source code of yamale with a comment above the definition of this function pointing to the source code of the original one.

Copy link
Contributor

@agjohnson agjohnson left a comment

So I think there are two PRs happening here. The first is writing a schema file so we can verify our thoughts around our spec.

The second is moving to use yamale as an application level dependency and move our validation to use yamale internals. I don't think we've made this decision yet, and would probably duplicate a lot of existing code.

Let's discuss this change in another issue. Save some of this work for now, but this PR should be dialed back to just include a way to describe our spec via a schema file in our unit tests.

The schema looks great. All of the additions I can think of are not yet RTD features, so we can hold off on new additions for just right now.


V2_SCHEMA = path.join(
path.dirname(__file__),
'../rtd_tests/fixtures/spec/v2/schema.yml'
Copy link
Contributor

@agjohnson agjohnson Jun 1, 2018

We shouldn't have code outside tests that references tests fixtures. This module should probably live under rtd_tests somewhere instead, as we've only decided to use the library for testing and validating our thoughts around our spec for now.

'unique': ['submodules.include', 'submodules.exclude'],
'default': 'submodules.include',
}
}
Copy link
Contributor

@agjohnson agjohnson Jun 1, 2018

Defaults and constraints should probably live where we are actually parsing the configuration file, as I don't think we're testing defaults yet. Testing defaults and constraints will happen at the parsing level, this PR is working on the spec level.

else:
child[item_position] = value

return child
Copy link
Contributor

@agjohnson agjohnson Jun 1, 2018

What is the underlying need for flattening the dictionary? This seems like it could be an unnecessary step.

Copy link
Member Author

@stsewd stsewd Jun 1, 2018

It makes it easy to manipulate and check the keys, also is better to describe constraints, for example

https://github.com/stsewd/readthedocs.org/blob/yaml-schema-yamale/readthedocs/buildconfig/__init__.py#L176-L180

We don't have to use recursion or several nested bucles to do things like that

Copy link
Contributor

@agjohnson agjohnson Jun 1, 2018

It's not clear if this is still required. It seems like a confusing way to do a nested dict merge. I think if we are implementing defaults through yamale there is a tighter integration rather than keeping this data in a dictionary.

return False


class BuildConfig(object):
Copy link
Contributor

@agjohnson agjohnson Jun 1, 2018

I haven't played with yamale, it's not clear how much of this class is needed to validate our specification in unit tests. If this is more for implementing yamale in our application, we should break this out and save it.

build:
# The build docker image to be used
# Default: '2.0'
image: enum('1.0', '2.0', 'latest', required=False)
Copy link
Contributor

@agjohnson agjohnson Jun 1, 2018

As per readthedocs/readthedocs-docker-images#62, this will be changing:

  • The default should be latest
  • stable is also an allowed tag
  • Numeric versions are allowed, but are not public knowledge (at least not referenced in our documentation)
  • We have versions 3.0 and 4.0rc1 (soon to be 4.0), we could have a 5.0 before this spec changes at all. I don't know how we plan to handle this (update the spec as images are released, and don't bump the version number of the spec?). At least we should include current images here.

Copy link
Member Author

@stsewd stsewd Jun 1, 2018

I think we just need to update the spec without increasing the version number. If there is a v3 spec we can stop updating the images there and add new ones in the most recent spec.

Numeric versions are allowed, but are not public knowledge (at least not referenced in our documentation)

But it is is our docs https://docs.readthedocs.io/en/latest/yaml-config.html#build-image

Copy link
Contributor

@agjohnson agjohnson Jun 1, 2018

Yup, that's changing. We don't want to support numeric versions directly anymore. See the discussion in readthedocs/readthedocs-docker-images#62 for more background.

I think we just need to update the spec without increasing the version number.

Yup, agreed. I think this just needs our current images then.

python:
# The Python version
# Default: '3.6'
version: enum('2', '2.7', '3', '3.5', '3.6', required=False)
Copy link
Contributor

@agjohnson agjohnson Jun 1, 2018

I'm not sure what to do here. This is dependent on the docker image. Our current latest image (our default image will soon be latest) has 2.7, 3.3, 3.4, 3.5, and 3.6. The 4.0 image has 2.7, 3.5, 3.6, and should probably have 3.7 soon.

I think verifying the python version list is a config parsing level validation, we don't necessarily need to worry about validating across fields as part of this exercise. Perhaps here, for our spec discussion, we just put all possible python versions?

Copy link
Member Author

@stsewd stsewd Jun 1, 2018

Perhaps here, for our spec discussion, we just put all possible python versions?

I'd say yes

@@ -0,0 +1,187 @@
"""Validator for the RTD configuration file."""
Copy link
Contributor

@agjohnson agjohnson Jun 1, 2018

As general feedback, I'm not sure this needs to be broken out into a separate django application path. I'd implement this next to similar code so that it was easy to find. I believe doc_builder/ already has similar code around our config file.

See note below though. I don't think this should be application level yet (pending a design discussion around usage of yamale)

@@ -64,6 +64,7 @@ Unipath==1.1
django-kombu==0.9.4
mimeparse==0.1.3
mock==2.0.0
yamale==1.7.0
Copy link
Contributor

@agjohnson agjohnson Jun 1, 2018

This is assuming yamale will be an application dependency, where I think it's only a testing dependency for now. This should be moved to testing.txt

@stsewd
Copy link
Member Author

@stsewd stsewd commented Jun 1, 2018

I have moved the code to just do the schema validation using yamale, the previous code can be recovered from this branch if is needed.

@humitos
Copy link
Member

@humitos humitos commented Jun 4, 2018

the previous code can be recovered from this branch if is needed.

Be careful with this. After this got merged, maybe the branch is deleted and it's more complicated to recover the code if not possible.

@stsewd stsewd requested a review from Jun 5, 2018

# Formats of the documentation to be built
# Default: []
formats: any(list(enum('htmlzip', 'pdf', 'epub')), enum('all'), required=False)
Copy link
Member Author

@stsewd stsewd Jun 6, 2018

This is only valid for sphinx, I wonder if we can move this to the sphinx key, or maybe we want to support this #1939? (still no pdf for mkdocs)

Copy link
Contributor

@agjohnson agjohnson Jun 8, 2018

It's only valid for sphinx for now. We might have other engines that produce a subset of these later. I'm fine with this being top level

@agjohnson
Copy link
Contributor

@agjohnson agjohnson commented Jun 8, 2018

The only thing I don't necessarily like is that we can only inspect the strings as errors from yamale, but that is a yamale problem. We can potentially fix these things later.

@agjohnson agjohnson removed this from the 2.7 milestone Jun 8, 2018
@agjohnson agjohnson added this to the 2.6 milestone Jun 8, 2018
@agjohnson agjohnson merged commit e5c0c1c into readthedocs:master Jun 8, 2018
1 check passed
@stsewd stsewd deleted the yaml-schema-yamale branch Jun 8, 2018
@stsewd stsewd restored the yaml-schema-yamale branch Jun 8, 2018
@stsewd stsewd mentioned this pull request Aug 16, 2018
@stsewd stsewd deleted the yaml-schema-yamale branch Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants