New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema for configuration file with yamale #4084

Merged
merged 30 commits into from Jun 8, 2018

Conversation

Projects
None yet
4 participants
@stsewd
Member

stsewd commented May 11, 2018

This refs to #4058

I haven't tested this yet, I will continue later, just wanted to upload it for an early review :)

@stsewd stsewd referenced this pull request May 11, 2018

Closed

Add schema for yaml configuration file #4079

0 of 4 tasks complete

@agjohnson agjohnson added this to the 2.5 milestone May 11, 2018

@@ -108,6 +108,21 @@ def make_test_hg():
return directory
def apply_fs(tmpdir, contents):

This comment has been minimized.

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

Since readthedocs depends on -build, why not just import and use it?

This comment has been minimized.

@stsewd

stsewd May 25, 2018

Member

ha, good point, but I feel a little unsure about that because we are going to re-integrate the build project 🤔

This comment has been minimized.

@humitos

humitos May 29, 2018

Member

I think that's a problem for the future :) and we will need to remove this function from here anyway, hehe.

apply_fs(self.tmpdir, file)
return path.join(self.tmpdir.strpath, 'rtd.yml')
def assertValidConfig(self, content):

This comment has been minimized.

@stsewd

stsewd May 14, 2018

Member

Camel case just for follow the same style as the self.assert methods

def assertValidConfig(self, content):
file = self.create_yaml(content)
data = yamale.make_data(file)
yamale.validate(self.schema, data)

This comment has been minimized.

@stsewd

stsewd May 14, 2018

Member

This function doesn't return anything when is successful

# Default: false
pip_install: bool(required=False)
# Install the project using python setup.py install or pip
install: enum('pip', 'setup.py', required=False)

This comment has been minimized.

@stsewd

stsewd May 14, 2018

Member

This is the first breaking change :), I think is more simple to do this, instead of having a boolean field for each

This comment has been minimized.

@stsewd

stsewd May 14, 2018

Member

Also, the extra requirements field just works when the project is installed using pip, but setup.py doesn't allow this too? 🤔

python: include('python', required=False)
# Configuration for sphinx
sphinx: include('sphinx', required=False)

This comment has been minimized.

@stsewd

stsewd May 14, 2018

Member

Instead of the Documentation type I have added a sphinx key. We want to still support mkdocs, right? If so, do you prefer the documentation type key back or having a mkdocs?

This comment has been minimized.

@stsewd

stsewd May 15, 2018

Member

We could use this to extend the build and support things like #1139

This comment has been minimized.

@ericholscher

ericholscher May 24, 2018

Member

Not sure. It does seem reasonable to have a sphinx and mkdocs section, and just always default to Sphinx unless we see a mkdocs section or something.

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

Having two different sections could be useful be extended in the future.

In case this works as Eric mentioned, we will need a check to to accept one or the other, but not both.

# Give the virtual environment access to the global site-packages dir
# Default: false
system_packages: bool(required=False)

This comment has been minimized.

@stsewd

stsewd May 14, 2018

Member

I wasn't 100% sure if this should belong in the python key

This comment has been minimized.

@ericholscher

ericholscher May 24, 2018

Member

I mostly want this option to go away, but I don't know if we can do it quite yet. I think it can go in python though.

# Read the Docs configuration file
# The version of the spec to be use
version: enum('2')

This comment has been minimized.

@stsewd

stsewd May 15, 2018

Member

I assume this spec will be used only on the version 2, if there is another version, we will need to create another schema.

Also, I was thinking that we can use this to validate the current v1 API. But I'm not sure if that's worth it.

This comment has been minimized.

@ericholscher

ericholscher May 24, 2018

Member

Yea, I think we can keep v1 as it is.

# Formats of the documentation to be built
# Default: ['htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia']
formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)

This comment has been minimized.

@stsewd

stsewd May 15, 2018

Member

We can't break this line, sadly :/

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

singlehtmllocalmedia shouldn't be in the default values after the discussion we had today.

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

I also would remove the all option and make it always explicit

This comment has been minimized.

@ericholscher

ericholscher May 25, 2018

Member

I think maybe we should even just default to building HTML, and let users specify if they want more. We build a ton of PDF's & Epub's too that are never used.

The original thought was those were good for users, even if the doc owners never used them. Perhaps we could continue building PDF's, but epub's aren't heavily used.

This comment has been minimized.

@stsewd

stsewd May 25, 2018

Member

We build a ton of PDF's & Epub's too that are never used

haha, yeah I don't think much people download docs to read offline these days. I think the all keyword can remain just to keep consistency around the spec.

Maybe an empty list as default?

This comment has been minimized.

@humitos

humitos May 29, 2018

Member

Most of the people that uses the PDF is because they don't usually have 24/7 internet access --I know people who regularly ask for PDF to be downloaded.

I would like to suggest a crazy idea (for another PR, etc) that could help RTD servers to be cleaner and without tons of useless files: build extra resourses on demand and save a copy (sync servers) for the requested versions only. That way, we will only have the extra resources for those versions that at least where used once.

formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)
# The path to the requirements file from the root of the project
requirements_file: str(required=False)

This comment has been minimized.

@stsewd

stsewd May 15, 2018

Member

I think this can be moved to the python section

This comment has been minimized.

@stsewd

stsewd May 15, 2018

Member

People using conda, they just use the environment file, right?

This comment has been minimized.

@ericholscher

ericholscher May 24, 2018

Member

I believe so. Though moving it to Python could be a bit weird for projects not written in Python, but using this to install Sphinx plugins. I don't feel strongly either way.

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

It sounds like something python specific to me since it's for pip which is something for python. I agree on moving it to the Python section.

redirects:
# Key/value list, represent redirects of type page
# from url -> to url
page: map(str(), str())

This comment has been minimized.

@stsewd

stsewd May 15, 2018

Member

For redirects, this is the only type that makes sense here. I didn't add the html_to_dir type, I think that's more like a global setting.

This comment has been minimized.

@stsewd

stsewd May 15, 2018

Member

Maybe we can just drop the page key and use only redirects

This comment has been minimized.

@ericholscher

ericholscher May 24, 2018

Member

Yea, we likely need a type as well (since we have a few types). So perhaps it's a list of dicts, with the dict being {type, from, to} or similar.

@stsewd

This comment has been minimized.

Member

stsewd commented May 16, 2018

I think we can add some stuff from Feature.

From the Feature model, I just found the pip_always_upgrade and use_setuptools_laterst features that may be useful.

@stsewd

This comment has been minimized.

Member

stsewd commented May 17, 2018

Additional options to consider:

@ericholscher

This looks like a good approach. I like that it will do the validation for us, with us just maintaining the schema, instead of the entire validation code.

# Read the Docs configuration file
# The version of the spec to be use
version: enum('2')

This comment has been minimized.

@ericholscher

ericholscher May 24, 2018

Member

Yea, I think we can keep v1 as it is.

formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)
# The path to the requirements file from the root of the project
requirements_file: str(required=False)

This comment has been minimized.

@ericholscher

ericholscher May 24, 2018

Member

I believe so. Though moving it to Python could be a bit weird for projects not written in Python, but using this to install Sphinx plugins. I don't feel strongly either way.

python: include('python', required=False)
# Configuration for sphinx
sphinx: include('sphinx', required=False)

This comment has been minimized.

@ericholscher

ericholscher May 24, 2018

Member

Not sure. It does seem reasonable to have a sphinx and mkdocs section, and just always default to Sphinx unless we see a mkdocs section or something.

# Give the virtual environment access to the global site-packages dir
# Default: false
system_packages: bool(required=False)

This comment has been minimized.

@ericholscher

ericholscher May 24, 2018

Member

I mostly want this option to go away, but I don't know if we can do it quite yet. I think it can go in python though.

redirects:
# Key/value list, represent redirects of type page
# from url -> to url
page: map(str(), str())

This comment has been minimized.

@ericholscher

ericholscher May 24, 2018

Member

Yea, we likely need a type as well (since we have a few types). So perhaps it's a list of dicts, with the dict being {type, from, to} or similar.

@humitos

This is just great! :)

@@ -108,6 +108,21 @@ def make_test_hg():
return directory
def apply_fs(tmpdir, contents):

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

Since readthedocs depends on -build, why not just import and use it?

# Formats of the documentation to be built
# Default: ['htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia']
formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

singlehtmllocalmedia shouldn't be in the default values after the discussion we had today.

# Formats of the documentation to be built
# Default: ['htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia']
formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

I also would remove the all option and make it always explicit

formats: any(list(enum('htmlzip', 'pdf', 'epub', 'singlehtmllocalmedia')), enum('all'), required=False)
# The path to the requirements file from the root of the project
requirements_file: str(required=False)

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

It sounds like something python specific to me since it's for pip which is something for python. I agree on moving it to the Python section.

# Configuration for the documentation build process
build: include('build', required=False)
# Configuration of the Python executable to be used

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

Instead of executable, maybe "Python environment"

with pytest.raises(ValueError) as excinfo:
yamale.validate(self.schema, data)
for msg in msgs:
self.assertIn(msg, str(excinfo.value))

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

Is it possible to have multiple different messages in only one exception?

I think you can simplify this by receiving just a msg that it's a string.

This comment has been minimized.

@stsewd

stsewd May 25, 2018

Member

This is more for easy testing, so I don't have to write the whole error or use some weird regex p:

This comment has been minimized.

@humitos

humitos May 29, 2018

Member

Not following you.

What I mean here is that I think we don't need the for msg in msgs since the msgs will be only one always, right?

build:
image: "{image}"
'''
for image in ['1.0', '2.0', 'latest']:

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

You may want to import the valid options from the -build package, so we don't need to keep this up to date every time we release a new version.

This comment has been minimized.

@stsewd

stsewd May 25, 2018

Member

Or maybe we can move the v1 to a similar spec. Also, if we release a new version, isn't better to just support it on the latest spec? I mean, so we can force to the users to update p:

content = '''
version: "2"
build:
image: "4.0"

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

You may want to use an invalid name, since 4.0 will be valid sooner and we will need to change this.

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

Mmm... although, if these tests are going to be fixed for v2, maybe we don't want this.

I'm not sure how we are going to handle the tests for different versions of the scheme. Maybe it worth to consider this when creating the test classes associated to this.

python:
guido: true
'''
self.assertValidConfig(content)

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

Why this one does not fail?

This comment has been minimized.

@stsewd

stsewd May 25, 2018

Member

Because python loves Guido haha. This is for testing that we allow the user to put other keys on the spec :). Anyway, this was to test the version key wasn't required.

This comment has been minimized.

@humitos

humitos May 29, 2018

Member

This is for testing that we allow the user to put other keys on the spec :)

I suppose that this should explicity fail on keys that are not allowed/ignored. Otherwise, if you had a typo in a valid setting you won't be informed about this and your value/setting won't be picked by our system.

Similar case to what has happened with the readthedocs.yaml when it should be .yml.

This comment has been minimized.

@stsewd

stsewd May 29, 2018

Member

Yep, yamala doesn't have a strict mode, I'm trying to figure out if it can be patched

content = '''
version: "2"
python:
system_packages: not true

This comment has been minimized.

@humitos

humitos May 24, 2018

Member

What would happen here with null and with [] and '' for example?

This comment has been minimized.

@stsewd

stsewd May 25, 2018

Member

I'll add more test cases

@stsewd

This comment has been minimized.

Member

stsewd commented May 25, 2018

Just commenting about the yamale library:

  • There isn't a strict mode 23andMe/Yamale#27 (typos can cost lives)
  • There isn't a default value

I think we can solve both problems by monkey patching the lib (I don't see much activity in the project)

@stsewd

This comment has been minimized.

Member

stsewd commented May 26, 2018

I was having some mysterious problems with the temporal files, so have to change from class-based tests to pytest, also I wasn't using anything from django and now I can use a lot of pytest's features 😎

@@ -273,6 +292,16 @@ def test_python_extra_requirements_invalid(tmpdir):
)
@pytest.mark.parametrize('value', ['', 'null', '[]'])

This comment has been minimized.

@humitos

humitos May 30, 2018

Member

Oh, yes!

assertValidConfig(tmpdir, content)
@pytest.mark.parametrize('value', ['2', 'environment.py'])

This comment has been minimized.

@humitos

humitos May 30, 2018

Member

Why it does fail with environment.py? I think that it's a valid path and means that the environment.py file is at the root level of the repository.

(there are other places where I have the same question --for example, conda.environment)

This comment has been minimized.

@stsewd

stsewd May 30, 2018

Member

It fails because it doesn't exist in the current file system (there is only an environment.yaml file)

This comment has been minimized.

@humitos

humitos May 30, 2018

Member

Got it.

Two things here:

  1. I'd use something more descriptive on the values when we are testing something in particular, like non-existent-file.yml for example. In that case, you don't need to read the whole implementation to understand why it will fail
  2. I think the validation should raise two different exceptions. One for something that's not a "valid path" like http://not-a-valid-path.com and another one for /file/not/found

(note that in 2 I used what I proposed in 1 and it's very clear)

This comment has been minimized.

@stsewd

stsewd May 30, 2018

Member

I go a little crazy with the names on tests p:

I'm not sure how to validate a valid path without doing it in a hacky way or a complicated regex, not sure if it's worth it.

This comment has been minimized.

@humitos

humitos May 31, 2018

Member

We probably don't want to validate that the file exists on the schema, since any path is valid for schema but we want to validate that the file exists when using the YAML file to build the docs.

As @ericholscher said in another comment, we may want to have these validations at different places/steps.

(although, my suggestion about the names of the invalid options/files/etc is still valid and I think you should follow it :) )

ALL = 'all'
def flatten(dic, keep_key=False, position=None):

This comment has been minimized.

@stsewd

stsewd May 31, 2018

Member

This is a simplified version of the function used internally by yamale p:

This comment has been minimized.

@humitos

humitos May 31, 2018

Member

You may want to add a reference (for the future) to the source code of yamale with a comment above the definition of this function pointing to the source code of the original one.

@agjohnson

So I think there are two PRs happening here. The first is writing a schema file so we can verify our thoughts around our spec.

The second is moving to use yamale as an application level dependency and move our validation to use yamale internals. I don't think we've made this decision yet, and would probably duplicate a lot of existing code.

Let's discuss this change in another issue. Save some of this work for now, but this PR should be dialed back to just include a way to describe our spec via a schema file in our unit tests.

The schema looks great. All of the additions I can think of are not yet RTD features, so we can hold off on new additions for just right now.

V2_SCHEMA = path.join(
path.dirname(__file__),
'../rtd_tests/fixtures/spec/v2/schema.yml'

This comment has been minimized.

@agjohnson

agjohnson Jun 1, 2018

Contributor

We shouldn't have code outside tests that references tests fixtures. This module should probably live under rtd_tests somewhere instead, as we've only decided to use the library for testing and validating our thoughts around our spec for now.

'unique': ['submodules.include', 'submodules.exclude'],
'default': 'submodules.include',
}
}

This comment has been minimized.

@agjohnson

agjohnson Jun 1, 2018

Contributor

Defaults and constraints should probably live where we are actually parsing the configuration file, as I don't think we're testing defaults yet. Testing defaults and constraints will happen at the parsing level, this PR is working on the spec level.

else:
child[item_position] = value
return child

This comment has been minimized.

@agjohnson

agjohnson Jun 1, 2018

Contributor

What is the underlying need for flattening the dictionary? This seems like it could be an unnecessary step.

This comment has been minimized.

@stsewd

stsewd Jun 1, 2018

Member

It makes it easy to manipulate and check the keys, also is better to describe constraints, for example

https://github.com/stsewd/readthedocs.org/blob/yaml-schema-yamale/readthedocs/buildconfig/__init__.py#L176-L180

We don't have to use recursion or several nested bucles to do things like that

This comment has been minimized.

@agjohnson

agjohnson Jun 1, 2018

Contributor

It's not clear if this is still required. It seems like a confusing way to do a nested dict merge. I think if we are implementing defaults through yamale there is a tighter integration rather than keeping this data in a dictionary.

return False
class BuildConfig(object):

This comment has been minimized.

@agjohnson

agjohnson Jun 1, 2018

Contributor

I haven't played with yamale, it's not clear how much of this class is needed to validate our specification in unit tests. If this is more for implementing yamale in our application, we should break this out and save it.

build:
# The build docker image to be used
# Default: '2.0'
image: enum('1.0', '2.0', 'latest', required=False)

This comment has been minimized.

@agjohnson

agjohnson Jun 1, 2018

Contributor

As per rtfd/readthedocs-docker-images#62, this will be changing:

  • The default should be latest
  • stable is also an allowed tag
  • Numeric versions are allowed, but are not public knowledge (at least not referenced in our documentation)
  • We have versions 3.0 and 4.0rc1 (soon to be 4.0), we could have a 5.0 before this spec changes at all. I don't know how we plan to handle this (update the spec as images are released, and don't bump the version number of the spec?). At least we should include current images here.

This comment has been minimized.

@stsewd

stsewd Jun 1, 2018

Member

I think we just need to update the spec without increasing the version number. If there is a v3 spec we can stop updating the images there and add new ones in the most recent spec.

Numeric versions are allowed, but are not public knowledge (at least not referenced in our documentation)

But it is is our docs https://docs.readthedocs.io/en/latest/yaml-config.html#build-image

This comment has been minimized.

@agjohnson

agjohnson Jun 1, 2018

Contributor

Yup, that's changing. We don't want to support numeric versions directly anymore. See the discussion in rtfd/readthedocs-docker-images#62 for more background.

I think we just need to update the spec without increasing the version number.

Yup, agreed. I think this just needs our current images then.

python:
# The Python version
# Default: '3.6'
version: enum('2', '2.7', '3', '3.5', '3.6', required=False)

This comment has been minimized.

@agjohnson

agjohnson Jun 1, 2018

Contributor

I'm not sure what to do here. This is dependent on the docker image. Our current latest image (our default image will soon be latest) has 2.7, 3.3, 3.4, 3.5, and 3.6. The 4.0 image has 2.7, 3.5, 3.6, and should probably have 3.7 soon.

I think verifying the python version list is a config parsing level validation, we don't necessarily need to worry about validating across fields as part of this exercise. Perhaps here, for our spec discussion, we just put all possible python versions?

This comment has been minimized.

@stsewd

stsewd Jun 1, 2018

Member

Perhaps here, for our spec discussion, we just put all possible python versions?

I'd say yes

@@ -0,0 +1,187 @@
"""Validator for the RTD configuration file."""

This comment has been minimized.

@agjohnson

agjohnson Jun 1, 2018

Contributor

As general feedback, I'm not sure this needs to be broken out into a separate django application path. I'd implement this next to similar code so that it was easy to find. I believe doc_builder/ already has similar code around our config file.

See note below though. I don't think this should be application level yet (pending a design discussion around usage of yamale)

@@ -64,6 +64,7 @@ Unipath==1.1
django-kombu==0.9.4
mimeparse==0.1.3
mock==2.0.0
yamale==1.7.0

This comment has been minimized.

@agjohnson

agjohnson Jun 1, 2018

Contributor

This is assuming yamale will be an application dependency, where I think it's only a testing dependency for now. This should be moved to testing.txt

@stsewd

This comment has been minimized.

Member

stsewd commented Jun 1, 2018

I have moved the code to just do the schema validation using yamale, the previous code can be recovered from this branch if is needed.

@humitos

This comment has been minimized.

Member

humitos commented Jun 4, 2018

the previous code can be recovered from this branch if is needed.

Be careful with this. After this got merged, maybe the branch is deleted and it's more complicated to recover the code if not possible.

@stsewd stsewd requested a review from rtfd/core Jun 5, 2018

# Formats of the documentation to be built
# Default: []
formats: any(list(enum('htmlzip', 'pdf', 'epub')), enum('all'), required=False)

This comment has been minimized.

@stsewd

stsewd Jun 6, 2018

Member

This is only valid for sphinx, I wonder if we can move this to the sphinx key, or maybe we want to support this #1939? (still no pdf for mkdocs)

This comment has been minimized.

@agjohnson

agjohnson Jun 8, 2018

Contributor

It's only valid for sphinx for now. We might have other engines that produce a subset of these later. I'm fine with this being top level

@agjohnson

This comment has been minimized.

Contributor

agjohnson commented Jun 8, 2018

The only thing I don't necessarily like is that we can only inspect the strings as errors from yamale, but that is a yamale problem. We can potentially fix these things later.

@agjohnson agjohnson modified the milestones: 2.7, 2.6 Jun 8, 2018

@agjohnson agjohnson merged commit e5c0c1c into rtfd:master Jun 8, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@stsewd stsewd deleted the stsewd:yaml-schema-yamale branch Jun 8, 2018

@stsewd stsewd restored the stsewd:yaml-schema-yamale branch Jun 8, 2018

@stsewd stsewd referenced this pull request Aug 16, 2018

Open

Config file v2 docs #4451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment