Nested schema auto resolution as default and respect modifiers in references#333
Conversation
lafrech
left a comment
There was a problem hiding this comment.
This has the same (mostly theoretical) limitation as the current implementation: the container fields (Nested, List(Nested), Dict(Nested)) are hardcoded, it won't work with a custom container field.
But your proposal here reminds me that the whole field2property logic hardcodes the container fields. This is nothing new. In this comment I propose a hook to manage those cases. Perhaps could it be extended to manage the auto-ref logic.
In any case, after a quick look, this implementation looks better than the existing IMHO. This looks like the right place for the job. And the hook I mention in above paragraph could cover the custom container field corner case for both documentation and auto-ref.
|
I think your recent commits make my comment above irrelevant (unless it was irrelevant from the beginning), as the resolving happens in a single place (https://github.com/marshmallow-code/apispec/pull/333/files#diff-daf6a711ac369a8d051b9606e0a9eca6R367) and there is no need to add specific cases for each container field. This is great. |
| property = self.field2property( | ||
| field_obj, use_refs=use_refs, dump=dump, load=load, name=name, | ||
| ) | ||
| jsonschema['properties'][observed_field_name] = property |
There was a problem hiding this comment.
AFAIU, this change and the change above (OrderedLazyDict() -> OrderedDict()) are related to each other but unrelated to the issue, right?
Last time I checked, I didn't understand this part of the code, so I appreciate that you could make it simpler but I can't comment on the impact.
There was a problem hiding this comment.
No these changes are necessary if we want to add a schema to the spec from within field2property. The way it worked before is that for each field a prop_func was created with a default parameter of the field_obj. Then that prop_func is loaded into a LazyDict. A the end of fields2jsonschema the LazyDict is returned but none of the prop_funcs are called. They only get called when accessing the item in the dict.
This strategy was added here. With the plan being that if execution of field2property was delayed until all of the schemas are in the spec (and in the ref dictionary) then when field2property eventually gets called it will find the the nested schemas in the ref, insert a ref into the spec, and stop infinite recursion.
Now if we want to move the functionaity to add a nested schema to the spec in field2property there is a problem with this approach. The first time iterating over the spec's definitions all of the all of the prop_funcs in the spec get called for the first time and field2property tires to add schemas to the dictionary during iteration which throws an error. For some reason things like print spec.to_dict() will work but json.dumps(spec.to_dict()) will not.
I think the original authors of the autoreferencing functionality were running into this, which is why they put they logic to iterate over the fields and add nested schemas to the spec where they did. In definition_helper inspect_schema_for_autoreferencing was adding all of the nested schemas into the spec before any of the field2property functions are called.
The plan in this pr is to use regular dictionaries and call field2property immediately from fields2jsonschema. All nested schemas will get added to the spec along the way via a call to spec.definition from field2property - instead of manually in the original plan or via inspect_schema_for_autoreferencing.
I hope this clarifies things, but I fear it might not...
Bangertm
left a comment
There was a problem hiding this comment.
I think your recent commits make my comment above irrelevant (unless it was irrelevant from the beginning), as the resolving happens in a single place (https://github.com/marshmallow-code/apispec/pull/333/files#diff-daf6a711ac369a8d051b9606e0a9eca6R367) and there is no need to add specific cases for each container field. This is great.
I'm glad you pointed it out for the Dict case because I didn't realize that I had redundant logic in the List case. Can you take a look a the two test cases added to make sure they cover the appropriate case.
| property = self.field2property( | ||
| field_obj, use_refs=use_refs, dump=dump, load=load, name=name, | ||
| ) | ||
| jsonschema['properties'][observed_field_name] = property |
There was a problem hiding this comment.
No these changes are necessary if we want to add a schema to the spec from within field2property. The way it worked before is that for each field a prop_func was created with a default parameter of the field_obj. Then that prop_func is loaded into a LazyDict. A the end of fields2jsonschema the LazyDict is returned but none of the prop_funcs are called. They only get called when accessing the item in the dict.
This strategy was added here. With the plan being that if execution of field2property was delayed until all of the schemas are in the spec (and in the ref dictionary) then when field2property eventually gets called it will find the the nested schemas in the ref, insert a ref into the spec, and stop infinite recursion.
Now if we want to move the functionaity to add a nested schema to the spec in field2property there is a problem with this approach. The first time iterating over the spec's definitions all of the all of the prop_funcs in the spec get called for the first time and field2property tires to add schemas to the dictionary during iteration which throws an error. For some reason things like print spec.to_dict() will work but json.dumps(spec.to_dict()) will not.
I think the original authors of the autoreferencing functionality were running into this, which is why they put they logic to iterate over the fields and add nested schemas to the spec where they did. In definition_helper inspect_schema_for_autoreferencing was adding all of the nested schemas into the spec before any of the field2property functions are called.
The plan in this pr is to use regular dictionaries and call field2property immediately from fields2jsonschema. All nested schemas will get added to the spec along the way via a call to spec.definition from field2property - instead of manually in the original plan or via inspect_schema_for_autoreferencing.
I hope this clarifies things, but I fear it might not...
aa4f857 to
3e69588
Compare
|
Current implementation of I think we both have the same nominal use case in mind. All schemas are called I don't think this should be addressed here, but it pushes for a simple check-raise in spec.components.schema. I opened #340 about this. |
|
On second thought, the auto-ref feature could be a reason not to raise an Exception, because it might auto-register a Schema that was already registered manually... We could add a test to check if the already registered schema with the same name is the same schema class, but then it becomes a bit smelly. |
|
Do we want to force the feature or just make it default? In the current state of the PR, the only way to cancel the feature is to pass a no-op resolver to override the default. This is due to my change request of not passing the resolver default in the signature. I realized that afterwards. We could revert that change. It is not a friendly API. We can go the other way and decide the feature is mandatory. This does not seem unreasonable. After all, when viewing the spec with ReDoc (and perhaps swagger-ui, I don't remember), it is done in the viewer frontend. If we do that, we should provide a default name for when the schema name can't be resolved (when the resolver returns None). It can be Then we may write: schema_cls = resolve_schema_cls(schema)
if schema_cls not in self.refs:
self.spec.components.schema(
self.thin_layer_above_schema_name_resolver_providing_default_name(schema_cls),
schema=schema,
dump=dump,
load=load,
)Or we don't try to think too much and we leave it as it is. I think this thought was triggered by the self-nesting or circular nesting case. I still don't totally grasp the change you made w.r.t. the delayed resolution ( Don't miss the "good" in "too good to be true". I like the code simplification brought by this PR. |
|
I'm still trying to wrap my head around the implications of making this feature the default. There are just so many assumptions that have been made along the way. Regarding collisions, my biggest concern is that a nested schema comes in once with Part of the rational for excluding fields when For example we could have a check early in
With the current implementation there is a check on whether |
Exactly.
Yeah, what I thought, too. We should double-check that. I wouldn't mind dropping the dump/load logic since schema used as reference should have the dumpOnly/loadOnly attributes correctly set.
OK, I thought so. We can't afford to break the circular reference use case so it looks like we're in the middle of the bridge: either we keep things as they are, either we make the auto-ref mandatory. Keeping things as they are means back to #307. Either keep, remove or deprecate current implementation (which can be easily introduced by subclassing MarshmallowPlugin). If we choose to make the feature mandatory, we need to sort out a few things:
I wrote:
This shouldn't be an issue since we check if the schema is already registered. The issue is when registering manually a schema that has been auto-registered before. We could skip that second registration, but that means the auto-name will be used instead of the user-specified name. Doing this silently sounds wrong, so an exception might be acceptable. If the user wants to specify the name manually, he should register the schema before using it. This logic could be a blocker for users wanting to specify names manually in case of circular references. And if not a blocker, it could be an annoyance for users wanting to register manual names as they'd need to ensure they register the schemas in the right order. In practice, I'm not sure it is such an issue. I think I already do that in my code. We may postpone this to 2.0 if we can't sort it out easily. We'd still need to decide what to do in 1.0. |
128ee27 to
2e8acc2
Compare
|
Given that this pr is somewhat experimental I added a couple of commits to move towards always adding the schema to the spec and inserting a reference whenever a Marshmallow schema is encountered in a request or response body. I'm starting to think that this might generally be the most desirable out of the box behavior. All of the schemas have names in the user's program. Part of the reason for creating a openapi spec is to communicate the structure and the names of the objects to the user. Auto-resolving the names makes this easier. In fact we may want to remove the
The only way to break the circular reference case is to override the default I added code that warns the user if they attempt to add a schema that is already in the spec. I'm not sure we need an exception, but that could easily be changed. I went with a warning for now because if the users manually adds a schema that has already been added to the spec (automatically) the resulting spec will still be valid. It just may not have the names that they were expecting in the ref and may have an "unused" schema. If we remove the dump/load logic from |
1beb35e to
7653300
Compare
This method can be used without
So should we specify that custom resolver MUST return a |
a0d3026 to
22f7e9a
Compare
|
After thinking about this for a few days I think this is in a pretty good state. Out of the box:
With a custom
I think this seems like a decent mix of desired out of the box behavior with enough customization for users who want that. The nice thing about this from my perspective is that out of the box every schema in the spec will have a name that matches what I call the schema in my code. I don't have to make sure to add nested schemas to the spec before adding the schemas that nest them and I don't have to provide a name resolver. It also makes the resolving schemas from a doc string much more convenient because those schemas get automatically named as well. I can probably eliminate adding schemas manually altogether can just add the paths. |
|
Great! I still need to review the code changes more thoroughly but I agree on the principle.
Yeah, or an access to a class attribute. class MySchema(Schema):
schema_name = 'MySuperSchema'
...
def resolver(schema):
return schema.schema_name |
ac3e8d6 to
03365c2
Compare
|
I agree on the principle. I had a quick look at the code. I'll review more thoroughly later. Edit: This is wrong. This case is covered. |
|
I think this is the way to go. Do you think you could rebase and update docs? |
fddbf33 to
ed34c44
Compare
|
Got the branch up to date and added some information in the docs on working with nested fields and modifiers. |
sloria
left a comment
There was a problem hiding this comment.
Just did a quick once-over, mostly on the docs. I just had a few nits. Can you also update CHANGELOG.rst? Feel free to create a new entry for "1.0.0 (unreleased)". I didn't look at this in-depth, so I'm not the best person to describe these changes concisely.
No need to block on me for merging. @Bangertm @lafrech Feel free to merge when you're confident with this.
aeb50a1 to
ed0a0e8
Compare
ed0a0e8 to
ce1ca3b
Compare
|
@sloria changelog is updated. Can you take a look to make sure it's clear. |
|
@Bangertm, sorry for the delay about this. I just reviewed it again and I think t is good to go once @sloria's comments are addressed. When this is merged, we can release a RC version (maybe with other fixes from https://github.com/marshmallow-code/apispec/milestone/2 it you have something ready) and we're close to a final v1! |
Default schema_name_resolver now passed to MarshmallowPlugin Logic for adding nested schmeas moved from MarshmallowPlugin.inspect_schema_for_auto_referencing to OpenAPIConverter.field2property Removed using LazyDict in favor of regular Dict Warn if schema is added twice Prefer to reference schemas in request and response bodies Catches the RuntimeError and raises an APISpecError indicating that a the schema_name_resolver returned None for a circular reference Also adds a note to the documentation to avoid using a function that returns None for a circular reference
because it doesn't do anything anymore
Key is now a namedtuple of the schema class and its modifiers resolves marshmallow-code#169 Ignoring the Many modifier because putting a schema in an array is is addressed in seperate logic and we would typically want the same ref to a singular schema and an array of objects
ce1ca3b to
f9d1765
Compare
|
Strangely enough, I wasn't notified about the push. Perhaps because it was a force-push. Thanks @Bangertm for going through all this. Let's merge this! Anything else you'd like to add before we release a new RC version? |
|
I've been working on a similar caching mechanism and it was a bit more tricky because the key could include I'm tempted to add a comment to avoid potential issues in the future: # This works because modifiers are either strings or lists/tuples.
# If MODIFIERS ever include dicts, this code will need to be adapted.
for modifier in MODIFIERS:
attribute = getattr(schema, modifier)
try:
hash(attribute)
except TypeError:
attribute = tuple(attribute)
modifiers.append(attribute) |
Relates to #307
This is a proof of concept on combining the logic in inspect_schema_for_auto_referencing with the nesting logic in field2property.
All the tests are passing, but I presume that there are some minor differences in the specs that are not captured by the test. Of potential concern would be cases like this. The second two definition adds would be unnecessary and if the schema name resolved to something other than what the user is putting in as the name there will be extra components in the model. Not sure that is too big of a deal or not.
At this point the cyclomatic complexity of field2property has increased, but there should be some simplifying that can be done. For example I'm not sure that the unbound self referencing cases are valid and the concept of embedding the reference string in field metadata feels like a relic of not having a ref dictionary.