Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser passes incomplete data to marshmallow.pre_load schema methods #173

Closed
benkogan opened this issue Jun 20, 2017 · 4 comments
Closed

Comments

@benkogan
Copy link

When using a Marshmallow schema with a @pre_load-decorated method, the data passed as the arg to the method includes only fields specified in the schema. This breaks pre_load methods that depend on data entires not in fields such as the enveloping example given in the Marshmallow docs.

Here's a minimal example using Flask to demonstrate the issue:

from flask import Flask, jsonify
from marshmallow import Schema, fields, pre_load
from webargs.flaskparser import use_args

class Args(Schema):
    foo = fields.String()

    class Meta:
        strict = True

    @pre_load
    def unwrap(self, data):
        return data['data']

app = Flask(__name__)

@app.route('/', methods=['POST'])
@use_args(Args)
def index(args):
    print('got: {}'.format(args))
    return jsonify(dict(got=args))

if __name__ == '__main__':
    app.run(port=5001, debug=True)

When sending data to the route above (for example using httpie: http POST :5001/ data:='{"foo": "bar"}' the following trace is produced:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 2000, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1991, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1567, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.4/dist-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1988, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1641, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1544, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.4/dist-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1639, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.4/dist-packages/flask/app.py", line 1625, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.4/dist-packages/webargs/core.py", line 438, in wrapper
    force_all=force_all)
  File "/usr/local/lib/python3.4/dist-packages/webargs/core.py", line 360, in parse
    result = self.load(parsed, schema)
  File "/usr/local/lib/python3.4/dist-packages/webargs/core.py", line 294, in load
    return schema.load(data)
  File "/usr/local/lib/python3.4/dist-packages/marshmallow/schema.py", line 580, in load
    result, errors = self._do_load(data, many, partial=partial, postprocess=True)
  File "/usr/local/lib/python3.4/dist-packages/marshmallow/schema.py", line 648, in _do_load
    original_data=data)
  File "/usr/local/lib/python3.4/dist-packages/marshmallow/schema.py", line 855, in _invoke_load_processors
    data=data, many=many, original_data=original_data)
  File "/usr/local/lib/python3.4/dist-packages/marshmallow/schema.py", line 957, in _invoke_processors
    data = utils.if_none(processor(data), data)
  File "/home/bkogan/webargs-issue.py", line 14, in unwrap
    return data['data']
KeyError: 'data'

If additional = ['data'] is added to the Meta class above, the example code works as intended since the envelope key is now a field. However, I doubt this seeming-hack is the intended approach.

The culprit appears to be in webargs.core.Parser#_parse_request. Per the line linked here, the parser iterates over and adds only values that correspond to schema.fields — rather than all request data — to the dict that is ultimately passed to the given schema's load method.

If this is indeed an unintended bug I would be happy to submit a PR.

@sloria
Copy link
Member

sloria commented Aug 16, 2017

Thanks for the report. I would be happy to review a PR.

@klapaukh
Copy link

I have been having a look at how this currently works / why this error is occurring in the hopes of creating a PR to correct it. I ran into the bug when trying to run the example from the Marshmallow validators example which throws a validation error when unused arguments are present.

It think the error boils down to the fact that the _parse_request() function does the wrong thing (based on the way it is used inside of parse()).
Currently the function extracts arguments that could be useful from all the locations specified. These are then passed into load() which does the rest of the construction, then calls the validator. However, any arguments that were not explicitly fields needed don't make it in, so the full set of arguments is never available to the validator.

I see a couple of solutions to this problems, however, they may have unintended consequences:

  1. Change _parse_request() to return all the parameters (tagged by their source location). Then parse those in load() where they are actually used. This does however raise the question of whether _parse_request() should only look in the specified locations, or in all of them. Which set of fields should be accessible as the full set of provided fields (i.e. only the specifically requested ones, or all of them)?
  2. Remove _parse_request() entirely and move the field fetching logic into load(). Fields are fetched as they are needed. The the full set of fields is only loaded if it is ever actually requested.
  3. Keep both functions exactly as they are. Change the logic that loads the original data in.

I am pretty sure that the logic for the last two options actually exists in Marshmallow rather than inside of Webargs so is not reasonable to change for this.

However, option 1 has a different problem. In the case of Fields that have specified load locations (instead of the general passed in ones) this can cause problems. I am pretty sure that this is a webargs specific feature since Marshmallow isn't going to be aware of the request context. This is because it is no longer clear what the full set of data to load should be (or rather there are now more options).

The options I see for full set of fields is:

  1. All the fields from all the different possible data sources. Even the ones not specified in the locations variable
  2. All the fields from the requested locations, and all the fields from locations specified as sources for individual fields.
  3. All the fields from the requested locations. Fields from specified locations are loaded afterwards (to overwrite others), but only the specific ones requested.

1 gives the validation method the ability to really be certain that no unnecessary arguments are being passed (e.g. in the example that I came here from), however, it also means that they would always have to think about what headers may be attached, which is not something you always want to do.

Both 1 & 2, however, come with the property that it is hard for them to actually limit the reading of arguments to only the locations that were requested. I believe that this is because loading is done using Marshamallow which doesn't have a notion of where an argument came from. Instead it will simply read arguments from the dictionary it gets no matter their original source.

3 is the option which is the middle ground between the two. This ensures that the loading code only gets fields which it is actually allowed to see, and does not require any changes. However, it does limit what is seen as the 'full' set of original parameters. It also runs into a difficulty with cases like that found in test_core.py:test_arg_location_param(). In this case there are two places to find the field foo. This method will correctly return the value as 42. However, it will also hide the fact that the foo option was present in the json field as well. Here again, a validator at the Marshmallow layer will not be aware of the problem as the json one would have been overwritten. It is not practical to keep both until validation as the loader will not be able to tell where which one came from and so which one should be loaded.

For my PR I am going to try implement options 1 & 3 above. Any feedback would be much appreciated.

@vashchukmaksim
Copy link

Are there any workarounds on this for now? I still can't get arguments that are not in schema in any @pre_* schema functions.

@sloria
Copy link
Member

sloria commented Dec 28, 2018

Let's continue discussion of this in #267, as it's the same issue described here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants