fix(request): Decouple form and URL param parsing #493

necaris · 2015-04-14T04:16:44Z

Separate form and URL parameter parsing into request.param and
request.form_param. Introduces new APIs for accessing the data.

Also, do not consume the POST stream unless the data is requested,
and make the raw POST data available on the request.

Closes #418

lwcolton · 2015-04-14T14:48:24Z

This PR doubles memory usage in the case where the form data is parsed. This happens due to storing the raw body and also the parsed form parameters, where previously the raw body was discarded when form parameters were parsed. The behavior to store the raw body when parsing should be optional.

necaris · 2015-04-14T16:20:47Z

@lwcolton Good point, I'll put that into a RequestOption

necaris · 2015-04-14T16:21:23Z

Also, it's worth noting that falcon-bench on my machine is about 5% slower on this branch :-(

necaris · 2015-04-18T18:41:16Z

Option added as above and rebased onto upstream master

necaris · 2015-04-18T20:57:15Z

@falconry/owners Are multiple commits in a PR OK, or should I squash this into one commit?

lwcolton · 2015-04-18T21:24:03Z

They will ask you to squash please and thank you

On Sat, Apr 18, 2015, 3:57 PM Rami Chowdhury notifications@github.com
wrote:

@falconry/owners https://github.com/orgs/falconry/teams/owners Are
multiple commits in a PR OK, or should I squash this into one commit?

—
Reply to this email directly or view it on GitHub
#493 (comment).

lwcolton · 2015-04-18T21:25:17Z

Also make sure to follow contribution guidelines when putting together your
commit message it needs to have a specific format. There is a
contributing.MD file in the root of the repo with that info

On Sat, Apr 18, 2015, 4:23 PM Colton Leekley-Winslow lwcolton@gmail.com
wrote:

They will ask you to squash please and thank you

On Sat, Apr 18, 2015, 3:57 PM Rami Chowdhury notifications@github.com
wrote:

@falconry/owners https://github.com/orgs/falconry/teams/owners Are
multiple commits in a PR OK, or should I squash this into one commit?

—
Reply to this email directly or view it on GitHub
#493 (comment).

necaris · 2015-04-19T03:27:57Z

Thanks @lwcolton -- I think I've complied with most of the requirements, although I'm not sure about where to put the docstrings for the get_param* methods on the Request. Since the logic for these has been moved to request_helpers.ParamProxy it makes sense to keep the docstring close to the implementation, but since Request.get_param* forms part of the public API perhaps the docstring should be left on Request?

lwcolton · 2015-04-20T05:11:42Z

Because it is part of the public API I personally would leave the docstring
on Request

On Sat, Apr 18, 2015 at 10:28 PM, Rami Chowdhury notifications@github.com
wrote:

Thanks @lwcolton https://github.com/lwcolton -- I think I've complied
with most of the requirements, although I'm not sure about where to put the
docstrings for the get_param* methods on the Request. Since the logic for
these has been moved to request_helpers.ParamProxy it makes sense to keep
the docstring close to the implementation, but since Request.get_param*
forms part of the public API perhaps the docstring should be left on
Request?

—
Reply to this email directly or view it on GitHub
#493 (comment).

Sincerely,

Colton Leekley-Winslow
651-242-9559
lwcolton@gmail.com

kgriffs · 2015-09-30T01:54:28Z

falcon/request.py

+            values. Behaves exactly as the ``params`` attribute.
+
+        raw_body (bytes): The raw POST data, if any. Note that Falcon will not
+            save the raw body unless the `save_raw_body` option is set.


I think it would be helpful to explain why it defaults to not buffering/saving the body.

necaris · 2015-09-30T04:42:19Z

@kgriffs Thanks for the comments! I've made some changes to address your concerns but noticed that the tests are failing after I rebased onto the most recent master -- I'll address these and add a get_as_date method to update the PR to the current API.

kgriffs · 2015-09-30T17:21:37Z

falcon/request.py

+        return self._param
+
+    @property
+    def raw_body(self):


I think the most surprising thing with Falcon's form parsing has always been that it consumes req.stream and you can't go back to read it again. To mitigate this problem, we could say "hey, go and use this raw_body attribute over here in that case; otherwise maybe use req.stream as usual?" But I'm concerned about that leading to more confusion about when to use what, and generally making apps harder to reason about.

I understand that adding a more general raw_body feature may be useful in some cases, yet I still hesitate to do this without better understanding the use cases from the community (and how common they are). I want to be sure we don't encourage anti-patterns, and that we can provide something more elegant than using a middleware component or hook (which is how you would otherwise accomplish this.) All things considered, I think it would be better to split the general case out into a separate issue/PR from the one currently under discussion.

For now, as an alternative, what do you think about replacing req.stream with a file-like object that has been seeked back to the beginning, and simply wraps an instance of bytes that was read in the course of parsing the form body? This would only happen the first time a form param was accessed (thus triggering the consumption and parsing of the original req.stream). Since url-encoded form bodies are generally of reasonable size, this shouldn't cause a significant degradation in memory efficiency. We would have to make sure that the mock stream behaves similarly to the ones provided by, e.g., mod_wsgi, Gunicorn and uWSGI.

That's an interesting alternative -- I think it could work pretty well. I agree that it's less than optimal to add more stuff to the API surface if it doesn't serve a very clear use case; I wonder if it could be as simple as using a BytesIO?

That might do it. I just tested on gunicorn under 2.7 and 3.4 and stream.read() returns bytes, and b'' when the stream is at EOF. wsgiref is similar. You may want to check mod_wsgi and uWSGI as well.

kgriffs · 2015-10-16T17:08:15Z

@necaris Will you have some time to work on this during the next week or two?

Separate form and URL parameter parsing into `request.param` and `request.form_param`. Also, do not consume the POST stream unless the data is requested, and make the raw POST data available on the request if the `store_raw_body` option is set. Closes #418

yohanboniface · 2015-11-10T07:00:53Z

falcon/request_helpers.py

+            if a query parameter is assigned a comma-separated list of
+            values (e.g., 'foo=a,b,c'), only one of those values will be
+            returned, and it is undefined which one. Use
+            `req.get_param_as_list()` to retrieve all the values.


req.params.get_as_list() now :)

yohanboniface · 2015-11-10T07:54:08Z

As we are using lazy proxy objects, here is a suggestion:

use req.query for query string parameters
use req.form for form like parameters
keep req.params as a merge of both: this will not break the current API plus keep the convenient shortcut.

And let's keep in mind that we'll need req.files at some point.

kgriffs · 2015-11-11T06:10:25Z

falcon/request.py

                body,
                keep_blank_qs_values=self.options.keep_blank_qs_values,
            )

-            self._params.update(extra_params)


Since self._params no longer contains form params, we will need to modify req.params to avoid a breaking change.

philiptzou · 2015-11-13T23:18:58Z

Perhaps we should allow customization and extension of ParamProxy class. For example I may want to use voluptuous to validate the input body. Also the content-type can be others for example "multipart/form-data" or "application/json" but not only the "application/x-www-form-urlencoded".

class Request(object):

    param_proxy_class = helpers.URLEncodedParamProxy
    # so we can easily custom the param proxy by inheriting from AbstractParamProxy class
    # which is neutral to any serialization formats and any validators

    ...

    @property
    def form(self):
        if self._form is None:
            # let param_proxy to handle the unserialization of raw_body but not in Request
            self._form = self.param_proxy_class(self, self.raw_body)
        return self._form

    def set_param_proxy_class(self, klass):
        # can also change param_proxy class by instance
        self.param_proxy_class = klass

    ...

So why not we decouple _parse_form_urlencoded from Request too?

philiptzou · 2015-11-24T04:48:15Z

@necaris Would you still like to follow up the comments of this pull-request? Or I can take over of it since I am interested in and if you don't mind :)

necaris · 2015-11-24T16:23:48Z

@philiptzou I don't have a clear plan to implement all the API changes requested recently so I'd be just fine with you taking over this branch 👍

philiptzou · 2015-11-29T05:24:00Z

@necaris I now understand why you don't have a clear plan... wow

I'm now consider to implement ParamProxy as a general lazy proxy but not only support for dictionary API. Since I want to allow future potential that JSON or XML can be parsed too.

kgriffs · 2015-12-11T01:40:38Z

@philiptzou This may not not be the best place to create an extension method for supporting arbitrary body media types. It isn't really a "form" at that point in the sense of the HTML-centric intent of this interface. I'd like to see how this would compare and contrast with parsing body documents via middleware methods or hooks.

For now I like @yohanboniface's suggestion of adding req.query and req.form alongside req.params. req.form and req.params would only consume the request body stream the first time either is called. Once consumed, the body would be buffered into a BytesIO instance which would then be returned by req.stream in lieu of the original req.env['wsgi.input'].

@philiptzou Were you still interested in taking this on? In preparation for the imminent Falcon 1.0, I'm considering putting up a PR to just add the BytesIO buffer, since that could be considered a subtle breaking change. Meanwhile, you could work on adding the additional, non-breaking req.query and req.form attributes in a 1.1 or something (assuming semantic versioning). Finally, in Falcon 2.0 we could consider removing req.params.

philiptzou · 2015-12-11T01:57:06Z

@kgriffs: Actually I have an unfinished BytesIO reversion at now. I think I'm just too caution to minimize resources usage and performance impact. I mean if you want to use plain BytesIO it will always double the memory usage as @lwcolton worried (#493 (comment)). And we'll also lose the advantages of stream reading (had to wait all request body arrived).

Alternatively, We can implement a compatible interface of BytesIO which proxying the original file-like object from WSGI. The advantages is we can still do stream reading, but the proxy will affect performance and there are a lot code to be written since we are going to implement an interface.

I have not decided yet which one should be implemented. And I don't think I have a clear idea. That's the reason why I need some more discussion but I'm still interested in taking this on.

kgriffs · 2015-12-21T17:43:32Z

I can understand the trepidation, and I don't see a problem with making the buffering optional via RequestOptions. However, I suspect that most application/x-www-form-urlencoded strings are not very long, and so would not incur a significant memory overhead when buffered. Let's suppose that the lengths of form bodies in a given app average 256 bytes. Even if one of my web heads is serving an extremely large number of simultaneous requests/sec, say 10,000, the app would only use about 2 mb of RAM.

Likewise, with such small bodies, I don't think streaming is going to be much of a concern. Since the body will usually fit in one or two TCP packets, all the data will already have arrived by the time the app attempts its first read.

Regardless, I've been thinking that we need to step back and re-examine our assumptions. I'm still not sure that I completely understand why people have been wanting to re-parse form bodies. Here are a few options:

To deal with collisions between field names in the query string and in the form. But this doesn't make a lot of sense, since form params are parsed after the query string, meaning the app would need to re-parse the raw query string to get the clobbered value, not the request body.
To work around some problem with the way Falcon parses application/x-www-form-urlencoded. Ideally this should be addressed by fixing the problem directly.

Whatever the reason, if the request body were only parsed on-demand (i.e., when the hypothetical req.form is first called), then at least the application would have some control over whether or not (or when) the body is consumed.

With that in mind, it may be better to just go ahead with the plan to disentangle the interfaces for form and query string parsing. Parse params just-in-time (the first time requested), and don't replace req.stream with an instance of BytesIO or similar. If the community feedback still indicates a need for buffering the body, we can cross that bridge when we come to it.

kgriffs · 2016-02-24T16:51:22Z

Another thought on this... perhaps we can provide robust HTML form handling via a standalone add-on (e.g., "falcon-forms"), vs. being part of the core framework. We could add an option to disable parsing form params as currently implemented, in the case that a developer would like to handle it in a different way.

kgriffs · 2016-05-10T23:17:05Z

Update: I plan on discussing this during the PyCon sprints. I'd like to settle on a design direction and get this finished up.

kgriffs · 2016-05-10T23:18:25Z

P.S. See also: https://github.com/yohanboniface/falcon-multipart/blob/master/falcon_multipart/middleware.py#L25

lwcolton · 2016-11-20T18:19:06Z

@kgriffs You previously commented that you liked the idea of lazy-access request attributes for form data, so that on first access the request stream is consumed, used to populate the form data and files request attributes, and not saved or wrapped. This is what I have seen in other frameworks and allows a developer to consume the stream themselves if they wish. Would that be acceptable here? Would love to help this finally get merged

kgriffs · 2016-11-23T02:01:21Z

@lwcolton Cool, thanks for following up on this. I've also been thinking about this and finally sat down today to write up a proposal. I'd love to get your thoughts on what I just posted to #418. I'm going to close this PR since it is so old. If this is something you'd like to work on, let me know. Otherwise we'll see if anyone else has some bandwidth to take and run with it.

mcm mentioned this pull request Apr 14, 2015

feat(request): Add function to get param as date #487

Closed

necaris mentioned this pull request Apr 18, 2015

feat(request): Add function to get param as date #498

Closed

necaris mentioned this pull request Apr 19, 2015

After doing req.stream.read(), is there a way to read the POST data again? #417

Open

lwcolton mentioned this pull request Sep 14, 2015

Decouple parsing of form fields and URL params #418

Closed

kgriffs reviewed Sep 30, 2015
View reviewed changes

necaris and others added 2 commits October 17, 2015 09:28

Changes to address @kgriffs concerns

14243db

kgriffs mentioned this pull request Nov 9, 2015

Reset stream after reading it while parsing urlencoded body #649

Closed

yohanboniface reviewed Nov 10, 2015
View reviewed changes

kgriffs reviewed Nov 11, 2015
View reviewed changes

philiptzou mentioned this pull request Nov 20, 2015

How do I handle multipart form based file uploads? #469

Closed

yohanboniface mentioned this pull request Jan 15, 2016

Add files to req.files instead when falcon is ready for that yohanboniface/falcon-multipart#2

Open

kgriffs added the needs-revision label Feb 3, 2016

kgriffs mentioned this pull request Mar 2, 2016

Add an option to RequestOptions to toggle parsing of form params #719

Closed

kgriffs added the design discussion label Mar 3, 2016

kgriffs added the blocked-by-release label Apr 12, 2016

kgriffs mentioned this pull request May 3, 2016

Revisit form-based file uploads #294

Closed

kgriffs removed the blocked-by-release label May 23, 2016

kgriffs removed the design discussion label Oct 28, 2016

kgriffs closed this Nov 23, 2016

vytas7 mentioned this pull request Nov 21, 2020

Add ability to cache the stream data #1746

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(request): Decouple form and URL param parsing #493

fix(request): Decouple form and URL param parsing #493

necaris commented Apr 14, 2015

lwcolton commented Apr 14, 2015

necaris commented Apr 14, 2015

necaris commented Apr 14, 2015

necaris commented Apr 18, 2015

necaris commented Apr 18, 2015

lwcolton commented Apr 18, 2015

lwcolton commented Apr 18, 2015

necaris commented Apr 19, 2015

lwcolton commented Apr 20, 2015

kgriffs Sep 30, 2015

necaris commented Sep 30, 2015

kgriffs Sep 30, 2015

necaris Sep 30, 2015

kgriffs Oct 2, 2015

kgriffs commented Oct 16, 2015

yohanboniface Nov 10, 2015

yohanboniface commented Nov 10, 2015

kgriffs Nov 11, 2015

philiptzou commented Nov 13, 2015

philiptzou commented Nov 24, 2015

necaris commented Nov 24, 2015

philiptzou commented Nov 29, 2015

kgriffs commented Dec 11, 2015

philiptzou commented Dec 11, 2015

kgriffs commented Dec 21, 2015

kgriffs commented Feb 24, 2016

kgriffs commented May 10, 2016

kgriffs commented May 10, 2016

lwcolton commented Nov 20, 2016

kgriffs commented Nov 23, 2016

fix(request): Decouple form and URL param parsing #493

fix(request): Decouple form and URL param parsing #493

Conversation

necaris commented Apr 14, 2015

lwcolton commented Apr 14, 2015

necaris commented Apr 14, 2015

necaris commented Apr 14, 2015

necaris commented Apr 18, 2015

necaris commented Apr 18, 2015

lwcolton commented Apr 18, 2015

lwcolton commented Apr 18, 2015

necaris commented Apr 19, 2015

lwcolton commented Apr 20, 2015

kgriffs Sep 30, 2015

Choose a reason for hiding this comment

necaris commented Sep 30, 2015

kgriffs Sep 30, 2015

Choose a reason for hiding this comment

necaris Sep 30, 2015

Choose a reason for hiding this comment

kgriffs Oct 2, 2015

Choose a reason for hiding this comment

kgriffs commented Oct 16, 2015

yohanboniface Nov 10, 2015

Choose a reason for hiding this comment

yohanboniface commented Nov 10, 2015

kgriffs Nov 11, 2015

Choose a reason for hiding this comment

philiptzou commented Nov 13, 2015

philiptzou commented Nov 24, 2015

necaris commented Nov 24, 2015

philiptzou commented Nov 29, 2015

kgriffs commented Dec 11, 2015

philiptzou commented Dec 11, 2015

kgriffs commented Dec 21, 2015

kgriffs commented Feb 24, 2016

kgriffs commented May 10, 2016

kgriffs commented May 10, 2016

lwcolton commented Nov 20, 2016

kgriffs commented Nov 23, 2016