-
-
Notifications
You must be signed in to change notification settings - Fork 944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(request): Decouple form and URL param parsing #493
Conversation
This PR doubles memory usage in the case where the form data is parsed. This happens due to storing the raw body and also the parsed form parameters, where previously the raw body was discarded when form parameters were parsed. The behavior to store the raw body when parsing should be optional. |
@lwcolton Good point, I'll put that into a |
Also, it's worth noting that |
Option added as above and rebased onto upstream master |
@falconry/owners Are multiple commits in a PR OK, or should I squash this into one commit? |
They will ask you to squash please and thank you On Sat, Apr 18, 2015, 3:57 PM Rami Chowdhury notifications@github.com
|
Also make sure to follow contribution guidelines when putting together your On Sat, Apr 18, 2015, 4:23 PM Colton Leekley-Winslow lwcolton@gmail.com
|
Thanks @lwcolton -- I think I've complied with most of the requirements, although I'm not sure about where to put the docstrings for the |
Because it is part of the public API I personally would leave the docstring On Sat, Apr 18, 2015 at 10:28 PM, Rami Chowdhury notifications@github.com
Sincerely, Colton Leekley-Winslow |
values. Behaves exactly as the ``params`` attribute. | ||
|
||
raw_body (bytes): The raw POST data, if any. Note that Falcon will not | ||
save the raw body unless the `save_raw_body` option is set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be helpful to explain why it defaults to not buffering/saving the body.
@kgriffs Thanks for the comments! I've made some changes to address your concerns but noticed that the tests are failing after I rebased onto the most recent master -- I'll address these and add a |
return self._param | ||
|
||
@property | ||
def raw_body(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the most surprising thing with Falcon's form parsing has always been that it consumes req.stream and you can't go back to read it again. To mitigate this problem, we could say "hey, go and use this raw_body attribute over here in that case; otherwise maybe use req.stream as usual?" But I'm concerned about that leading to more confusion about when to use what, and generally making apps harder to reason about.
I understand that adding a more general raw_body
feature may be useful in some cases, yet I still hesitate to do this without better understanding the use cases from the community (and how common they are). I want to be sure we don't encourage anti-patterns, and that we can provide something more elegant than using a middleware component or hook (which is how you would otherwise accomplish this.) All things considered, I think it would be better to split the general case out into a separate issue/PR from the one currently under discussion.
For now, as an alternative, what do you think about replacing req.stream with a file-like object that has been seeked back to the beginning, and simply wraps an instance of bytes
that was read in the course of parsing the form body? This would only happen the first time a form param was accessed (thus triggering the consumption and parsing of the original req.stream). Since url-encoded form bodies are generally of reasonable size, this shouldn't cause a significant degradation in memory efficiency. We would have to make sure that the mock stream behaves similarly to the ones provided by, e.g., mod_wsgi, Gunicorn and uWSGI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting alternative -- I think it could work pretty well. I agree that it's less than optimal to add more stuff to the API surface if it doesn't serve a very clear use case; I wonder if it could be as simple as using a BytesIO
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might do it. I just tested on gunicorn under 2.7 and 3.4 and stream.read() returns bytes
, and b''
when the stream is at EOF. wsgiref is similar. You may want to check mod_wsgi and uWSGI as well.
@necaris Will you have some time to work on this during the next week or two? |
Separate form and URL parameter parsing into `request.param` and `request.form_param`. Also, do not consume the POST stream unless the data is requested, and make the raw POST data available on the request if the `store_raw_body` option is set. Closes #418
if a query parameter is assigned a comma-separated list of | ||
values (e.g., 'foo=a,b,c'), only one of those values will be | ||
returned, and it is undefined which one. Use | ||
`req.get_param_as_list()` to retrieve all the values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
req.params.get_as_list()
now :)
As we are using lazy proxy objects, here is a suggestion:
And let's keep in mind that we'll need |
body, | ||
keep_blank_qs_values=self.options.keep_blank_qs_values, | ||
) | ||
|
||
self._params.update(extra_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since self._params
no longer contains form params, we will need to modify req.params
to avoid a breaking change.
Perhaps we should allow customization and extension of class Request(object):
param_proxy_class = helpers.URLEncodedParamProxy
# so we can easily custom the param proxy by inheriting from AbstractParamProxy class
# which is neutral to any serialization formats and any validators
...
@property
def form(self):
if self._form is None:
# let param_proxy to handle the unserialization of raw_body but not in Request
self._form = self.param_proxy_class(self, self.raw_body)
return self._form
def set_param_proxy_class(self, klass):
# can also change param_proxy class by instance
self.param_proxy_class = klass
... So why not we decouple |
@necaris Would you still like to follow up the comments of this pull-request? Or I can take over of it since I am interested in and if you don't mind :) |
@philiptzou I don't have a clear plan to implement all the API changes requested recently so I'd be just fine with you taking over this branch 👍 |
@necaris I now understand why you don't have a clear plan... wow I'm now consider to implement ParamProxy as a general lazy proxy but not only support for dictionary API. Since I want to allow future potential that JSON or XML can be parsed too. |
@philiptzou This may not not be the best place to create an extension method for supporting arbitrary body media types. It isn't really a "form" at that point in the sense of the HTML-centric intent of this interface. I'd like to see how this would compare and contrast with parsing body documents via middleware methods or hooks. For now I like @yohanboniface's suggestion of adding @philiptzou Were you still interested in taking this on? In preparation for the imminent Falcon 1.0, I'm considering putting up a PR to just add the BytesIO buffer, since that could be considered a subtle breaking change. Meanwhile, you could work on adding the additional, non-breaking |
@kgriffs: Actually I have an unfinished Alternatively, We can implement a compatible interface of I have not decided yet which one should be implemented. And I don't think I have a clear idea. That's the reason why I need some more discussion but I'm still interested in taking this on. |
I can understand the trepidation, and I don't see a problem with making the buffering optional via Likewise, with such small bodies, I don't think streaming is going to be much of a concern. Since the body will usually fit in one or two TCP packets, all the data will already have arrived by the time the app attempts its first read. Regardless, I've been thinking that we need to step back and re-examine our assumptions. I'm still not sure that I completely understand why people have been wanting to re-parse form bodies. Here are a few options:
Whatever the reason, if the request body were only parsed on-demand (i.e., when the hypothetical With that in mind, it may be better to just go ahead with the plan to disentangle the interfaces for form and query string parsing. Parse params just-in-time (the first time requested), and don't replace req.stream with an instance of BytesIO or similar. If the community feedback still indicates a need for buffering the body, we can cross that bridge when we come to it. |
Another thought on this... perhaps we can provide robust HTML form handling via a standalone add-on (e.g., "falcon-forms"), vs. being part of the core framework. We could add an option to disable parsing form params as currently implemented, in the case that a developer would like to handle it in a different way. |
Update: I plan on discussing this during the PyCon sprints. I'd like to settle on a design direction and get this finished up. |
@kgriffs You previously commented that you liked the idea of lazy-access request attributes for form data, so that on first access the request stream is consumed, used to populate the form data and files request attributes, and not saved or wrapped. This is what I have seen in other frameworks and allows a developer to consume the stream themselves if they wish. Would that be acceptable here? Would love to help this finally get merged |
@lwcolton Cool, thanks for following up on this. I've also been thinking about this and finally sat down today to write up a proposal. I'd love to get your thoughts on what I just posted to #418. I'm going to close this PR since it is so old. If this is something you'd like to work on, let me know. Otherwise we'll see if anyone else has some bandwidth to take and run with it. |
Separate form and URL parameter parsing into
request.param
andrequest.form_param
. Introduces new APIs for accessing the data.Also, do not consume the POST stream unless the data is requested,
and make the raw POST data available on the request.
Closes #418