Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support form param parsing #180

Closed
kgriffs opened this issue Sep 12, 2013 · 30 comments
Closed

Support form param parsing #180

kgriffs opened this issue Sep 12, 2013 · 30 comments

Comments

@kgriffs
Copy link
Member

kgriffs commented Sep 12, 2013

Maybe do this as a /contrib hook? Alternatively, role into get_param?

application/x-www-form-urlencoded
multipart/form-data

See also: http://www.w3.org/TR/html401/interact/forms.html#submit-format

@savorywatt
Copy link

I could really use this so I started looking at how bottle.py handles the parsing of the body. Is cgi.FieldStorage not allowed in Falcon? It would be great if this was rolled into get_param so you could just

data = req.get_param('form_name')

Going to take a harder look at this as right now because it directly affects my current project where I'm attempting to use Falcon instead of bottle.py. Suggestions are welcome.

@kgriffs
Copy link
Member Author

kgriffs commented Oct 30, 2013

cgi.FieldStorage could work. I wonder how fast it is?

Reference: http://hg.python.org/cpython/file/2.7/Lib/cgi.py

@andrew-d
Copy link

If you're interested, I've got a library for streaming multipart parsing in Python: https://github.com/andrew-d/python-multipart

Highlights: test coverage is 100%, it supports parsing multipart and form data, and it's very configurable.

@savorywatt
Copy link

@andrew-d that looks interesting. I'm now leaning more towards a general req.form property and only parsing the stream on the first access of the request's form property. The get_param already has the performance tweak to not parse it if the query string doesn't exist so we could do a 'delayed' parsing on the stream till it is needed.

@kgriffs
Copy link
Member Author

kgriffs commented Jan 3, 2014

👍 I like the req.form property idea.

@zzart
Copy link

zzart commented Jan 9, 2014

I like req.form as well. I can help if you need testing.

@savorywatt
Copy link

I'm circling back to this, are we thinking something that would allow you to access the form data like so?

<form>
First name: <input type="text" name="firstname">
Last name: <input type="text" name="lastname">
Letter Text: <input type="text" name="emailtext">
Send Now: <input type="checkbox" name="send">
</form>
first_name = req.form.get('firstname')
last_name = req.form.get('lastname')
email_text = req.form.get('emailtext')
send = req.form.get('send')

print first_name
Ross
print send
True

Essentially create a dictionary styled way of interacting with the form data but doing some smart on demand parsing. The names of the would correspond to keys in the dictionary and their content would be the value?

@kgriffs what do you think?

@kgriffs
Copy link
Member Author

kgriffs commented Feb 13, 2014

Good stuff. I've been thinking about this, and I would love to be able to meet the following goals with whatever design we come up with:

  • Hide the data structure implementation to afford non-breaking modifications to the same.
  • Avoid extra object creation if we can do so without degrading usability.
  • Be consistent with the query params interface.

With the above in mind, I mocked up some permutations based on your example for the sake of discussion:

"""
Using a form object as proposed, shown side-by-side with query param interface
"""

site_name = req.get_param('site', required=True)
limit = req.get_param_as_int('limit') or 10
coordinates = req.get_param_as_list('coordinates', 
                                    transform=float, required=True)

first_name = req.form.get('firstname')
last_name = req.form.get('lastname')
email_text = req.form.get('emailtext')
should_subscribe = req.form.get('subscribe')


"""
Side-by-side example: With new interface for params (additional work item)
"""

# New interface prototype
#
# This would deprecate the req.get_param interface, but not remove
# it until the 0.2 release.
#
# If we decide to go down this path, we will want to also update the 
# header interface to suite (i.e., get_header ===> header).
def param(name, type=str, transform=None, required=False, store=None):
    pass


site_name = req.param('site', required=True)
limit = req.param('limit', int) or 10
coordinates = req.param('coordinates', list, 
                        transform=float, required=True)

first_name = req.form('firstname')
last_name = req.form('lastname')
email_text = req.form('emailtext')
should_subscribe = req.form('subscribe')  # bool

"""
Without new interface for params
"""

site_name = req.get_param('site', required=True)
limit = req.get_param_as_int('limit') or 10
coordinates = req.get_param_as_list('coordinates', 
                                    transform=float, required=True)

first_name = req.form('firstname')
last_name = req.form('lastname')
email_text = req.form('emailtext')
should_subscribe = req.form('subscribe')  # bool

"""
Another option for forms, but doesn't allow for future additions, such as transform support
"""

first_name = req.form['firstname']
last_name = req.form['lastname']
email_text = req.form['emailtext']
send = req.form['send']

What do you think?

@kgriffs
Copy link
Member Author

kgriffs commented Mar 24, 2014

Discussed on IRC with sebasmagri, and we think it would make sense to just reuse the req.get_param interface to access form params. The two types of params are very similar, and unifying the interface would afford hooks that support other content types besides forms.

@sebasmagri
Copy link
Contributor

I would like to note that the req.param interface looks much better for me, so it would be nice just to use it for form parameters also.

@kgriffs
Copy link
Member Author

kgriffs commented Mar 31, 2014

Let me take a stab at implementing this under the existing req.get_param interface. Also, I created an issue to track the proposal for improving that interface in 0.3 (See also #242)

@kgriffs kgriffs self-assigned this Mar 31, 2014
@savorywatt
Copy link

Kurt,

I think going the get param route makes a lot of sense. If you want to get
a branch out there I'd be happy to work off that. I could help with test
coverage once you have something.

Ross
On Mar 31, 2014 2:55 PM, "Kurt Griffiths" notifications@github.com wrote:

Let me take a stab at implementing this under the existing req.get_paraminterface. Also, I created an issue to track the proposal for improving
that interface in 0.3 (See also #242#242
)

Reply to this email directly or view it on GitHubhttps://github.com//issues/180#issuecomment-39133519
.

kgriffs pushed a commit to kgriffs/falcon that referenced this issue Apr 1, 2014
@kgriffs
Copy link
Member Author

kgriffs commented Apr 1, 2014

I tried this out to see what it would look like (WIP: no tests, needs try blocks around cgi.FieldStorage calls):

https://github.com/kgriffs/falcon/blob/issues/180/falcon/request.py#L543

TBH, I'm still on the fence about this. On the one hand, overloading the get_param interface means you can get away with supporting form submissions and direct API requests with the same line of code in the responder; on the other hand, if something goes wrong, it may not be immediately obvious where the errant param came from.

What does everybody think about this WIP? If we like it, @hendricksonrw has volunteered to help with test coverage (I think we'll need a large, dedicated test module to do this justice).

@kgriffs
Copy link
Member Author

kgriffs commented Apr 3, 2014

Hmm, I'm starting to have second thoughts on this. What if we were to just land PR #172 and then contribute a before hook to Talons that uses https://github.com/andrew-d/python-multipart or something to handle file uploads?

Also, I was thinking, supporting multipart/form-data for file uploads directly in the Falcon "kernel" is going to be really hard to do efficiently without either taking a 3rd-party dependency or writing some Falcon-specific code. cgi.FieldStorage is OK, but I was looking at the code and it could be a lot better.

@kgriffs
Copy link
Member Author

kgriffs commented Apr 7, 2014

After discussing this proposal on IRC, I think it is the right way to go. Unless anyone objects in the next day or two, I will get PR #172 merged and then look at contributing something to talons.

kgriffs pushed a commit that referenced this issue Apr 14, 2014
This patch adds support for merging the body of an
'application/x-www-form-urlencoded' request into the params collection
so they can be queried via get_param & Co.

Credit goes to sorenh for the originaly idea and POC. Thanks Soren!

Partially implements issue #180
@kgriffs
Copy link
Member Author

kgriffs commented Apr 15, 2014

OK, only thing left now is to write a Talons hook for processing file uploads from a form post.

@kgriffs kgriffs modified the milestones: 0.3, 0.2 Apr 15, 2014
@kgriffs kgriffs modified the milestones: 0.4, Future-1 Dec 16, 2014
@kgriffs
Copy link
Member Author

kgriffs commented Dec 16, 2014

Question: If we can do this without external dependencies (using the cgi module), is there any reason not to just implement this directly in falcon.Request?

@richardolsson
Copy link
Contributor

Does the above (82ae94f in particular) not mean that it becomes impossible to POST form data to a URI with a query string, without one overwriting the other? This would be an edge case obviously, but I don't see a reason for sacrificing such functionality.

As an example, consider this (HTML) form:

<form action="/path/to/endpoint?foo=bar">
  <input name="foo" value="baz">
</form>

If both querystring and form variables are accessed using the same method on Request, how would you distinguish the two foo fields? Is this illegal in any way?

To answer the question of form data and/or file uploads directly in Request, IMO that seems fine as long as it doesn't have external dependencies and means no (or negligible) performance hit.

@kgriffs
Copy link
Member Author

kgriffs commented Jan 6, 2015

As currently implemented, the form data overwrites query string values in the case of a collision. If we wanted to separate them, we would need to introduce a parallel interface, perhaps something like this:

req.form.get_param
req.form.get_param_as_int
req.form.get_param_as_list

# etc.

This makes Request a bit more complicated, but may be worth it if we can find a good example of when an app/service would need to distinguish between fields submitted through a form vs. a query string.

@ex-nerd
Copy link
Contributor

ex-nerd commented Jan 6, 2015

Could also create separate accessors for GET, POST, and then a default prioritized combination like what Falcon does now.

There's definitely a use-case for retrieving lists of parameters with the same name (regardless of GET vs POST), e.g. <select multiple>

@dukedougal
Copy link

It's useful to have the form fields parsed, but it should be into a separate data structure to req.params and very importantly, it should not consume the POST data such that POST data can no longer be read. Perhaps such parsing should be optional or explicitly executed or something. My application needs the POST data.

@kgriffs
Copy link
Member Author

kgriffs commented Feb 10, 2015

it should not consume the POST data such that POST data can no longer be read. Perhaps such parsing should be optional or explicitly executed or something.

Seems like a reasonable approach to only consume the request stream "just in time" when the form data attribute/helper (however we end up implementing it) is first called.

Alternatively, we could buffer req.stream to make it seekable, but that would bring additional complexity and overhead (req.stream is normally a direct reference to environ['wsgi.input']). I'd personally rather avoid doing that if we can.

@kgriffs
Copy link
Member Author

kgriffs commented Mar 6, 2015

See also #418

@kgriffs
Copy link
Member Author

kgriffs commented Mar 6, 2015

Closing since remaining work is captured in #418.

@kgriffs kgriffs closed this as completed Mar 12, 2015
@kgriffs kgriffs removed this from the 0.3 milestone Mar 12, 2015
@dukedougal
Copy link

I'm not sure where this ended up, but my question is, am I both able to get Falcon to parse the form data, AND also get access to the POST data for my own purposes?

If I can only do one of these things then that's not an ideal outcome.

@kgriffs
Copy link
Member Author

kgriffs commented Mar 27, 2015

There's an issue (#418) to modify how this works for the 0.3 release. The current plan is to create a separate interface for form params vs. query string params. The request body will no longer be automatically consumed, only being parsed the first time a form param method is called.

In your case, it sounds like you would like to have Falcon parse the form params but also have access to the raw string? In the case of application/x-www-form-urlencoded, there shouldn't be much data to buffer, so I'd be open to supporting it.

@dukedougal
Copy link

Yes I need both to be able to parse the form data and get access to the raw POST data.

@kgriffs
Copy link
Member Author

kgriffs commented Apr 1, 2015

OK, thanks for confirming. I added a note to #418 so we don't forget.

@kgriffs kgriffs removed their assignment Oct 12, 2015
@TomIsYourName
Copy link

I saw #418 still open, so the issue still remain?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants