Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we support querystring / x-www-form-urlencoded messages? #247

Open
jcrist opened this issue Dec 21, 2022 · 4 comments
Open

Should we support querystring / x-www-form-urlencoded messages? #247

jcrist opened this issue Dec 21, 2022 · 4 comments

Comments

@jcrist
Copy link
Owner

jcrist commented Dec 21, 2022

URL querystrings/x-www-form-urlencoded forms are structured but untyped messages. The python standard library has a few tools for encoding/decoding these:

In [2]: urllib.parse.parse_qs("x=1&y=true&z=a&z=b")
Out[2]: {'x': ['1'], 'y': ['true'], 'z': ['a', 'b']}

This is annoying to work with manually because the output is always of type dict[str, list[str]]. This means that:

  • The string values have to be manually cast to the expected types
  • Fields where you expect a single value have to be validated (or only the last value used)
  • Missing required fields and default values have to be manually handled

A library like Pydantic may be used to ease some of the ergonomic issues here, but adds extra overhead.

Since msgspec is already useful for parsing JSON payloads into typed & structured objects, we might support a new querystring encoding/decoding that makes use of msgspec's existing type system to handle the decoding and validation. A lot of the code needed to handle this parsing already exists in msgspec, it's mostly just plumbing needed to hook things together. For performance, I'd expect this to be ~as fast as our existing JSON encoder/decoder.

Proposed interface:

# msgspec/querystring.py

def encode(obj: Any) -> bytes:
    """Encode an object as a querystring.

    This returns `bytes` not `str`, since that's what `msgspec` returns for other encodings.
    """
    ...

def decode(buf: bytes | str, type: Type[T] = dict[str, list[str]]) -> T:
    """Decode a querystring.

    If `type` is passed, a value of that type is returned (or an error is raised).

    If `type` is not passed, a `dict[str, list[str]]` is returned containing all passed query parameters.
    This matches the behavior of `urllib.parse.parse_qs`.
    """
    ...

Proposed encoding/decoding scheme:

  • Nested objects are not supported due to querystring restrictions. We don't try to do anything complicated like rails or sinatra do (i.e. no foo[][key]=bar stuff).
  • A valid type must be a top-level object-like (struct, dataclass, ...) type, mapping fields to value types

The following value types are supported

  • int, float, str, and str-like types (datetimes, ...) map to/from their str representations, quoting as needed
  • bool serializes to "true"/"false". When deserializing, "", "1" and "0" are also accepted (are there other common values?)
  • None serializes as "". When decoding "null" is also accepted.
  • Sequences of the above (e.g. list/tuple/...) map to/from multiple values set for a field. So a field a with value ("x", None, True, 3) would be "a=x&a=&a=true&a=3"
  • All builtin constraints are also supported

Questions:

  • Do the above encodings make sense?
  • Do the restrictions on supported types make sense? In particular, note the no-nested-objects/sequences restriction
  • Are there other options we'd want to expose on encode/decode? The stdlib also exposes a few options that I've never needed to change:
    • max_num_fields to limit the number of fields when decoding
    • separator to change the character used for separating fields (defaults to &).
  • Is msgspec.querystring the best namespace for this format, or is there a better name we could use?
  • Does this seem like something that would be useful to have in msgspec? The intent here is for msgpspec to handle much of the parsing/validation that a typical web server would need to handle in a performant and useful way.
@jcrist jcrist changed the title Shoule we support querystring (x-www-form-urlencoded) messages? Should we support querystring / x-www-form-urlencoded messages? Dec 21, 2022
@jcrist
Copy link
Owner Author

jcrist commented Dec 21, 2022

@provinzkraut
Copy link

Do the above encodings make sense?

They sound very reasonable. I specifically like the aspect of optionally coercing values into common expected types (int/bools and such)

Do the restrictions on supported types make sense?

I'd say so. If users need more complex parsing, they could always implement it on top of msgspecs output.

Is msgspec.querystring the best namespace for this format

This to me depends on the direction you want to take this library to be going. If it's - as you said - to provide general parsing / validation utilities commonly needed for webservers then this would be a good namespace, since it follows the established schema (msgspec.json, msgspec.msgpack) and could be easily extended (e.g. msgspec.multipart)

This brings me to my question regarding msgspecs mission statement:

As already expressed on discord, I would welcome msgpec seeing itself as a "one-stop-shop offering fast, correct and type safe parsing / validation for webserver needs", since this is currently somewhat missing in Python. There are many projects that do parts of it, but often installing 4 different libraries, each with a very narrow scope isn't desirable, so it would definitely fill a niche there.

The question however is, what do you think the scope of this would be? If query strings, and form-urlencoded is supported, I'd say multipart would make sense as well. How about other things like URLs? If msgspec already parses query strings, this would kinda fit in.

@rjdbcm
Copy link

rjdbcm commented Jan 7, 2023

This scheme makes a lot of sense for url queries. I have been using a similar scheme with msgspec to serialize parsed database query tokens to "kwarg-like" collections I can evil eval(), obviously with similar caveats. I like your idea better.

@troyswanson
Copy link

I’m for it for the simple fact that the validation of POST JSON data and query string data should (in my mind) be handled by the same system. This makes producing error messages on bad requests consistent, as well as the interface for passing input data from the request to database operation handlers.

I am currently using Falcon in an experimental system and it has support for plucking items out of a query string one at a time and coercing them into Python objects (e.g.: request.get_param_as_datetime), but it doesn’t have the same kind of constraints that msgspec has (for instance, enforcing a value is time zone aware).

I would appreciate the ability to define a msgspec Struct and decode a query string into that object, or throw errors in the same way as a JSON payload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants