Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.0 feature request: Require that sessions are JSON serializable #2709

Closed
digitalresistor opened this issue Jul 17, 2016 · 22 comments · Fixed by #3413
Closed

2.0 feature request: Require that sessions are JSON serializable #2709

digitalresistor opened this issue Jul 17, 2016 · 22 comments · Fixed by #3413

Comments

@digitalresistor
Copy link
Member

Currently the ISession interface requires that that keys and values of a session must be pickleable.

My proposal is to drop that requirement and simply state that all keys and values should be JSON serializeable. We should still give users the option to use the pickle serializer, but pick a more safe default.


Cookie based sessions right now by default use a PickleSerializer and then the cookie is protected using a secret and HMAC hashes. However if an attacker was able to get the the secret, they could potentially use the unpickle methods to execute code in the context of the running Python process.

@mmerickel
Copy link
Member

If JSON supported datetime this would be a no brainer for me.

@ztane
Copy link
Contributor

ztane commented Jul 17, 2016

@digitalresistor
Copy link
Member Author

@mmerickel there's gotta be a way to solve that problem though, with custom serializer/deserializer perhaps?

@ztane I don't want to add a 3rd party library to Pyramid core.

@mmerickel
Copy link
Member

@bertjwregeer deserialization is where things get shady. On the serialization side you basically end up marshaling a custom syntax into a string, or go even further and instead of a straight json dump you instead iterate through and build out private structures containing the type information. This is all possible but then there's a fine line between this and basically re-inventing pickle.

I'm not ruling anything out right now but I think it'd ideally be something general purpose, pluggable with custom types and not in pyramid's core. This way people could use it in their own session factories and they could add serializers for types like datetime and any other custom objects.

@digitalresistor
Copy link
Member Author

I was suggesting we support only the lowest common denominator. Basically ISession would state that all content to be stored in the session is to be something that can be JSON serialized. I was also thinking we could make an exception for datetime (although I don't use those in sessions...)

If you want to have an ISession that allows for complex objects it can still be implemented of course, just like you can currently take ISession and use a JSON serializer instead.

I was simply asking for the default to be changed.

@mmerickel
Copy link
Member

Yeah... that's not actually helpful for creating a session ecosystem though. Addons need to be able to use the session without worrying about the specific implementation. That's why pickle is so great right now - addons don't need to care about the serialization format and they can throw datetimes in there and get them back. If we switch to JSON in the way you suggested, they can no longer know what they will get back for any non-primitive types. The addon will need to ask itself... can I even store this object? What will I get back, if it even takes it? That's not an improvement.

@digitalresistor
Copy link
Member Author

IMHO storing anything but strings or small bits of data in the session is something that shouldn't be encouraged anyway.

Maybe there should be a way for the session to be interrogated before app startup, so that an extension can ask it what the serializer is, and fail out if it doesn't provide certain guarantees. This way for most cases deployers can be safer (and not possible allow an RCE if their secret is leaked), yet if required they can switch to a different serializer to support their add-ons and the ecosystem around it.

@jvanasco
Copy link
Contributor

jvanasco commented Sep 7, 2016

I switched from JSON to Msgpack a while back to minimize the size of server-side sessions in Redis (Pickle is FAST for builtins and standard-lib types, but the payload consumes more bytes than desired when using a LRU cache memory store).

The standard msgpack and json libs have similar approaches and implementations to serialization, and I will underscore mmerickel's concern that compatibility is an issue.

The concern isn't just for custom data types which might be stored in a session, but also

  1. not all builtins survive a decode(encode(payload)) roundtrip. for example, lists and tuples will both become a list.
  2. none of stdlib datatypes are handled, not just datetime. i.e. Decimal is out.

In terms of loads, dealing with the object_pairs_hook/object_hook is painfully slow as everything get inspected. It becomes a significant performance hit.

If you were to implement something / require a restriction... it would make sense to have a default JSON serializer that can be over-ridden by the user, and/or support an API where 3rd party plugins can register (de)serialization formats for non-standard types. You really don't want to support serialization for any unused object types.

(FWIW, I found encoding datetime&date as epoch seconds was the fastest option for deserialization.)

@dstufft
Copy link
Contributor

dstufft commented Nov 8, 2016

FWIW Django switched the default serializer to JSON (but kept the ability to set a Pickle serializer) and afaik apps just dealt with it and it hasn't caused any problems.

@mmerickel
Copy link
Member

FTR I don't have a problem making this switch. I think there are several cons regarding json's limited type support but I think overall it's a win.

@dstufft
Copy link
Contributor

dstufft commented Nov 19, 2016

FWIW It might also make sense to use msgpack and use custom types to represent common Python datatypes to allow them to still be serialized. This works a bit better in msgpack than it does in JSON because msgpack has type information in it's serialization format so you can register your own types (represented by an integer) and when it deserializes you get that integer + the data back out. That avoids needing to do things like guess about datetimes or what have you.

Something like:

import enum
import datetime
import struct

import msgpack

class Types(enum.IntEnum):
    datetime = 1

def default(obj):
    if isinstance(obj, datetime.datetime):
        return msgpack.ExtType(
            Types.datetime.value,
            struct.pack(">I", int(obj.timestamp())),
        )
    raise TypeError("Unknown type: %r" % (obj,))

def ext_hook(code, data):
    type_ = Types(code)

    if type_ == Types.datetime:
        return datetime.datetime.fromtimestamp(struct.unpack(">I", data)[0])

    return msgpack.ExtType(code, data)

data = {"foobar": datetime.datetime.now()}
packed = msgpack.packb(data, default=default, use_bin_type=True)
unpacked = msgpack.unpackb(packed, ext_hook=ext_hook, encoding="utf-8")

# \/ This actually fails because milliseconds don't roundtrip via timestamp
#     but you get the idea.
assert data == unpacked

@digitalresistor
Copy link
Member Author

Is msgpack in the stdlib? I'd prefer not to pull in another dependency. I'd be okay with shipping an optional session implementation that uses msgpack, but I don't think it's a great idea for core.

@jvanasco
Copy link
Contributor

msgpack isn't in the stdlib and there are a few implementations.

fwiw, it's much faster and more compact than json. if you stick to the core Python data types, it outperforms Pickle's speed on serialization and deserialization by a lot. the hooks for encoding/decoding other types are rather slow though, and each additional non-standard element introduces a performance hit. IIRC think 1-2 datetime elements is faster than cpickle, 3+ are increasingly slower.

@digitalresistor
Copy link
Member Author

I am a -1 on setting the default to something that requires installing another dependency. I have no problem with making it easily switchable (including back to pickle).

@mmerickel
Copy link
Member

I haven't really taken any time to evaluate msgpack but I will say that if it's a pure-python (or well-supported), trustable dependency then I'm not against it.

@dstufft
Copy link
Contributor

dstufft commented Nov 23, 2016

msgpack has a C accelerator but it falls back to pure Python when that can't compile (or on PyPy).

@ztane
Copy link
Contributor

ztane commented Nov 23, 2016

I love msgpack, it is awesome especially if you need to interface with C code, or if you need to dump shtloads of data. However I don't think it adds anything over JSON in this case, especially without speed-ups.

If serialization of complex object structures were needed, then I'd rather have https://pypi.python.org/pypi/cbor2. Supports cycles. Pure python.

@mcdonc
Copy link
Member

mcdonc commented Jul 3, 2018

FWIW, I am +1 on breaking bw compat in 2.X, moving to an interface that mandates that session data be JSON serializable. We could provide a PickleSession bw compat shim and instructions on how it would be used and point to these instructions when a deserialization or serialization error is raised within the default cookie serializer.

@jvanasco
Copy link
Contributor

jvanasco commented Jul 3, 2018

I like @mmerickel's initial idea of defaulting to JSON but being pluggable. this would allow people to switch to pickle/msgpack/etc or even specify a json encoder that can handle datetime.

while i understand his later concern over 'what happens if a 3rd party plugin needs to support X/Y/Z' -- that's what unittests are for, and the plugin can recommend encoder/decoders.

@mmerickel
Copy link
Member

mmerickel commented Jul 3, 2018

Yeah it sounds like Pyramid would define JSON data types as the target and people are free to use something more strict (like pickle). The downside for any addon / app that wanted richer types (like what pickle provided) would be that they may need to document more clearly the types of objects they store in the session and the session impl would then need to try to accommodate that but it'd be "not pyramid's problem".

@mcdonc
Copy link
Member

mcdonc commented Jul 3, 2018

@jvanasco what you describe is pretty much already the case, it's just that technically a session implementation currently must support the serialization of all pickleable objects to meet the API. In 2.0, it will be changed such that it must support only the serialization of all JSON-serializable objects, and specialized sessioning implementations can abide by less strict rules, as Michael said.

@mmerickel
Copy link
Member

See #3353 for how I deprecated pickle-based sessions in pyramid 1.10 in order to provide users with some info about the upcoming change in 2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants