Skip to content

Conversation

whardier
Copy link

…ntation (Shouldn't be too confusing.. doesn't stray from current behaviour)

Added in canonical_uuid option to json_options. This was the only way I could think to not break current API globally but also allow for this option to exist.

>>> test_uuid = uuid.uuid1()
>>> test_uuid
UUID('e387e6cc-3b45-11eb-9ec8-00163e0987ed')
>>> JSON_OPTIONS = bson.json_util.DEFAULT_JSON_OPTIONS.with_options(canonical_uuid=True)
>>> payload = bson.json_util.dumps({'uuid': test_uuid}, json_options=JSON_OPTIONS)
>>> payload
'{"uuid": {"$uuid": "e387e6cc-3b45-11eb-9ec8-00163e0987ed"}}'
>>> bson.json_util.loads(payload)
{'uuid': UUID('e387e6cc-3b45-11eb-9ec8-00163e0987ed')}
>>> 

…ntation (Shouldn't be too confusing.. doesn't stray from current behaviour)
@ShaneHarvey
Copy link
Member

Thanks for the pull! If I understand correctly, the feature added here is that setting canonical_uuid=True changes the $uuid encoding output to include the dashes (-):

>>> bson.json_util.dumps({'uuid': test_uuid})
'{"uuid": {"$uuid": "e387e6cc3b4511eb9ec800163e0987ed"}}'
>>> bson.json_util.dumps({'uuid': test_uuid}, json_options=canonical_uuid_opts)
'{"uuid": {"$uuid": "e387e6cc-3b45-11eb-9ec8-00163e0987ed"}}'

Can you explain the motivation for this change?

@whardier
Copy link
Author

The motivation for the change is to provide consistency between MongoDB drivers. The MongoDB bson.js requires dashes (frontend related code) and I've put in a request to make that more flexible according to the UUID encoding specifications put together by the BSON team (Needs to support dashes, needs to support hex, needs to support URN) and there needs to be a way to set a preference on what is exported. The patch I presented does not support URN but that would be an easy change and an ENUM or preference class might be best going forward.

When serializing EJSON data between BSON SDKs a lot of recursive replacement of values related to $uuid keys is required for relaxed encodings. This includes using hex-to-uuid and uuid-to-hex JS modules in the case of working between Python and Javascript.

On top of that.. the UUID value as hex is not immediately useful in situations where URLs or content expects a canonical UUID. For instance.. sending EJSON encoded data to a front end where a route is developed at /account/profile/:uuid where dashes are required.. being given only a hex representation requires extra fiddling that could be avoided if the following specification were more closely followed for all MongoDB/BSON drivers.

Also.. the UUID default representation on the CLI is canonical UUIDs (dashed)

See: https://github.com/mongodb/specifications/blob/master/source/extended-json.rst#special-rules-for-parsing-uuid-fields

Also see:

https://github.com/mongodb/js-bson/blob/master/src/uuid.ts

Overall I feel as though EJSON needs to be reviewed between all drivers and a common options component implemented for handling serialization. There's a lot of gotchas with this specific asset.

@whardier
Copy link
Author

I think it would be a breaking change to directly implement the specification in pymongo bson. See: the referred document here as well for more text on how the STANDARD UUID output representation expected of EJSON is to be canonical.

https://github.com/mongodb/specifications/blob/master/source/uuid.rst#explicit-encoding-and-decoding

@ShaneHarvey
Copy link
Member

ShaneHarvey commented Dec 14, 2020

Overall I feel as though EJSON needs to be reviewed between all drivers and a common options component implemented for handling serialization. There's a lot of gotchas with this specific asset.

We have addressed compatibility between drivers via the Canonical Extended JSON output format. If you need to serialize BSON to JSON in one language and deserialize it in another then the best option is to use CANONICAL_JSON_OPTIONS (or any other JSONOptions with json_mode=JSONMode.CANONICAL), like this:

>>> from bson import json_util
>>> test_uuid = UUID('e387e6cc-3b45-11eb-9ec8-00163e0987ed')
>>> json_util.dumps({'uuid': test_uuid}, json_options=json_util.CANONICAL_JSON_OPTIONS)
'{"uuid": {"$binary": {"base64": "44fmzDtFEeueyAAWPgmH7Q==", "subType": "03"}}}'

Note that the CANONICAL format sacrifices human readability in favor of improved compatibility between drivers and improved type preservation.

I think it would be a breaking change to directly implement the specification in pymongo bson.

We have already implemented both the UUID and Extended JSON specifications in PyMongo.

@whardier
Copy link
Author

Indeed. I currently use canonical JSON mode in order to avoid potential changes of behavior due to what I have seen in relaxed mode. Pymongo BSON does indeed implement the specification.. however it may be erroneous in consistency as a non binary type between drivers.

@ShaneHarvey
Copy link
Member

Can you explain what the problem is with using canonical JSON mode to exchange data from Python to JS? It sounds like the UUID type is hard to work with in JS? If so, NODE-1046 might be related.

@whardier
Copy link
Author

When working with hex string representations (via $uuid keys) as well as with canonical mode (more consistent and preferred) a conversion from value to dashed canonical UUID string is required in order to form a simple string representation of a UUID. This could be avoided by allowing for a preferred representation in relaxed mode when $uuid is used rather than $binary.

As it stands.. in python when exporting content containing UUIDs I'm recursing through the sent object and doing replacements in order to work around unnecessary processing in frontend code. Honestly.. If I wasn't using UUIDs as objects in order to deal with some functions that require UUID classes.. I would just use strings of my own choosing.

@ShaneHarvey
Copy link
Member

Ah okay I understand the motivation now. Thanks for explaining in detail!

This does jog my memory that when we adding support for parsing {"$uuid": "e387e6cc-3b45-11eb-9ec8-00163e0987ed"} we also discussed whether drivers should support generating that format in Relaxed mode. Relaxed mode seems like a good fit because it prefers human readability, however you are correct that it would be a backwards breaking change (at least to implement by default) so it would require a new knob/toggle/option like you have introduced in this PR. We rejected this idea because we generally prefer not to add knobs.

One problem I have here is that it seems like even with canonical_uuid=True, your JS app will end up with the same Binary object after decoding the extended JSON. In python we decode the following inputs to the same UUID:

>>> json_util.loads('{"uuid": {"$uuid": "e387e6cc-3b45-11eb-9ec8-00163e0987ed"}}')
{'uuid': UUID('e387e6cc-3b45-11eb-9ec8-00163e0987ed')}
>>> json_util.loads('{"uuid": {"$binary": {"base64": "44fmzDtFEeueyAAWPgmH7Q==", "subType": "03"}}}')
{'uuid': UUID('e387e6cc-3b45-11eb-9ec8-00163e0987ed')}

It seems odd to add a knob in pymongo to workaround a likely temporary pain point in the JS API. Right now I am leaning towards not accepting this PR but I will discuss it with the rest of the Python and Node teams first to get their take.

@whardier
Copy link
Author

whardier commented Dec 14, 2020 via email

@pooooodles
Copy link

@pooooodles pooooodles added the tracked-in-jira Ticket filed in Mongo's Jira system label Dec 14, 2020
@ShaneHarvey ShaneHarvey changed the title Fix hex only representation of UUID by allowing for canonical represe… PYTHON-2467 Fix hex only representation of UUID by allowing for canonical represe… Dec 14, 2020
@ShaneHarvey
Copy link
Member

Closing this because the underlying issue is a problem in the node driver (https://jira.mongodb.org/browse/NODE-2962). Instead of adding a new API in pymongo to workaround this issue (which we would need to support for a long time) we will simply fix the issue in the Node driver itself.

Also, as mentioned above there is already a workaround for this problem. You can use JSONMode.CANONICAL or JSONMode.RELAXED to output UUIDs in the "$binary" format. "$binary" is parsable by the node driver:

>>> from bson import json_util
>>> test_uuid = UUID('e387e6cc-3b45-11eb-9ec8-00163e0987ed')
>>> json_util.dumps({'uuid': test_uuid}, json_options=json_util.CANONICAL_JSON_OPTIONS)
'{"uuid": {"$binary": {"base64": "44fmzDtFEeueyAAWPgmH7Q==", "subType": "03"}}}'

@ShaneHarvey ShaneHarvey closed this Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tracked-in-jira Ticket filed in Mongo's Jira system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants