-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-2467 Fix hex only representation of UUID by allowing for canonical represe… #532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ntation (Shouldn't be too confusing.. doesn't stray from current behaviour)
Thanks for the pull! If I understand correctly, the feature added here is that setting >>> bson.json_util.dumps({'uuid': test_uuid})
'{"uuid": {"$uuid": "e387e6cc3b4511eb9ec800163e0987ed"}}'
>>> bson.json_util.dumps({'uuid': test_uuid}, json_options=canonical_uuid_opts)
'{"uuid": {"$uuid": "e387e6cc-3b45-11eb-9ec8-00163e0987ed"}}' Can you explain the motivation for this change? |
The motivation for the change is to provide consistency between MongoDB drivers. The MongoDB bson.js requires dashes (frontend related code) and I've put in a request to make that more flexible according to the UUID encoding specifications put together by the BSON team (Needs to support dashes, needs to support hex, needs to support URN) and there needs to be a way to set a preference on what is exported. The patch I presented does not support URN but that would be an easy change and an ENUM or preference class might be best going forward. When serializing EJSON data between BSON SDKs a lot of recursive replacement of values related to $uuid keys is required for relaxed encodings. This includes using hex-to-uuid and uuid-to-hex JS modules in the case of working between Python and Javascript. On top of that.. the UUID value as hex is not immediately useful in situations where URLs or content expects a canonical UUID. For instance.. sending EJSON encoded data to a front end where a route is developed at /account/profile/:uuid where dashes are required.. being given only a hex representation requires extra fiddling that could be avoided if the following specification were more closely followed for all MongoDB/BSON drivers. Also.. the UUID default representation on the CLI is canonical UUIDs (dashed) Also see: https://github.com/mongodb/js-bson/blob/master/src/uuid.ts Overall I feel as though EJSON needs to be reviewed between all drivers and a common options component implemented for handling serialization. There's a lot of gotchas with this specific asset. |
I think it would be a breaking change to directly implement the specification in pymongo bson. See: the referred document here as well for more text on how the https://github.com/mongodb/specifications/blob/master/source/uuid.rst#explicit-encoding-and-decoding |
We have addressed compatibility between drivers via the Canonical Extended JSON output format. If you need to serialize BSON to JSON in one language and deserialize it in another then the best option is to use >>> from bson import json_util
>>> test_uuid = UUID('e387e6cc-3b45-11eb-9ec8-00163e0987ed')
>>> json_util.dumps({'uuid': test_uuid}, json_options=json_util.CANONICAL_JSON_OPTIONS)
'{"uuid": {"$binary": {"base64": "44fmzDtFEeueyAAWPgmH7Q==", "subType": "03"}}}' Note that the CANONICAL format sacrifices human readability in favor of improved compatibility between drivers and improved type preservation.
We have already implemented both the UUID and Extended JSON specifications in PyMongo. |
Indeed. I currently use canonical JSON mode in order to avoid potential changes of behavior due to what I have seen in relaxed mode. Pymongo BSON does indeed implement the specification.. however it may be erroneous in consistency as a non binary type between drivers. |
Can you explain what the problem is with using canonical JSON mode to exchange data from Python to JS? It sounds like the UUID type is hard to work with in JS? If so, NODE-1046 might be related. |
When working with hex string representations (via $uuid keys) as well as with canonical mode (more consistent and preferred) a conversion from value to dashed canonical UUID string is required in order to form a simple string representation of a UUID. This could be avoided by allowing for a preferred representation in relaxed mode when $uuid is used rather than $binary. As it stands.. in python when exporting content containing UUIDs I'm recursing through the sent object and doing replacements in order to work around unnecessary processing in frontend code. Honestly.. If I wasn't using UUIDs as objects in order to deal with some functions that require UUID classes.. I would just use strings of my own choosing. |
Ah okay I understand the motivation now. Thanks for explaining in detail! This does jog my memory that when we adding support for parsing One problem I have here is that it seems like even with >>> json_util.loads('{"uuid": {"$uuid": "e387e6cc-3b45-11eb-9ec8-00163e0987ed"}}')
{'uuid': UUID('e387e6cc-3b45-11eb-9ec8-00163e0987ed')}
>>> json_util.loads('{"uuid": {"$binary": {"base64": "44fmzDtFEeueyAAWPgmH7Q==", "subType": "03"}}}')
{'uuid': UUID('e387e6cc-3b45-11eb-9ec8-00163e0987ed')} It seems odd to add a knob in pymongo to workaround a likely temporary pain point in the JS API. Right now I am leaning towards not accepting this PR but I will discuss it with the rest of the Python and Node teams first to get their take. |
In JS it would be a string as native JSON. I won’t need BSON specifically
to deal with UUID in areas of code that aren't BSON aware. This includes
3rd party libs. This includes libs where parameter paths in dot notation
are used to send identifiers to logging and authorization providers
expecting a dashed UUID.
In my frontend code I can use the JSON and BSON as a preference depending
on what I need. Being able to use dashed UUIDs in frontend route
generation directly from a stringy source greatly simplifies things.
Currently I am simply using canonical EJSON and doing all the conversations
and then adding stringy UUIDs to the EJSON object in Python in a
prerendered state for other tools outside of my control to use.
|
Tracked in https://jira.mongodb.org/browse/PYTHON-2467 |
Closing this because the underlying issue is a problem in the node driver (https://jira.mongodb.org/browse/NODE-2962). Instead of adding a new API in pymongo to workaround this issue (which we would need to support for a long time) we will simply fix the issue in the Node driver itself. Also, as mentioned above there is already a workaround for this problem. You can use >>> from bson import json_util
>>> test_uuid = UUID('e387e6cc-3b45-11eb-9ec8-00163e0987ed')
>>> json_util.dumps({'uuid': test_uuid}, json_options=json_util.CANONICAL_JSON_OPTIONS)
'{"uuid": {"$binary": {"base64": "44fmzDtFEeueyAAWPgmH7Q==", "subType": "03"}}}' |
…ntation (Shouldn't be too confusing.. doesn't stray from current behaviour)
Added in canonical_uuid option to json_options. This was the only way I could think to not break current API globally but also allow for this option to exist.