Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonpickle 2.0.0 fails to decode pickled files created by jsonpickle 1.4.1 #364

Closed
Verequus opened this issue Oct 13, 2021 · 15 comments · Fixed by #372
Closed

jsonpickle 2.0.0 fails to decode pickled files created by jsonpickle 1.4.1 #364

Verequus opened this issue Oct 13, 2021 · 15 comments · Fixed by #372

Comments

@Verequus
Copy link

Verequus commented Oct 13, 2021

I've been attempting to upgrade the jsonpickle package in our project, where we store a number of files in pickled format. It happens that in some UTs for that area their evaluation failed, when I only change the jsonpickle package to 2.0.0. For 1.5.2 it still works. I also locally tested with the real data, in case it would be case of overly specific UTs. But those ended up with corrupted JSON data. As one can see in the screenshotRampdown-pickle-issue, instead of

"previousFeatureTeam": {
        "py/id": 1
      },

on the left side created with 1.5.2, the version created by jsonpickle suddenly creates a list and effectively doubles the output by reusing existing data (but even that seems corrupted).

I suspect that the format changed or that maybe 2.0.0 introduced a regression, but I couldn't find any information on what actually changed for 2.0.0. Do I need to do something special for conversions from older data?

@Theelx
Copy link
Contributor

Theelx commented Dec 17, 2021

Sorry for the late response, I forgot that I didn't have this repo watched, so I didn't get notifications:
#351 was where we noticed a backwards-compatibility issue with 1.5.1, so we released 1.5.2 to try and fix that. 2.0.0 is the release where we went ahead with the changes, because they fixed a bug (I don't remember the specific issue with the bug, but it'd be in CHANGES.rst).

Essentially, according to SemVer 2.0, which this project follows, incrementing the major version number (in this case from 1 to 2) signals a backwards-incompatible change. So yeah, 2.0.0's format changes a bit. If you still need help migrating (again, I'm very sorry for the late response), I can see if I can write a small migration script.

@Verequus
Copy link
Author

Yes, having such script would be very helpful. Not just for my project, but for anyone who would want to keep the old data available. In particular, since it is not possible to have both jsonpickle modules loaded at once (which I considered as possible migration path).

@Theelx
Copy link
Contributor

Theelx commented Dec 19, 2021

Small update: I believe it's possible to fix this by adding a compatibility mode to the Unpickler.restore_dict method that controls whether a function is called or not. Specifically, this line:

self._mkref(data)

Adding an option to toggle it on or off would allow the unpickler to switch between v2 behavior or v1 behavior for unpickling, and then the data could be re-encoded in the v2 format.

However, I want to preemptively state that I don't think it's a good idea to modify the pickler to allow encoding in the v1 format because enabling this current behavior fixed #255. Thoughts @davvid?

@Verequus
Copy link
Author

My 2 cents, I agree with not allowing v1 encoding. Being able to decode v1 data like that sounds like a great option though.

@davvid
Copy link
Member

davvid commented Dec 19, 2021

This sounds like it could be a useful option. We don't currently provide any tools, but a python -m jsonpickle.tool.convert_to_v2 <filename> could be a convenient command-line interface to add as well, but that'd be a cherry on top for sure.

@Theelx
Copy link
Contributor

Theelx commented Dec 20, 2021

This sounds like it could be a useful option. We don't currently provide any tools, but a python -m jsonpickle.tool.convert_to_v2 <filename> could be a convenient command-line interface to add as well, but that'd be a cherry on top for sure.

Regarding a CLI tool, I think that that's possible, but it'd be a bit clumsy when the user could just call an API function like py jsonpickle.decode_from_v1(pickled_string, *flags), use it, and re-encode it in the default v2 format like normal when they're done using it. Plus, a CLI tool would make it hard to pass classes and context arguments when decoding (since the command line doesn't have access to arbitrary Python objects), and other arguments might be affected too. I'd be interested to here a out-in-the-wild user's opinion though, what do you think @Verequus ?

Sorry if this turns out with poor formatting, I'm on my phone.

@Verequus
Copy link
Author

Not that familiar with the internals, but if access to Python classes is needed, could some path given to import everything?

@davvid
Copy link
Member

davvid commented Dec 20, 2021

Super valid points @Theelx a CLI tool is not a good idea in hindsight. We definitely don't need a CLI interface, and those points highlight that adding one incurs an entire category of maintenance burden and issues that are best avoided altogether.

decode_from_v1 sounds like a sensible approach to me since it should then be straightforward for users to write their own site-specific tool for doing conversions if they need it.

For simple cases this conversion tool might look like jsonpickle.encode(jsonpickle.decode_from_v1(json_string)) but that doesn't handle the general case. The general case is that the encode(...) arguments must match what the project is doing internally when they call encode(...), which we can't know in advance, so the CLI idea is a non-starter. Thanks for the reality check, and sorry for the distraction from the main topic.

@Theelx
Copy link
Contributor

Theelx commented Dec 21, 2021

Thanks for the reality check, and sorry for the distraction from the main topic.

Ah your CLI idea was absolutely on topic, since we want to make it as easy as possible for users to migrate, and a CLI could work quite well for some users if we didn't have so many options to deal with!

Anyway, I'm not sure whether the solution will be to make a jsonpickle.decode_from_v1(string_to_decode, *decoding_flags) function or to add a from_v1=boolean argument to jsonpickle.decode. I'm leaning towards the former because it'll be more explicit about what the decoder is doing, easier to find+replace, and also not clutter the decode function with more arguments, but the latter will reduce code duplication, and also kinda fits with what seems like the intended ethos of jsonpickle: add a flag for everything.

Edit: Both approaches would end up setting a class variable named something like v1_decoding on Unpickler, which the _restore_dict function would check when deciding whether to call _mkref. What I'm wondering is whether to present this option to the user as its own function or as an argument to decode.

@Theelx
Copy link
Contributor

Theelx commented Dec 21, 2021

@Verequus Can you send the specific data + code that produced that errant stream (or a minimal reproducible example)? I've been trying to reproduce the issue by comparing both versions so that I can have data to test the fix on, but I can't seem to use any combination of dictionaries and class objects to achieve that.

It seems to need "py/id" to show up as a tag, and that only happens when there's a cyclical reference (detected by _mkref returning False when encoding), but I have never created a cyclical reference in the context of jsonpickle before, so I'm actually stumped as to how to do that.

@Theelx
Copy link
Contributor

Theelx commented Dec 21, 2021

Based on the blacking out of data in the screenshot you sent, I assume that you have some private data that this is running on. In that case, can you git clone jsonpickle, apply this diff on jsonpickler/unpickler.py, pip install your local version (pip install . in the cloned directory), run it on the data, and let us know if the issue is fixed?
The line numbers may be off on this diff, I can't remember if I have any uncommitted changes in this directory from a long time ago.

diff --git a/jsonpickle/unpickler.py b/jsonpickle/unpickler.py
index 45eaed7..7c08be3 100644
--- a/jsonpickle/unpickler.py
+++ b/jsonpickle/unpickler.py
@@ -17,7 +17,8 @@ from .backend import json
 
 
 def decode(
-    string, backend=None, context=None, keys=False, reset=True, safe=False, classes=None
+    string, backend=None, context=None, keys=False, reset=True, safe=False,
+    classes=None, v1_decode=False
 ):
     """Convert a JSON string into a Python object.
 
@@ -45,7 +46,7 @@ def decode(
     36
     """
     backend = backend or json
-    context = context or Unpickler(keys=keys, backend=backend, safe=safe)
+    context = context or Unpickler(keys=keys, backend=backend, safe=safe, v1_decode=v1_decode)
     data = backend.decode(string)
     return context.restore(data, reset=reset, classes=classes)
 
@@ -121,10 +122,11 @@ def _obj_setvalue(obj, idx, proxy):
 
 
 class Unpickler(object):
-    def __init__(self, backend=None, keys=False, safe=False):
+    def __init__(self, backend=None, keys=False, safe=False, v1_decode=False):
         self.backend = backend or json
         self.keys = keys
         self.safe = safe
+        self.v1_decode = v1_decode
 
         self.reset()
 
@@ -540,9 +542,11 @@ class Unpickler(object):
     def _restore_set(self, obj):
         return {self._restore(v) for v in obj[tags.SET]}
 
-    def _restore_dict(self, obj):
+    def _restore_dict(self, obj, v1_decode=False):
         data = {}
-        self._mkref(data)
+        # We can remove v1_decode when it's time for jsonpickle 3
+        if not v1_decode:
+            self._mkref(data)

@Verequus
Copy link
Author

Since I'm on vacation, I can't do it myself until middle January. But I'll tell my colleagues how to check this and hopefully we get faster feedback.

@Theelx
Copy link
Contributor

Theelx commented Dec 22, 2021

Ok, thanks! I should add for completeness' sake, that applying that diff and pip installing alone won't be enough, your colleagues will also need to pass v1_decode=True to the decode function.

@Verequus
Copy link
Author

I've got feedback. It seems that there is a small bug with your patch:

  •    if not v1_decode:
    

should be

  •    if not self.v1_decode:
    

Otherwise, the failing UT works again as expected. If you could publish a 2.0.1 with that fix, I'd appreciate it.

@Theelx
Copy link
Contributor

Theelx commented Dec 23, 2021

Ohhh, yes thanks! I don't have permissions to publish to PyPi, but I'll commit the fix and hopefully @davvid will have time to publish it.

Edit: I'm going to fix it with a new flag to decode, to minimize code duplication.

Edit 2: I'm also not including the unneeded v1_decode parameter for _restore_dict that I included in the diff, in the final PR, as it's unused. I'm also going to make a pull request instead of committing directly so that tests can run.

@Theelx Theelx closed this as completed Dec 23, 2021
bors bot added a commit to meilisearch/meilisearch-digitalocean that referenced this issue Jan 24, 2022
247: Bump jsonpickle from 2.0.0 to 2.1.0 r=alallema a=dependabot[bot]

Bumps [jsonpickle](https://github.com/jsonpickle/jsonpickle) from 2.0.0 to 2.1.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/jsonpickle/jsonpickle/blob/main/CHANGES.rst">jsonpickle's changelog</a>.</em></p>
<blockquote>
<h1>v2.1.0</h1>
<pre><code>* Python 3.10 is now officially supported. (+376)
* Benchmarks were added to aid in optimization efforts.  ([#350](jsonpickle/jsonpickle#350)) (+352)
* ``is_reducible()`` was sped up by ~80%.  (+353) (+354)
* ``_restore_tags()`` was sped up by ~100%. Unpickling items
  with a lot of tuples and sets will benefit most. Python 2 users
  and users deserializing pickles from jsonpickle &lt;= 0.9.6 may see
  a slight performance decrease if using a lot of bytes, ref,
  and/or repr objects. (+354)
* ``is_iterator()`` was sped up by ~20% by removing an unnecessary
  variable assignment. (+354)
* ``jsonpickle.decode`` has a new option, ``v1_decode`` to assist in
  decoding objects created in jsonpickle version 1. ([#364](jsonpickle/jsonpickle#364))
* The ``encode()`` documentation has been updated to help sklearn users.
* ``demjson`` has been removed from the test suite. (+374)
* ``SQLALchemy&lt;1.2`` is no longer being tested by jsonpickle.
  Users of sqlalchemy + jsonpickle can always use 1.2 or 1.3.
  When jsonpickle v3 is released we will add SQLAlchemy 1.4 to
  the test suite alongside removal of support for Python 3.5 and earlier.
</code></pre>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/jsonpickle/jsonpickle/commit/1594d3fdb6717cb06d05c79e46f93330cc38eb36"><code>1594d3f</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/jsonpickle/jsonpickle/issues/378">#378</a> from jsonpickle/fix-test-warnings</li>
<li><a href="https://github.com/jsonpickle/jsonpickle/commit/1591055228f2d477599cbf8ed148dc842edf7094"><code>1591055</code></a> Add missing dtypes</li>
<li><a href="https://github.com/jsonpickle/jsonpickle/commit/66649f09e039f014cf1ce23d4d1d57d8d4018ddd"><code>66649f0</code></a> Fix test warnings</li>
<li><a href="https://github.com/jsonpickle/jsonpickle/commit/2902ddb136cda0f09058cb5c95f8f88ae1563b7f"><code>2902ddb</code></a> Finish removing text</li>
<li><a href="https://github.com/jsonpickle/jsonpickle/commit/ac7da53048b628526ca06d92a10dc4328e242666"><code>ac7da53</code></a> Remove text</li>
<li><a href="https://github.com/jsonpickle/jsonpickle/commit/c3b9ab011c97244a7c8ddf00ae69fcb67200071a"><code>c3b9ab0</code></a> Update index.rst</li>
<li><a href="https://github.com/jsonpickle/jsonpickle/commit/22354e9cbace297b41b1931fabac5b3a9a9caf16"><code>22354e9</code></a> CHANGES: update release notes for jsonpickle v2.1.0</li>
<li><a href="https://github.com/jsonpickle/jsonpickle/commit/a83d3ae728c3133d00f3f6d5dfc1ccb7eda022d6"><code>a83d3ae</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/jsonpickle/jsonpickle/issues/376">#376</a> from jsonpickle/add-3.10-support</li>
<li><a href="https://github.com/jsonpickle/jsonpickle/commit/88f4610b5537a657c039f7f6edf214bbc019073b"><code>88f4610</code></a> Adjust SQLAlchemy versions tested</li>
<li><a href="https://github.com/jsonpickle/jsonpickle/commit/14dba931f44cd32c38dc26a516c63e7e847bcc08"><code>14dba93</code></a> Adjust SQLAlchemy versions tested</li>
<li>Additional commits viewable in <a href="https://github.com/jsonpickle/jsonpickle/compare/v2.0.0...v2.1.0">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=jsonpickle&package-manager=pip&previous-version=2.0.0&new-version=2.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting ``@dependabot` rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- ``@dependabot` rebase` will rebase this PR
- ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it
- ``@dependabot` merge` will merge this PR after your CI passes on it
- ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it
- ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging
- ``@dependabot` reopen` will reopen this PR if it is closed
- ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants