Revisit the pickle jar procedure #10768

nthiery · 2011-02-10T17:43:21Z

The current pickle jar mechanism has some drawbacks:

We never add new pickles to the pickle jar
We don't know how old pickles in the pickle jar are
We may be testing an old pickle, but not a recent one
Updating specific pickles is a bit tedious

Here is a new proposal:

Pickles will no longer be stored in a .tar.bz2 file but simply as files within the directory extcode/pickle_jar/$VERSION. This will likely increase the on-disk space needed for a Sage install, but will not have a big influence on Sage distributions, since we have an extcode spkg anyway (which is tarred and compressed).
Pickles will be under git control (this will now become possible).
The $VERSION in the directory name refers to the Sage version used to create the pickle. Once a pickle has been made, it will remain in place in that directory, even in subsequent Sage versions (so sage-4.7.2 will contain pickle_jar/4.7, pickle_jar/4.7.1 and pickle_jar/4.7.2).
When making a new release, the release manager will unpickle all old pickles and repickle them with the new Sage version. Whenever a pickle has changed, the new (changed) pickle will be stored in pickle_jar/$NEWVERSION. The old pickle is kept where it was.
sage.structure.sage_object.unpickle_all will check all pickles (old and new).
If some day some pickle rots away and it is decided by consensus to not support unpickling it anymore, then the patch author would simply git remove the old pickle.

CC: @sagetrac-sage-combinat @ohanar

Component: pickling

Issue created by migration from https://trac.sagemath.org/ticket/10768

The text was updated successfully, but these errors were encountered:

jdemeyer · 2011-03-29T07:26:33Z

comment:1

While we're at it, why does the pickle jar need to be a tar.bz2 file as opposed to just a directory in data/extcode/pickle_jar? When distributing the pickle jar, it is contained in the extcode spkg anyway, so I don't see the gain of having an additional layer of tarring.

jdemeyer · 2011-03-29T07:30:19Z

comment:2

One major advantage of not having the tar file would be that the pickle jar could be updated using standard hg commands. This would instantly solve 2 of the 3 complaints:

Using hg log, we would know exactly how old everything is
Updating specific pickles would become as easy as adding a patch to the Sage library.

jdemeyer · 2011-03-29T07:30:47Z

comment:3

Related ticket: #11069

jdemeyer · 2011-03-29T07:41:26Z

comment:4

Nicolas, just to make sure I understand you correctly, is your proposal the following:

Pickle jars are named after the Sage version (i.e. we would have a pickle_jar-4.6.2.tar.bz2 file or a pickle_jar-4.6.2 directory in my proposal).
We always keep the old versions unchanged (so sage-4.7 would still contain pickle_jar-4.6.2).
With every new Sage version, the release manager unpickles pickle_jar-$OLDVERSION, repickles them using the new Sage version and saves them as pickle_jar-$NEWVERSION.

I can see some merit to this proposal, however I would save only the pickles which actually changed. Otherwise you will end up with lots of copies of the same pickle.

nthiery · 2011-03-29T07:57:58Z

comment:5

Replying to @jdemeyer:

One major advantage of not having the tar file would be that the pickle jar could be updated using standard hg commands. This would instantly solve 2 of the 3 complaints:

Using hg log, we would know exactly how old everything is

Updating specific pickles would become as easy as adding a patch to the Sage library.

+1, definitely! Actually I did not suggest it earlier because I was
worrying about the disk space usage, not for the Sage distribution but
for the Sage install. But if there is a consensus that this is well
used disk space, let's go for it.

I was also wondering whether this could possibly slow down
unpickle_all since this would require loading lots of little files
instead of slurping in one large archive. Any clue?

nthiery · 2011-03-29T08:21:42Z

comment:6

Hi Jeroen!

Replying to @jdemeyer:

Nicolas, just to make sure I understand you correctly, is your proposal the following:

I am going to use the occasion to amend a bit the proposal :-)

Pickle jars are named after the Sage version (i.e. we would have a pickle_jar-4.6.2.tar.bz2 file or a pickle_jar-4.6.2 directory in my proposal).

Yes.

We always keep the old versions unchanged (so sage-4.7 would still contain pickle_jar-4.6.2).

Yes. More precisely sage-4.7 would still contain the subset of the
pickles in pickle_jar-4.6.2 that:

still unpickles properly in sage-4.7
differ from the corresponding pickle in 4.7 (and any intermediate version)

With every new Sage version, the release manager unpickles pickle_jar-$OLDVERSION, repickles them using the new Sage version and saves them as pickle_jar-$NEWVERSION.

More precisely: the release manager recreates a fresh pickle jar by running all the sage tests with SAGE_PICKLE_JAR set (as described in unpickle_all). And then removes from pickle_jar-$OLDVERSION those that did not change. An easy thing to script.

I can see some merit to this proposal, however I would save only the pickles which actually changed. Otherwise you will end up with lots of copies of the same pickle.

+1; this is a good refinement of the last point in the ticket description. The comments above should take care of this.

Note that if the pickle_jar for 3.1 and 4.6.2 contain the same pickle X (version numbers just for the example), then I prefer to delete that of 3.1 and keep that of 4.6.2. Indeed, if X does not unpickle anymore with 4.7, then the relevant question is: "is it acceptable to not unpickle in 4.7 a pickle generated by 4.6.2?".

Do you mind rephrasing the ticket description accordingly, and then make a quick call for comments on sage-devel?

Thanks!

Cheers,
Nicolas

jdemeyer · 2011-03-29T08:32:18Z

comment:7

Replying to @nthiery:

Note that if the pickle_jar for 3.1 and 4.6.2 contain the same pickle X (version numbers just for the example), then I prefer to delete that of 3.1 and keep that of 4.6.2.

If we use hg to track the pickles, I actually think it is better not to constantly move pickles from one version to another. So while I understand your point, from a practical point of view, I prefer to keep the pickle in the old directory of the old version.

jdemeyer · 2011-03-29T08:40:18Z

comment:8

Replying to @nthiery:

+1, definitely! Actually I did not suggest it earlier because I was
worrying about the disk space usage, not for the Sage distribution but
for the Sage install.

Currently, the pickle jar contains 1174 files. Assuming each file takes 4kB of actual disk space, this would use a few megabytes. I don't think this is an issue.

I was also wondering whether this could possibly slow down
unpickle_all since this would require loading lots of little files
instead of slurping in one large archive. Any clue?

This would depend very much on the operating system and file system...
But yes, on some systems this will be slower. On the other hand, it could even speed up things by not having to decompress and untar.

AndrewMathas · 2012-10-17T22:32:00Z

comment:10

Hi Nicolas,

I want to add to your proposal that the pickle_jar be properly documented. As far as I am aware, there is currently no documentation on what the pickle jar is for, how it should be used, and what to do when a pickle breaks with

sage -t  devel/sage-sf/sage/structure/sage_object.pyx

for example. A non-trivial example for using register_unpickle_override should also be added.

Secondly, I think that the procedure for adding new pickles to the jar needs to streamlined. Again, I don't believe that it is described anywhere when or how this happens, but I do know that there are many "new" classes which are not represented in the pickle_jar with the consequence that the pickle_jar is unable to check backward compatibility for these classes.

Andrew

vbraun · 2014-01-17T03:10:15Z

comment:11

Do we really put all that into the git repo? The current (incredibly old) pickle jar is about 2MB uncompressed. A new one is likely considerably larger. There are of the order of 10 minor Sage releases every year. I don't know often the pickle changes, but it seems likely that this'll generate on the order of 10MB/year that will be with us forever. The whole git repo is currently <100MB.

nthiery · 2014-01-24T09:49:57Z

comment:12

Hi Volker!

I don't have a good view on the order of magnitudes. Yet, with the proposed protocol, pickles that don't change don't get duplicated between versions, and I'd expect that only a few pickles get changed from one version to the other (especially if we emphasize pickling by construction rather than by internal data structure). A good experiment would be to regenerate a new pickle jar, and see how much we have added to it since last time!

I don't have a strong opinion about whether the pickle jar should be maintained under git or not. If we can affor it, that makes things easier, as changes to the pickle jar can be done within the usual workflow. But if it's too big, it's too big.

Cheers,
Nicolas

nthiery added c: pickling labels Feb 10, 2011

nthiery assigned williamstein Feb 10, 2011

This comment has been minimized.

Sign in to view

vbraun added the s: needs info label Jan 17, 2014

This comment has been minimized.

Sign in to view

seblabbe mentioned this issue Mar 28, 2011

Remove deprecated word objects from the pickle jar #10354

Closed

jdemeyer mentioned this issue Jun 7, 2011

Don't use version of Sage in default pickle directory #11069

Closed

kwankyu mentioned this issue May 20, 2011

Weighted degree term orders added #11316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit the pickle jar procedure #10768

Revisit the pickle jar procedure #10768

nthiery commented Feb 10, 2011

jdemeyer commented Mar 29, 2011

jdemeyer commented Mar 29, 2011

jdemeyer commented Mar 29, 2011

jdemeyer commented Mar 29, 2011

nthiery commented Mar 29, 2011

nthiery commented Mar 29, 2011

jdemeyer commented Mar 29, 2011

jdemeyer commented Mar 29, 2011

This comment has been minimized.

AndrewMathas commented Oct 17, 2012

vbraun commented Jan 17, 2014

This comment has been minimized.

nthiery commented Jan 24, 2014

This comment has been minimized.

Revisit the pickle jar procedure #10768

Revisit the pickle jar procedure #10768

Comments

nthiery commented Feb 10, 2011

jdemeyer commented Mar 29, 2011

jdemeyer commented Mar 29, 2011

jdemeyer commented Mar 29, 2011

jdemeyer commented Mar 29, 2011

nthiery commented Mar 29, 2011

nthiery commented Mar 29, 2011

jdemeyer commented Mar 29, 2011

jdemeyer commented Mar 29, 2011

This comment has been minimized.

AndrewMathas commented Oct 17, 2012

vbraun commented Jan 17, 2014

This comment has been minimized.

nthiery commented Jan 24, 2014

This comment has been minimized.