Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pickling of matrix_gfpn_dense #23411

Closed
simon-king-jena opened this issue Jul 12, 2017 · 81 comments
Closed

Fix pickling of matrix_gfpn_dense #23411

simon-king-jena opened this issue Jul 12, 2017 · 81 comments

Comments

@simon-king-jena
Copy link
Member

There currently are two problems related with pickling of Matrix_gfpn_dense:

  • When a string is provided as argument to __init__, it is interpreted as the name of a file from which data are read by some MeatAxe command. If the file doesn't exist, it was intended to have an uninitialised matrix (to be properly initialised later). But currently it crashes, which needs to be fixed.
  • The number of bytes used holding the data of a matrix row is machine independent. However, the number of bytes actually used to store the row always is a multiple of sizeof(long), which is machine dependent (at least in theory). Pickling should of course only involve the machine independent memory chunks, but currently it involves the machine dependent chunks.

Component: packages: optional

Author: Simon King

Branch/Commit: ba72aba

Reviewer: Jeroen Demeyer

Issue created by migration from https://trac.sagemath.org/ticket/23411

@simon-king-jena
Copy link
Member Author

@simon-king-jena
Copy link
Member Author

Author: Simon King

@simon-king-jena
Copy link
Member Author

New commits:

4960a81Fix reading mtx-matrix from non-existent file
4e85fcaMake pickling of matrix_gfpn_dense machine independent

@simon-king-jena
Copy link
Member Author

Commit: 4e85fca

@simon-king-jena
Copy link
Member Author

comment:3

Sorry, one commit is missing

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 12, 2017

Changed commit from 4e85fca to f9d878a

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 12, 2017

Branch pushed to git repo; I updated commit sha1. New commits:

f9d878aAdd test for reading from non-existing file; add cimports

@simon-king-jena
Copy link
Member Author

comment:5

Now it builds and tests are passing...

@jdemeyer
Copy link

comment:6
  1. As I mentioned on some other ticket, use bytes instead of str for pickling.

  2. If you claim that this makes pickling machine-independent, it would be good to actually test that. You can hardcode a pickle in a doctest (essentially, the output of dumps()) and then test that.

@simon-king-jena
Copy link
Member Author

comment:7

Replying to @jdemeyer:

  1. As I mentioned on some other ticket, use bytes instead of str for pickling.

You also said I should split my original patchbomb into smaller parts. That's why I wanted to leave the transition from str to bytes for a ticket dedicated to cleaning code.

  1. If you claim that this makes pickling machine-independent, it would be good to actually test that. You can hardcode a pickle in a doctest (essentially, the output of dumps()) and then test that.

OK. Just to be on the safe side: You do not say that I should imitate a pickle string that results from a machine where, say, sizeof(long)==3? Cause otherwise we would never see the deprecation warning.

@simon-king-jena
Copy link
Member Author

comment:8

Replying to @simon-king-jena:

You also said I should split my original patchbomb into smaller parts. That's why I wanted to leave the transition from str to bytes for a ticket dedicated to cleaning code.

PS: Adding such patch would also imply to modify the lines where I import the PyString... functions, which means that there will be further conflicts with #21437.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 13, 2017

Branch pushed to git repo; I updated commit sha1. New commits:

e477948Test meataxe unpickling being machine independent

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 13, 2017

Changed commit from f9d878a to e477948

@simon-king-jena
Copy link
Member Author

comment:10

In the latest commit, I add a test that should demonstrate machine independence of pickling (provided that the patchbots provide a whole lot of different architectures, including little and big endian and different sizes of long). I also add a test that triggers the deprecation warning.

I am not really happy about doing part of the code cleanup here. Would you accept to do all cleanup on a separate ticket. If you do, then please consider this ticket as "needs review".

@simon-king-jena
Copy link
Member Author

comment:11

Setting it to "needs review" in order to allow the patchbots to test the ticket.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 13, 2017

Branch pushed to git repo; I updated commit sha1. New commits:

01e2c0dMark new meataxe test 'optional'

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 13, 2017

Changed commit from e477948 to 01e2c0d

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 17, 2017

Branch pushed to git repo; I updated commit sha1. New commits:

564104eDon't use six.string_types, but basestring
982755fWorkaround: MeatAxe's MatLoad returns NULL on empty filename without setting error

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 17, 2017

Changed commit from 3d94133 to 982755f

@jdemeyer
Copy link

Reviewer: Jeroen Demeyer

@jdemeyer
Copy link

comment:61

As expected, this "machine dependent" doctest may fail depending on the machine:

        sage: t = 'Uq\x82\x00\x00\x00\x00\x00\xa7\x8bh\x00\x00\x00\x00\x00'
        sage: len(t)
        16
        sage: N == mtx_unpickle(MS, 2, 5, t, True)           # optional: meataxe

@jdemeyer
Copy link

comment:62

On a 32-bit system, this gives:

sage -t --long --warn-long 87.3 src/sage/matrix/matrix_gfpn_dense.pyx
**********************************************************************
File "src/sage/matrix/matrix_gfpn_dense.pyx", line 1635, in sage.matrix.matrix_gfpn_dense.mtx_unpickle
Failed example:
    N == mtx_unpickle(MS, 2, 5, t, True)           # optional: meataxe
Exception raised:
    Traceback (most recent call last):
      File "/home/jdemeyer/sage-git/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 509, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/home/jdemeyer/sage-git/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 872, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.matrix.matrix_gfpn_dense.mtx_unpickle[18]>", line 1, in <module>
        N == mtx_unpickle(MS, Integer(2), Integer(5), t, True)           # optional: meataxe
      File "sage/matrix/matrix_gfpn_dense.pyx", line 1712, in sage.matrix.matrix_gfpn_dense.mtx_unpickle (/home/jdemeyer/sage-git/src/build/cythonized/sage/matrix/matrix_gfpn_dense.c:14936)
        raise ValueError(f"Expected a pickle with {FfCurrentRowSizeIo}*{nr} bytes per row, got {lenData} instead")
    ValueError: Expected a pickle with 3*2 bytes per row, got 16 instead
**********************************************************************

@simon-king-jena
Copy link
Member Author

comment:63

Replying to @jdemeyer:

On a 32-bit system, this gives:

sage -t --long --warn-long 87.3 src/sage/matrix/matrix_gfpn_dense.pyx
...
    ValueError: Expected a pickle with 3*2 bytes per row, got 16 instead
**********************************************************************

Thank you for testing on different architectures!

So, what shall we do with that test?

  • On the one hand, that test is about a pickling format that is known to be machine dependent and is therefore deprecated. That's an argument for removing it (which I would like to avoid).
  • On the other hand, I could try to do better and make unpickling work in that case as well.

I think I prefer the second way.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 25, 2017

Branch pushed to git repo; I updated commit sha1. New commits:

78d7b21MeatAxe: Enable unpickling of deprecated pickle created with different machine architecture

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 25, 2017

Changed commit from 982755f to 78d7b21

@simon-king-jena
Copy link
Member Author

comment:65

Done! And I demonstrate that a deprecated pickle can be opened even if it was created on a silly machine with sizeof(long)==9.

@jdemeyer
Copy link

comment:66

Cool test! Now, we are still missing a doctest for the else case, when elif pickled_rowsize < FfCurrentRowSizeIo

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 26, 2017

Branch pushed to git repo; I updated commit sha1. New commits:

ba72abaTest another code path in unpickling meataxe matrices

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 26, 2017

Changed commit from 78d7b21 to ba72aba

@simon-king-jena
Copy link
Member Author

comment:68

Done!

There is now one test where the number of bytes is not divisible by the number of rows and one test where the number is divisible but too small to contain a matrix of the given dimensions. Together with the existing tests, I think it covers all code paths.


New commits:

ba72abaTest another code path in unpickling meataxe matrices

@vbraun
Copy link
Member

vbraun commented Jul 29, 2017

Changed branch from u/SimonKing/fix_pickling_of_matrix_gfpn_dense to ba72aba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants