-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce compression #11
Conversation
…change :serialize to :julia_native
Codecov Report
@@ Coverage Diff @@
## master #11 +/- ##
==========================================
- Coverage 100% 92.68% -7.32%
==========================================
Files 5 5
Lines 54 82 +28
==========================================
+ Hits 54 76 +22
- Misses 0 6 +6
Continue to review full report at Codecov.
|
Why was it renamed? It's not clear that |
I'd suggest |
The problem with So I figured since changing the file version this is a chance to corect it. |
Anyway, as of now this has all the stuff to work with Transcoding stream compressors. |
to :julia_serialize
Right, here are some stats,
|
src/serialization.jl
Outdated
jlso.objects[name] = take!(io) | ||
# need to close buffer so any compression can write end of body stuffs. | ||
close(compressing_buffer) | ||
jlso.objects[name] = buffer.data # can't use take! as stream is now closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems gross, there must be a better way to get the end stuff in without closing the stream
look at
https://github.com/bicycle1885/CodecZlib.jl/blob/a5ec201eba8447c5fd8c941d050ff5fa89ebd28f/src/compression.jl#L148
and
https://github.com/bicycle1885/CodecZlib.jl/blob/a5ec201eba8447c5fd8c941d050ff5fa89ebd28f/src/compression.jl#L148
Default is now |
src/serialization.jl
Outdated
#) | ||
bson = ( | ||
deserialize! = first ∘ values ∘ BSON.load, | ||
serialize! = (io, value) -> bson(io, Dict("object" => value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "object"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is an element of JLSOFile.objects
and we only put it into aDict
here because that is how the BSON API likes it.
This dict is never visible to the User, except when they access the BSON directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds fine, but what changed to make this only necessary now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made bson
and julia_serialize
work with the same interface
BSON used to do Dict(name => value)
which was redundant and the name is already stored as the key to the parent of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, does bson
only accept a Dict
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. There is probably some internal function we could call instead.
But it would also make it much harder to keep loading v1 JLSO files so...
Co-Authored-By: Eric Davies <iamed2@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes seem reasonable, but I do have a suggestion for code readability.
Looks like the tests are failing because of the sample legacy files being loaded. I'd recommend just hard coding the legacy metadata in the test julia code rather than saving files that'll be julia version and architecture dependent. |
I have tests that directly check the legacy metadata but that would not catch changest in how those are interpreted (e.g. changing Not testing that can still load files feels bad. |
You can still test all of those things without needing to save binary files in the git repo. Just manually serialize the old structure to an IOBuffer and try loading it. I feel like the only thing that the binary files might catch are changes to the backend serialization format (e.g., saved using an old version of the bson library). |
Generating the old structure would be super easy to screw up though. A end to end integration test gives me much more confidence that it is correct. |
Also I would really like to know which things when serialized on one platform can't be loaded on another. |
Ok, I am pretty sure this is actually a bug in the JLSO format. So for now I have allowed failures on x86. But in anycase this convinces me that it is completely worth having these kinds of tests. |
O n testing of BSON on data from 32bit and 64 bit systems it handles it fine. |
In that case, I'd recommend saving all the different permutations in a datadeps just for testing rather than storing binary files in the repo. |
I would agree, if there were a bit larger. But they are pretty small really. I am still chasing down what is breaking on 32bit. |
I'm inclined to do it out of principle :) It's easy enough for someone to accidentally slip in a large binary file (especially if we want to have automated benchmarks), so using datadeps would make it more explicit about what the best practice is regardless of file size. |
Ok, I am now satisfied that this is a weird BSON.jl error. |
BSON error now has a PR to fix it open. @rofinn how keen are you on having that test data in a DataDep? |
If you're willing to make an issue to add DataDeps then I think I'm fine to approve. We should probably host these test files in a public S3 bucket. |
Done #16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine to me for now
This PR will close #7
Right now all it does is rename
:serialize
to:julia_native
,and make sure we are all setup to handle version changes.
We should probably have a sample file added to the repo (it i can be very small)
to test we can load formats serialized by old versions.