Add automatic giant array chunking in msgpack checkpoints. by levskaya · Pull Request #947 · google/flax

levskaya · 2021-01-27T09:38:40Z

msgpack can only support total leaf encoded buffers sizes of max length 2^32-1 Some giant embedding arrays exceed this, so add an automatic reversible array chunking pass to msgpack serialization.

This PR should -not- break compatibility with existing checkpoints.

codecov-io · 2021-01-27T09:49:18Z

Codecov Report

Merging #947 (4ecb68a) into master (836946a) will increase coverage by 0.08%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##           master     #947      +/-   ##
==========================================
+ Coverage   80.93%   81.01%   +0.08%     
==========================================
  Files          55       55              
  Lines        4421     4473      +52     
==========================================
+ Hits         3578     3624      +46     
- Misses        843      849       +6

Impacted Files	Coverage Δ
flax/serialization.py	`86.33% <88.88%> (+1.01%)`	⬆️
flax/linen/module.py	`95.01% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 836946a...4ecb68a. Read the comment docs.

jheek

Looks good just some minor comments

jheek · 2021-01-27T10:16:51Z

flax/serialization.py

I'm always a bit worried about the serialisation logic operating on jax arrays. Should we just move things to numpy at the very start of serialisation?

sure, added a pass to do this first.

jheek · 2021-01-27T10:24:25Z

flax/serialization.py

nit: this loop duplicates the tuple_to_dict and range logic. I think it's nicer to have:

chunks = [flatarr[i: i + chunksize] for i in range(0, flatarr.size, chunksize)] data['chunks'] = _tuple_to_dict(chunks)

yeah, agreed - replaced it.

msgpack can only support total leaf encoded buffers sizes of max length 2^32-1 Some giant embedding arrays exceed this, so add an automatic reversible array chunking pass to msgpack serialization. This PR does -not- break compatibility with existing checkpoints.

google-cla bot added the cla: yes label Jan 27, 2021

levskaya added the pull ready label Jan 27, 2021

levskaya requested review from avital and jheek January 27, 2021 09:40

jheek reviewed Jan 27, 2021

View reviewed changes

levskaya force-pushed the msgpackfix branch from 91f2552 to 4ecb68a Compare January 28, 2021 03:57

copybara-service bot merged commit 09a7c91 into google:master Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add automatic giant array chunking in msgpack checkpoints.#947

Add automatic giant array chunking in msgpack checkpoints.#947
copybara-service[bot] merged 1 commit intogoogle:masterfrom
levskaya:msgpackfix

levskaya commented Jan 27, 2021

Uh oh!

codecov-io commented Jan 27, 2021 •

edited

Loading

Uh oh!

jheek left a comment

Uh oh!

jheek Jan 27, 2021

Uh oh!

levskaya Jan 28, 2021

Uh oh!

jheek Jan 27, 2021

Uh oh!

levskaya Jan 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

levskaya commented Jan 27, 2021

Uh oh!

codecov-io commented Jan 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jheek left a comment

Choose a reason for hiding this comment

Uh oh!

jheek Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

levskaya Jan 28, 2021

Choose a reason for hiding this comment

Uh oh!

jheek Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

levskaya Jan 28, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-io commented Jan 27, 2021 •

edited

Loading