-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fail to save big gs #32
Comments
This buffer size limitation was introduced by switching to |
I see. Is it worth switching back then? I kept pretty detailed notes on minimization of the |
Yeah, switching back to full version of protobuf will be one quick solution. There are two other alternatives, which require the change of the existing message format
The second approach will be potentially good for concurrent loading as well as efficient sub-loading through Either of the two could still fail theoretically if the single sample reaches the same buffer limit (when the total number of gates are huge and events number is large enough). This probably would not happen practically. (Or I could be wrong on this, given the nature of Anyway, in the short run, I will do the switching. The discussion above is for the record in future. |
I ended up implemented the second distributed approach, i.e. one pb file per sample. Now it saves ok for big gs > save_gs(gs_big, tmp)
Done
To reload it, use 'load_gs' function
> list.files(tmp)
[1] "90b6757a-26ab-4158-bfd2-fb4272fd1054.pb" "s1.h5"
[3] "s1.pb" "s10.h5"
[5] "s10.pb" "s100.h5"
[7] "s100.pb" "s11.h5"
...
[195] "s96.pb" "s97.h5"
[197] "s97.pb" "s98.h5"
[199] "s98.pb" "s99.h5"
[201] "s99.pb" And sub-loading is more efficient than before > system.time(gs1 <- load_gs(tmp, select = c("s1", "s100")))
user system elapsed
2.290 0.068 2.382
> sampleNames(gs1)
[1] "s1" "s100" |
This is great @mikejiang! |
Here is the reproducible example
The text was updated successfully, but these errors were encountered: