Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.x.x] simplify txhashset zip creation and extraction #2908

Merged
merged 5 commits into from Jul 12, 2019

Conversation

@antiochp
Copy link
Member

commented Jun 21, 2019

This PR aims to simplify and improve a couple of implementation details around our txhashset zip handling.

Primary motivation here was to introduce some flexibility in the set of acceptable/expected files in the txhashset.zip archive.

We don't need the kernel hash file (we can rebuild it from the kernel data file) and we can save approx 45MB by excluding it.

Currently it is hard to exclude it when building the zip file without introducing a lot of code.

This PR makes the list of files more explicit so we could modify this list for protocol version 2 for example.


We currently do the following -

  • Wrap the decompress logic in panic::catch_unwind and register a handler via panic::set_hook to handle unexpected panic scenarios.
  • walk the src dir and zip everything up, being very permissive in terms of what we allow in the zip, this is recursive and handles arbitrary file paths.
  • Prior to this we check_and_remove_files via a regex pattern to clean up unwanted files.
  • When decompressing we filter files again via check_and_remove_files.
  • We attempt to handle both / and \\ path separators.

Proposed Approach

  1. We know exactly which files to include in the zip file when creating txhashset.zip. The only variable part of this is the <hash> prefix on the "rewound" leaf set files for the output and rangeproof MMR.
kernel/pmmr_data.bin
kernel/pmmr_hash.bin
output/pmmr_data.bin
output/pmmr_hash.bin
output/pmmr_leaf.bin.<hash>
output/pmmr_prun.bin
rangeproof/pmmr_data.bin
rangeproof/pmmr_hash.bin
rangeproof/pmmr_leaf.bin.<hash>
rangeproof/pmmr_prun.bin

We do not need to craft a regex to support these. We can simply define a list of file paths. These are the only files that will be included in the zip when creating it. These are the only files that will be extracted from the zip file when receiving it.

  1. Handle potential panic when extracting files from the zip by wrapping the extraction logic in a separate thread and simply checking the join handle result via join(). https://doc.rust-lang.org/std/thread/

Fatal logic errors in Rust cause thread panic, during which a thread will unwind the stack, running destructors and freeing owned resources. While not meant as a 'try/catch' mechanism, panics in Rust can nonetheless be caught (unless compiling with panic=abort) with catch_unwind and recovered from, or alternatively be resumed with resume_unwind. If the panic is not caught the thread will exit, but the panic may optionally be detected from a different thread with join. If the main thread panics without the panic being caught, the application will exit with a non-zero exit code.

Additional improvements -

  • Bump zip-rs to latest 0.5.2
  • The zip file "spec" permits the use of either / or \\ as path separator in the names of files included in the zip file. For portability we can limit this and only use '/' for both Windows and Unix. We do not need to be permissive in terms of handling a variety of path separators. We simply assume '/' and fail if the paths in the zip file do not meet these assumptions.
  • Use start_file_from_path when creating the zip to ensure paths are handled safely.
  • Use BufReader and BufWriter for IO operations involving reading/writing zip files.

TODO -

  • Verify this works on Windows (both reading and writing the zip file)

@antiochp antiochp added this to the 2.x.x milestone Jun 21, 2019

@antiochp antiochp self-assigned this Jun 21, 2019

@antiochp

This comment has been minimized.

Copy link
Member Author

commented Jun 21, 2019

Sample output receiving a zip. We only look for these exact files in the zip and we expect the paths in the zip to match exactly. No attempt will be made to extract anything not matching any of these exact paths.

20190621 13:28:39.278 DEBUG grin_chain::txhashset::txhashset - zip_write on path: "/antiochp/grin/node_mainnet/tmp"
20190621 13:28:39.307 INFO grin_util::zip - extract_files: "kernel/pmmr_data.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/kernel/pmmr_data.bin"
20190621 13:28:39.497 INFO grin_util::zip - extract_files: "kernel/pmmr_hash.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/kernel/pmmr_hash.bin"
20190621 13:28:39.516 INFO grin_util::zip - extract_files: "output/pmmr_data.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/output/pmmr_data.bin"
20190621 13:28:39.607 INFO grin_util::zip - extract_files: "output/pmmr_hash.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/output/pmmr_hash.bin"
20190621 13:28:39.608 INFO grin_util::zip - extract_files: "output/pmmr_prun.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/output/pmmr_prun.bin"
20190621 13:28:40.074 INFO grin_util::zip - extract_files: "rangeproof/pmmr_data.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/rangeproof/pmmr_data.bin"
20190621 13:28:40.144 INFO grin_util::zip - extract_files: "rangeproof/pmmr_hash.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/rangeproof/pmmr_hash.bin"
20190621 13:28:40.145 INFO grin_util::zip - extract_files: "rangeproof/pmmr_prun.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/rangeproof/pmmr_prun.bin"
20190621 13:28:40.145 INFO grin_util::zip - extract_files: "output/pmmr_leaf.bin.000003b18778" -> "/antiochp/grin/node_mainnet/tmp/txhashset/output/pmmr_leaf.bin.000003b18778"
20190621 13:28:40.147 INFO grin_util::zip - extract_files: "rangeproof/pmmr_leaf.bin.000003b18778" -> "/antiochp/grin/node_mainnet/tmp/txhashset/rangeproof/pmmr_leaf.bin.000003b18778"

antiochp added some commits Jun 18, 2019

bump version of zip-rs
cleanup create_zip (was compress)
use explicit list of files when creating zip archive

@antiochp antiochp force-pushed the antiochp:zip_simplify branch from 0052271 to e99573b Jul 9, 2019

@antiochp antiochp changed the base branch from master to milestone/2.x.x Jul 9, 2019

@antiochp antiochp force-pushed the antiochp:zip_simplify branch from 606accb to 7c5d7cd Jul 9, 2019

@DavidBurkett
Copy link
Contributor

left a comment

I can't test at the moment, but we'll want to make sure this gets tested in Windows too before merging. It seems every change we make to this code breaks windows due to the file system differences (path separators, allowed filenames, etc).

@hashmap
Copy link
Member

left a comment

LGTM

// These are the *only* files we will attempt to extract from the zip file.
// If any of these are missing we will attempt to continue as some are potentially optional.
zip::extract_files(txhashset_data, &txhashset_path, files)?;
Ok(())

This comment has been minimized.

Copy link
@hashmap

hashmap Jul 10, 2019

Member

I remember we discussed it before, still, why not to remove ? and the last line?:)

This comment has been minimized.

Copy link
@antiochp

antiochp Jul 11, 2019

Author Member

Just personal preference.

{
    one()?;
    two()?;
    three()?;
    Ok(())
}

reads better to me than -

{
    one()?;
    two()?;
    three()
}

And if you need to reorder those lines or add one at the end you don't need to go reintroducing ? (or forgetting to).

This comment has been minimized.

Copy link
@antiochp

antiochp Jul 11, 2019

Author Member

I have been experimenting with this though recently -

Ok(())
    .and_then(one())
    .and_then(two())
    .and_then(three())?
let res = thread::spawn(move || {
let mut archive = zip_rs::ZipArchive::new(from_archive).expect("archive file exists");
for x in files {
if let Ok(file) = archive.by_name(x.to_str().expect("valid path")) {

This comment has been minimized.

Copy link
@hashmap

hashmap Jul 10, 2019

Member

Nitpick: valid path and all below may look confusing in the log file, looks good in the source code though

@antiochp antiochp changed the title simplify txhashset zip creation and extraction [2.x.x] simplify txhashset zip creation and extraction Jul 11, 2019

@antiochp

This comment has been minimized.

Copy link
Member Author

commented Jul 12, 2019

Going to merge this into the 2.x.x branch now.
We can test on that branch for Windows compatibility.

@antiochp antiochp merged commit 1395074 into mimblewimble:milestone/2.x.x Jul 12, 2019

10 checks passed

mimblewimble.grin Build #20190709.3 succeeded
Details
mimblewimble.grin (linux api/util/store) linux api/util/store succeeded
Details
mimblewimble.grin (linux chain/core/keychain) linux chain/core/keychain succeeded
Details
mimblewimble.grin (linux pool/p2p/src) linux pool/p2p/src succeeded
Details
mimblewimble.grin (linux release) linux release succeeded
Details
mimblewimble.grin (linux servers) linux servers succeeded
Details
mimblewimble.grin (macos release) macos release succeeded
Details
mimblewimble.grin (macos test) macos test succeeded
Details
mimblewimble.grin (windows release) windows release succeeded
Details
mimblewimble.grin (windows test) windows test succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.