-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for downloading zipped input data from URL #445
Conversation
f8875ad
to
c613a2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Only questions.
fmt::print(fg(fmt::color::red), "\nFile storage folder: {} not found.\n", | ||
cmd.storage_folder.string()); | ||
cmd.exit_code = EXIT_FAILURE; | ||
cmd.data_path_or_url = result["storage"].as<std::string>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it doesn't make sense any more to use a relative path in the config file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it does! I must have just deleted this without looking too closely at it. Oops 🙈
I'll fix it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait. This is a bit weird. It converts relative paths to absolute ones, but it doesn't actually change what the path points to. I assumed it was treating relative paths as if they were relative to the config file's path (and then fixing them up), but they're actually just relative to the CWD.
I suppose either way, if users are expecting to see absolute paths in log files etc., we should probably keep this behaviour though. What do you think?
If we were writing this from scratch, I think a better behaviour would be for paths to be relative to the config file, but I guess we'll break things if we change it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose either way, if users are expecting to see absolute paths in log files etc., we should probably keep this behaviour though. What do you think?
Yeah, probably for the best. Rather than fiddling around with data dir location and config relative paths -- sounds error prone. If it's easy to do the relative paths, though, go ahead if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's easy enough to do, but I don't want to break people's existing config files.
Hopefully going forward everyone will use URLs instead anyway!
std::filesystem::path path = path_or_url; | ||
if (std::filesystem::is_directory(path)) { | ||
root_ = std::move(path); | ||
} else if (std::filesystem::is_regular_file(path) && path.extension() == ".zip") { | ||
root_ = extract_zip_file_or_load_from_cache(path); | ||
} else { | ||
throw std::runtime_error(fmt::format( | ||
"Path must either point to a zip file or a directory: {}", path_or_url)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this implicitly check if the file exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I think under the hood stat()
is used to both check whether a file/dir exists and what it is (at least on *nix -- think it's the same on Windows).
throw std::runtime_error(fmt::format("Failed to create file {}", download_path.string())); | ||
} | ||
|
||
curlpp::Cleanup cleanup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleans up curl's global resources when it goes out of scope. Uses this function: https://curl.se/libcurl/c/curl_global_cleanup.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a bit weird but I guess that's the C++/RAII way!
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
e6d3123
to
c2e562d
Compare
c2e562d
to
1f5a175
Compare
This PR adds support for entering a URL to the input data instead of a file path on the command line. I implemented this using a C++ wrapper for curl.
Note that files are downloaded to a temporary folder and that there is no caching of URLs, just caching to check whether a zip file matching a particular checksum has already been extracted. While we could implement this by allowing users to pass a checksum for the input data as a command line argument, I don't think that's particularly user friendly. The better option is to let users specify it via the config file (#407), which I'll do later.
Closes #364.