New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hailctl] Fix uploading files from parent directories of cwd #13812
Conversation
I think we might need a different name than "mount". In Docker, you can only use the same destination mount once. i.e. you can't have this. @danking Do you have any thoughts on the interface for this? |
Fixes #13785 |
@@ -164,6 +164,9 @@ def submit( | |||
files: Ann[ | |||
Optional[List[str]], Opt(help='Files or directories to add to the working directory of the job.') | |||
] = None, | |||
mounts: Ann[ | |||
Optional[List[str]], Opt(help='Files or directories to add to a specific mount location in the job. Equivalent to -v src:dest in Docker.') | |||
] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the term mount
b/c this isn't mounting the users hard drive into the cloud.
I think it's a bit confusing that there are two arguments with nearly identical descriptions.
How about this:
- Replace
--file
with the functionality provided by--mount
- Allow the new
--file
to elide the destination. When the destination is elided, it is assumed to be the same as the source (so:..
means..:..
both relative to the current working directory) - By default, make the working directory of the job match the working directory of the user's laptop.
These three things together allow
$ cd /Users/dking/projects/foo
$ hailctl batch submit ... \
--file ../foo \
--file ../../reference_data:/reference_data \
--file ../../other_project/thing.py
to work as expected. You'll get a container with this file structure:
/Users
/dking
/projects
/foo
thing.py
... everything else in the working directory of this script on the user's laptop ...
/reference_data
... everything in /Users/dking/projects/reference_data on the user's laptop ...
NB1: if you do --file .. --file foo.py
you're implicitly copying foo.py twice. Unnecessary, but should be fine if foo.py is not mutated while we are submitting.
NB2: --file .
and --file ../foo
, are the same in this instance. that seems fine.
The "Fixes #12345" needs to be in the description, I've edited the description |
bump. Test failure was unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! happy to hit approve once the conflicts are fixed
I rebased this @iris-garden |
Supporting |
I couldn't replicate the error locally. It seems to be transient because at least one of the CI builds succeeded. |
I didn't know what we currently support with regards to file paths, so I just made it that we're resilient to someone adding file:// |
I'm not sure what to do about the invalid block id transient error... |
This took me an incredibly long time to figure out, but I'm pretty sure the issue is that my last test copies the same file twice. Once explicitly as a single file and the second time as part of the directory to copy recursively. I don't think our copier code interacts correctly with Azure Blob Storage in this case and we end up with invalid block ID list errors. I'm still stewing over which option is the best to address this. I could either not have |
@jigold https://azure.github.io/Storage/docs/application-and-user-data/code-samples/concurrent-uploads-with-versioning Looks like this is the expected behavior from Azure Blob Storage when concurrently uploading blobs. It seems like all blocks are purged when any single upload succeeds. Unfortunately, it doesn't seem possible to distinguish between a truly invalid block list and this transiently invalid block list. I dislike this interface, but it is what we have. It seems to me that the right fix is to deduplicate the file list before uploading. Multiple files uploading to the same target should be an error if they're different source files. |
@iris-garden Can you review again when you get a chance? Tests are all passing now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
The bug in #13785 was that we used a
relpath
, but the files that were to be included was the parent directory of the current working directory of the script. To solve this, I added a new option--mounts
that uses the Docker syntax for-v src:dest
to specify where the contents of the source should be mounted into the container.Fixes #13785