Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hailctl] Fix uploading files from parent directories of cwd #13812

Merged
merged 17 commits into from Jan 9, 2024

Conversation

jigold
Copy link
Collaborator

@jigold jigold commented Oct 13, 2023

The bug in #13785 was that we used a relpath, but the files that were to be included was the parent directory of the current working directory of the script. To solve this, I added a new option --mounts that uses the Docker syntax for -v src:dest to specify where the contents of the source should be mounted into the container.

Fixes #13785

@jigold jigold marked this pull request as ready for review October 18, 2023 21:41
@jigold
Copy link
Collaborator Author

jigold commented Oct 20, 2023

I think we might need a different name than "mount". In Docker, you can only use the same destination mount once. i.e. you can't have this. --mount file1:/ --mount file2:/.

@danking Do you have any thoughts on the interface for this?

@danking danking self-assigned this Oct 23, 2023
@jigold
Copy link
Collaborator Author

jigold commented Oct 23, 2023

Fixes #13785

@jigold jigold linked an issue Oct 23, 2023 that may be closed by this pull request
@@ -164,6 +164,9 @@ def submit(
files: Ann[
Optional[List[str]], Opt(help='Files or directories to add to the working directory of the job.')
] = None,
mounts: Ann[
Optional[List[str]], Opt(help='Files or directories to add to a specific mount location in the job. Equivalent to -v src:dest in Docker.')
] = None,
Copy link
Collaborator

@danking danking Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the term mount b/c this isn't mounting the users hard drive into the cloud.

I think it's a bit confusing that there are two arguments with nearly identical descriptions.

How about this:

  1. Replace --file with the functionality provided by --mount
  2. Allow the new --file to elide the destination. When the destination is elided, it is assumed to be the same as the source (so: .. means ..:.. both relative to the current working directory)
  3. By default, make the working directory of the job match the working directory of the user's laptop.

These three things together allow

$ cd /Users/dking/projects/foo
$ hailctl batch submit ... \
  --file ../foo \
  --file ../../reference_data:/reference_data \
  --file ../../other_project/thing.py

to work as expected. You'll get a container with this file structure:

/Users
  /dking
    /projects
      /foo
        thing.py
        ... everything else in the working directory of this script on the user's laptop ...
/reference_data
   ... everything in /Users/dking/projects/reference_data on the user's laptop ...

NB1: if you do --file .. --file foo.py you're implicitly copying foo.py twice. Unnecessary, but should be fine if foo.py is not mutated while we are submitting.
NB2: --file . and --file ../foo, are the same in this instance. that seems fine.

@danking
Copy link
Collaborator

danking commented Oct 23, 2023

The "Fixes #12345" needs to be in the description, I've edited the description

@jigold
Copy link
Collaborator Author

jigold commented Nov 7, 2023

bump. Test failure was unrelated.

Copy link
Collaborator

@iris-garden iris-garden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! happy to hit approve once the conflicts are fixed

@jigold
Copy link
Collaborator Author

jigold commented Nov 29, 2023

I rebased this @iris-garden

iris-garden
iris-garden previously approved these changes Nov 29, 2023
@danking
Copy link
Collaborator

danking commented Nov 30, 2023

Supporting file:// seems to suggest that we would support URIs in general (e.g. --files gs://foo/bar:/baz); AFAICT we don't support the latter, right? What's the motivation to support file://?

@danking
Copy link
Collaborator

danking commented Nov 30, 2023

I couldn't replicate the error locally. It seems to be transient because at least one of the CI builds succeeded.

@jigold
Copy link
Collaborator Author

jigold commented Dec 1, 2023

Supporting file:// seems to suggest that we would support URIs in general (e.g. --files gs://foo/bar:/baz); AFAICT we don't support the latter, right? What's the motivation to support file://?

I didn't know what we currently support with regards to file paths, so I just made it that we're resilient to someone adding file://

@jigold
Copy link
Collaborator Author

jigold commented Dec 1, 2023

I'm not sure what to do about the invalid block id transient error...

@jigold
Copy link
Collaborator Author

jigold commented Jan 8, 2024

This took me an incredibly long time to figure out, but I'm pretty sure the issue is that my last test copies the same file twice. Once explicitly as a single file and the second time as part of the directory to copy recursively. I don't think our copier code interacts correctly with Azure Blob Storage in this case and we end up with invalid block ID list errors.

I'm still stewing over which option is the best to address this. I could either not have submit.py ask to copy the same file twice or I can modify the copier to have a lock on the create blob writer stream so that we don't have this issue. However, I'm worried about deadlocks with our extensive use of semaphores and parallelism. Thoughts?? I think the right thing to do is fix it for now in submit.py so we can get this in and then make an issue to make a more durable fix to the copier.

@danking
Copy link
Collaborator

danking commented Jan 9, 2024

@jigold https://azure.github.io/Storage/docs/application-and-user-data/code-samples/concurrent-uploads-with-versioning

Looks like this is the expected behavior from Azure Blob Storage when concurrently uploading blobs. It seems like all blocks are purged when any single upload succeeds. Unfortunately, it doesn't seem possible to distinguish between a truly invalid block list and this transiently invalid block list. I dislike this interface, but it is what we have.

It seems to me that the right fix is to deduplicate the file list before uploading. Multiple files uploading to the same target should be an error if they're different source files.

@jigold
Copy link
Collaborator Author

jigold commented Jan 9, 2024

@iris-garden Can you review again when you get a chance? Tests are all passing now

Copy link
Collaborator

@iris-garden iris-garden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hailctl batch submit tries to copy bad directory files
3 participants