Skip to content

Conversation

@ccharest93
Copy link
Contributor

Added a docker API client, and changed the build function to its low-level version. This makes it return an event stream that we can then log to screen for real-time update on the image build process. This prints the same things as running docker build locally, so docker RUN commands can get quite verbose, we could filter the 'stream' events and only print the ones starting with 'Step ' or '---> ' .

Also changed the log message from api.py to make more sense (encompasses the case where dockerfile.torchx is used).

#813

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 3, 2024
@ccharest93
Copy link
Contributor Author

ccharest93 commented Feb 6, 2024

  • Fixed the pyre typing errors
  • Unitest are failling because of codecov failing to upload?
  • Slurm integration stuck in post

I also removed a redundant error case in my commit ( next() doesnt raise ValueError) and renamed output to build_msg.

@ccharest93 ccharest93 marked this pull request as draft February 11, 2024 12:49
@ccharest93
Copy link
Contributor Author

ccharest93 commented Feb 11, 2024

My previous commit #809 was flawed, it didn't change logic cause if it did, it would introduce a bug. Ill fix it in a different commit, in the meantime this commit is in draft mode.

@d4l3k
Copy link
Member

d4l3k commented Apr 9, 2024

@ccharest93 are you interested in polishing this up and landing it? cc @clumsy

if len(old_imgs) == 0 or role.image not in old_imgs:
role.image = image.id
build_msg: Dict[str, str] = {}
while True:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use a simple for loop?

for build_msg in build_events:
    if "stream" in build_msg:
        output_str = build_msg["stream"].strip("\r\n").strip("\n")
        if output_str:
            log.info(output_str)
    if "aux" in build_msg:
        image_id = build_msg["aux"]["ID"]
        break

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build events is a stream, until stop_iteration is given we need to wait for further input on the stream. A for_loop wouldn't wait for the stream to complete and would only print events up to this point

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it how Iterables/generators work in for loop in Python? If you give a for loop an Iterable/generator (something that has __next__()) it will call __next__() until it faces StopIteration and exists the loop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem to be working fine, please check the latest version of #874

)
log.info("Building workspace docker image (this may take a while)...")
image, _ = self._docker_client.images.build(
build_events = self._docker_api_client.build(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we simply use the second value in the tuple that self._docker_client.images.build() returns? It's the same build_events Iterable, isn't it? Then we don't have to copy the logic to retrieve the image id.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it done so that we can print all the output leading to the error?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so we don't seem to print error events

Copy link
Contributor Author

@ccharest93 ccharest93 Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i recall correctly, the first call is blocking and returns all the events at the same time, the second isnt and returns a stream which we can then print/process in real time

@ccharest93
Copy link
Contributor Author

Ok i cleaned up the commit, i still need to rebuild my devcontainer to run tests and fix any errors that show up, will do this weekend. Would still like feedback on my original post though. Don't want to end up cluttering the torchx logs too much is my opinion.

This prints the same things as running docker build locally, so docker RUN commands can get quite verbose, we could filter the 'stream' events and only print the ones starting with 'Step ' or '---> ' .

@clumsy
Copy link
Collaborator

clumsy commented Apr 10, 2024

@ccharest93 adding quiet scheduler_arg should help suppressing where it's needed.

@ccharest93
Copy link
Contributor Author

This was landed in #874 right? I can close this PR? @clumsy

@ccharest93 ccharest93 closed this Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants