Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] Add option to show docker output when building images #11428

Merged

Conversation

daniel-goldstein
Copy link
Contributor

This adds support to optionally view the docker output in build_python_image. I'm not sure what our general strategy is for adding inputs to user-facing functions. I tried to maintain backwards compatibility, but it feels very weird and similarly wrong to put a normal parameter after an underscore parameter. I feel like it would be best to have _tmp_dir be keyword-only. That would also technically be breaking, but does the underscore prefix relinquish us from that responsibility?

jigold
jigold previously requested changes Feb 28, 2022
@@ -39,13 +49,17 @@ async def check_shell_output(script: str, echo: bool = False) -> Tuple[bytes, by
return await check_exec_output('/bin/bash', '-c', script, echo=echo)


async def check_shell(script: str, echo: bool = False) -> None:
await check_shell_output(script, echo)
async def check_shell(script: str, echo: bool = False, inherit_std_out_err: bool = False) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is stdio better than std_out_err?



def build_python_image(fullname: str,
requirements: Optional[List[str]] = None,
python_version: Optional[str] = None,
_tmp_dir: str = '/tmp') -> str:
_tmp_dir: str = '/tmp',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea why we named the argument to start with an underscore. That seems like it was a mistake on my part when I first put this functionality in. It could have been a bad renaming operation that did more than I intended.

Not sure what @danking thinks, but I think I'd be okay with renaming the argument to tmp_dir or tmpdir and placing a * after the renamed tmp_dirso that all other arguments after this are keyword argument only. I'd also change the type to Optional[str] and set the default to /tmp inside the function.

Sorry about that! Also, why not make the show docker output always True and punt on the tmp_dir problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a better sense of how folks are using this API than I, if you're comfortable with changing tmp_dir, then I am also comfortable changing it. What are you thoughts on making tmp_dir keyword only? Does that feel a step too far?



def build_python_image(fullname: str,
requirements: Optional[List[str]] = None,
python_version: Optional[str] = None,
_tmp_dir: str = '/tmp') -> str:
_tmp_dir: str = '/tmp',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underscore parameters are not part of the public API; however, _tmp_dir is documented below which makes me think we have to pretend it actually is a public API. I'd prefer no underscores in public APIs, but I even more strongly think we should not change already-public APIs.

Without the constraint of publicity, I am inclined to make all the parameters other than fullname keyword-only. I roughly agree with the sk-learn team's thoughts on preferring keyword-only parameters when there is a variety of optional parameters. Going forward, I recommend making an argument keyword-only unless it is always required. Making arguments positional should generally be special exceptions that make the use of the API substantially more intuitive.

For now, let's make show_docker_output keyword only. We can change this for Hail 0.3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a list anywhere of "Breaking changes we want to make in 0.3"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query has one that exists in Asana, I don't know if there's services stuff in there too.

print(f'finished pulling image {fullname}')

if '/' in fullname:
sync_check_shell_output(f'docker push {fullname}')
sync_check_shell(f'docker push {fullname}', inherit_std_out_err=show_docker_output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These really should not be check shell, or at least we need to shell quote the paths and full name.


That suggests these should really be sync_check_exec, which does not yet exist.

I feel like we're building up a parallel process API that diverges from the subprocess.run API.

If we need to change our asyncio process API, I think we should prefer starting an asyncio copy of subprocess.run. In particular, inherit_std_out_err is known in subprocess.run as capture_output=False.


That said, this is synchronous code, so maybe we should just use subprocess.run? Unfortunately subprocess.run does not include the stdout and stderr in the CalledProcessError:

In [33]: subprocess.run(['/bin/bash', '-c', 'echo hello ; echo >&2 \'ERROR!\' ; false'], capture_output=True, check=True) 
    ...:                                                                                                                                          
Traceback (most recent call last):
  File "<ipython-input-33-3dac52dfa019>", line 1, in <module>
    subprocess.run(['/bin/bash', '-c', 'echo hello ; echo >&2 \'ERROR!\' ; false'], capture_output=True, check=True)
  File "/Users/dking/miniconda3/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
CalledProcessError: Command '['/bin/bash', '-c', "echo hello ; echo >&2 'ERROR!' ; false"]' returned non-zero exit status 1.

Unfortunately the printed form of CalledProcessError does not include the stdout and stderr. We might want to write our own shim around subprocess.run which just raises a Hail CalledProcessError with includes the stdout and stderr (which are available on subprocess.CalledProcessError as stdout and stderr).

@danking
Copy link
Contributor

danking commented Apr 11, 2022

@daniel-goldstein bump, let's get this resolved and merged.

@daniel-goldstein daniel-goldstein force-pushed the build-python-image-show-docker-output branch from a5877b1 to 5f56525 Compare April 11, 2022 13:40
@daniel-goldstein daniel-goldstein dismissed stale reviews from danking and jigold April 11, 2022 13:41

addressed

@daniel-goldstein daniel-goldstein force-pushed the build-python-image-show-docker-output branch from 5f56525 to 112ac3f Compare April 13, 2022 21:17
@daniel-goldstein
Copy link
Contributor Author

@jigold this could use a look

jigold
jigold previously requested changes Apr 21, 2022
@@ -78,15 +82,15 @@ def build_python_image(fullname: str,
python3 -m pip check
''')

sync_check_shell_output(f'docker build -t {fullname} {docker_path}')
sync_check_exec('docker', 'build', '-t', fullname, docker_path, capture_output=(not show_docker_output))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it (not show_docker_output)? I would have thought it should be the opposite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capture_output means to collect the stdout/stderr with a pipe instead of inheriting the standard streams of the calling process. If we want to show the docker output on the user's stdout/err, we want not to capture the output.

@@ -49,3 +50,13 @@ def sync_check_shell_output(script: str, echo=False) -> Tuple[bytes, bytes]:

def sync_check_shell(script: str, echo=False) -> None:
sync_check_shell_output(script, echo)


def sync_check_exec(command: str, *args: str, echo: bool = False, capture_output: bool = False) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for passing command and args separately? It doesn't look like command is used by itself anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll make it consistent with subprocess.run.

@daniel-goldstein
Copy link
Contributor Author

Tested with

hb.build_python_image('my_image', requirements=['hail'], show_docker_output=True)

@danking danking merged commit 4cd2286 into hail-is:main Apr 26, 2022
CDiaz96 pushed a commit to CDiaz96/hail that referenced this pull request Apr 27, 2022
…s#11428)

* [batch] Add option to show docker output when building images

* wip

* start doing a lot of things

* simplify sync_check_exec

* lint

* lint

* fix signature

* clean up imports

* fix signature
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants