Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hailctl] Autocomplete for hailctl config {get,set,unset} #13224

Merged
merged 22 commits into from Aug 14, 2023

Conversation

jigold
Copy link
Collaborator

@jigold jigold commented Jun 30, 2023

To get this to work, I had to run the following command:

hailctl --install-completion zsh

And then add this line to the end of the ~/.zshrc file: autoload -Uz compinit && compinit

Copy link
Contributor

@daniel-goldstein daniel-goldstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a super cool change! It doesn't have to be in this PR, but can you follow up with a small note in the documentation (if there's no hailctl documentation it can even be just a note in the Installation section of the docs) about installing autocomplete and the need to have compinit in the zshrc? I'm assuming that if this is not a bug in the CLI tool then it is something that zsh users are just expected to know, so we should link to some zsh documentation describing that.

@@ -9,6 +9,7 @@
import typer
from typer import Argument as Arg

from hailtop.batch_client.parse import CPU_REGEXPAT, MEMORY_REGEXPAT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this import into the config_variables function.

I'm personally pretty sensitive to hailctl startup time, which unfortunately python is not great at and this is mostly due to transitively importing everything up front. This is why I went to some lengths to keep top-level imports in the hailctl module to a minimum so that subcommands only load parts of the codebase that they will need during execution (I should really document this it is super not obvious from the code). Unfortunately this import increases startup overhead by around 250ms, which to me is noticeable and a bit of a thorn when just trying to e.g. look at the help text which should in theory be instant.

If you want to examine import times you can do the following:

pip install tuna  # For viewing the profiler output
PYTHONPROFILEIMPORTTIME=1 hailctl 2>profile
tuna profile

On this commit it gave this output

Screenshot 2023-07-05 at 8 51 14 AM

which I traced back to this import.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how shell autocomplete is actually implemented? Is it just repeatedly invoking the hailctl executable or is there some long-lived hailctl process and it's communicating with the shell? This would be useful information to know for making the autocomplete snappier in the future as it's a bit laggy at the moment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's much snappier when we don't import the regex patterns for cpu and memory as well as the RouterAsyncFS.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you'd like to do here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So those imports are only needed to execute the validation lambdas, which aren't necessary for autocomplete, so I guess the crux is that we can't push the imports down into those lambdas.

I think that if we go with my other suggestion about turning these tuples into classes, and instead of those lambdas have a validate method on the classes, I think we should get best of both worlds because those imports can happen inside the validate method (which won't get called in the autocomplete code path)

@app.command()
def set(parameter: str, value: str):
def set(parameter: Ann[str, Arg(default=..., help="Configuration variable to set", autocompletion=complete_config_variable)], value: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there default=…. here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was necessary to get a required argument from some resource I can't find anymore, but I don't think that's actually true. Removing.

global _config_variables

if _config_variables is not None:
return _config_variables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a measurable difference in stashing this in a global variable as opposed to generating the list whenever this function is called?

print(f"Error: unknown parameter {parameter}", file=sys.stderr)
sys.exit(1)

validation_func, msg = validations.get(parameter, (lambda _: True, '')) # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
validation_func, msg = validations.get(parameter, (lambda _: True, '')) # type: ignore
validation_func, msg = validations[parameter]

We've now guaranteed that parameter will be present in the dictionary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about preserving backwards compatibility, but I guess that's not necessary???

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was worried that if I missed some configuration in the code base or something we add later on, then we've basically made this version of hailctl unusable. So basically, my goal was just to make what we have more helpful rather than stricter. So the real question is whether lines 163-165 are correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to pose this question to the larger team, but in my opinion it is a bug to accept arbitrary config values not a feature because if I'm allowed to set it I would expect it to do something. If we agree it's a bug, we are within our rights to revoke the ability to set arbitrary config values.

Regarding versions being unusable, that's an interesting question. I guess that would happen if someone does somewhere in the code configuration_of(some_new_config) without ever adding validations for it? And that would only make it through our tests if we only ever use the environment variable override instead of using hailctl config set which to be honest sounds very plausible. I suppose that for that problem I would propose the solution that configuration_of( should not accept some arbitrary string, but a variant of an Enum. Separately, we can enforce in a test (or ideally just within the type system too) that every variant of that enum can be properly set through hailctl. That way you can't introduce configuration without also adding support for it in hailctl. This would also give a lot more confidence that this PR caught all usages of configuration_of.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to address this comment.

(
'query/batch_worker_cores',
'Cores specification for the query worker',
(lambda x: re.fullmatch(CPU_REGEXPAT, x) is not None, 'should be an integer which is a power of two from 1 to 16 inclusive'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little odd to have the nested tuples here after adding the help text. Can we instead make classes that encapsulate this information instead of a Tuple[str, str, Tuple[Callable[...], str]]?

for name, help_text, _ in config_variables():
if name.startswith(incomplete):
completion_item = (name, help_text)
yield completion_item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: for brevity I would personally forego the additional binding and just do yield (name, help_text)

for name, current_value in elements:
if name.startswith(incomplete):
completion_item = (name, current_value)
yield completion_item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea to use the actual value of the setting in the help text for autocomplete is neat, but I think it misses the original intention of the help text, which is to identify the name of a certain concept in the configuration. I think it would be more consistent and useful to use the help text from the set command that you have above. If there are multiple configurations that store names of buckets for example, I think it would be more helpful to read the help text explaining what those two buckets mean rather than what those buckets are (you can always actually execute the get command to get the actual value).

Copy link
Contributor

@daniel-goldstein daniel-goldstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left responses in earlier comments.

for name, current_value in elements:
if name.startswith(incomplete):
help_msg = config_items.get(name)
yield (name, help_msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this whole function be deleted now and we use complete_config_variable in unset and get?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what we want the behavior to be for unsetting and getting variables that exist already that are not known in the validations. The way it's implemented now is we load all possibilities from the existing configuration. You're proposing to show all known possibilities even if they haven't been set before and don't exist in the configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I missed that distinction. In that case I'm good with listing what's already in the configuration (keeping this as is)

@jigold
Copy link
Collaborator Author

jigold commented Aug 3, 2023

@iris-garden Would you be interested in collaborating on this set of changes? I know you were starting to think about environment variables and configuration as well.

@jigold
Copy link
Collaborator Author

jigold commented Aug 3, 2023

I haven't tested the refactored code yet -- would like to see if this was the refactoring you had in mind with Enums.

Copy link
Contributor

@daniel-goldstein daniel-goldstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya this is definitely what I was getting at. Just a couple small comments

ConfigVariableInfo = namedtuple('ConfigVariable', ['help_msg', 'validation'])


class ConfigVariable(Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya this is what I meant, with one step remaining. If you make the configuration_of function take a ConfigVariable instead of a section string and option string, then you don't need the assert, it will be impossible for someone to use configuration_of without that option being present in this Enum

@app.command()
def set(parameter: str, value: str):
def set(parameter: Ann[str, Arg(help="Configuration variable to set", autocompletion=complete_config_variable)], value: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make class ConfigVariable(Enum) into class ConfigVariable(Enum, str), then you can make this parameter Ann[ConfigVariable and typer will automatically reject inputs that aren't part of the enum and do the conversion to the corresponding Enum variant for you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out how to do the import to be able to use that class annotation without the from hailtop.config import ConfigVariable at the top level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya it'll have to be at the top level. But we only need that single enum definition, we can put that in its own file that's cheap to import

Copy link
Contributor

@daniel-goldstein daniel-goldstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, just a few small nits.

deprecated_envvar: Optional[str] = None) -> Union[str, T]:
config_variable_info = config_variables()[config_variable]
section = config_variable_info.section
option = config_variable_info.option
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of storing the section and option again in a map, you can do config_variable.value.split("/")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we went with your suggestion, we'd have to have something like this. Which do you prefer?

if '/' in config_variable.value:
    section, option = config_variable.value.split('/')
else:
    section = 'global'
    option = config_variable.value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we went with your suggestion, we'd have to have something like this. Which do you prefer?

if '/' in config_variable.value:
    section, option = config_variable.value.split('/')
else:
    section = 'global'
    option = config_variable.value

I would prefer this. If I understand correctly it would allow us to delete the repeated section and option information in config_variables


validations = {var: var_info.validation for var, var_info in config_variables().items()}

validation_func, msg = validations.get(parameter, (lambda _: True, '')) # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to assert now that parameter is in the map. We can make a test that asserts every ConfigVariable is in config_variables (the ConfigVariable class itself is iterable so you can loop over all the variants of the enum).

# necessary for backwards compatibility test
os.environ['XDG_CONFIG_HOME'] = config_dir
_set('global', 'email', 'johndoe@gmail.com')
_set('batch', 'foo', '5')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents _set from changing in ways that are potentially valid going forward. I think the real backwards compatibility concern here is being able to read old configs, so let's do that directly by opening the file and writing these configs in.

assert res.exit_code == 0
assert res.stdout.strip() == 'johndoe@gmail.com'

res = bc_runner.invoke(cli.app, ['get', 'batch/foo'], catch_exceptions=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch and I think we should proceed in this PR as you have it currently. I feel like at some point though we should somehow communicate to users that they have bogus settings in their config. Maybe something we can discuss at team meeting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be part of a future hailctl config check where we also check the remote_tmpdir is correct for the regions and not cold storage.


def test_config_unset_unknown_name(runner: CliRunner):
res = runner.invoke(cli.app, ['unset', 'foo'], catch_exceptions=False)
assert res.exit_code == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good call here about allowing arbitrary str in unset.

@jigold
Copy link
Collaborator Author

jigold commented Aug 11, 2023

Should be all set except for the question in the comment.

Copy link
Contributor

@daniel-goldstein daniel-goldstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded in comment

Copy link
Contributor

@daniel-goldstein daniel-goldstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome change. Thank you for bearing with all the Enum changes, I think this will be very robust.

@jigold
Copy link
Collaborator Author

jigold commented Aug 14, 2023

A test failed because hailctl config unset now returns an error if the config variable does not exist. Let me know if you think we should maintain the current behavior -- otherwise, I slightly modified the tests.

@@ -96,10 +116,13 @@ def unset(parameter: str):
del config[section][key]
with open(config_file, 'w', encoding='utf-8') as f:
config.write(f)
else:
print(f"Error: unknown parameter {parameter!r}", file=sys.stderr)
sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, let's just allow this. I don't see a problem with unseting a value that is not currently set.

@jigold
Copy link
Collaborator Author

jigold commented Aug 14, 2023

Sorry - one more thing I need help with. There's a cyclical import with the RouterFS in variables.py. Should I just pylint ignore it?

@daniel-goldstein
Copy link
Contributor

daniel-goldstein commented Aug 14, 2023

Sorry - one more thing I need help with. There's a cyclical import with the RouterFS in variables.py. Should I just pylint ignore it?

So it looks like that circular import is because of config_variables, but config_variables is only ever used in hailctl not elsewhere in hailtop:

hailctl/config/cli.py
10:from hailtop.config.variables import ConfigVariable, config_variables
46:    for var, var_info in config_variables().items():
56:    if parameter not in config_variables():
62:    config_variable_info = config_variables()[parameter]
86:    from hailtop.config import config_variables, get_user_config  # pylint: disable=import-outside-toplevel
99:    config_items = {var.name: var_info.help_msg for var, var_info in config_variables().items()}

config/__init__.py
4:from .variables import ConfigVariable, config_variables
14:    'config_variables',

config/variables.py
6:_config_variables = None
30:def config_variables():
34:    global _config_variables
36:    if _config_variables is None:
37:        _config_variables = {
112:    return _config_variables

Can you move config_variables into a file in hailtop/hailctl/config and keep the ConfigVariable enum in hailtop/config/variables.py? Then you can't have a circular reference because hailtop can't depend on hailctl.

@jigold
Copy link
Collaborator Author

jigold commented Aug 14, 2023

I think we're finally good on this. Thanks for the help!

@danking danking merged commit ca4ab15 into hail-is:main Aug 14, 2023
8 checks passed
sjparsa pushed a commit to sjparsa/hail that referenced this pull request Aug 15, 2023
)

To get this to work, I had to run the following command:
```
hailctl --install-completion zsh
```

And then add this line to the end of the ~/.zshrc file: `autoload -Uz
compinit && compinit`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants