New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hailctl] Autocomplete for hailctl config {get,set,unset} #13224
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a super cool change! It doesn't have to be in this PR, but can you follow up with a small note in the documentation (if there's no hailctl documentation it can even be just a note in the Installation
section of the docs) about installing autocomplete and the need to have compinit
in the zshrc? I'm assuming that if this is not a bug in the CLI tool then it is something that zsh users are just expected to know, so we should link to some zsh documentation describing that.
@@ -9,6 +9,7 @@ | |||
import typer | |||
from typer import Argument as Arg | |||
|
|||
from hailtop.batch_client.parse import CPU_REGEXPAT, MEMORY_REGEXPAT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this import into the config_variables
function.
I'm personally pretty sensitive to hailctl
startup time, which unfortunately python is not great at and this is mostly due to transitively importing everything up front. This is why I went to some lengths to keep top-level imports in the hailctl
module to a minimum so that subcommands only load parts of the codebase that they will need during execution (I should really document this it is super not obvious from the code). Unfortunately this import increases startup overhead by around 250ms, which to me is noticeable and a bit of a thorn when just trying to e.g. look at the help text which should in theory be instant.
If you want to examine import times you can do the following:
pip install tuna # For viewing the profiler output
PYTHONPROFILEIMPORTTIME=1 hailctl 2>profile
tuna profile
On this commit it gave this output
which I traced back to this import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how shell autocomplete is actually implemented? Is it just repeatedly invoking the hailctl
executable or is there some long-lived hailctl
process and it's communicating with the shell? This would be useful information to know for making the autocomplete snappier in the future as it's a bit laggy at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's much snappier when we don't import the regex patterns for cpu and memory as well as the RouterAsyncFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you'd like to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So those imports are only needed to execute the validation lambdas, which aren't necessary for autocomplete, so I guess the crux is that we can't push the imports down into those lambdas.
I think that if we go with my other suggestion about turning these tuples into classes, and instead of those lambdas have a validate
method on the classes, I think we should get best of both worlds because those imports can happen inside the validate
method (which won't get called in the autocomplete code path)
@app.command() | ||
def set(parameter: str, value: str): | ||
def set(parameter: Ann[str, Arg(default=..., help="Configuration variable to set", autocompletion=complete_config_variable)], value: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there default=….
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was necessary to get a required argument from some resource I can't find anymore, but I don't think that's actually true. Removing.
global _config_variables | ||
|
||
if _config_variables is not None: | ||
return _config_variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a measurable difference in stashing this in a global variable as opposed to generating the list whenever this function is called?
print(f"Error: unknown parameter {parameter}", file=sys.stderr) | ||
sys.exit(1) | ||
|
||
validation_func, msg = validations.get(parameter, (lambda _: True, '')) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validation_func, msg = validations.get(parameter, (lambda _: True, '')) # type: ignore | |
validation_func, msg = validations[parameter] |
We've now guaranteed that parameter
will be present in the dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was worried about preserving backwards compatibility, but I guess that's not necessary???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I was worried that if I missed some configuration in the code base or something we add later on, then we've basically made this version of hailctl unusable. So basically, my goal was just to make what we have more helpful rather than stricter. So the real question is whether lines 163-165 are correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to pose this question to the larger team, but in my opinion it is a bug to accept arbitrary config values not a feature because if I'm allowed to set it I would expect it to do something. If we agree it's a bug, we are within our rights to revoke the ability to set arbitrary config values.
Regarding versions being unusable, that's an interesting question. I guess that would happen if someone does somewhere in the code configuration_of(some_new_config)
without ever adding validations for it? And that would only make it through our tests if we only ever use the environment variable override instead of using hailctl config set
which to be honest sounds very plausible. I suppose that for that problem I would propose the solution that configuration_of(
should not accept some arbitrary string, but a variant of an Enum
. Separately, we can enforce in a test (or ideally just within the type system too) that every variant of that enum can be properly set through hailctl
. That way you can't introduce configuration without also adding support for it in hailctl
. This would also give a lot more confidence that this PR caught all usages of configuration_of
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to address this comment.
( | ||
'query/batch_worker_cores', | ||
'Cores specification for the query worker', | ||
(lambda x: re.fullmatch(CPU_REGEXPAT, x) is not None, 'should be an integer which is a power of two from 1 to 16 inclusive'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a little odd to have the nested tuples here after adding the help text. Can we instead make classes that encapsulate this information instead of a Tuple[str, str, Tuple[Callable[...], str]]
?
for name, help_text, _ in config_variables(): | ||
if name.startswith(incomplete): | ||
completion_item = (name, help_text) | ||
yield completion_item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: for brevity I would personally forego the additional binding and just do yield (name, help_text)
for name, current_value in elements: | ||
if name.startswith(incomplete): | ||
completion_item = (name, current_value) | ||
yield completion_item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea to use the actual value of the setting in the help text for autocomplete is neat, but I think it misses the original intention of the help text, which is to identify the name of a certain concept in the configuration. I think it would be more consistent and useful to use the help text from the set
command that you have above. If there are multiple configurations that store names of buckets for example, I think it would be more helpful to read the help text explaining what those two buckets mean rather than what those buckets are (you can always actually execute the get
command to get the actual value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left responses in earlier comments.
for name, current_value in elements: | ||
if name.startswith(incomplete): | ||
help_msg = config_items.get(name) | ||
yield (name, help_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this whole function be deleted now and we use complete_config_variable
in unset
and get
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on what we want the behavior to be for unsetting and getting variables that exist already that are not known in the validations. The way it's implemented now is we load all possibilities from the existing configuration. You're proposing to show all known possibilities even if they haven't been set before and don't exist in the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I missed that distinction. In that case I'm good with listing what's already in the configuration (keeping this as is)
@iris-garden Would you be interested in collaborating on this set of changes? I know you were starting to think about environment variables and configuration as well. |
I haven't tested the refactored code yet -- would like to see if this was the refactoring you had in mind with Enums. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya this is definitely what I was getting at. Just a couple small comments
ConfigVariableInfo = namedtuple('ConfigVariable', ['help_msg', 'validation']) | ||
|
||
|
||
class ConfigVariable(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya this is what I meant, with one step remaining. If you make the configuration_of
function take a ConfigVariable
instead of a section string and option string, then you don't need the assert, it will be impossible for someone to use configuration_of
without that option being present in this Enum
@app.command() | ||
def set(parameter: str, value: str): | ||
def set(parameter: Ann[str, Arg(help="Configuration variable to set", autocompletion=complete_config_variable)], value: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make class ConfigVariable(Enum)
into class ConfigVariable(Enum, str)
, then you can make this parameter Ann[ConfigVariable
and typer will automatically reject inputs that aren't part of the enum and do the conversion to the corresponding Enum variant for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't figure out how to do the import to be able to use that class annotation without the from hailtop.config import ConfigVariable
at the top level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya it'll have to be at the top level. But we only need that single enum definition, we can put that in its own file that's cheap to import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, just a few small nits.
deprecated_envvar: Optional[str] = None) -> Union[str, T]: | ||
config_variable_info = config_variables()[config_variable] | ||
section = config_variable_info.section | ||
option = config_variable_info.option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of storing the section and option again in a map, you can do config_variable.value.split("/")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we went with your suggestion, we'd have to have something like this. Which do you prefer?
if '/' in config_variable.value:
section, option = config_variable.value.split('/')
else:
section = 'global'
option = config_variable.value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we went with your suggestion, we'd have to have something like this. Which do you prefer?
if '/' in config_variable.value: section, option = config_variable.value.split('/') else: section = 'global' option = config_variable.value
I would prefer this. If I understand correctly it would allow us to delete the repeated section
and option
information in config_variables
|
||
validations = {var: var_info.validation for var, var_info in config_variables().items()} | ||
|
||
validation_func, msg = validations.get(parameter, (lambda _: True, '')) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to assert now that parameter
is in the map. We can make a test that asserts every ConfigVariable
is in config_variables
(the ConfigVariable
class itself is iterable so you can loop over all the variants of the enum).
# necessary for backwards compatibility test | ||
os.environ['XDG_CONFIG_HOME'] = config_dir | ||
_set('global', 'email', 'johndoe@gmail.com') | ||
_set('batch', 'foo', '5') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prevents _set
from changing in ways that are potentially valid going forward. I think the real backwards compatibility concern here is being able to read old configs, so let's do that directly by open
ing the file and writing these configs in.
assert res.exit_code == 0 | ||
assert res.stdout.strip() == 'johndoe@gmail.com' | ||
|
||
res = bc_runner.invoke(cli.app, ['get', 'batch/foo'], catch_exceptions=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good catch and I think we should proceed in this PR as you have it currently. I feel like at some point though we should somehow communicate to users that they have bogus settings in their config. Maybe something we can discuss at team meeting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be part of a future hailctl config check
where we also check the remote_tmpdir is correct for the regions and not cold storage.
|
||
def test_config_unset_unknown_name(runner: CliRunner): | ||
res = runner.invoke(cli.app, ['unset', 'foo'], catch_exceptions=False) | ||
assert res.exit_code == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good call here about allowing arbitrary str
in unset.
Should be all set except for the question in the comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responded in comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome change. Thank you for bearing with all the Enum changes, I think this will be very robust.
A test failed because |
@@ -96,10 +116,13 @@ def unset(parameter: str): | |||
del config[section][key] | |||
with open(config_file, 'w', encoding='utf-8') as f: | |||
config.write(f) | |||
else: | |||
print(f"Error: unknown parameter {parameter!r}", file=sys.stderr) | |||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, let's just allow this. I don't see a problem with unset
ing a value that is not currently set.
Sorry - one more thing I need help with. There's a cyclical import with the RouterFS in |
So it looks like that circular import is because of
Can you move |
I think we're finally good on this. Thanks for the help! |
To get this to work, I had to run the following command:
And then add this line to the end of the ~/.zshrc file:
autoload -Uz compinit && compinit