Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoscaler][aws] Update _configure_key_pair to available_node_types #15108

Closed

Conversation

DmitriGekhtman
Copy link
Contributor

@DmitriGekhtman DmitriGekhtman commented Apr 3, 2021

Why are these changes needed?

Closes #15096
Addresses key_pair part of #14856

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Added tests to check key-pair filling logic and assertion error on missing KeyName.

Did manual test: Can launch AWS cluster by specifying KeyName under available node types.

Copy link
Contributor

@ijrsvt ijrsvt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify log_to_cli to also check available node types?

Comment on lines +229 to +246
def test_missing_key(iam_client_stub):
"""
Check that an error is raised when ssh_private_key is given in the
auth_config but a node type is missing KeyName.
"""
config = helpers.load_aws_example_config_file("example-full.yaml")
config["auth"]["ssh_private_key"] = "~/fake/path"
config["available_node_types"]["ray.head.default"]["node_config"][
"KeyName"] = "fakeKey"

# bootstrap_config configures iam role and then raises an assertion error
stubs.configure_iam_role_default(iam_client_stub)
with pytest.raises(AssertionError):
helpers.bootstrap_aws_config(config)

# expect no pending responses left in IAM client stub queue
iam_client_stub.assert_no_pending_responses()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add one more test to ensure the UserData configuration works correctly? i.e. having the ssh_private_key passed in and having ["node_config"]["UserData"] set to a non-empty value

Copy link
Contributor

@ijrsvt ijrsvt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overall looks good! Thanks for doing this. OOC, would this break for workflows where a user is using available_node_types and also setting data in head_node and worker_nodes?

@DmitriGekhtman
Copy link
Contributor Author

This overall looks good! Thanks for doing this. OOC, would this break for workflows where a user is using available_node_types and also setting data in head_node and worker_nodes?

Eh, yeah, thanks -- need to consider this carefully, as that's way it currently has to be done... In an initial version, had a check for that, then got rid of it..

@ijrsvt
Copy link
Contributor

ijrsvt commented Apr 5, 2021

@DmitriGekhtman I think it might be good to support that workflow for a bit longer?

What are your thoughts @AmeerHajAli

@DmitriGekhtman
Copy link
Contributor Author

@DmitriGekhtman I think it might be good to support that workflow for a bit longer?

What are your thoughts @AmeerHajAli

Yes, for sure. No need to change things suddenly.
Shouldn't require too many more lines of code.

@DmitriGekhtman DmitriGekhtman added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 5, 2021
@simon-mo simon-mo removed their assignment Apr 7, 2021
if "UserData" not in node_config:
cli_logger.doassert( # todo: verify schema beforehand?
"KeyName" in node_config,
_key_assert_msg(node_type)) # todo: err msg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this todo?

for node_type in node_types:
node_config = node_types[node_type]["node_config"]
if "UserData" not in node_config:
cli_logger.doassert( # todo: verify schema beforehand?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this todo?

Comment on lines +345 to +346
node_config["KeyName"] == DEFAULT_KEY_PAIR["KeyName"]
node_config.pop("KeyName")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you trying to do?
I don't understand

config["auth"]["ssh_private_key"] = "~/fake/path"
config["available_node_types"]["ray.head.default"]["node_config"][
"KeyName"] = "fakeKey"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets explicitly unset the "ray.worker_default" to make it easier to read this code and understand that there is another key that does not have a KeyName.

Comment on lines -353 to +355
config["head_node"]["KeyName"] = key_name
config["worker_nodes"]["KeyName"] = key_name
for node_type in node_types:
node_config = node_types[node_type]["node_config"]
node_config["KeyName"] = key_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When doing something like this, we should explicitly raise a warning saying that KeyName in head_node/worker_nodes are deprecated and they must be passed in the node config.

Copy link
Contributor

@AmeerHajAli AmeerHajAli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we throw a warning for using worker_nodes and head_node and mention that they are deprecated and suggest using multi node types??

@AmeerHajAli
Copy link
Contributor

@DmitriGekhtman , closing this as it is no longer necessary, right?

@DmitriGekhtman
Copy link
Contributor Author

DmitriGekhtman commented May 25, 2021

@DmitriGekhtman , closing this as it is no longer necessary, right?

Right, superseded by #15584!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KeyName doesn't work with multi-node type autoscaler.
4 participants