Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS] Enable p4de in the catalog #1827

Merged
merged 5 commits into from
Apr 1, 2023
Merged

[AWS] Enable p4de in the catalog #1827

merged 5 commits into from
Apr 1, 2023

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Mar 31, 2023

Since our AWS credential does not have permission to query the offerings for A100-80GB instances, we now hardcoded it in our fetch_aws to unblock the uses of that instance for users.

Fixes #1201

Tested (run the relevant ones):

  • Any manual or new tests for this PR (please specify below)
    • python -m sky.clouds.service_catalog.data_fetchers.fetch_aws
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll, some questions.

Also: after users upgrade past this PR, do they need to do anything, or do they have to wait till up to 7 hr for the next catalog refresh? I think it's ok, just checking.

@concretevitamin
Copy link
Collaborator

Also: after users upgrade past this PR, do they need to do anything, or do they have to wait till up to 7 hr for the next catalog refresh? I think it's ok, just checking.

One problem: I ran the fetcher locally, but the fetched catalog doesn't contain any A100-80GB.

@Michaelvll
Copy link
Collaborator Author

Also: after users upgrade past this PR, do they need to do anything, or do they have to wait till up to 7 hr for the next catalog refresh? I think it's ok, just checking.

Oops, missed this. Yes, the user does not have to do anything. It will be updated automatically.

One problem: I ran the fetcher locally, but the fetched catalog doesn't contain any A100-80GB.

That is weird, it works with my account. Could you try the same command as we run in the catalog repo python -m sky.clouds.service_catalog.data_fetchers.fetch_aws --no-az-mappings --check-all-regions-enabled-for-account?

@concretevitamin
Copy link
Collaborator

python -m sky.clouds.service_catalog.data_fetchers.fetch_aws

succeeded but did not produce A100-80GB.

 python -m sky.clouds.service_catalog.data_fetchers.fetch_aws --no-az-mappings --check-all-regions-enabled-for-account

failed with

RuntimeError: The following regions are not enabled: {'eu-south-1', 'af-south-1', 'me-central-1', 'ap-southeast-3', 'me-south-1', 'ap-east-1'}

Approving for now to unblock potential usage.

@Michaelvll
Copy link
Collaborator Author

Merging this for now to unblock the users trying AWS for the A100-80GB instances. Let us debug offline @concretevitamin. : )

@Michaelvll Michaelvll merged commit f8eeeea into master Apr 1, 2023
15 checks passed
@Michaelvll Michaelvll deleted the p4de branch April 1, 2023 06:49
@Michaelvll
Copy link
Collaborator Author

Just confirmed in the catalog repo, the p4de.24xlarge is included in the catalog file now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

p4de.24xlarge not supported
2 participants