Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spot] Fix AWS NoCredentialError caused by credential rotation #2695

Merged
merged 17 commits into from
Oct 19, 2023

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Oct 11, 2023

Fixes #2697

We fixed the cache for session.resource in #2675. However, it causes a user using SSO getting NoCredentialError when running spot jobs, which might be because of the rotation of the credential used by the cached session.resource. We avoid using cached resource, but create a new one to trigger refresh of the credential.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
    • sky spot launch --cloud aws --num-nodes 2 --cpus 2+ sleep 10000000; run with no static credential on spot controller for 11 hours with no issue.
  • All smoke tests: pytest tests/test_smoke.py --aws
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@Michaelvll Michaelvll changed the title [AWS] Fix NoCredentialError caused by cached session.resource [Spot] Fix AWS NoCredentialError caused by credential rotation Oct 11, 2023
Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @Michaelvll! Some comments while tests/repro are underway.

Since this touches inner loops, we may need to run AWS smoke tests as well.

sky/provision/__init__.py Outdated Show resolved Hide resolved
sky/adaptors/aws.py Show resolved Hide resolved
sky/adaptors/aws.py Outdated Show resolved Hide resolved
sky/adaptors/aws.py Outdated Show resolved Hide resolved
sky/adaptors/aws.py Outdated Show resolved Hide resolved
sky/adaptors/aws.py Show resolved Hide resolved
sky/adaptors/aws.py Show resolved Hide resolved
@@ -37,6 +38,11 @@

version = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to bump this and/or do the importlib reload hack?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no API changes for this file in the file, so it should be fine not bumping the version.

Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @Michaelvll. We should run AWS smoke tests before merging?

sky/adaptors/aws.py Outdated Show resolved Hide resolved
sky/provision/__init__.py Outdated Show resolved Hide resolved
@Michaelvll Michaelvll merged commit a3311f6 into master Oct 19, 2023
18 checks passed
@Michaelvll Michaelvll deleted the fix-credential-issue-with-aws branch October 19, 2023 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Spot] NoCredentialError for spot jobs
2 participants