Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/sas token final #13140

Merged
merged 23 commits into from Jun 8, 2023
Merged

Feature/sas token final #13140

merged 23 commits into from Jun 8, 2023

Conversation

gregsmi
Copy link
Contributor

@gregsmi gregsmi commented Jun 2, 2023

Note this PR replaces the previous Feature/sas token merge because the original PR branch got jacked up beyond repair. All the comments on the earlier PR are responded to there and addressed in the code for this one.

This PR is to enable hail-az/https Azure file references to contain SAS tokens to enable bearer-auth style file access to Azure storage. Basic summary of the changes:

  • Update AzureAsyncFS url parsing function to look for and separate out a SAS-token-like query string. Note: made fairly specific to SAS tokens - generic detection of query string syntax interferes with glob support and '?' characters in file names
  • Added generate_sas_token convenience function to AzureAsyncFS. Adds new azure-mgmt-storage package requirement.
  • Updated AzureAsyncFS to use (account, container, credential) tuple as internal BlobServiceClient cache key
  • Updated AzureAsyncFSURL and AzureFileListEntry to track the token separately from the name, and extend the base classes to allow returning url with or without a token
  • Update RouterFS.ls function and associated listfiles function to allow for trailing query strings during path traversal
  • Update AsyncFS.open_from function to handle query-string urls in zero-length case
  • Change to existing behavior: LocalAsyncFSURL.__str__ no longer returns 'file:' prefix. Done to make str() output be appropriate for input to fs functions across all subclasses
  • Updated InputResource to not include the SAS token as part of the destination file name
  • Updated inter_cloud/test_fs.py to generically use query-string-friendly file path building functions to respect the new model, where it is no longer safe to extend URLs by just appending new segments with + "/" because there may be a query string, and added 'sas/azure-https' test case to the fixture. Running tests for the SAS case requires some new test variables to allow the test code to generate SAS tokens (build.yaml/test_hail_python_fs):
      # Required for SAS testing on Azure
      export HAIL_TEST_AZURE_RESGRP=haildev
      export HAIL_TEST_AZURE_SUBID=12ab51c6-da79-4a99-8dec-3d2decc97343

@gregsmi gregsmi mentioned this pull request Jun 2, 2023
@daniel-goldstein daniel-goldstein self-assigned this Jun 2, 2023
Copy link
Contributor

@daniel-goldstein daniel-goldstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies for all the conflicts with your old branch! Thanks a lot for this contribution, this looks great.

@daniel-goldstein
Copy link
Contributor

daniel-goldstein commented Jun 5, 2023

@gregsmi Looks like the scopes we currently have set are insufficient for reading the storage account keys. Do you know best practice permissions for creating SAS tokens?

The client '96fe73da-25e0-4a69-9cd8-0043e56d0d0a' with object id '96fe73da-25e0-4a69-9cd8-0043e56d0d0a' does not have authorization to perform action 'Microsoft.Storage/storageAccounts/listKeys/action' over scope '/subscriptions/22cd45fe-f996-4c51-af67-ef329d977519/resourceGroups/haildev/providers/Microsoft.Storage/storageAccounts/hailtest' or the scope is invalid. If access was recently granted, please refresh your credentials.

@gregsmi
Copy link
Contributor Author

gregsmi commented Jun 6, 2023

ah yes - the identity used to create the SAS token needs to have a control plane role on the Storage Account - Owner, Contributor, or (most specific) Storage Account Key Operator Service Role... Is that a manageable role to configure for testing or should I try to explore alternatives in the generation?

@daniel-goldstein
Copy link
Contributor

ah yes - the identity used to create the SAS token needs to have a control plane role on the Storage Account - Owner, Contributor, or (most specific) Storage Account Key Operator Service Role... Is that a manageable role to configure for testing or should I try to explore alternatives in the generation?

Thanks! This is totally fine, I'll just configure the SP that we use for the inter-cloud tests with the key operator role and re-run the tests.

@danking danking merged commit d647b6a into hail-is:main Jun 8, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants