Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Azure User Identity for credential-less operation on Azure Batch #3314

Closed
abhi18av opened this issue Oct 24, 2022 · 21 comments · Fixed by #4897
Closed

Integrate Azure User Identity for credential-less operation on Azure Batch #3314

abhi18av opened this issue Oct 24, 2022 · 21 comments · Fixed by #4897
Assignees

Comments

@abhi18av
Copy link
Member

New feature

There are basically three modes of authentication which we can explore

  1. Service Principal
  2. User-assigned Identity
  3. Managed Identity

The user-assigned identity requires certain metadata to be made available to an Azure Resource such as client ID, Object ID, or Resource ID but beyond this, is effectively the same as Managed Identity (a user-identity managed by the Azure platform).

TLDR; The use of user/managed identity

  • Would only require us to do an azcopy login --managed within .command.run
  • Requires certain properties to be enabled in the Batch account (such as Identity = SystemAssigned and Storage account authentication = "BatchAccountManagedIdentity"
  • Requires the Managed Identity to have sufficient permissions

Managed/User Identity can be used with all major services which makes sense in Nextflow (and Tower) context such as

  • Monitoring
  • Logging
  • Managed Disk
  • Azure ARC Kubernetes
  • Key vault

Usage scenario

This enhancement would pave the way for a completely credential less deployment of Nextflow on Azure Batch.

Suggest implementation

The foundation of this enhancement would be the activeDirectory scope introduced in #3132

@abhi18av abhi18av changed the title Integrate Azure User Identity for credential less operation on Azure Batch Integrate Azure User Identity for credential-less operation on Azure Batch Oct 24, 2022
@abhi18av abhi18av self-assigned this Oct 24, 2022
@adamrtalbot
Copy link
Collaborator

A recent change on Azure means that low priority VMs have been replaced on Azure Batch. To use these, Azure Batch accounts must be in User Subscription mode. Once an account is in User Subscription mode, you must use AAD to authenticate, no more keys!

image

This means it is no longer possible to use low-priority VMs on Azure Batch via Nextflow. It's still possible to use dedicated VMs on a Batch account which uses the Batch Service as it's pool allocation mode, but I imagine this is on some sort of deprecation pathway.

Long story short, AAD based authentication is pretty critical for continued use of Azure Batch.

Thanks for your work so far @abhi18av, I'll happily volunteer some time and energy to testing if you need it.

@abhi18av
Copy link
Member Author

abhi18av commented Oct 29, 2022

Thanks @adamrtalbot , to address the use-case you have highlighted I have already implemented a service principal based auth here #3132 - I'd love to know hear your thoughts if this would address the immediate concerns?

The use of User-assigned Identity and Managed Identity would build upon the solution implemented in the indicated PR.

@adamrtalbot
Copy link
Collaborator

Thanks @abhi18av, I was getting a bit mixed up with layers of Microsoft AAD.

Could the above ticket allow us to run on Azure Batch without any keys? So we could use az login or a Managed Identity to authenticate? Or is this scope internal use within Nextflow?

@abhi18av
Copy link
Member Author

Could the above ticket allow us to run on Azure Batch without any keys? So we could use az login or a Managed Identity to authenticate? Or is this scope internal use within Nextflow?

Exactly, that's the plan 💯

Also, just to clarify the Managed Identity solution afaict could only be used from a VM which has been deployed with proper configuration and then NF head job should be able to offload the responsibility of authentication to the VM's own Managed Identity.

@adamrtalbot
Copy link
Collaborator

Sounds perfect!

@stale

This comment was marked as off-topic.

@stale stale bot added the stale label Jun 10, 2023
@pditommaso
Copy link
Member

What's missing to achive this?

@pditommaso pditommaso reopened this Jun 12, 2023
@stale stale bot removed the stale label Jun 12, 2023
@pditommaso
Copy link
Member

@adamrtalbot this is challenge for you 😉

@adamrtalbot
Copy link
Collaborator

@abhi18av did you make any progress on this when you were working on it? If I get a chance I will take another look.

@abhi18av
Copy link
Member Author

Hey @adamrtalbot , I assume that what you're aiming for is user/managed identity now right?

@adamrtalbot
Copy link
Collaborator

Yup.

@abhi18av
Copy link
Member Author

abhi18av commented Sep 30, 2023

I'm a bit out of touch on this, perhaps we could sync sometime on Monday?

Happy to pick this up or guide you through depending on what's needed.

@abhi18av
Copy link
Member Author

abhi18av commented Oct 2, 2023

Okay, here are the things @adamrtalbot and @abhi18av discussed in the meeting today to address the credential-less operations for azcopy (and also fusionfs)

@adamrtalbot
Copy link
Collaborator

adamrtalbot commented Dec 1, 2023

OK I've managed this, it seems pretty straightforward.

Firstly, I made an Azure Batch pool which was the same as the normal Nextflow pools, but with two changes:

  • The identity was a user assigned identity, which I selected from a set. I just used the UI so far and selected from a list. Docs: https://learn.microsoft.com/en-us/azure/batch/managed-identity-pools. Would be worth someone who actually understands Azure identities to check this out.
  • Modified the startTask to download a more recent version of AzCopy. Not a big deal. The script was this:
bash -c "tar -xzvf azcopy.tar.gz && chmod +x azcopy*/azcopy && mkdir $AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy*/azcopy $AZ_BATCH_NODE_SHARED_DIR/bin/"

and resource file was used this URL: https://aka.ms/downloadazcopy-v10-linux to file azcopy.tar.gz.

After this, you need to set some env variables to tell azcopy to authenticate automatically. This is unique to azcopy but I imagine there would be something similar if we used this system to access the Azure Key Vault to enable secrets. This was pretty straightforward with the env directive (and nothing too secret here). Docs here: https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-authorize-azure-active-directory#authorize-by-using-a-system-wide-managed-identity.

I then ran the following Nextflow pipeline. I used Fusion to try and remove any secret azcopy authentication that may have occured. I also ran it on a different batch pool where it failed with this error message: Failed to perform Auto-login: ManagedIdentityCredential: no default identity is assigned to this resource.

main.nf:

process AZCOPY {
    container 'quay.io/nf-core/ubuntu:20.04'

    output:
        path("nf-1JnhkGMxTKrBuU.log"), emit: myFile

    """
    \$AZ_BATCH_NODE_SHARED_DIR/bin/azcopy copy 'https://account.blob.core.windows.net/path/nf-1JnhkGMxTKrBuU.log' nf-1JnhkGMxTKrBuU.log
    cat nf-1JnhkGMxTKrBuU.log
    """
}

workflow {
    AZCOPY()
}

nextflow.config:

workDir = "$AZURE_BATCH_WORK_DIR"

tower {
    enabled = true
}

wave {
    enabled = true
}

fusion {
    enabled = true
}

env {
    AZCOPY_JOB_PLAN_LOCATION = "."
    AZCOPY_AUTO_LOGIN_TYPE   = "MSI"
    AZCOPY_LOG_LOCATION      = "."
}

process {
    executor = 'azurebatch'
    queue = 'managed-identity'
}

azure {
    storage {
        accountName   = "$AZURE_STORAGE_ACCOUNT_NAME"
    }
    batch {
        location      = "$AZURE_BATCH_ACCOUNT_REGION"
        accountName   = "$AZURE_BATCH_ACCOUNT_NAME"
    }
}

All-in-all, it's just a couple of API call changes and updating azcopy, so nothing too troubling. Users will have to create a managed identity with the correct permissions so docs may take a bit longer because it's a bit fiddly. @vsmalladi would you or someone from msft be able to help?

Will try Azure Key Vault now.

@pditommaso
Copy link
Member

Worth mentioning the authentication of storage is not a problem. The use of SAS (temporary) tokens is fair enough.

The biggest problem is the authentication for Batch API.

@adamrtalbot
Copy link
Collaborator

The biggest problem is the authentication for Batch API.

What's missing from the existing Entra integration?

@pditommaso
Copy link
Member

pditommaso commented Dec 1, 2023

As long as you need to share the servicePrincipalSecret is useless from my point of view. We need a share-nothing solution

@adamrtalbot
Copy link
Collaborator

Yes, especially when every other Azure service can be authenticated by calling DefaultAzureCredential() it's frustrating that Batch is left behind.

But these changes would enable more secure storage access, secrets etc.

@abhi18av
Copy link
Member Author

abhi18av commented Dec 6, 2023

#3314 (comment) - thanks @adamrtalbot, I believe that with this we could have this implemented in the default setup too!

Also the managed identity solution should work for the use case Paolo mentioned, I just never got around to playing with the Azure Batch configs for this one 🙈

Now that we have some credits, I'll pick this up again and share the updates here 🤞

@adamrtalbot
Copy link
Collaborator

Thanks Abhinav, can we use a managed identity to authenticate with a batch service? I know we can with storage, key vault etc. but didn't think it was possible with Batch. Some googling suggests it is possible but you have to write some code to do it: https://stackoverflow.com/questions/76326790/how-to-use-user-managed-identity-to-access-azure-batchclient-programatically

@abhi18av
Copy link
Member Author

abhi18av commented Dec 6, 2023

Yes, it is possible. From what I can recall with the efforts done in #3132 (comment) I was able to use managed identity and user-assigned identity through some operations on the Azure portal.

I decided to focus only on the servicePrincipal at the time since azcopy also works with it and the PR was becoming too big.

I'll have to poke around in the Git history/branches but I'm sure that this operation was implemented (or sketched out at least) 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants