Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Storage Support #101

Closed
1 of 3 tasks
DevSecNinja opened this issue May 9, 2022 · 17 comments · Fixed by #171
Closed
1 of 3 tasks

Azure Storage Support #101

DevSecNinja opened this issue May 9, 2022 · 17 comments · Fixed by #171
Labels
enhancement New feature or request

Comments

@DevSecNinja
Copy link

DevSecNinja commented May 9, 2022

  • I'm submitting a ...

    • bug report
    • feature request
    • support request
  • What is the expected behavior?

Enabling docker-volume-backup to backup to Azure Storage.

  • What is the motivation / use case for changing the behavior?

Azure Storage is a great solution (especially with cold storage) to backup Docker volumes to. I don't have any experience with Go but based on the AWS format I might be able to implement support for Azure Storage. Additionally, lots of professionals working with Microsoft technology have a subscription from work that they would like to use. (e.g. via a Visual Studio subscription)

SDK Docs: https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob#pkg-overview

Thanks for this awesome tool!

@m90
Copy link
Member

m90 commented May 9, 2022

This would be a great addition indeed.

One question upfront: is it possible to spin up a Azure Blob Storage compatible tool in a Docker container so it can be used in tests (and development) for this image? I would like to avoid to a. create an Azure account just to be able to support this feature and b. run tests against remote infrastructure. Ideally, this could work like with S3 where the tests run a MinIO server or WebDAV where an instance of bytemark/webdav is running.

@DevSecNinja
Copy link
Author

DevSecNinja commented May 9, 2022

This would be a great addition indeed.

One question upfront: is it possible to spin up a Azure Blob Storage compatible tool in a Docker container so it can be used in tests (and development) for this image? I would like to avoid to a. create an Azure account just to be able to support this feature and b. run tests against remote infrastructure. Ideally, this could work like with S3 where the tests run a MinIO server or WebDAV where an instance of bytemark/webdav is running.

Absolutely and totally understandable! It can be done with Azurite: https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azurite

See also this chapter to configure it with the SDK: https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azurite?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json&tabs=visual-studio#authorization-for-tools-and-sdks

For the Docker container, see: https://hub.docker.com/_/microsoft-azure-storage-azurite

If you use CI/CD pipelines, here is a great example of automating all of that: https://docs.microsoft.com/en-us/azure/storage/blobs/use-azurite-to-run-automated-tests

@m90
Copy link
Member

m90 commented May 9, 2022

Sounds good. If you or someone else wants to pick this up right away, I am happy to help with getting PRs merged (it should be relatively straight forward). Otherwise I'll keep it on the roadmap so we can get this supported sooner or later. Thanks for the suggestion.

@m90 m90 added enhancement New feature or request help wanted Extra attention is needed labels May 9, 2022
@DevSecNinja
Copy link
Author

Sounds good. If you or someone else wants to pick this up right away, I am happy to help with getting PRs merged (it should be relatively straight forward). Otherwise I'll keep it on the roadmap so we can get this supported sooner or later. Thanks for the suggestion.

Awesome! I must say that I'm also a part time CS grad student besides work, so I can't make any commitments. But it's high on my list to pick up when I have some spare time. Thanks!

@m90
Copy link
Member

m90 commented May 9, 2022

so I can't make any commitments

No need to worry here.

@m90
Copy link
Member

m90 commented May 13, 2022

In case anyone starts working on this, it'd be wise to check in on the progress of #94 which might make implementing this feature a bit easier (also completely different).

@m90
Copy link
Member

m90 commented Dec 24, 2022

@DevSecNinja I created a PR that adds support for Azure Blob Storage in #171 - it seems to work well when working against Azurite, but as I have zero experience in the Azure Ecosystem, I am wondering if I am doing something "wrong" here (e.g. non-standard way of passing credentials and such). If you're interested, I would be very happy if you could give this PR a quick sanity check and let me know if you think it makes sense like this. Thank you 🎩

@m90 m90 removed the pr welcome label Dec 24, 2022
@DevSecNinja
Copy link
Author

Really cool, thanks so much @m90! PR looks good, but I'm not a Go expert! :) Made a small comment on the PR.

With regards to authentication, I always like to use my managed identities. I have them on my Azure VMs and on my lab VMs via Azure Arc. A managed identity basically allows you to authenticate to resources without a password (and is therefore often times safer). With an RBAC role, you can just give read/write access to the managed identity of the VM and the application can easily authenticate via azidentity.NewDefaultAzureCredential(nil): https://learn.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication?tabs=powershell

If you're interested in implementing this, you could think of running azidentity.NewDefaultAzureCredential(nil) if no Azure Storage account key was provided. In that case, the Go client will try the options defined in the Learn article I shared earlier, including managed identity auth.

@m90
Copy link
Member

m90 commented Dec 25, 2022

If you're interested in implementing this, you could think of running azidentity.NewDefaultAzureCredential(nil) if no Azure Storage account key was provided.

This sounds great and is exactly why I was asking. I'll implement it like this.

@DevSecNinja
Copy link
Author

If you're interested in implementing this, you could think of running azidentity.NewDefaultAzureCredential(nil) if no Azure Storage account key was provided.

This sounds great and is exactly why I was asking. I'll implement it like this.

Awesome, thanks! If you could push out a prerelease Docker image, I’m happy to give it a spin in my lab before releasing it?

@m90
Copy link
Member

m90 commented Dec 25, 2022

There now is v2.24.0-pre.1 on Docker Hub.

If you find the time to give this a test drive, it'd be much appreciated. Updated documentation is here: https://github.com/offen/docker-volume-backup/blob/azure-storage/README.md

@DevSecNinja
Copy link
Author

DevSecNinja commented Dec 25, 2022

There now is v2.24.0-pre.1 on Docker Hub.

If you find the time to give this a test drive, it'd be much appreciated. Updated documentation is here: https://github.com/offen/docker-volume-backup/blob/azure-storage/README.md

That's awesome, thanks for the Christmas present! 😄 I'll test the following soon:

  • Storage Account via public endpoint (via AZURE_STORAGE_ACCOUNT_NAME)
  • Storage Account via private endpoint (via AZURE_STORAGE_ENDPOINT)
  • Storage Account with Access Key
  • Storage Account with Managed Identity

Let me know if I'm missing a test case!

Storage Account via private endpoint (via AZURE_STORAGE_ENDPOINT)

  1. Noticed in testing with AZURE_STORAGE_ENDPOINT that if the storage account cannot be accessed for whatever reason, the email log (with info notification level) didn't tell me that the blob upload failed. Is that as expected?
  2. AZURE_STORAGE_ENDPOINT seems to require a slash at the end. That's not very clear from the docs and I would suggest to either throw an error in the log if the string doesn't end with a slash or just add the slash yourself in the code.
  3. AZURE_STORAGE_ENDPOINT doesn't seem to work with public (xyz..blob.core.windows.net) and private (privatelink.blob.core.windows.net) endpoints. Not sure why but the log doesn't tell me anything:
Attaching to backup
backup                | Installing cron.d entry with expression @daily.
backup                | crond: crond (busybox 1.35.0) started, log level 8
backup                | Starting cron in foreground.
time="2022-12-25T15:08:18Z" level=info msg="Created backup of `/backup` at `/tmp/backup-2022-12-25T15-05-14.tar.gz`."
time="2022-12-25T15:08:46Z" level=info msg="Encrypted backup using given passphrase, saving as `/tmp/backup-2022-12-25T15-05-14.tar.gz.gpg`."
time="2022-12-25T15:09:02Z" level=info msg="Removed tar file `/tmp/backup-2022-12-25T15-05-14.tar.gz`."
time="2022-12-25T15:09:02Z" level=info msg="Removed GPG file `/tmp/backup-2022-12-25T15-05-14.tar.gz.gpg`."
time="2022-12-25T15:09:04Z" level=info msg="Finished running backup tasks."

@m90
Copy link
Member

m90 commented Dec 26, 2022

Noticed in testing with AZURE_STORAGE_ENDPOINT that if the storage account cannot be accessed for whatever reason, the email log (with info notification level) didn't tell me that the blob upload failed. Is that as expected?

Hmm, this is kind of mysterious. If something fails, there should be an error and a non-zero exit. No errors are swallowed in the code and I've seen it fail a lot in development, too. I need to dig into the code of the SDK to see if there are any conditions that could cause a failing upload while still not returning an error.

When I deliberately break the test case by passing a bad account name, I will get the following log output:

➜  azure git:(azure-storage) ✗ TEST_VERSION=v2.24.0-pre.1 docker-compose up -d
Recreating azure_backup_1 ... 
azure_storage_1 is up-to-date
Recreating azure_backup_1 ... done
Starting azure_az_cli_1   ... done
➜  azure git:(azure-storage) ✗ docker-compose exec backup backup              
time="2022-12-26T08:09:39Z" level=info msg="Stopping 1 container(s) labeled `docker-volume-backup.stop-during-backup=true` out of 4 running container(s)."
time="2022-12-26T08:09:39Z" level=info msg="Created backup of `/backup` at `/tmp/test.tar.gz`."
time="2022-12-26T08:09:40Z" level=info msg="Restarted 1 container(s) and the matching service(s)."
time="2022-12-26T08:09:40Z" level=error msg="Fatal error running backup: copyArchive: error copying archive: (*azureBlobStorage).Copy: error uploading file /tmp/test.tar.gz: PUT http://storage:10000/devstoreaccount1xxx/test-container/path/to/backup/test.tar.gz\n--------------------------------------------------------------------------------\nRESPONSE 400: 400 Invalid storage account.\nERROR CODE: InvalidOperation\n--------------------------------------------------------------------------------\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n<Error>\n  <Code>InvalidOperation</Code>\n  <Message>Invalid storage account.\nRequestId:cb1979da-5926-4c55-bde8-4d82451cab96\nTime:2022-12-26T08:09:40.565Z</Message>\n</Error>\n--------------------------------------------------------------------------------\n"
time="2022-12-26T08:09:40Z" level=info msg="Removed tar file `/tmp/test.tar.gz`."

so I am not sure why you are seeing a successful exit right now.

AZURE_STORAGE_ENDPOINT seems to require a slash at the end. That's not very clear from the docs and I would suggest to either throw an error in the log if the string doesn't end with a slash or just add the slash yourself in the code.

Auto-appending a slash if needed probably makes sense.

AZURE_STORAGE_ENDPOINT doesn't seem to work with public (xyz..blob.core.windows.net) and private (privatelink.blob.core.windows.net) endpoints. Not sure why but the log doesn't tell me anything:

The endpoint needs to include a protocol, e.g. https://xyz.blob.core.windows.net/. Did you pass the values like this?

@DevSecNinja
Copy link
Author

Apologies for my late reply, wanted to dive into it before replying.

Hmm strange, I'm not seeing those logs. Could it be that for AZURE_STORAGE_ENDPOINT, also AZURE_STORAGE_ACCOUNT_NAME needs to be set? I assumed that when I set AZURE_STORAGE_ENDPOINT I don't need to set AZURE_STORAGE_ACCOUNT_NAME but that might be wrong.

The endpoint needs to include a protocol, e.g. https://xyz.blob.core.windows.net/. Did you pass the values like this?

Yeah, I set it like this: AZURE_STORAGE_ENDPOINT=https://MyAccountName.privatelink.blob.core.windows.net/ and I've also tested with AZURE_STORAGE_ENDPOINT="https://MyAccountName.blob.core.windows.net/"

I'm getting an interesting error now that I'm using the AZURE_STORAGE_ENDPOINT in combination with the AZURE_STORAGE_ACCOUNT_NAME:

time="2022-12-30T16:25:13Z" level=error msg="Fatal error running backup: copyArchive: error copying archive: (*azureBlobStorage).Copy: error uploading file /tmp/backup-2022-12-30T16-21-39.tar.gz.gpg: parse \"\\\"https://MyAccountName.blob.core.windows.net\\\"/stct-mycontainername/backup-2022-12-30T16-21-39.tar.gz.gpg\": first path segment in URL cannot contain colon"

@m90
Copy link
Member

m90 commented Dec 30, 2022

Could it be that for AZURE_STORAGE_ENDPOINT, also AZURE_STORAGE_ACCOUNT_NAME needs to be set?

Yes, for the version you are testing against, AZURE_STORAGE_ACCOUNT_NAME always needs to be set if you want to target Azure Blob Storage.

I'm getting an interesting error now that I'm using the AZURE_STORAGE_ENDPOINT in combination with the AZURE_STORAGE_ACCOUNT_NAME

The double quoted string in the error message makes me wonder if you're running into the bane of this issue tracker: docker/compose#2854

Assuming you are using Docker compose, could you try defining your environment variables in object notation instead, i.e. AZURE_STORAGE_ENDPOINT: https://MyAccountName.privatelink.blob.core.windows.net/

In case this is not causing the error: do you set a value for AZURE_STORAGE_PATH?

@m90
Copy link
Member

m90 commented Jan 11, 2023

@DevSecNinja Do you think you'll find the time to give this another test soon? In case you're too busy, that's no problem (it really is no problem, please do not feel obliged to do anything), I'd just like to wrap up that PR at some point. Considering the test setup against Azurite works, I'd be fine with merging this and seeing if people encounter issues in the wild, but if you still would like to test it, it'd be much appreciated as well.

@DevSecNinja
Copy link
Author

@DevSecNinja Do you think you'll find the time to give this another test soon? In case you're too busy, that's no problem (it really is no problem, please do not feel obliged to do anything), I'd just like to wrap up that PR at some point. Considering the test setup against Azurite works, I'd be fine with merging this and seeing if people encounter issues in the wild, but if you still would like to test it, it'd be much appreciated as well.

Thanks @m90, understood! I think it's best to merge it and I'll test it afterwards as well. A new semester for my study program just started and I'll try to find some time for it!

@m90 m90 closed this as completed in #171 Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants