Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azureblob: allow working with only list+create/write permissions (fix no_head_object/add no_read_for_metadata) #7813

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

moqmar
Copy link

@moqmar moqmar commented Apr 30, 2024

What is the purpose of this change?

It is now possible to use an azureblob storage without "read" permissions, by getting the metadata from a "list" operation instead. Also, --no-head works now also with a prefix set on the remote.

There are three reasons an option why one would avoid HEAD requests on azureblob, with this now mainly implementing the latter one:

  • no_head_object: Don't do a HEAD before writing to (or reading from) an object, to reduce the number of transactions and thus increase performance. This already exists in azureblob, a small issue was fixed here where using a prefix would still require a HEAD request to the prefix itself.
  • no_head: Don't do a HEAD after writing to an object, for the same reason but at the cost of "increasing the chance for undetected upload failures". This is still not implemented for azureblob yet and quite complex, see s3.Object.Update()!
  • no_read_for_metadata: Instead of doing a HEAD to get metadata, use a list operation, to avoid requiring READ permissions for immutable/append-only applications (this seems to be quite azureblob-specific) - one could also use both no_head_object and no_head together to achieve this, but then uploads wouldn't be verified. This is the main focus of this PR.

Was the change discussed in an issue or in the forum before?

This fixes #6162 and #7027. The issue with --no-head and a prefix wasn't discussed before AFAIK.

Checklist

  • I have read the contribution guidelines.
  • I have added tests for all changes in this PR if appropriate.
  • I have added documentation for the changes if appropriate.
  • All commit messages are in house style.
  • I'm done, this Pull Request is ready for review :-)

… no_head_object/add no_read_for_metadata)

There are three reasons an option why one would avoid HEAD requests on azureblob, with this now mainly implementing the latter one:

- no_head_object: Don't do a HEAD *before* writing to (or reading from) an object, to reduce the number of transactions and thus increase performance. **This already exists in azureblob, a small issue was fixed here where using a prefix would still require a HEAD request to the prefix itself.**
- no_head: Don't do a HEAD *after* writing to an object, for the same reason but at the cost of "increasing the chance for undetected upload failures".  **This is still not implemented for azureblob yet and quite complex, see s3.Object.Update()!**
- no_read_for_metadata: Instead of doing a HEAD to get metadata, use a list operation, to avoid requiring READ permissions for immutable/append-only applications (this seems to be quite azureblob-specific) - one could also use both no_head_object and no_head together to achieve this, but then uploads wouldn't be verified.

Fixes rclone#6162 and rclone#7027
Copy link
Member

@ncw ncw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change looks good.

I put a couple of queries inline I wasn't sure about.

Thank you :-)

}
}

// readMetaData gets the metadata using a ListBlobsFlatPager, to avoid requiring read access on the blob itself, and is used by Fs.readMetaData if no_read_for_metadata is set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could be setting MaxResults in container.ListBlobsFlatOptions to limit it to 1 result which might be a good idea. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is if there's a file hello and a file helloworld.txt, there's a chance that requesting the metadata for hello returns the metadata for helloworld.txt first (as the former is a prefix of the latter).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait, there's this:

Blob names are returned in lexicographic order. For more information, see https://docs.microsoft.com/rest/api/storageservices/list-blobs.

(from https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob#Client.NewListBlobsFlatPager)

So, limiting to 1 result should be possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can limit it, that would be good.


for _, currentBlob := range blobs.ListBlobsFlatSegmentResponse.Segment.BlobItems {

if *currentBlob.Name != containerPath {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a bit of Encoding - *currentBlob.Name will be in backend encoding where containerPath is in Standard encoding I think.

@ncw
Copy link
Member

ncw commented May 16, 2024

Let me know when you want me to take another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move to Azure Blob without read permission fails on first try
2 participants