azureblob: allow working with only list+create/write permissions (fix no_head_object/add no_read_for_metadata) #7813

moqmar · 2024-04-30T06:10:59Z

What is the purpose of this change?

It is now possible to use an azureblob storage without "read" permissions, by getting the metadata from a "list" operation instead. Also, --no-head works now also with a prefix set on the remote.

There are three reasons an option why one would avoid HEAD requests on azureblob, with this now mainly implementing the latter one:

no_head_object: Don't do a HEAD before writing to (or reading from) an object, to reduce the number of transactions and thus increase performance. This already exists in azureblob, a small issue was fixed here where using a prefix would still require a HEAD request to the prefix itself.
no_head: Don't do a HEAD after writing to an object, for the same reason but at the cost of "increasing the chance for undetected upload failures". This is still not implemented for azureblob yet and quite complex, see s3.Object.Update()!
no_read_for_metadata: Instead of doing a HEAD to get metadata, use a list operation, to avoid requiring READ permissions for immutable/append-only applications (this seems to be quite azureblob-specific) - one could also use both no_head_object and no_head together to achieve this, but then uploads wouldn't be verified. This is the main focus of this PR.

Was the change discussed in an issue or in the forum before?

This fixes #6162 and #7027. The issue with --no-head and a prefix wasn't discussed before AFAIK.

Checklist

I have read the contribution guidelines.
I have added tests for all changes in this PR if appropriate.
I have added documentation for the changes if appropriate.
All commit messages are in house style.
I'm done, this Pull Request is ready for review :-)

… no_head_object/add no_read_for_metadata) There are three reasons an option why one would avoid HEAD requests on azureblob, with this now mainly implementing the latter one: - no_head_object: Don't do a HEAD *before* writing to (or reading from) an object, to reduce the number of transactions and thus increase performance. **This already exists in azureblob, a small issue was fixed here where using a prefix would still require a HEAD request to the prefix itself.** - no_head: Don't do a HEAD *after* writing to an object, for the same reason but at the cost of "increasing the chance for undetected upload failures". **This is still not implemented for azureblob yet and quite complex, see s3.Object.Update()!** - no_read_for_metadata: Instead of doing a HEAD to get metadata, use a list operation, to avoid requiring READ permissions for immutable/append-only applications (this seems to be quite azureblob-specific) - one could also use both no_head_object and no_head together to achieve this, but then uploads wouldn't be verified. Fixes rclone#6162 and rclone#7027

ncw

I think this change looks good.

I put a couple of queries inline I wasn't sure about.

Thank you :-)

ncw · 2024-05-13T17:59:18Z

backend/azureblob/azureblob.go

+	}
+}
+
+// readMetaData gets the metadata using a ListBlobsFlatPager, to avoid requiring read access on the blob itself, and is used by Fs.readMetaData if no_read_for_metadata is set


We could be setting MaxResults in container.ListBlobsFlatOptions to limit it to 1 result which might be a good idea. What do you think?

The issue is if there's a file hello and a file helloworld.txt, there's a chance that requesting the metadata for hello returns the metadata for helloworld.txt first (as the former is a prefix of the latter).

Ah wait, there's this:

Blob names are returned in lexicographic order. For more information, see https://docs.microsoft.com/rest/api/storageservices/list-blobs.

(from https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob#Client.NewListBlobsFlatPager)

So, limiting to 1 result should be possible.

If we can limit it, that would be good.

ncw · 2024-05-13T18:00:36Z

backend/azureblob/azureblob.go

+
+		for _, currentBlob := range blobs.ListBlobsFlatSegmentResponse.Segment.BlobItems {
+
+			if *currentBlob.Name != containerPath {


This needs a bit of Encoding - *currentBlob.Name will be in backend encoding where containerPath is in Standard encoding I think.

ncw · 2024-05-16T17:04:12Z

Let me know when you want me to take another look.

ncw reviewed May 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

azureblob: allow working with only list+create/write permissions (fix no_head_object/add no_read_for_metadata) #7813

azureblob: allow working with only list+create/write permissions (fix no_head_object/add no_read_for_metadata) #7813

moqmar commented Apr 30, 2024

ncw left a comment

ncw May 13, 2024

moqmar May 16, 2024

moqmar May 16, 2024

ncw May 16, 2024

ncw May 13, 2024

ncw commented May 16, 2024


		for _, currentBlob := range blobs.ListBlobsFlatSegmentResponse.Segment.BlobItems {

		if *currentBlob.Name != containerPath {

azureblob: allow working with only list+create/write permissions (fix no_head_object/add no_read_for_metadata) #7813

Are you sure you want to change the base?

azureblob: allow working with only list+create/write permissions (fix no_head_object/add no_read_for_metadata) #7813

Conversation

moqmar commented Apr 30, 2024

What is the purpose of this change?

Was the change discussed in an issue or in the forum before?

Checklist

ncw left a comment

Choose a reason for hiding this comment

ncw May 13, 2024

Choose a reason for hiding this comment

moqmar May 16, 2024

Choose a reason for hiding this comment

moqmar May 16, 2024

Choose a reason for hiding this comment

ncw May 16, 2024

Choose a reason for hiding this comment

ncw May 13, 2024

Choose a reason for hiding this comment

ncw commented May 16, 2024