Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign implicit names to nameless AzureBlob objects #2973

Open
hmlkao opened this issue Feb 13, 2019 · 18 comments
Open

Assign implicit names to nameless AzureBlob objects #2973

hmlkao opened this issue Feb 13, 2019 · 18 comments

Comments

@hmlkao
Copy link

hmlkao commented Feb 13, 2019

Azure supports nameless "folders" but rclone can not handle them.

For example, commands

$ echo asdf > ~/Downloads/file.txt
$ az storage blob upload -c registry -f ~/Downloads/file.txt -n /
$ az storage blob upload -c registry -f ~/Downloads/file.txt -n test/
$ az storage blob upload -c registry -f ~/Downloads/file.txt -n test/file.txt

create blobs

  • <no name>/<no name>
  • test/<no name>
  • test/file.txt
$ rclone -vv ls Azure:registry
2019/02/13 15:01:57 DEBUG : rclone: Version "v1.46" starting with parameters ["rclone" "-vv" "ls" "Azure:registry"]
2019/02/13 15:01:57 DEBUG : Using config file from "/home/centos/.config/rclone/rclone.conf"
2019/02/13 15:01:57 DEBUG : pacer: Reducing sleep to 0s
2019/02/13 15:01:57 ERROR : : Entry doesn't belong in directory "" (same as directory) - ignoring
2019/02/13 15:01:57 ERROR : test/: Entry doesn't belong in directory "test" (same as directory) - ignoring
        5 test/file.txt
2019/02/13 15:01:57 DEBUG : 4 go routines active
2019/02/13 15:01:57 DEBUG : rclone: Version "v1.46" finishing with parameters ["rclone" "-vv" "ls" "Azure:registry"]

So only the one blob is for rclone visiblible

@ncw
Copy link
Member

ncw commented Feb 13, 2019

Rclone will treat any blob ending in / as a directory marker, so it will appear as a directory in rclone listings. This is a widely used convention.

@hmlkao
Copy link
Author

hmlkao commented Feb 13, 2019

Well but if remote storage is able (from whatever reason) to create nameless blobs I thing that rclone should be able to work with them anyway.

If I crate by commands

$ az storage blob upload -c registry -f ~/Downloads/file.txt -n //test////file.txt
$ az storage blob upload -c registry -f ~/Downloads/file.txt -n aaa/bbb/ccc/ddd/

this folder structure:

<no name>/<no name>/test/<no name>/<no name>/<no name>/file.txt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~
                 real "folders" on Azure                 real blob on Azure with own ETAG

aaa/bbb/ccc/ddd/<no name>
~~~~~~~~~~~~~~~ ~~~~~~~~~
       \           real blob on Azure with own ETAG
  real "folders" on Azure

These blobs have own ETAGs and URLs

https://<storage_account>.blob.core.windows.net/<container>///test////file.txt
https://<storage_account>.blob.core.windows.net/<container>/aaa/bbb/ccc/ddd/

but rclone cannot handle the blobs because are nameless or are inside nameless folder.

@ncw
Copy link
Member

ncw commented Feb 13, 2019

Azureblob doesn't really have a concept of folders, it is a key / value store.

By convention we use the unix file system paths path/to/file.txt but azureblob just sees that as a string. You could equally well use Windows paths path\to\file.txt and azureblob wouldn't care.

Rclone however has to deal with actual file systems like the one on your disk which does have limitations in that string. So if you can't have a file called file.txt/ then rclone won't be able to deal with a blob called that.

This doesn't usually cause a problem though.

What problem is it causing you?

@hmlkao
Copy link
Author

hmlkao commented Feb 14, 2019

Ok, I know that the azureblob is key/value storage but Azure WebUI uses folder term so I used it too.

Let's define some general terms to be sure that we talk about the same things:

  • file - file on workstation filesystem
  • folder - folder on workstation filesystem
  • blob - datafile on Azure
  • "folder" - structure item on Azure

The core of this problem is that on Azure you can store a blob called file.txt/ but it will point to the nameless blob <no name> in "folder" file.txt (file.txt/<no name>).

Well, I just find out that the rclone could create nameless "folders", eg. command:
rclone -vv copy ./file Azure:registry///asdf///fdsa
creates blob on Azure in path
asdf/<no name>/<no name>/fdsa/file
I have expected that it will be stored in path (so it would solve my problem)
<no name>/<no name>/asdf/<no name>/<no name>/fdsa/file

Another example could be that command
rclone -vv copy ./file Azure:registry///asdf///fdsa/
would create blob
<no name>/<no name>/asdf/<no name>/<no name>/fdsa/<no name>

From the other side command
rclone -vv copy Azure:registry///asdf///fdsa/ ./file
should process blob in path
<no name>/<no name>/asdf/<no name>/<no name>/fdsa/<no name>
on Azure.

I assume this behavior should be switchable by flag for backward compatibility, eg. --azureblob-allow-nameless.

The backgroud of this problem is that I want use Azure driver for storage for docker-registry. But it erroneously sends data to Azure with two slashes '//' instead of '/' (distribution/distribution#1247) so it creates nameless root "folder" on Azure and I want to use rclone for migration from Amazon S3 to Azure Blob Storage and it fails on this issue because I am not able to create nameless root "folder" by rclone.

@ncw
Copy link
Member

ncw commented Feb 14, 2019

The backgroud of this problem is that I want use Azure driver for storage for docker-registry. But it erroneously sends data to Azure with two slashes '//' instead of '/' (docker/distribution#1247) so it creates nameless root "folder" on Azure and I want to use rclone for migration from Amazon S3 to Azure Blob Storage and it fails on this issue because I am not able to create nameless root "folder" by rclone.

Can you paste the error messages you get please?

@hmlkao
Copy link
Author

hmlkao commented Feb 14, 2019

It doesn't show any error message.

The problem is that I use docker-registry with S3 storage driver where are blobs stored with docker/registry/v2/... path while with Azure storage driver are blobs stored to nameless root folder
<no name>/docker/registry/v2/... due to the mentioned bug with double slashes.

I would like to migrate blobs form AWS S3 to Azure with rclone but rclone is not able to create nameless root "folder". This is the main problem. In general it would be great if the rclone could create nameless root "folder" when it is possible in Azure and this is the reason why I have opened this issue.

@ncw
Copy link
Member

ncw commented Feb 26, 2019

I would like to migrate blobs form AWS S3 to Azure with rclone but rclone is not able to create nameless root "folder". This is the main problem. In general it would be great if the rclone could create nameless root "folder" when it is possible in Azure and this is the reason why I have opened this issue.

It would be fairly easy to add an option to force a leading / on object keys. Would that be helpful?

@hmlkao
Copy link
Author

hmlkao commented Feb 26, 2019

It probably doesn't solve the problem with rclone ls mentioned in the first comment but yes it would be helpful with the migration problem (if I could handle nameless root "folder" on Azure with leading /).

@micku
Copy link

micku commented Jun 10, 2019

I have the same (or at least very similar) problem with GCS, we have the opportunity to create something like gs://bucketname////file-with-slashes, I see these with rclone ls, but are ignored with rclone sync with error "ERROR : : Entry doesn't belong in directory "" (same as directory) - ignoring".
This creates 3 slash folders and the file inside.
If you want/need I can open a new issue since this is a different provider.

This is a problem because we can't control the path creation, but we should backup everything.

@ncw
Copy link
Member

ncw commented Jun 11, 2019

This is a problem because we can't control the path creation, but we should backup everything.

Where are you backing it up to? You can't store a file like ///file-with-slashes on a local disk, or anything except a key,value store like s3,gcs,swift etc

@micku
Copy link

micku commented Jun 14, 2019

We are giving the choice to do backups to ec3, gcp or ftp through a web interface.
I think we will add an alert when source is an object-value and destination is ftp.

@ncw
Copy link
Member

ncw commented Jun 14, 2019

We are giving the choice to do backups to ec3, gcp or ftp through a web interface.
I think we will add an alert when source is an object-value and destination is ftp.

The best way of fixing this would be rclone's filename mangling scheme which is in progress at the moment... - what do you think @B4dM4n ?

@ivandeex
Copy link
Member

ivandeex commented Feb 9, 2021

@hmlkao
This bug is reported against an obsolete version of rclone. Can you reproduce it with rclone 1.54?

@hmlkao
Copy link
Author

hmlkao commented Feb 14, 2021

Well, it seems the main problem was solved.

Prerequisites

$ echo asdf > file.txt
$ az storage blob upload -c rclonetestcont -f file.txt -n /
$ az storage blob upload -c rclonetestcont -f file.txt -n test/
$ az storage blob upload -c rclonetestcont -f file.txt -n test/file.txt
$ az storage blob upload -c rclonetestcont -f file.txt -n ///test//aaaa.txt

rclone ls

Working well in v1.54

$ rclone -vv ls Azure:rclonetestcont
2021/02/14 01:02:01 DEBUG : rclone: Version "v1.54.0" starting with parameters ["rclone" "-vv" "ls" "Azure:rclonetestcont"]
2021/02/14 01:02:01 DEBUG : Using config file from "$HOME/.config/rclone/rclone.conf"
2021/02/14 01:02:01 DEBUG : Creating backend with remote "Azure:rclonetestcont"
        5 /
        5 ///test//aaaa.txt
        5 test/
        5 test/file.txt
2021/02/14 01:02:01 DEBUG : 4 go routines active

rclone sync

$ rclone -vv sync Azure:rclonetestcont bbbb
2021/02/14 01:03:10 DEBUG : rclone: Version "v1.54.0" starting with parameters ["rclone" "-vv" "sync" "Azure:rclonetestcont" "bbbb"]
2021/02/14 01:03:10 DEBUG : Using config file from "$HOME/.config/rclone/rclone.conf"
2021/02/14 01:03:10 DEBUG : Creating backend with remote "Azure:rclonetestcont"
2021/02/14 01:03:10 DEBUG : Creating backend with remote "bbbb"
2021/02/14 01:03:10 DEBUG : fs cache: renaming cache item "bbbb" to be canonical "$HOME/rclone-azure-infra/bbbb"
2021/02/14 01:03:10 ERROR : : Entry doesn't belong in directory "" (same as directory) - ignoring
2021/02/14 01:03:10 ERROR : test/: Entry doesn't belong in directory "test" (same as directory) - ignoring
2021/02/14 01:03:10 DEBUG : Local file system at $HOME/rclone-azure-infra/bbbb: Waiting for checks to finish
2021/02/14 01:03:10 DEBUG : Local file system at $HOME/rclone-azure-infra/bbbb: Waiting for transfers to finish
2021/02/14 01:03:10 DEBUG : preAllocate: got error on fallocate, trying combination 1/2: operation not supported
2021/02/14 01:03:10 DEBUG : preAllocate: got error on fallocate, trying combination 2/2: operation not supported
2021/02/14 01:03:10 DEBUG : test/file.txt: MD5 = 2b00042f7481c7b056c4b410d28f33cf OK
2021/02/14 01:03:10 INFO  : test/file.txt: Copied (new)
2021/02/14 01:03:10 DEBUG : Waiting for deletions to finish
2021/02/14 01:03:10 INFO  : 
Transferred:             5 / 5 Bytes, 100%, 59 Bytes/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:         0.7s

2021/02/14 01:03:10 DEBUG : 4 go routines active
$ ll bbbb/
total 12
drwxrwxr-x 3 ondra ondra 4096 úno 14 01:03 .
drwxr-xr-x 5 ondra ondra 4096 úno 14 01:03 ..
drwxrwxr-x 2 ondra ondra 4096 úno 14 01:03 test

$ ll bbbb/test/
total 20
drwxrwxr-x 2 ondra ondra 4096 úno 14 01:03 .
drwxrwxr-x 3 ondra ondra 4096 úno 14 01:03 ..
-rw-rw-r-- 1 ondra ondra    5 úno 13 18:20 file.txt

It may be helpful when rclone sync could synchronize nameless blobs and "folders" to filesystem (only file.txt is synchronized now). Each blob has own unique etag.
I understand that files or folders on filesystem have to have name but it could be solved eg. by some reserved pattern(s) like <noname> (or <nodir> and <nofile>) instead of name of file or folder for nameless blobs and "folders".

Versions

$ az version
{
  "azure-cli": "2.19.1",
  "azure-cli-core": "2.19.1",
  "azure-cli-telemetry": "1.0.6",
  "extensions": {}
}
$ rclone version
rclone v1.54.0
- os/arch: linux/amd64
- go version: go1.15.7

@ivandeex ivandeex changed the title Nameless files and folders Handle nNameless files and folders Feb 14, 2021
@ivandeex ivandeex changed the title Handle nNameless files and folders Assign implicit names to nameless AzureBlog objects Feb 14, 2021
@ivandeex ivandeex changed the title Assign implicit names to nameless AzureBlog objects Assign implicit names to nameless AzureBlob objects Feb 14, 2021
@ivandeex
Copy link
Member

Got you. At least we have some progress.
BTW, can you do golang?

@hmlkao
Copy link
Author

hmlkao commented Feb 14, 2021

A little, I'm not sure if my skills could produce PR 😃

@ivandeex
Copy link
Member

We have 600 requests and a dozen hands in rclone.
If you could help it'd be awesome.
https://github.com/rclone/rclone/blob/master/CONTRIBUTING.md might give you some taste.
Happy Valentine!

@ivandeex
Copy link
Member

FYI the #4412 (comment) provides some general ideas on using a truncated ID for generating ephemeral object names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants