Renaming a subdirectory in datasets and pushing reuploads all files #16

andrenmelo · 2021-05-07T08:07:14Z

If you have pushed a subdirectory in your datasets folder, then rename it locally and push again, all files will be reuploaded. It seems to me that through aws mv it should be possible to avoid this.

The text was updated successfully, but these errors were encountered:

jozuas · 2021-06-06T15:17:45Z

~~Wont fix.~~

~~I was reimplementing aws sync with plain boto3 and I can say I understand why aws sync does not do this. The only way to prevent re-upload:~~
~~1. Keep in-memory what files are present locally and then what files are on S3~~
~~2. Do a big diff on paths and metadata to deduce what has been changed~~

~~This is a very expensive process, and given that we expect our users to use Nimbo with datasets that could potentially have millions of files - undesirable.~~

~~aws mv could indeed do a move, but we would still need to do a diff to deduce when to use aws mv.~~

As a workaround, it might be worth providing a utility to rename part of the path from Nimbo CLI. If we get more issues with renaming, we'll add it. If you have a specific use-case where you are frequently renaming directories, and this issue was not a one-off, let us know.

EDIT: I did a back of the envelope calculation that 100,000,000 file_names would only take about 1GB RAM, and given the current re-implementation of S3 push/pull actually loads all differences in memory, this is possible, and is going to be addressed with #32.

jozuas added the enhancement New feature or request label May 7, 2021

jozuas self-assigned this Jun 3, 2021

jozuas mentioned this issue Jun 3, 2021

Replace AWSCLI usage with boto3 implementation #32

Closed

jozuas closed this as completed Jun 6, 2021

jozuas reopened this Jun 6, 2021

jozuas added the wontfix This will not be worked on label Dec 12, 2021

jozuas removed their assignment Dec 12, 2021

jozuas closed this as completed Dec 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Renaming a subdirectory in datasets and pushing reuploads all files #16

Renaming a subdirectory in datasets and pushing reuploads all files #16

andrenmelo commented May 7, 2021

jozuas commented Jun 6, 2021 •

edited

Loading

Renaming a subdirectory in datasets and pushing reuploads all files #16

Renaming a subdirectory in datasets and pushing reuploads all files #16

Comments

andrenmelo commented May 7, 2021

jozuas commented Jun 6, 2021 • edited Loading

jozuas commented Jun 6, 2021 •

edited

Loading