Skip to content
This repository has been archived by the owner on Jan 3, 2022. It is now read-only.

Renaming a subdirectory in datasets and pushing reuploads all files #16

Closed
andrenmelo opened this issue May 7, 2021 · 1 comment
Closed
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@andrenmelo
Copy link

If you have pushed a subdirectory in your datasets folder, then rename it locally and push again, all files will be reuploaded. It seems to me that through aws mv it should be possible to avoid this.

@jozuas jozuas added the enhancement New feature or request label May 7, 2021
@jozuas jozuas self-assigned this Jun 3, 2021
@jozuas
Copy link
Contributor

jozuas commented Jun 6, 2021

Wont fix.

I was reimplementing aws sync with plain boto3 and I can say I understand why aws sync does not do this. The only way to prevent re-upload:
1. Keep in-memory what files are present locally and then what files are on S3
2. Do a big diff on paths and metadata to deduce what has been changed

This is a very expensive process, and given that we expect our users to use Nimbo with datasets that could potentially have millions of files - undesirable.

aws mv could indeed do a move, but we would still need to do a diff to deduce when to use aws mv.

As a workaround, it might be worth providing a utility to rename part of the path from Nimbo CLI. If we get more issues with renaming, we'll add it. If you have a specific use-case where you are frequently renaming directories, and this issue was not a one-off, let us know.

EDIT: I did a back of the envelope calculation that 100,000,000 file_names would only take about 1GB RAM, and given the current re-implementation of S3 push/pull actually loads all differences in memory, this is possible, and is going to be addressed with #32.

@jozuas jozuas closed this as completed Jun 6, 2021
@jozuas jozuas reopened this Jun 6, 2021
@jozuas jozuas added the wontfix This will not be worked on label Dec 12, 2021
@jozuas jozuas removed their assignment Dec 12, 2021
@jozuas jozuas closed this as completed Dec 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants