You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 3, 2022. It is now read-only.
If you have pushed a subdirectory in your datasets folder, then rename it locally and push again, all files will be reuploaded. It seems to me that through aws mv it should be possible to avoid this.
The text was updated successfully, but these errors were encountered:
I was reimplementing aws sync with plain boto3 and I can say I understand why aws sync does not do this. The only way to prevent re-upload: 1. Keep in-memory what files are present locally and then what files are on S3 2. Do a big diff on paths and metadata to deduce what has been changed
This is a very expensive process, and given that we expect our users to use Nimbo with datasets that could potentially have millions of files - undesirable.
aws mv could indeed do a move, but we would still need to do a diff to deduce when to use aws mv.
As a workaround, it might be worth providing a utility to rename part of the path from Nimbo CLI. If we get more issues with renaming, we'll add it. If you have a specific use-case where you are frequently renaming directories, and this issue was not a one-off, let us know.
EDIT: I did a back of the envelope calculation that 100,000,000 file_names would only take about 1GB RAM, and given the current re-implementation of S3 push/pull actually loads all differences in memory, this is possible, and is going to be addressed with #32.
If you have pushed a subdirectory in your datasets folder, then rename it locally and push again, all files will be reuploaded. It seems to me that through
aws mv
it should be possible to avoid this.The text was updated successfully, but these errors were encountered: