-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3-api permission issue #61
Comments
@denvdm have you tried Could you please try |
For me
|
I've solved my I've upgraded to tacl v3.3.1, but still got the same error. Then I've tried uploading to p697 - this worked, but for p33 it still failed. |
Hey Alex,
Looks indeed promising and no, I hadn’t tried it yet. I did one successful test run just now and seems all good. However… I now need to import ~5TB of UKB diffusion imaging data. Would tacl be up to this? i.e. can it handle such a load (7k zips, each about 700MB), is the default import dir an acceptable location for this, and how about speed (this is coming from NIRD)?
Thanks. Best, Dennis
… On 18 Jan 2021, at 16:40, Oleksandr Frei ***@***.***> wrote:
@denvdm <https://github.com/denvdm> have you tried tacl api?
https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/import-data-using-the-tsd-api.html <https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/import-data-using-the-tsd-api.html>
My feeling is that TSD team have a low-priority for maintaining tsd-s3cmd. I confirmed last week that it is officially supported, but they advice users to go for tacl whenever possible, and in the long term TSD may consider deprecating tsd-s3cmd. If there are things that tacl can't do then we can push for maintaining tsd-s3cmd, but I'd like to check if there is a real need for this.
Could you please try tacl next time you import/export, and tell us here if it doesn't cover your needs or is a step back compared to tsd-s3cmd?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#61 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACE3BEA2LGTRLFKCL3PWWOLS2RJAXANCNFSM4WEG57MQ>.
|
tacl worked very well, very pleasant and it was able to copy almost half (3k, >2TB) of the zips from NIRD to the import dir. However, then I got kicked out of NIRD; after reconnecting and attempting to restart the upload I received the below error. Of course I checked and the file does exist (with every attempt, it complains about a different file). Any idea what is going on here? Out of frustration I also tried the —session-delete mentioned in the previous mails, but I get the same error after re-authenticating.
Thanks for any insights. Best, Dennis
… On 20 Mar 2021, at 18:59, Dennis van der Meer ***@***.***> wrote:
Hey Alex,
Looks indeed promising and no, I hadn’t tried it yet. I did one successful test run just now and seems all good. However… I now need to import ~5TB of UKB diffusion imaging data. Would tacl be up to this? i.e. can it handle such a load (7k zips, each about 700MB), is the default import dir an acceptable location for this, and how about speed (this is coming from NIRD)?
Thanks. Best, Dennis
> On 18 Jan 2021, at 16:40, Oleksandr Frei ***@***.*** ***@***.***>> wrote:
>
>
> @denvdm <https://github.com/denvdm> have you tried tacl api?
> https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/import-data-using-the-tsd-api.html <https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/import-data-using-the-tsd-api.html>
> My feeling is that TSD team have a low-priority for maintaining tsd-s3cmd. I confirmed last week that it is officially supported, but they advice users to go for tacl whenever possible, and in the long term TSD may consider deprecating tsd-s3cmd. If there are things that tacl can't do then we can push for maintaining tsd-s3cmd, but I'd like to check if there is a real need for this.
>
> Could you please try tacl next time you import/export, and tell us here if it doesn't cover your needs or is a step back compared to tsd-s3cmd?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub <#61 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACE3BEA2LGTRLFKCL3PWWOLS2RJAXANCNFSM4WEG57MQ>.
>
|
@denvdm eh, too bad... what was the error? I don't see it attached. |
Ha! I figured it out: as per usual it was stupidity on my side, and your ’screen’ remark was the key... I did use ‘screen’, which you taught me a few years back (and it has been a lifesaver for these long copying jobs), so when I got kicked out of NIRD, the job of course continued. However, I didn’t think of this and tried to restart the still running process, which must be what is causing the error. In the meantime I checked the import dir and indeed the number of files is still growing. Just to satisfy any curiosity, I have attached the error screenshot as a file rather than the image I pasted earlier.
By the way, I think I just assumed the job crashed because it had done that already a few (3-4) times earlier; not really a big deal and those times it did simply continue when resubmitting the upload command.
![PastedGraphic-5](https://user-images.githubusercontent.com/9023632/112373372-2e76cd00-8ce1-11eb-9e36-a73d38e35a72.png)
… On 24 Mar 2021, at 19:51, Oleksandr Frei ***@***.***> wrote:
@denvdm <https://github.com/denvdm> eh, too bad... what was the error? I don't see it attached.
Also, do you use screen session? It's best to run sync within screen to make sure it survives a disconnect . But it's not an excuse for tacl not been able to resume the session - tacl should resume just fine, let's investigate why it doesn't
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#61 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACE3BEBDUYBYHIEI3WWDJJDTFIYCPANCNFSM4WEG57MQ>.
|
Re. the above, unfortunately, turns out I did not figure it out. The error in the end did not seem to be caused by trying to restart a running process. The transfer this time is definitely dead (number of files in import dir hasnt increased in hours) and when I try to start up tacl upload again (tacl p33 --upload 20250) I am still getting the same error, see pasted below (btw, that file is definitely present). Any thoughts? File "/nird/home/dennisva/.local/bin/tacl", line 11, in |
@denvdm Yeah, this is weird. The file definitely exists (I know where to look - checked just now), and "chmod" permissions are fine. I've noticed that 20250 folder within TSD had exactly 5000 files - sounds like some sort of limit. Could you please submit a ticket to TSD-drift? Add please add a link to this github ticket - here it's a good discussion.... I've tried removing one file (the text file with field lists), and then re-running the sync with
Finally, after ~15 minutes, |
@denvdm btw, for me tsd-s3cmd works - I guess it could be the quickest way to resolve your data transfer. Long term, let's push for a better tacl - it seem quite handy. If there is a limit of 5000 it's likely a quick fix, but I'm more concerned about slow perf in |
What behaviour do you want from the sync here? Do you want files that are removed locally (from NIRD) to be removed remotely (from TSD)? |
The 5000 limit sounds weird, and I cannot imagine where that would come from. I will try to reproduce it. If this is just a directory upload, and not a sync of a routinely changing directory, then If it is a sync (and you want local changes to propagate to the remote) then you need to explicitly enable caching so you get resume:
You'll be using |
@leondutoit, I indeed used tacl p33 --upload dir. I dont know what may be causing the error but as @ofrei indicated it does seem awfully coincidental it gets stuck at such a round number. Then again I already got an error a day earlier when it hadnt reached this number yet (see earlier messages). Re. the specific error message, as Alex also checked, the file definitely did exist. |
Ah ok, |
I can reproduce the
What's happing here is that the local cache contains all files in the directory when the upload starts, and removes them as they succeed. Then I cancel the upload, delete a file that has not been uploaded, and restart the upload. Now the missing local file is still listed in the cache, and when trying to upload it, it fails. |
Great, that makes a lot of sense, nicely solved! However, I did not delete any files in between the failed upload and the second attempt. And it seemed to be complaining about a different file missing at every attempt.
Anyway, thanks for looking into this in detail. Tacl in general does seem like a very good solution, and easier to get working than tsd-s3cmd.
… On 27 Mar 2021, at 18:32, Leon du Toit ***@***.***> wrote:
I can reproduce the FileNotFoundError like this:
ldt:~ leondutoit$ mkdir -p d3 && for i in `seq 1 10`; do mkfile 1k d3/$i.txt; done; tacl p11 --basic --upload d3
uploading directory d3
d3/10.txt |################################| 100%
d3/9.txt |################################| 100%
d3/8.txt |################################| 100%
d3/5.txt |################################| 100%
d3/4.txt |################################| 100%
^C
Aborted!
ldt:~ leondutoit$ rm d3/3.txt
ldt:~ leondutoit$ tacl p11 --basic --upload d3
uploading directory d3
resuming directory transfer from cache
d3/4.txt |################################| 100%
d3/6.txt |################################| 100%
d3/7.txt |################################| 100%
Traceback (most recent call last):
File "/usr/local/bin/tacl", line 33, in <module>
sys.exit(load_entry_point('tsd-api-client==3.3.1', 'console_scripts', 'tacl')())
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/tacl.py", line 518, in cli
File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/sync.py", line 286, in sync
File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/sync.py", line 586, in _transfer
File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/sync.py", line 427, in _transfer_local_to_remote
FileNotFoundError: [Errno 2] No such file or directory: 'd3/3.txt'
What's happing here is that the local cache contains all files in the directory when the upload starts, and removes them as they succeed. Then I cancel the upload, delete a file that has not been uploaded, and restart the upload. Now the missing local file is still listed in the cache, and when trying to upload it, it fails.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#61 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACE3BEFMBFMCRZJOQPRES3DTFYJCBANCNFSM4WEG57MQ>.
|
@ofrei I made a configuration change on the server, which sped up the scanning part of my sync of 5001 files from 2min to 9sec. |
I'll assume this is not an issue anymore, but if so just ping me here. |
@leondutoit Thank you for fixing this! I'm busy (major grant deadline this Thursday), will re-test |
Done for now, and I don’t see anything major coming up this month at least. TACL worked well (apart from that little hiccup)!
… On 6 Apr 2021, at 20:41, Oleksandr Frei ***@***.***> wrote:
@leondutoit <https://github.com/leondutoit> Thank you for fixing this! Major grant deadline this Thursday, I'll re-test on Friday.
@denvdm <https://github.com/denvdm> are you continue some of large-scale transfers, or already done by now?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#61 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACE3BEDKCB27HQGLXQUTTLLTHNIV7ANCNFSM4WEG57MQ>.
|
@ofrei the latest release of tacl: https://pypi.org/project/tsd-api-client/3.4.0/ includes better sync performance, better Windows support, and some other small improvements, feel free to give it a go |
@leondutoit Upgraded to tacl 3.4.0 - now testing sync of github repo. From time to time it gives the following error:
|
@ofrei Ah yes, I saw that too. It is a config option on the server side which has to allow deletion of files, which I forgot to set. Will do later today. |
@leondutoit Thank you! Happy to re-test when this is ready. |
The order of the upload? Why is this a problem? |
It's not a big problem, but any non-deterministic behaviour is less user-friendly than when things happen in well determined order. |
I already explained the delete issue. As for "non-deterministic order" this is how python returns directory entries:
Note The list is in arbitrary order. I'm not going to sacrifice performance and memory usage to force a certain order. |
ok, I see! Thanks.
…On Wed, Apr 21, 2021 at 12:11 PM Leon du Toit ***@***.***> wrote:
I already explained the delete issue. As for "non-deterministic order"
this is how python returns directory entries:
In [81]: os.listdir?
Signature: os.listdir(path=None)
Docstring:
Return a list containing the names of the files in the directory.
path can be specified as either str, bytes, or a path-like object. If path is bytes,
the filenames returned will also be bytes; in all other circumstances
the filenames returned will be str.
If path is None, uses the path='.'.
On some platforms, path may also be specified as an open file descriptor;\
the file descriptor must refer to a directory.
If this functionality is unavailable, using it raises NotImplementedError.
The list is in arbitrary order. It does not include the special
entries '.' and '..' even if they are present in the directory.
Type: builtin_function_or_method
Note *The list is in arbitrary order*. I'm not going to sacrifice
performance and memory usage to force a certain order.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#61 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADEPV6IXRPKQVGF3OXQ67ZTTJ2QDNANCNFSM4WEG57MQ>
.
|
The delete issue should be fixed now. |
@leondutoit I upgraded tacl, re-run
However I'm closing this ticket - the original question from @denvdm is solved and we're back to using |
Can't reproduce btw:
|
Getting permission errors when attempting to use s3-API on NIRD to import, export, or peform any other action. This worked fine last month. Exact error pasted below
The text was updated successfully, but these errors were encountered: