Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble Adding S3 URLs from NDA manifest #1

Open
tjhendrickson opened this issue Jul 30, 2021 · 24 comments
Open

Trouble Adding S3 URLs from NDA manifest #1

tjhendrickson opened this issue Jul 30, 2021 · 24 comments

Comments

@tjhendrickson
Copy link

Hello,

I am attempting to use the datalad-nda CLI in order to produce a datalad dataset based on the 09 2020 3165 ABCD release found on NDA. I have the manifest as an uncompressed text file, so I came up with the following command:

/home/faird/shared/code/external/utilities/datalad-nda/scripts/datalad-nda add2datalad \ -i <(cat /spaces/ngdr/workspaces/hendr522/ABCD/datalad-ABCD-BIDS/abcd316520200818/datastructure_manifest.txt) \ -d /spaces/ngdr/workspaces/hendr522/ABCD/datalad2.0-ABCD-BIDS -J 10 --fast --drop-after. I've attached the messages sent to STDOUT and STDERR in separate files
datalad2.0-nda_ABCD-BIDS_5371074_STDOUT.txt
datalad2.0-nda_ABCD-BIDS_5371074_STDERR.txt

I am using the most recent version of the datalad-nda CLI (I pulled the repo maybe a week ago) and am running datalad version 0.14.4

yarikoptic added a commit to yarikoptic/datalad that referenced this issue Jul 30, 2021
@yarikoptic
Copy link
Member

unrelated but you don't need <(cat ...) if it is already uncompressed.

the error at the end of STDERR
[INFO] -> Adding URLs
Traceback (most recent call last):
  File "/home/faird/shared/code/external/utilities/datalad-nda/scripts/datalad-nda", line 442, in <module>
    main()
  File "/home/umii/hendr522/SW/miniconda3/envs/datalad/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/umii/hendr522/SW/miniconda3/envs/datalad/lib/python3.8/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/umii/hendr522/SW/miniconda3/envs/datalad/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/umii/hendr522/SW/miniconda3/envs/datalad/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/umii/hendr522/SW/miniconda3/envs/datalad/lib/python3.8/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/faird/shared/code/external/utilities/datalad-nda/scripts/datalad-nda", line 250, in add2datalad
    out = ds.addurls(
  File "/home/umii/hendr522/SW/miniconda3/envs/datalad/lib/python3.8/site-packages/datalad/distribution/dataset.py", line 503, in apply_func
    return f(**kwargs)
  File "/home/umii/hendr522/SW/miniconda3/envs/datalad/lib/python3.8/site-packages/datalad/interface/utils.py", line 486, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/home/umii/hendr522/SW/miniconda3/envs/datalad/lib/python3.8/site-packages/datalad/interface/utils.py", line 474, in return_func
    results = list(results)
  File "/home/umii/hendr522/SW/miniconda3/envs/datalad/lib/python3.8/site-packages/datalad/interface/utils.py", line 459, in generator_func
    raise IncompleteResultsError(
datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 1 failed:
[{'action': 'addurls',
  'message': 'First positional argument should be mapping '
             '[addurls.py:format:79]',
  'path': '/spaces/ngdr/workspaces/hendr522/ABCD/datalad2.0-ABCD-BIDS',
  'status': 'error',
  'type': 'dataset'}]

filed datalad/datalad#5850 to help make that message a bit more informative.

overall, please double check that your .txt is a .tsv, i.e. tab separated (that is what I had) since ATM we just hardcode that assumption: https://github.com/ReproNim/datalad-nda/blob/master/scripts/datalad-nda#L78

@yarikoptic
Copy link
Member

although probably it is not that -- since it would have then just failed here https://github.com/ReproNim/datalad-nda/blob/master/scripts/datalad-nda#L92 instead of nicely getting those out... I should try to redo a sample run on what I had to see if may be something changed in datalad which rendered it not working.

@tjhendrickson
Copy link
Author

Let me know if you need anything from me. I'm using an updated manifest.txt from when you originally built the ABCD-BIDS datalad dataset, so maybe that is part of it? If it is useful, I am happy to send you the manifest file that I am using via BOX or something. I imagine you have an active ABCD DUC, right?

@yarikoptic
Copy link
Member

Sorry about delay, I will try to get try this today, and let you know if we need to sync up on manifests ;-)

@yarikoptic
Copy link
Member

I have pushed fefeb27 which should address that exception you (and I) were getting with 0.14.4 -- we have managed to change something on datalad end which stopped doing automagic treatment of pathlib's Path's as str where desired. But the main problem came later since I seem can't get access to any of those prior S3 urls -- may be they turned off that access approach already? but may be it would somehow work for you?

@yarikoptic
Copy link
Member

FWIW, better upgrade to 0.14.7 datalad, just in case ;)

@tjhendrickson
Copy link
Author

Thanks for the push, I ran it with this and datalad version 0.14.6 (because that is the most recent conda had).

Here is the information I have gotten from STDERR:
[INFO] Creating a new dataset at /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/ABCD-BIDS-3165
[INFO] Creating a new annex repo at /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/ABCD-BIDS-3165
[INFO] Creating a helper procedure /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/ABCD-BIDS-3165/.datalad/procedures/cfg_nda
[INFO] Running procedure cfg_nda
[INFO] == Command start (output follows) =====
[INFO] == Command exit (modification check follows) =====
[INFO] Reading entire file
[INFO] Read 4764349 lines
[INFO] Loaded 4764347 records from 290 submissions for 290 datasets.
[INFO] Creating a new annex repo at /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/ABCD-BIDS-3165/sourcedata
[INFO] Running procedure cfg_nda
[INFO] == Command start (output follows) =====
[INFO] == Command exit (modification check follows) =====
[INFO] Processing for submission 21948
[INFO] Getting records only for submission_id=21948
[INFO] Got 331758 records
[INFO] Processed entire file: got 331758 files, 5668 subdatasets with 7 files having multiple URLs to possibly reach them
[WARNING] - dataset_description.json: 45848 records
| - README: 45848 records
| - CHANGES: 45848 records
| - task-SST_bold.json: 1191 records
| - task-rest_bold.json: 1470 records
| - task-MID_bold.json: 1235 records
| - task-nback_bold.json: 1197 records
[INFO] Saved output to <_io.TextIOWrapper name='/spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/ABCD-BIDS-3165/sourcedata/submission-21948.csv' mode='w' encoding='UTF-8'>
[INFO] -> Saving new submission
[INFO] -> Adding URLs
[INFO] Creating a new annex repo at /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/ABCD-BIDS-3165/derivatives/abcd-hcp-pipeline
[INFO] Running procedure cfg_nda
[INFO] == Command start (output follows) =====
[INFO] == Command exit (modification check follows) =====

Is that looking like you expect? If so, can you give me some advice on the resources a job like this will need to run to completion? Currently I have specified 10 CPUs, 20gb RAM, and 24 hours.

@yarikoptic
Copy link
Member

Did it annex any files?

Could you please take a head (eg 100 rows) of the file and run till completion, check that files are annexed and could be retrieved if you clone the result and remove origin remote (so it just doesn't fetch from where you already fetched to)

Resources - hard to tell, depends if you use --fast mode (no checksums in annex keys, suboptimal), bandwidth etc. Iirc a full run took like a week for me and that is with iirc 10 parallel across subdatasets jobs and a really good bandwidth. Do your manifests contain md5 checksums? If so, I might better finally add support (recent git annex has needed functionality already) to just trust those, and then avoiding most if not all traffic - we would just populate datasets without actually having data. Will only query NDA / s3 for file sizes afaik

@tjhendrickson
Copy link
Author

I'm not totally sure what file that you are wanting me to take a "head" of, but it doesn't look that it was able to at least annex one file within the "sourcedata" folder named "submission-21948.csv" which appears to be a manifest as a CSV file.

It does not look like the manifest I used has md5 checksums, the file headers are:

  • "submission_id"
  • "dataset_id"
  • "submission_id"
  • "manifest_name"
  • "manifest_file_name"
  • "associated_file"

Here is the command that I used for the most recent execution:
/home/faird/shared/code/external/utilities/datalad-nda/scripts/datalad-nda add2datalad
-i /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/abcd316520200818/datastructure_manifest.txt
-d /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/ABCD-BIDS-3165 -J 10 --fast --drop-after

@yarikoptic
Copy link
Member

I'm not totally sure what file that you are wanting me to take a "head" of,

head -n 100 < /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/abcd316520200818/datastructure_manifest.txt > /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/abcd316520200818/datastructure_manifest-100.txt

/home/faird/shared/code/external/utilities/datalad-nda/scripts/datalad-nda add2datalad
-i /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/abcd316520200818/datastructure_manifest-100.txt
-d /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/ABCD-BIDS-3165-test100 -J 10 --fast --drop-after

@yarikoptic
Copy link
Member

It does not look like the manifest I used has md5 checksums, the file headers are:

checked mine -- the same... I wonder what was some other file I had tried to work with. oh well -- I guess no "speedy" way for now (unless we see a checksum to be added to manifest by NDA - they might be computing them upon upload to ensure data integrity etc)

@tjhendrickson
Copy link
Author

Okay, what I am hearing you say is that it appears to be working, but it is going to take a lot longer than anticipated because it needs to create the md5 checksums. Maybe I'll do a little research into how Tobias Kadelka was able to pre-generate the md5 checksums via: https://github.com/TobiasKadelka/build_hcp and http://handbook.datalad.org/en/inm7/usecases/HCP_dataset.html

@tjhendrickson
Copy link
Author

Just thinking out loud, I could ask the NDA help desk if they distribute md5 checksums for releases. The other thing that I could do is that I have the entire ABCD-BIDS collection 3165 as a read-only folder at Minnesota. I could crawl all of the files and pull out the md5 checksums for each file. What do you think?

@yarikoptic
Copy link
Member

Okay, what I am hearing you say is that it appears to be working

nope -- I am waiting on you to say either that 100-lines head worked out nicely and files were annexed (when I tried) I failed to get access to those listed in the manifest I had

I could ask the NDA help desk if they distribute md5 checksums for releases.

we could ask or check their API either they provide them already may be... if they do -- we could then more "insistingly" ask to include them in distributed manifests ;)

I have the entire ABCD-BIDS collection 3165 as a read-only folder at Minnesota. I could crawl all of the files and pull out the md5 checksums for each file. What do you think?

It is a "danger zone" since assumes that that folder has it 100% identical to what is in NDA and no evil software/hardware/human bug caused divergence.
I would say that running something like cd /path/to/3165; find -type f | parallel --jobs 10 md5sum > ~/mine/3165.md5sums could be useful later to compare to what we would get from NDA. But before then we should see if they have checksum and which one (could be something else than md5, e.g. sha256, etc)

@tjhendrickson
Copy link
Author

After running what you suggested here is what the datalad dataset looks like:

derivatives:
abcd-hcp-pipeline

derivatives/abcd-hcp-pipeline:
sub-XXXXXXXXX sub-XXXXXXXXX

derivatives/abcd-hcp-pipeline/sub-XXXXXXXXX:

derivatives/abcd-hcp-pipeline/sub-XXXXXXXXX:

sourcedata:
submission-22640.csv

@yarikoptic
Copy link
Member

without any data file under any of those directories? if so -- means that it failed to fetch any file, most likely "good old direct S3 access" is no longer possible.

@tjhendrickson
Copy link
Author

Right, no data files underneath any of the directories. It appears that the this only file that is underneath any of the given directories is "submission-22640.csv". So what's next if "good old direct S3 access" is no longer possible?

By the way, how does your script go about authenticating users? I've been following along with the tutorial at: http://handbook.datalad.org/en/latest/usecases/HCP_dataset.html#data-retrieval-and-interacting-with-the-repository. There it suggests that I explicitly add something to the ".config/providers/nda-s3.cfg" for authentication, but it seems like you are going about a totally different strategy.

@yarikoptic
Copy link
Member

By the way, how does your script go about authenticating users?

datalad comes with this https://github.com/datalad/datalad/blob/master/datalad/downloaders/configs/nda.cfg (I don't spot any "suggests" you mention present in the handbook) so it used nda-s3 type of credential defined here https://github.com/datalad/datalad/blob/master/datalad/downloaders/credentials.py#L353 which the dance to get the token to access S3... datalad should ask you for user/password, store those, and then mint a new token whenever needed

You know -- I have tried now again -- and it seemed to work out as it all should have. May be things are back in "old normal"??? please try again, and if it doesn't work (how? any errors?) -- we would indeed need to work out based on the same manifest.

My invocation was

 datalad-nda/scripts/datalad-nda --pdb add2datalad -i datastructure_manifest-100.txt -d testds-datalad2.0-ABCD-BIDS-100 -J 10 --drop-after

on a sample of top 100 rows in manifest I had, and I had datalad 0.14.7 (FWIW: we do have now updated datalad and git-annex in conda-forge... would be interested to discover if that matter)...

before that - you could try just plain direct datalad download-url s3://NDAR_Central_1/submission_22640/derivatives/abcd-hcp-pipeline/sub-SENSORED/ses-baselineYear1Arm1/img/DVARS_and_FD_task-nback01.png (find a correct GUID ;)) -- if that works, datalad-nda helper should work. If doesn't -- we need to troubleshoot at this level first to get download working for you

@tjhendrickson
Copy link
Author

I've tried this, but it seems to just hang.

Based on our current discussion I get:
datalad download-url s3://NDAR_Central_1/submission_22640/derivatives/abcd-hcp-pipeline/sub-SENSORED/ses-baselineYear1Arm1/img/DVARS_and_FD_task-nback01.png [INFO ] Downloading 's3://NDAR_Central_1/submission_22640/derivatives/abcd-hcp-pipeline/sub-SENSORED/ses-baselineYear1Arm1/img/DVARS_and_FD_task-nback01.png' into '/spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/'

And based on our discussion that I started with Michael Hanke, on the HCP-D data:
datalad download-url s3://NDAR_Central_4/submission_33171/SENSORED_V1_MR/unprocessed/Diffusion/SENSORED_V1_MR_dMRI_dir99_AP_SBRef.nii.gz [INFO ] Downloading 's3://NDAR_Central_4/submission_33171/SENSORED_V1_MR/unprocessed/Diffusion/SENSORED_V1_MR_dMRI_dir99_AP_SBRef.nii.gz' into '/spaces/ngdr/workspaces/hendr522/HCP-D/'

In both cases it does not proceed any further.

@yarikoptic
Copy link
Member

yarikoptic commented Aug 11, 2021

eh, I bet it is due to datalad/datalad#5099 ... I will look into at least making the situation more obvious.

edit: datalad/datalad#5884 hopefully would mitigate this "halting" issue by first of all being more informative

Original intention is to lock for querying credentials, and "theoretically" it should not hang unless there is another datalad process on the same box asking for credentials

Please rerun with datalad -l debug download-url ... -- it will state before "hanging" the lock file path (on linux - ~/.cache/datalad/locks/downloader-auth.lck, don't know on OSX). If that is where it hangs

  • ctrl-C that process
  • if on linux -- test if you don't have that lock used by another process (fuser -v that-lock-file) and kill it
  • if nothing really holds it, and it is just stale somehow... please just rm it and try to download-url again

@tjhendrickson
Copy link
Author

Deleting the locked file definitely did the trick. The only problem I am now having is that despite entering the correct NDA username and password datalad download-url does not accept it. Any ideas?

Here are the messages sent to the debugger:

(datalad_and_nda) hendr522@ln0005 [/spaces/ngdr/workspaces/hendr522/ABCD/data/datalad] % datalad -l debug download-url s3://NDAR_Central_1/submission_22640/README
[DEBUG  ] Command line args 1st pass for DataLad 0.14.7. Parsed: Namespace() Unparsed: ['download-url', 's3://NDAR_Central_1/submission_22640/README'] 
[DEBUG  ] Discovering plugins 
[DEBUG  ] Building doc for <class 'datalad.core.local.status.Status'> 
[DEBUG  ] Building doc for <class 'datalad.core.local.save.Save'> 
[DEBUG  ] Building doc for <class 'datalad.interface.download_url.DownloadURL'> 
[DEBUG  ] Parsing known args among ['/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/bin/datalad', '-l', 'debug', 'download-url', 's3://NDAR_Central_1/submission_22640/README'] 
[DEBUG  ] Async run:
|  cwd=None
|  cmd=['git', '--git-dir=', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Launching process ['git', '--git-dir=', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Process 3556229 started 
[DEBUG  ] Waiting for process 3556229 to complete 
[DEBUG  ] Process 3556229 exited with return code 0 
[DEBUG  ] Determined class of decorated function: <class 'datalad.interface.download_url.DownloadURL'> 
[DEBUG  ] parseParameters: Given "", we split into [] 
[DEBUG  ] parseParameters: Given "credential: Credential, optional
|   Provides necessary credential fields to be used by authenticator
| authenticator: Authenticator, optional
|   Authenticator to use for authentication.", we split into [('credential', 'credential: Credential, optional\n  Provides necessary credential fields to be used by authenticator'), ('authenticator', 'authenticator: Authenticator, optional\n  Authenticator to use for authentication.')] 
[DEBUG  ] parseParameters: Given "", we split into [] 
[DEBUG  ] parseParameters: Given "credential: Credential, optional
|   Provides necessary credential fields to be used by authenticator
| authenticator: Authenticator, optional
|   Authenticator to use for authentication.", we split into [('credential', 'credential: Credential, optional\n  Provides necessary credential fields to be used by authenticator'), ('authenticator', 'authenticator: Authenticator, optional\n  Authenticator to use for authentication.')] 
[DEBUG  ] parseParameters: Given "", we split into [] 
[DEBUG  ] parseParameters: Given "credential: Credential, optional
|   Provides necessary credential fields to be used by authenticator
| authenticator: Authenticator, optional
|   Authenticator to use for authentication.", we split into [('credential', 'credential: Credential, optional\n  Provides necessary credential fields to be used by authenticator'), ('authenticator', 'authenticator: Authenticator, optional\n  Authenticator to use for authentication.')] 
[DEBUG  ] parseParameters: Given "", we split into [] 
[DEBUG  ] parseParameters: Given "method : callable
|   A callable, usually a method of the same class, which we decorate
|   with access handling, and pass url as the first argument
| url : string
|   URL to access
| *args, **kwargs
|   Passed into the method call", we split into [('method', 'method : callable\n  A callable, usually a method of the same class, which we decorate\n  with access handling, and pass url as the first argument'), ('url', 'url : string\n  URL to access'), ('*args, **kwargs', '*args, **kwargs\n  Passed into the method call')] 
[DEBUG  ] Reading files: ['/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/crawdad.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/crcns.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/dockerio.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/figshare.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/hcp.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/indi.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/kaggle.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/loris.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/nda.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/nitrc.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/nsidc.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/openfmri.cfg', '/home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/lib/python3.9/site-packages/datalad/downloaders/configs/providers.cfg'] 
[DEBUG  ] Assigning credentials into 21 providers 
[DEBUG  ] Returning provider Provider(authenticator=<<S3Authenticato++27 chars++one)>>, credential=<<NDA_S3(name='N++40 chars++'>>)>>, name='NDA', url_res=<<['s3://(ndar_c++27 chars++*)']>>) for url s3://NDAR_Central_1/submission_22640/README 
[INFO   ] Downloading 's3://NDAR_Central_1/submission_22640/README' into '/spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/' 
[DEBUG  ] Acquiring a currently existing lock to establish download session. If stalls - check which process holds b'/home/umii/hendr522/.cache/datalad/locks/downloader-auth.lck' 
[DEBUG  ] S3 session: Reconnecting to the bucket 
[DEBUG  ] Importing keyring 
[DEBUG  ] Generating token for NDA user hendr522 using <datalad.support.third.nda_aws_token_generator.NDATokenGenerator object at 0x7fb8cf6cfca0> talking to https://nda.nih.gov/DataManager/dataManager 
DEBUG  :datalad.downloaders.credentials:Generating token for NDA user hendr522 using <datalad.support.third.nda_aws_token_generator.NDATokenGenerator object at 0x7fb8cf6cfca0> talking to https://nda.nih.gov/DataManager/dataManager
ERROR:root:response had error message: Invalid username and/or password
[DEBUG  ] Access was denied: invalid username and/or password [credentials.py:_nda_adapter:330] 
DEBUG  :datalad.downloaders:Access was denied: invalid username and/or password [credentials.py:_nda_adapter:330]
Access to s3://NDAR_Central_1/submission_22640/README has failed.
Do you want to enter other credentials in case they were updated? (choices: yes, no): yes

You need to authenticate with 'NDA' credentials. https://ndar.nih.gov/access.html provides information on how to gain access
user: hendr522

password: 
password (repeat): 
INFO:datalad.ui.dialog:Clear progress bars
download_url(error): /spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/ (file) [Password not found [file_base.py:delete_password:180]]
INFO:datalad.ui.dialog:Refresh progress bars
[DEBUG  ] could not perform all requested actions: Command did not complete successfully. 1 failed:
[{'action': 'download_url',
  'exception_traceback': '[download_url.py:__call__:186,base.py:download:520,base.py:access:210,base.py:_handle_authentication:255,base.py:_enter_credentials:337,credentials.py:enter_new:269,credentials.py:refresh:277,credentials.py:delete:184,keyring_.py:delete:62,core.py:delete_password:65,file_base.py:delete_password:180]',
  'message': 'Password not found [file_base.py:delete_password:180]',
  'path': '/spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/',
  'status': 'error',
  'type': 'file'}] [utils.py:generator_func:459] 
DEBUG  :datalad.cmdline:could not perform all requested actions: Command did not complete successfully. 1 failed:
[{'action': 'download_url',
  'exception_traceback': '[download_url.py:__call__:186,base.py:download:520,base.py:access:210,base.py:_handle_authentication:255,base.py:_enter_credentials:337,credentials.py:enter_new:269,credentials.py:refresh:277,credentials.py:delete:184,keyring_.py:delete:62,core.py:delete_password:65,file_base.py:delete_password:180]',
  'message': 'Password not found [file_base.py:delete_password:180]',
  'path': '/spaces/ngdr/workspaces/hendr522/ABCD/data/datalad/',
  'status': 'error',
  'type': 'file'}] [utils.py:generator_func:459]

@yarikoptic
Copy link
Member

quick workaround (BTW please share output of datalad wtf --decor html_details) might be to get to your OS credentials manager and remove anything you find for datalad and NDA, and retry. More detail on the fresh issue datalad/datalad#5889

@tjhendrickson
Copy link
Author

tjhendrickson commented Aug 12, 2021

DataLad 0.14.7 WTF (configuration, credentials, datalad, dependencies, environment, extensions, git-annex, location, metadata_extractors, metadata_indexers, python, system)

WTF

configuration <SENSITIVE, report disabled by configuration>

credentials

  • keyring:
    • active_backends:
      • PlaintextKeyring with no encyption v.1.0 at /home/umii/hendr522/.local/share/python_keyring/keyring_pass.cfg
    • config_file: /home/umii/hendr522/.config/python_keyring/keyringrc.cfg
    • data_root: /home/umii/hendr522/.local/share/python_keyring

datalad

  • full_version: 0.14.7
  • version: 0.14.7

dependencies

  • annexremote: 1.5.0
  • appdirs: 1.4.4
  • boto: 2.49.0
  • cmd:7z: 16.02
  • cmd:annex: 8.20210803-g99bb214
  • cmd:bundled-git: UNKNOWN
  • cmd:git: 2.32.0
  • cmd:system-git: 2.32.0
  • cmd:system-ssh: 8.6p1
  • exifread: 2.1.2
  • humanize: 3.11.0
  • iso8601: 0.1.16
  • keyring: 23.0.1
  • keyrings.alt: 4.0.2
  • msgpack: 1.0.2
  • mutagen: 1.45.1
  • requests: 2.26.0
  • wrapt: 1.12.1

environment

  • LANG: en_US.UTF-8
  • PATH: /home/umii/hendr522/SW/miniconda3/envs/datalad_and_nda/bin:/home/umii/hendr522/SW/miniconda3/condabin:/panfs/roc/msisoft/fsl/6.0.1/bin:/panfs/roc/msisoft/R/4.0.0/bin:/home/umii/hendr522/SW/aws-cli/bin:/home/dhp/public/storage/s3policy_bin:/home/umii/hendr522/SW/sublime_text_3:/home/umii/hendr522/SW/workbench/bin_rh_linux64:/panfs/roc/msisoft/rclone/1.38/bin:/panfs/roc/groups/3/umii/hendr522/SW/VSCode-linux-x64/bin:/home/umii/hendr522/SW/pycharm-community-2021.1.1/bin:/home/umii/hendr522/bin:/home/faird/shared/CBRAIN_distro/cbrain_git_ruby_gems/ruby/2.7.0/bin:/panfs/roc/msisoft/ruby/2.7.0/bin:/opt/msi/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/opt/puppetlabs/bin:/home/umii/hendr522/.rvm/bin:/home/umii/hendr522/.rvm/bin:/panfs/roc/groups/3/umii/hendr522/SW/simNIBS/bin

extensions

git-annex

  • build flags:
    • Assistant
    • Webapp
    • Pairing
    • Inotify
    • DBus
    • DesktopNotify
    • TorrentParser
    • MagicMime
    • Feeds
    • Testsuite
    • S3
    • WebDAV
  • dependency versions:
    • aws-0.22
    • bloomfilter-2.0.1.0
    • cryptonite-0.26
    • DAV-1.3.4
    • feed-1.3.0.1
    • ghc-8.8.4
    • http-client-0.6.4.1
    • persistent-sqlite-2.10.6.2
    • torrent-10000.1.1
    • uuid-1.3.13
    • yesod-1.6.1.0
  • key/value backends:
    • SHA256E
    • SHA256
    • SHA512E
    • SHA512
    • SHA224E
    • SHA224
    • SHA384E
    • SHA384
    • SHA3_256E
    • SHA3_256
    • SHA3_512E
    • SHA3_512
    • SHA3_224E
    • SHA3_224
    • SHA3_384E
    • SHA3_384
    • SKEIN256E
    • SKEIN256
    • SKEIN512E
    • SKEIN512
    • BLAKE2B256E
    • BLAKE2B256
    • BLAKE2B512E
    • BLAKE2B512
    • BLAKE2B160E
    • BLAKE2B160
    • BLAKE2B224E
    • BLAKE2B224
    • BLAKE2B384E
    • BLAKE2B384
    • BLAKE2BP512E
    • BLAKE2BP512
    • BLAKE2S256E
    • BLAKE2S256
    • BLAKE2S160E
    • BLAKE2S160
    • BLAKE2S224E
    • BLAKE2S224
    • BLAKE2SP256E
    • BLAKE2SP256
    • BLAKE2SP224E
    • BLAKE2SP224
    • SHA1E
    • SHA1
    • MD5E
    • MD5
    • WORM
    • URL
    • X*
  • operating system: linux x86_64
  • remote types:
    • git
    • gcrypt
    • p2p
    • S3
    • bup
    • directory
    • rsync
    • web
    • bittorrent
    • webdav
    • adb
    • tahoe
    • glacier
    • ddar
    • git-lfs
    • httpalso
    • borg
    • hook
    • external
  • supported repository versions:
    • 8
  • upgrade supported from repository versions:
    • 0
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
  • version: 8.20210803-g99bb214

location

  • path: /spaces/ngdr/workspaces/hendr522/HCP-D
  • type: directory

metadata_extractors

  • annex (datalad 0.14.7):
    • distribution: datalad 0.14.7
    • load_error: None
    • module: datalad.metadata.extractors.annex
    • version: None
  • audio (datalad 0.14.7):
    • distribution: datalad 0.14.7
    • load_error: None
    • module: datalad.metadata.extractors.audio
    • version: None
  • datacite (datalad 0.14.7):
    • distribution: datalad 0.14.7
    • load_error: None
    • module: datalad.metadata.extractors.datacite
    • version: None
  • datalad_core (datalad 0.14.7):
    • distribution: datalad 0.14.7
    • load_error: None
    • module: datalad.metadata.extractors.datalad_core
    • version: None
  • datalad_rfc822 (datalad 0.14.7):
    • distribution: datalad 0.14.7
    • load_error: None
    • module: datalad.metadata.extractors.datalad_rfc822
    • version: None
  • exif (datalad 0.14.7):
    • distribution: datalad 0.14.7
    • load_error: None
    • module: datalad.metadata.extractors.exif
    • version: None
  • frictionless_datapackage (datalad 0.14.7):
    • distribution: datalad 0.14.7
    • load_error: None
    • module: datalad.metadata.extractors.frictionless_datapackage
    • version: None
  • image (datalad 0.14.7):
    • distribution: datalad 0.14.7
    • load_error: None
    • module: datalad.metadata.extractors.image
    • version: None
  • xmp (datalad 0.14.7):
    • distribution: datalad 0.14.7
    • load_error: No module named 'libxmp' [xmp.py::20]
    • module: datalad.metadata.extractors.xmp

metadata_indexers

python

  • implementation: CPython
  • version: 3.9.6

system

  • distribution: centos/7/Core
  • encoding:
    • default: utf-8
    • filesystem: utf-8
    • locale.prefered: UTF-8
  • max_path_length: 294
  • name: Linux
  • release: 3.10.0-1160.36.2.el7.x86_64
  • type: posix
  • version: Trouble Adding S3 URLs from NDA manifest #1 SMP Wed Jul 21 11:57:15 UTC 2021

@yarikoptic
Copy link
Member

yarikoptic commented Aug 13, 2021

removed verbatim decoration in WTF -- looks how it looks now ;)

so -- did you succeed with credentials workaround?
FWIW, submitted a PR to fix it up so would not be needed: datalad/datalad#5892 (you can python3 -m pip install git+https://github.com/yarikoptic/datalad@bf-delete-credential and if datalad --version reports 0.14.6+60.g3cddd7c4a -- you got it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants