Skip to content
This repository has been archived by the owner on Nov 2, 2021. It is now read-only.

feature request: hirni-import-dcm from ssh/sftp urls #135

Open
pvavra opened this issue Dec 4, 2019 · 4 comments
Open

feature request: hirni-import-dcm from ssh/sftp urls #135

pvavra opened this issue Dec 4, 2019 · 4 comments

Comments

@pvavra
Copy link

pvavra commented Dec 4, 2019

A nice feature: import dicoms from tarballs which are located on a remote file-server (but not a datalad dataset yet) with only ssh/sftp access.

My understanding is that hirni-import-dcm cannot use ssh:// or sftp:// urls, because git annex addurl doesn't handle those.

On my local machine I tried (and failed):

datalad hirni-import-dcm ssh://pvavra@medusa.ovgu.de:/home/data/sfb_cuetarget/sourcedata/dicoms/cm39_3556 acq1
[INFO   ] Creating a new annex repo at /home/petvav/tmp/test_hirni/sourcedata/acq1/dicoms 
[WARNING] Running addurl resulted in stderr output: Configuration does not allow accessing ssh://pvavra@medusa.ovgu.de:/home/data/sfb_cuetarget/sourcedata/dicoms/cm39_3556
download failed: Configuration does not allow accessing ssh://pvavra@medusa.ovgu.de:/home/data/sfb_cuetarget/sourcedata/dicoms/cm39_3556
git-annex: addurl: 1 failed
 
CommandError: command 'addurl'
Error, annex reported failure for addurl (url='ssh://pvavra@medusa.ovgu.de:/home/data/sfb_cuetarget/sourcedata/dicoms/cm39_3556'): {'command': 'addurl', 'success': False, 'error-messages': [], 'file': 'cm39_3556'}

as well as

datalad hirni-import-dcm sftp://pvavra@medusa.ovgu.de:/home/data/sfb_cuetarget/sourcedata/dicoms/cm39_3556 acq2
[INFO   ] Creating a new annex repo at /home/petvav/tmp/test_hirni/sourcedata/acq2/dicoms 
[WARNING] Running addurl resulted in stderr output: Configuration does not allow accessing sftp://pvavra@medusa.ovgu.de:/home/data/sfb_cuetarget/sourcedata/dicoms/cm39_3556
download failed: Configuration does not allow accessing sftp://pvavra@medusa.ovgu.de:/home/data/sfb_cuetarget/sourcedata/dicoms/cm39_3556
git-annex: addurl: 1 failed
 
CommandError: command 'addurl'
Error, annex reported failure for addurl (url='sftp://pvavra@medusa.ovgu.de:/home/data/sfb_cuetarget/sourcedata/dicoms/cm39_3556'): {'command': 'addurl', 'success': False, 'error-messages': [], 'file': 'cm39_3556'}

If I'm misunderstanding something and this should work, I would propose to improve the docs to reflect that ;)

@bpoldrack
Copy link

Generally, your understanding of the error is correct. git-annex-addurl refuses ssh:// scheme URLs.
However, hirni should still be able to deal with that and if it currently doesn't I need to fix that.

Apart from this particular issue, @pvavra : Thanks for trying and reporting all the troubles you have lately!

@pvavra
Copy link
Author

pvavra commented Dec 4, 2019

However, hirni should still be able to deal with that and if it currently doesn't I need to fix that.

does not work with datalad 0.12rc6 and 0.12rc4 -- just in case it might depend on some datalad stuff..

@pvavra
Copy link
Author

pvavra commented Dec 4, 2019

@bpoldrack: given that hirni isn't currently handling this, I have the following question:
What's our best approach now to import data now?

My thinking process is to do the following:

  1. create "staging area", e.g. under $project_root/scratch/mini_helios/ and rsync the tarballs (and physio data) into that place, and we can also directly copy the behavioral logfiles (saved as .mat files) there.
  2. use hirni-import-dcm referencing a local path, instead of the ssh:// one; analogously for the physio and behavioral data.
  3. change studyspec.json files to reflect what we need, esp. for the .mat file using a custom procedure.

with that, the sourcedata folder should be ready for a hirni-spec2bids call.

The question is, whether it will be later possible to change the "source" of the dicoms to the ssh:// version, like changing a git-remote. I am aware that datalad has this functionality (but haven't read carefully yet how it would apply to our case), but I'm also unsure how to do that for hirni-import-dcm- and whether it is possible in the first place.

Or is there an alternative approach I am not seeing?

@bpoldrack
Copy link

This is pretty much what I'd suggest to do for now.
Generally, additional (or alternative) sources can be added later. If annex can handle them, that's particularly easy. Now, what exactly to add later in this case depends a bit on what solution I come up with, but one way or another this possibility will be there. I'm just not sure yet, what mechanism to use. Access to that SSH source could be realized via a git-annex-special-remote. In that case it would be setting up such a remote and let annex fetch availability information from there. The difference would be that the thing you need to do wouldn't happen at the file level, but the (sub-)dataset level (that dicoms dataset created by import-dcm).
Either way: Yes, it will be possible to register the original source later on.

If you happen to play more with how annex natively allows to do such things: In case of imported dicom archives the location of the relevant file is actually somewhat "hidden" in a special branch called "incoming" within that dicom subdataset. There the actual archive is registered (and is in turn registered as the source for the extracted dicoms you see in the master branch).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants