Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: DTITK Interface #2514

Merged
merged 58 commits into from
Apr 15, 2018
Merged

FIX: DTITK Interface #2514

merged 58 commits into from
Apr 15, 2018

Conversation

kesshijordan
Copy link
Contributor

@kesshijordan kesshijordan commented Mar 27, 2018

Fixes #2462

Changes proposed in this pull request

Closes #2463.

Kesshi Jordan and others added 29 commits September 11, 2017 11:04
…n in that directory through another mechanism.
@effigies effigies added this to the 1.0.3 milestone Mar 27, 2018
@effigies
Copy link
Member

Here's a mixin you can put in dtitk/base.py

class DTITKRenameMixin(object):
    def __init__(self, *args, **kwargs):
        classes = [cls.__name__ for cls in self.__class__.mro()]
        dep_name = classes[0]
        rename_idx = classes.index('DTITKRenameMixin')
        new_name = classes[rename_idx + 1]
        warnings.warn('The {} interface has been renamed to {}\n'
                      'Please see the documentation for DTI-TK '
                      'interfaces, as some inputs have been '
                      'added or renamed for clarity.'
                      ''.format(dep_name, new_name),
                      DeprecationWarning)
        super(DTITKRenameMixin, self).__init__(*args, **kwargs)

And then for classes that have been renamed, we can just have a big sequence of dummy classes like so:

dtitk/registration.py:

class RigidTask(DTITKRenameMixin, Rigid):
    pass
...
class diffeoScalarVolTask(DTITKRenameMixin, DiffeoScalarVol):
    pass

dtitk/utils.py:

class TVAdjustVoxSpTask(DTITKRenameMixin, TVAdjustVoxSp):
    pass
...
class BinThreshTask(DTITKRenameMixin, BinThresh):
    pass

@satra @mgxd @djarecka I suspect you might have preferences for how deprecations are handled. Feel free to suggest any modifications (e.g. do we want to set a version that these will be removed by?).

return
return fname_presuffix(os.path.basename(
self.inputs.in_df).split('.')[0],
suffix='_affdf.df.nii')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you originally had (a10da5d): name_template="%s_aff.df", keep_extension=True

I think we can achieve that pattern as follows:

return fname_presuffix(os.path.basename(self.inputs.in_df),
                       suffix='_aff.df', keep_ext=True)

Copy link
Contributor Author

@kesshijordan kesshijordan Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, fname_presuffix still keeps the .df, so we end up with an extra extension (filename.df_aff.df.nii). I wanted to make it match the other warps .df.nii extension with something like _affdf prepended so the user knows this warp includes both the affine and diffeomorphic transforms together (filename_affdf.df.nii or filename_affdf.df.nii.gz). Is the extra extension okay or would it be better to match the other warp files that come from just the diffeo transform?

Copy link
Member

@effigies effigies Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I misunderstood the situation (and I'm now no longer sure that fixing #2506 will let us do what you want). There are two perspectives here, IMO.

  1. File names are semantically meaningful, and as long as we're going to impose a convention within the DTI-TK interfaces, we should respect it.

  2. Nipype is often used to bring tools together that have zero respect for each others' conventions, so attempting to manipulate names semantically will fail in some cases. Naming should largely be handled at the DataSink level.

I think we can sort of split the difference here:

path, base, ext = split_filename(self.inputs.in_df)
suffix = '_affdf'
if base.endswith('.df'):
    suffix += '.df'
    base = base[:-3]
return fname_presuffix(base, suffix=suffix + ext, keep_ext=False)

This way if you get a file named x.df.nii, you'll get x_affdf.df.nii, but if you just get x.nii, you'll get x_affdf.nii.

Did I get your convention right? And does this seem like a reasonable way to achieve that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Thanks! I think it's a good idea to match the diffeomorphic naming convention.

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I think this is basically done at this point. One thing that caught my eye in these last few changes, but otherwise, I think this can go in when tests pass.

If anybody else wants to review before merging, make some noise before the tests are all green.

if name == "out_file":
return self._list_outputs()["out_file"]
return None
class BinThreshTASK(DTITKRenameMixin, BinThresh):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TASK is all caps.

@kesshijordan
Copy link
Contributor Author

@effigies can I test the automatic naming in a workflow before this gets merged?

@effigies
Copy link
Member

Sounds good. Just say when you're ready.

@kesshijordan
Copy link
Contributor Author

kesshijordan commented Apr 10, 2018

Thanks, @effigies . Something isn't right and I'm trying to figure out what it is... in the report.rst for nodes with multiple outputs that are named automatically, the transform is not getting logged as an output ("out_file_xfm" is undefined). If I look at the .json, "needed_outputs" lists only "out_file" ("out_file_xfm" is missing)... I think this is making it delete the transform from the directory since it isn't listed in "needed outputs" (the transforms are missing from the node directory). Do I need to do something with "aggregate_outputs" ?

@effigies
Copy link
Member

You can update the configuration to set remove_unnecessary_outputs to false, and see if that resolves the issue.

If remove_unnecessary_outputs is true, then any file that isn't fed into some other node or otherwise marked as needed will be removed. I don't remember the details of how to mark as needed, but you shouldn't need to mess with it. I set it to false while developing, to make sure that I can see all the outputs without re-running the node, but then make sure it's still true by default for final pipelines.

@kesshijordan
Copy link
Contributor Author

kesshijordan commented Apr 12, 2018

Thanks for the suggestion, @effigies. That was helpful in debugging. I think I figured out what's been causing problems and how to fix it. It seems like the outputs aren't moving between node directories correctly in registration nodes. Let me know if you think this is a good fix.

DTI-TK saves the outputs in the same directory as the inputs for the functions that don't take an output path (e.g. Rigid registration), so I addressed this by setting copyfile=False for both fixed and moving images (which means that the execution inputs are paths to the input files in the node working directory instead of the previous node's directory). Then, the outputs (image and rigid transform) get put in the node working directory instead of the input node directory (which is what we want). We still have an issue, however, in that there is no input for the initialization transform (to provide the Affine node with the Rigid node's output transform), so it needs to be copied to the Affine node working directory.

I addressed this by making the boolean flag instructing the use of a precomputed transform to initialize the registration (use_in_transform) into a file that takes in the .aff transform file. It adds the transform as a string, which triggers using the transform just like the "1" flag would, and leaves it off it it isn't set (thus not triggering "use transform". Then I can set copyfile=True to ensure that the transform makes it into the new directory even though it isn't specified as an input to the command line.

Cases:
nothing input to Affine.inputs.initialize_xfm:
dti_affine_reg ...working_dir/dtitk_reg/rigid_node/Ptensor_resampled.nii.gz ...working_dir/dtitk_reg/rigid_node/Ttensor_resampled.nii.gz EDS 4 4 4 0.01

Rigid.outputs.out_file_xfm connected to Affine.inputs.initialize_xfm
dti_affine_reg ...working_dir/dtitk_reg/affine_node/Ptensor_resampled.nii.gz ...working_dir/dtitk_reg/affine_node/Ttensor_resampled.nii.gz EDS 4 4 4 0.01 ...working_dir/dtitk_reg/affine_node/Ttensor_resampled.aff

One thing to be addressed with this approach is that we need to enforce the DTITK format or it's a silent failure (if it can't open the .aff transform file, it just skips it and still performs the registration with a different initialization strategy)
I was thinking: can we search the stderr for keywords to raise an error if the .aff file is not found?
stderr: filename.aff doesn't exist or can't be opened prints while it runs

@effigies
Copy link
Member

Yes, using copyfile=False is a good approach to making sure these outputs are placed in the correct directory. If I'm understanding the initialization issue correctly, 1 means "look for a file named <moving>.aff and use that", which is failing because we're no longer looking in the output directory of the old node? And thus we pass an absolute filename instead?

If that's correct, I think we still want to be able to use 1 (if someone is using the interface outside a workflow), but also accept filenames:

initialize_xfm = traits.Either(File(exists=True, copyfile=False), "1",
                               desc="Initialize with DTITK-FORMAT transform "
                                    "('1' to initialize with existing transform)",
                               position=5, argstr="%s")

I'm not sure whether copyfile is necessary, but it can't hurt.

And then, to check for stderr, you can do something like this:

def _run_interface(self, runtime):
runtime = super(EPIDeWarp, self)._run_interface(runtime)
if runtime.stderr:
self.raise_exception(runtime)
return runtime

You can modify the check to look for a specific string, if there may be innocuous stderr output.

@kesshijordan
Copy link
Contributor Author

Yes; that is my understanding of the issue. Your suggestion works for running the interface alone, but in the pipeline it doesn't put the symlink to .aff in the working directory. It works if you put copyfile=False outside the File(), but that breaks if you set it to 1. Do you know why copyfile=False would behave differently inside of traits.Either?

copyfile=False as argument to File inside of traits.Either sets the .aff in the rigid node and it can't find the .aff in the working directory.
dti_affine_reg ...working_dir/dtitk_reg/affine_node/Ptensor_resampled.nii.gz ...working_dir/dtitk_reg/affine_node/Ttensor_resampled.nii.gz EDS 4 4 4 0.01 ...working_dir/dtitk_reg/rigid_node/Ttensor_resampled.aff

copyfile=False as argument to traits.Either works, but then breaks if you set it to 1 because it can't copy the non-file
dti_affine_reg ...working_dir/dtitk_reg/affine_node/Ptensor_resampled.nii.gz ...working_dir/dtitk_reg/affine_node/Ttensor_resampled.nii.gz EDS 4 4 4 0.01 ...working_dir/dtitk_reg/affine_node/Ttensor_resampled.aff

@effigies
Copy link
Member

Oh, so if you specify the location, it uses that location as both input and output? I get it now. Out of curiosity, is it still a symlink after running? If so, I suspect it's overwriting the file back in the rigid node, in which case you'll want copyfile=True.

And I guess let's drop the Either and just use File. The copyfile in Either issue is probably a bug, but fixing it is definitely out-of-scope for this PR.

@kesshijordan
Copy link
Contributor Author

It still saves the .aff to the node working directory (because the moving file gets symlinked and that is the path used in the execution input, which is the basis for the output <moving>.aff). You can't specify the location of the output on the command line (there's no field)... the only thing it does is guess what the .aff would be from the moving filename and use it if the flag is set to anything and the file exists. Since I'm setting copyfile=False on the fixed and moving images, I think it's symlinking those files to the node working directory. (the original inputs are the previous node directories, but the execution inputs are the node working directory). That's why I need to symlink the initialize_xfm path to the working directory; because it will be looking there. If it doesn't find it (as is the case with Either), it still generates the output by the same name <moving>.aff, but does not use the previous affine to initialize. I'm cheating by specifying an input the command line function can't take, but if I set copyfile=False it symlinks and works the way we want it to.

I don't see the symlinks after running (I think they're deleted after the node processes), but I can infer that they're there during processing from the 'Execution Inputs' section of the report.rst file (and the output filename <moving>.aff would not be in the directory if that were not the case).

If we drop Either and just use File, then I can leave exists as False so we can still use the interface by itself with the flag '1'. It won't work with initialize_xfm=True or initialize_xfm=1 like we had it before, though... it must be a string because we're tricking it into thinking the flag is a filename and providing anything at all in that field to the command line function will trigger the desired behavior (using the pre-calculated transform in the working directory). Does that sound good? It's a little gross but I'm not sure how else to do it without changing copyfile in Either.

@effigies
Copy link
Member

So just regarding symlinks and <moving>.aff: Do you have a case where you are symlinking <moving>.aff from the Rigid node, and then have a <moving>.aff as a still-existing output in your Affine node? (It's okay if the symlinks of the files are deleted after being used.) I'm trying to figure out the file writing behavior of this utility.

Specifically, I'm trying to verify whether a new file is being written or if it's writing to the old file through the symlink. To illustrate the idea:

$ echo A > a
$ cat a
A
$ ln -s a b
$ echo B > b
$ cat a
B

Basically, if this is what's happening, we need to switch to copyfile=True, which will make a copy instead of a symlink. If it's unclear, I would go ahead and set copyfile=True no matter what, since this file should be well under a kilobyte.

Sorry if this was all clear to you and you were trying to make a different point, by the way.

If we drop Either and just use File, then I can leave exists as False so we can still use the interface by itself with the flag '1'.

So at this point we're balancing which is more useful to a user: checking whether the file exists or letting them type "1" to mean "<moving>.aff". I can go either way, but I'm slightly inclined toward exists=True, as the majority of the time it will be used will be in a workflow, where "1" isn't an option but checking existence is useful. (And they should be able to add affine.inputs.args = ["1"] to get that behavior, if they really need it.)

@kesshijordan
Copy link
Contributor Author

I think the intended behavior is to overwrite the previous .aff file behind the scenes to cut down on processing, but I don't think that happens backward through the symlink in our case because the "execution" paths are all to the working directory so the <moving>.nii file has a path in the current directory affine/<moving>.nii (I assume this is a symlink that gets deleted). That means the transform save path is affine/<moving>.aff so I think it's writing in the new directory and keeping the transform, since it's passed to the next step. The way it's written, though, the flag is set to copyfile=True for the transform, but the fixed and moving files have copyfile=False. I think it copies the rigid transform over, symlinks the fixed and moving files so that the saved transform will be in the right place (affine/<moving>.aff), runs the function initializing with the copied .aff in the working directory, and then overwrites the copied .aff and deletes the symlinks. Does that sound right? The affines are different in rigid vs affine when the images are different sizes so I think it's working as it should.

I agree that setting exists=True makes more sense to the user... it ensures that the user knows exactly what file they're initializing with. It can be confusing when the function is finding its own file in the working directory and I think that would cause problems if the user doesn't realize what is happening. Worst case scenario, they have to figure out the filename, which they should have anyway since they're using it to initialize (most software packages ask for the xfm path so this is more in line with normal use)... or, as you mentioned, they could add affine.inputs.args if they really want to use a 1.

initialize_xfm = File(copyfile=True, desc="Initialize w/DTITK-FORMAT" "affine", position=5, argstr="%s", exists=True)

I made that change, and added simple workflows for performing affine and diffeomorphic registration to make sure a user has a functioning pipeline example that adheres to dtitk's requirements (e.g. array size dimension zero must be a power of 2). Does that look good?

@effigies
Copy link
Member

Nice. https://148-59684390-gh.circle-artifacts.com/0/home/circleci/work/docs/interfaces/generated/workflows.dmri/dtitk.tensor_registration.html

This looks great. If you're all set, I'm happy to merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants