Skip to content

AddCSVRow interface #829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 31, 2014
Merged

AddCSVRow interface #829

merged 19 commits into from
Jul 31, 2014

Conversation

oesteban
Copy link
Contributor

@oesteban oesteban commented Apr 3, 2014

A complementary interface to AddCSVColumn

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling fd99ffd on oesteban:enh/AddCSVRow into cba8e41 on nipy:master.

@satra
Copy link
Member

satra commented Apr 8, 2014

please do a make specs

@oesteban
Copy link
Contributor Author

Very much improved version using pandas. Now the fields are not checked manually (pandas DataFrames are created for this).

Additionally, now the columns can be named in a DataSink style: using custom input traits, the values are directly associated to the header.

Finally, it allows for saving lists, expanding the header to the number of elements in the list.


class AddCSVRow(BaseInterface):
"""
Short interface to add an extra row to a text file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add here that pandas is required for this interface?

@oesteban
Copy link
Contributor Author

Dear @satra, I based my code on the DataSink interface to get auto-generated names for columns. The interface seems to work fine isolated, but when I insert it into a workflow, I get errors like Module AddCSVRow has no input called dice_index.

What else should I write to get the interface working with workflows?

Thanks!

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 96462b2 on oesteban:enh/AddCSVRow into cba8e41 on nipy:master.

@oesteban
Copy link
Contributor Author

Ok, I see that for DataSink and DataGrabber the check is skipped:

'.io' in str(destnode._interface.__class__)):

and so it should be for the new approach to AddCSVRow.

Should this interface be under io? Or we could add some indicator to allow dynamic traits?

@oesteban
Copy link
Contributor Author

I'm playing around this dynamically traited version of AddCSVRow, and it seems to me that this interface should be considered to be under IO:

  • I think it is a bit uncertain what happens if using the MultiProc plugin one calls AddCSVRow simultaneously in two threads. A file-lock system should be implemented (I volunteer for this if you think it is really necessary).
  • There are problems in the integration of AddCSVRow within workflows, besides the one reported previously. More precisely, after including one more exception to the L352 I am experiencing crashes like this:
140411-10:52:19,424 workflow DEBUG:
         deepcopy of <class 'nipype.algorithms.misc.AddCSVRow'>
140411-10:52:19,425 workflow DEBUG:
         Aggregate: True
140411-10:52:19,426 workflow ERROR:
         ['Node AddRow.a2 failed to run on host hades.']
140411-10:52:19,426 workflow INFO:
         Saving crash info to crash-20140411-105219-oesteban-AddRow.a2.pklz
140411-10:52:19,426 workflow INFO:
         Traceback (most recent call last):
  File "/home/oesteban/workspace/nipype/nipype/pipeline/plugins/linear.py", line 38, in run
    node.run(updatehash=updatehash)
  File "/home/oesteban/workspace/nipype/nipype/pipeline/engine.py", line 1392, in run
    self._run_interface()
  File "/home/oesteban/workspace/nipype/nipype/pipeline/engine.py", line 1502, in _run_interface
    self._result = self._run_command(execute)
  File "/home/oesteban/workspace/nipype/nipype/pipeline/engine.py", line 1604, in _run_command
    self._originputs = deepcopy(self._interface.inputs)
  File "/usr/lib/python2.7/copy.py", line 174, in deepcopy
    y = copier(memo)
  File "/home/oesteban/workspace/nipype/nipype/interfaces/base.py", line 579, in __deepcopy__
    dup_dict = deepcopy(self.get(), memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 230, in _deepcopy_list
    y.append(deepcopy(a, memo))
  File "/usr/lib/python2.7/copy.py", line 174, in deepcopy
    y = copier(memo)
  File "/usr/local/lib/python2.7/dist-packages/numpy/ma/core.py", line 5541, in __deepcopy__
    copied = MaskedArray.__new__(type(self), self, copy=True)
  File "/usr/local/lib/python2.7/dist-packages/numpy/ma/core.py", line 2688, in __new__
    data._mask.shape = data.shape
AttributeError: attribute 'shape' of 'numpy.generic' objects is not writable

and:

Traceback (most recent call last):
  File "/home/oesteban/bin/run_evaluations.py", line 89, in <module>
    wf.run()
  File "/home/oesteban/workspace/nipype/nipype/pipeline/engine.py", line 695, in run
    runner.run(execgraph, updatehash=updatehash, config=self.config)
  File "/home/oesteban/workspace/nipype/nipype/pipeline/plugins/base.py", line 261, in run
    slots=slots, graph=graph)
  File "/home/oesteban/workspace/nipype/nipype/pipeline/plugins/base.py", line 387, in _send_procs_to_workers
    tid = self._submit_job(deepcopy(self.procs[jobid]),
  File "/usr/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 174, in deepcopy
    y = copier(memo)
  File "/home/oesteban/workspace/nipype/nipype/interfaces/base.py", line 579, in __deepcopy__
    dup_dict = deepcopy(self.get(), memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python2.7/copy.py", line 230, in _deepcopy_list
    y.append(deepcopy(a, memo))
  File "/usr/lib/python2.7/copy.py", line 174, in deepcopy
    y = copier(memo)
  File "/usr/local/lib/python2.7/dist-packages/numpy/ma/core.py", line 5541, in __deepcopy__
    copied = MaskedArray.__new__(type(self), self, copy=True)
  File "/usr/local/lib/python2.7/dist-packages/numpy/ma/core.py", line 2688, in __new__
    data._mask.shape = data.shape
AttributeError: attribute 'shape' of 'numpy.generic' objects is not writable

Methods _outputs and _add_output_traits were missing to provide a
fully IOBase-like interface.
@oesteban
Copy link
Contributor Author

I've found that the source of the problem was not the interface, but the inputs I was supplying (numpy masked arrays).

Remaining decisions:

  • Add a file-locking system to the interface, to be threadsafe in MultiProc execution.
  • Allow dynamic-traited interfaces, eg. modifying these lines as follows:
    def _check_inputs(self, parameter):
        if hasattr(self.inputs,'_outputs'):
            return True
        return hasattr(self.inputs, parameter)

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 916ddfe on oesteban:enh/AddCSVRow into cba8e41 on nipy:master.

@oesteban
Copy link
Contributor Author

@satra, @chrisfilo what do you think about my last suggestions? Other than that, I'm currently using the new interface without any problems

@chrisgorgo
Copy link
Member

Sorry for the delay I'm happy to merge it as long as we give a warning that it is not thread safe (and and it into description).

@satra
Copy link
Member

satra commented May 27, 2014

one quick semantic note: should we call it GetCSVRow?

@oesteban
Copy link
Contributor Author

@satra Well GetCSVRow is more like reading a row, right?. This is the counterpart to the existing AddCSVColumn. With some improvements as using pandas is a great advantage to code features safely (managing columns, missing values, etc). In any case, I will write the name you prefer because I have no concerns about that.

@chrisfilo Regarding the file-lock, I'll add the warning and a note in the description. Regarding the dynamic-traited interfaces I think that the piece of code I suggested is necessary for this interface to work. Alternatively, we can include AddCSVRow as a DataSink that already accept dynamic traits.

@satra
Copy link
Member

satra commented May 27, 2014

@oesteban - my bad - this is actually adding a row - no need to change names.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling 7e4d434 on oesteban:enh/AddCSVRow into cba8e41 on nipy:master.

@oesteban
Copy link
Contributor Author

Travis is reporting a failure for python 2.6 but it was successful for 2.7. I guess it's a problem on the configuration of tests for 2.6.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) when pulling 3b23d82 on oesteban:enh/AddCSVRow into ab0e8b9 on nipy:master.

Now, the interface is thread-safe using lockfile. It is included in
documentation, and also a warning is issued when the module is not
available or could not be imported.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.26%) when pulling 4d3e368 on oesteban:enh/AddCSVRow into ab0e8b9 on nipy:master.

@oesteban
Copy link
Contributor Author

@satra, @chrisfilo: I think this interface is in a pretty mature status. But there's still something to get decided: how to allow dynamic-traited inputs in a BaseInterface?

I proposed modifying these lines as follows:

    def _check_inputs(self, parameter):
        if hasattr(self.inputs,'_outputs'):
            return True
        return hasattr(self.inputs, parameter)

This is the most simplistic way to get it, but we could also add one more meta info to the traits indicating if it is dynamic (instead of checking for a special name).

@chrisgorgo
Copy link
Member

Why not:

def _check_inputs(self, parameter):
    if isinstance(self.inputs, DynamicTraitedSpec):
        return True
    return hasattr(self.inputs, parameter)

instead?

@oesteban
Copy link
Contributor Author

You're right, much more elegant 👍

@chrisgorgo
Copy link
Member

Could you test if it works and add it to this PR?

On Thu, Jul 24, 2014 at 12:30 PM, Oscar Esteban notifications@github.com
wrote:

You're right, much more elegant [image: 👍]


Reply to this email directly or view it on GitHub
#829 (comment).

@oesteban
Copy link
Contributor Author

As soon as I test it I let you know :)

oesteban added 2 commits July 28, 2014 13:06
Conflicts:
	CHANGES
	nipype/algorithms/misc.py

Merge after updating master to upstream
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) when pulling 5e780bf on oesteban:enh/AddCSVRow into 8a5a190 on nipy:master.

@oesteban
Copy link
Contributor Author

This is working OK for me 👍

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) when pulling 764eab6 on oesteban:enh/AddCSVRow into af406fe on nipy:master.

oesteban added a commit that referenced this pull request Jul 31, 2014
@oesteban oesteban merged commit df7d9a6 into nipy:master Jul 31, 2014
@oesteban oesteban deleted the enh/AddCSVRow branch July 31, 2014 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants