<img src="../static/images/mapnode.png"  width="300">

# MapNode

If you want to iterate over a list of inputs, but need to feed all iterated outputs afterwards as one input (an array) to the next node, you need to use a **``MapNode``**. A ``MapNode`` is quite similar to a normal ``Node``, but it can take a list of inputs and operate over each input separately, ultimately returning a list of outputs. (The main homepage has a [nice section](http://nipype.readthedocs.io/en/latest/users/mapnode_and_iterables.html) about ``MapNode`` and ``iterables`` if you want to learn more).

Let's demonstrate this with a simple function interface:

In [None]:
from nipype import Function
def square_func(x):
    return x ** 2
square = Function(["x"], ["f_x"], square_func)

We see that this function just takes a numeric input and returns its squared value.

In [None]:
square.run(x=2).outputs.f_x

4

What if we wanted to square a list of numbers? We could set an iterable and just split up the workflow in multiple sub-workflows. But say we were making a simple workflow that squared a list of numbers and then summed them. The sum node would expect a list, but using an iterable would make a bunch of sum nodes, and each would get one number from the list. The solution here is to use a `MapNode`.

The `MapNode` constructor has a field called `iterfield`, which tells it what inputs should be expecting a list.

In [None]:
from nipype import MapNode
square_node = MapNode(square, name="square", iterfield=["x"])

In [None]:
square_node.inputs.x = [0, 1, 2, 3]
square_node.run().outputs.f_x

170716-01:40:53,809 workflow INFO:
	 Executing node square in dir: /tmp/tmp6886npt7/square
170716-01:40:53,816 workflow INFO:
	 Executing node _square0 in dir: /tmp/tmp6886npt7/square/mapflow/_square0
170716-01:40:53,828 workflow INFO:
	 Executing node _square1 in dir: /tmp/tmp6886npt7/square/mapflow/_square1
170716-01:40:53,841 workflow INFO:
	 Executing node _square2 in dir: /tmp/tmp6886npt7/square/mapflow/_square2
170716-01:40:53,853 workflow INFO:
	 Executing node _square3 in dir: /tmp/tmp6886npt7/square/mapflow/_square3


[0, 1, 4, 9]

Because `iterfield` can take a list of names, you can operate over multiple sets of data, as long as they're the same length. The values in each list will be paired; it does not compute a combinatoric product of the lists.

In [None]:
def power_func(x, y):
    return x ** y

In [None]:
power = Function(["x", "y"], ["f_xy"], power_func)
power_node = MapNode(power, name="power", iterfield=["x", "y"])
power_node.inputs.x = [0, 1, 2, 3]
power_node.inputs.y = [0, 1, 2, 3]
print(power_node.run().outputs.f_xy)

170716-01:41:23,366 workflow INFO:
	 Executing node power in dir: /tmp/tmpa5hjghpw/power
170716-01:41:23,372 workflow INFO:
	 Executing node _power0 in dir: /tmp/tmpa5hjghpw/power/mapflow/_power0
170716-01:41:23,385 workflow INFO:
	 Executing node _power1 in dir: /tmp/tmpa5hjghpw/power/mapflow/_power1
170716-01:41:23,400 workflow INFO:
	 Executing node _power2 in dir: /tmp/tmpa5hjghpw/power/mapflow/_power2
170716-01:41:23,414 workflow INFO:
	 Executing node _power3 in dir: /tmp/tmpa5hjghpw/power/mapflow/_power3
[1, 1, 4, 27]


But not every input needs to be an iterfield.

In [None]:
power_node = MapNode(power, name="power", iterfield=["x"])
power_node.inputs.x = [0, 1, 2, 3]
power_node.inputs.y = 3
print(power_node.run().outputs.f_xy)

170716-01:41:31,840 workflow INFO:
	 Executing node power in dir: /tmp/tmp7foq1xcp/power
170716-01:41:31,845 workflow INFO:
	 Executing node _power0 in dir: /tmp/tmp7foq1xcp/power/mapflow/_power0
170716-01:41:31,854 workflow INFO:
	 Executing node _power1 in dir: /tmp/tmp7foq1xcp/power/mapflow/_power1
170716-01:41:31,865 workflow INFO:
	 Executing node _power2 in dir: /tmp/tmp7foq1xcp/power/mapflow/_power2
170716-01:41:31,878 workflow INFO:
	 Executing node _power3 in dir: /tmp/tmp7foq1xcp/power/mapflow/_power3
[0, 1, 8, 27]


As in the case of `iterables`, each underlying `MapNode` execution can happen in **parallel**. Hopefully, you see how these tools allow you to write flexible, reusable workflows that will help you processes large amounts of data efficiently and reproducibly.

# Why is this important?

Let's consider we have multiple functional images (A) and each of them should be motioned corrected (B1, B2, B3,..). But afterwards, we want to put them all together into a GLM, i.e. the input for the GLM should be an array of [B1, B2, B3, ...]. [Iterables](basic_iteration.ipynb) can't do that. They would split up the pipeline. Therefore, we need **MapNodes**.

<img src="../static/images/mapnode.png"  width="300">

Let's look at a simple example, where we want to motion correct two functional images. For this we need two nodes:
 - Gunzip, to unzip the files (plural)
 - Realign, to do the motion correction

In [None]:
from nipype.algorithms.misc import Gunzip
from nipype.interfaces.spm import Realign
from nipype.pipeline.engine import Node, MapNode, Workflow

files = ['/data/ds000114/sub-01/func/sub-01_task-fingerfootlips_bold.nii.gz',
         '/data/ds000114/sub-01/func/sub-01_task-linebisection_bold.nii.gz']

realign = Node(Realign(register_to_mean=True),
               name='motion_correction')

If we try to specify the input for the **Gunzip** node with a simple **Node**, we get the following error:

In [None]:
gunzip = Node(Gunzip(), name='gunzip',)
gunzip.inputs.in_file = files

TraitError: The 'in_file' trait of a GunzipInputSpec instance must be an existing file name, but a value of ['/data/ds000114/sub-01/func/sub-01_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-01/func/sub-01_task-linebisection_bold.nii.gz'] <class 'list'> was specified.

```bash
TraitError: The 'in_file' trait of a GunzipInputSpec instance must be an existing file name, but a value of ['/data/ds102/sub-01/func/sub-01_task-flanker_run-1_bold.nii.gz', '/data/ds102/sub-01/func/sub-01_task-flanker_run-2_bold.nii.gz'] <type 'list'> was specified.
```

But if we do it with a **MapNode**, it works:

In [None]:
gunzip = MapNode(Gunzip(), name='gunzip',
                 iterfield=['in_file'])
gunzip.inputs.in_file = files

Now, we just have to create a workflow, connect the nodes and we can run it:

In [None]:
mcflow = Workflow(name='realign_with_spm')
mcflow.connect(gunzip, 'out_file', realign, 'in_files')
mcflow.base_dir = '/data'
mcflow.run('MultiProc', plugin_args={'n_procs': 4})

170716-01:43:50,105 workflow INFO:
	 Workflow realign_with_spm settings: ['check', 'execution', 'logging']
170716-01:43:50,133 workflow INFO:
	 Running in parallel.
170716-01:43:50,139 workflow INFO:
	 Executing: gunzip ID: 0
170716-01:43:50,156 workflow INFO:
	 Adding 2 jobs for mapnode gunzip
170716-01:43:50,164 workflow INFO:
	 Executing: _gunzip0 ID: 2
170716-01:43:50,177 workflow INFO:
	 Executing: _gunzip1 ID: 3
170716-01:43:50,181 workflow INFO:
	 Executing node _gunzip0 in dir: /data/realign_with_spm/gunzip/mapflow/_gunzip0
170716-01:43:50,200 workflow INFO:
	 Executing node _gunzip1 in dir: /data/realign_with_spm/gunzip/mapflow/_gunzip1
170716-01:43:51,801 workflow INFO:
	 [Job finished] jobname: _gunzip1 jobid: 3
170716-01:43:51,827 workflow INFO:
	 [Job finished] jobname: _gunzip0 jobid: 2
170716-01:43:51,831 workflow INFO:
	 Executing: gunzip ID: 0
170716-01:43:51,865 workflow INFO:
	 Executing node gunzip in dir: /data/realign_with_spm/gunzip
170716-01:43:51,966 workflow I

<networkx.classes.digraph.DiGraph at 0x7fc97fc25908>