You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some example tools which consume multiple datasets (including lists) include:
143
+
144
+
- `multi_data_param <https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/multi_data_param.xml>`__ (small test tool in Galaxy test suite)
Also see the tools-devteam repository `Pull Request #20 <https://github.com/galaxyproject/tools-devteam/pull/20>`__ modifying the cufflinks suite of tools for collection compatible reductions.
149
+
150
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
136
151
Identifiers
137
-
-------------------------------
152
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
138
153
139
-
As mentioned previously sample identifiers are preserved through mapping
154
+
As mentioned previously, sample identifiers are preserved through mapping
140
155
steps, during reduction steps one may likely want to use these - for
141
156
reporting, comparisons, etc.... When using these multiple ``data`` parameters
142
157
the dataset objects expose a field called ``element_identifier``. When these
@@ -155,22 +170,21 @@ derived from using a little ficitious program called ``merge_rows``.
.. note:: Here we are rewriting the element identifiers to assure everything is safe to
160
183
put on the command-line. In the future collections will not be able to contain
161
184
keys are potentially harmful and this won't be nessecary.
162
185
163
-
Some example tools which consume collections include:
164
-
165
-
- `multi_data_param <https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/multi_data_param.xml>`__ (small test tool in Galaxy test suite)
Also see the tools-devteam repository `Pull Request #20 <https://github.com/galaxyproject/tools-devteam/pull/20>`__ modifying the cufflinks suite of tools for collection compatible reductions.
170
-
171
-
-------------------------------
172
-
Processing Collections
173
-
-------------------------------
186
+
More on ``data_collection`` parameters
187
+
----------------------------------------------
174
188
175
189
The above three cases (users mapping over single tools, consuming pairs, and
176
190
consuming lists using `multiple` ``data`` parameters) are hopefully the most
@@ -218,18 +232,30 @@ Some example tools which consume collections include:
218
232
219
233
220
234
-------------------------------
221
-
Collection as an Output
235
+
Creating Collections
222
236
-------------------------------
223
237
224
-
Whenever possible simpler operations that produce datasets should be implicitly "mapped over" to produce collections - but there are a variety of situations for which this idiom is insufficient.
238
+
Whenever possible simpler operations that produce datasets should be
239
+
implicitly "mapped over" to produce collections as described above - but there
240
+
are a variety of situations for which this idiom is insufficient.
241
+
242
+
Progressively more complex syntax elements exist for the increasingly complex
243
+
scenarios. Broadly speaking - the three scenarios covered are the tool
244
+
produces...
225
245
226
-
Progressively more complex syntax elements exist for the increasingly complex scenarios. Broadly speaking - the three scenarios covered are the tool produces...
246
+
1. a collection with a static number of elements (mostly for ``paired``
247
+
collections, but if a tool does say fixed binning it might make sense to create a list this way as well)
248
+
2. a ``list`` with the same number of elements as an input list
249
+
(this would be a common pattern for normalization applications for
250
+
instance).
251
+
3. a ``list`` where the number of elements is not knowable until the job is
252
+
complete.
227
253
228
-
- a collection with a static number of elements (mostly for paired, but if a tool does say fixed binning it might make sense to create a list this way as well)
229
-
- a list with the same number of elements as an input (common pattern for normalization applications for instance).
230
-
- a list where the number of elements is not knowable until the job is complete.
254
+
1. Static Element Count
255
+
-----------------------------------------------
231
256
232
-
For the first case - the tool can simply declare standard data elements below an output collection element in the outputs tag of the tool definition.
257
+
For this first case - the tool can simply declare standard data elements
258
+
below an output collection element in the outputs tag of the tool definition.
233
259
234
260
::
235
261
@@ -239,7 +265,8 @@ For the first case - the tool can simply declare standard data elements below an
239
265
</collection>
240
266
241
267
242
-
Templates (e.g. the ``command`` tag) can then reference ``$forward`` and ``$reverse`` or whatever ``name`` the corresponding ``data`` elements are given - as demonstrated in ``test/functional/tools/collection_creates_pair.xml``.
268
+
Templates (e.g. the ``command`` tag) can then reference ``$forward`` and ``$reverse`` or whatever ``name`` the corresponding ``data`` elements are given.
269
+
- as demonstrated in ``test/functional/tools/collection_creates_pair.xml``.
243
270
244
271
The tool should describe the collection type via the type attribute on the collection element. Data elements can define ``format``, ``format_source``, ``metadata_source``, ``from_work_dir``, and ``name``.
245
272
@@ -252,6 +279,9 @@ The above syntax would also work for the corner case of static lists. For paired
252
279
253
280
In this case the command template could then just reference ``${paried_output.forward}`` and ``${paired_output.reverse}`` as demonstrated in ``test/functional/tools/collection_creates_pair_from_type.xml``.
254
281
282
+
2. Computable Element Count
283
+
-----------------------------------------------
284
+
255
285
For the second case - where the structure of the output is based on the structure of an input - a structured_like attribute can be defined on the collection tag.
256
286
257
287
::
@@ -262,6 +292,9 @@ Templates can then loop over ``input1`` or ``list_output`` when buliding up comm
262
292
263
293
``format``, ``format_source``, and ``metadata_source`` can be defined for such collections if the format and metadata are fixed or based on a single input dataset. If instead the format or metadata depends on the formats of the collection it is structured like - ``inherit_format="true"`` and/or ``inherit_metadata="true"`` should be used instead - which will handle corner cases where there are for instance subtle format or metadata differences between the elements of the incoming list.
264
294
295
+
3. Dynamic Element Count
296
+
-----------------------------------------------
297
+
265
298
The third and most general case is when the number of elements in a list cannot be determined until runtime. For instance, when splitting up files by various dynamic criteria.
266
299
267
300
In this case a collection may define one of more discover_dataset elements. As an example of one such tool that splits a tabular file out into multiple tabular files based on the first column see ``test/functional/tools/collection_split_on_column.xml`` - which includes the following output definition:
@@ -272,6 +305,12 @@ In this case a collection may define one of more discover_dataset elements. As a
0 commit comments