How to deal with optional values and channels #129

AlexVCaron · 2024-04-19T15:25:39Z

AlexVCaron
Apr 19, 2024
Maintainer

As we start developing workflows and using channels, we now hit the problem of having to play with optionality and deal correctly with channels or objects in them that are optional. I tend to always look the way that will make it the easiest for the end-user, as we are the experts and should deal with the complications and code repetitions, while still limiting them as much as possible.

So now I want to start this discussion, so we can decide the best ways to take about optionality in the code base. I'd like to explore a few ways to deal with it, and have your ideas and experiences on the matter. To best orient the discussion, I'd like to divide it in two parts :

Optionality in Dataflows. This happens inside channels, when they contain tuples (practically all the time in our case). At an index, the data will be missing. We detect it and patch it with a valid empty value so a module that consumes it still works.
Optionality in Workflows : This happens inside pipelines and subworkflows, in groovy. A channel is missing, identified by a null value (though any value that evaluates to false in groovy does the trick). We detect it and consider its absence when doing operations on channels, e.g. either by adding empty values in a channel instead of joining or using if/else conditions.

There is good in doing the logic in either of those two parts. For now, we've mostly been using the second way, using conditions to either join a channel or, if absent, map empty values. It's a good way, but becomes super heavy with increasing count of optional channels.

I am a big proponent of the first way of doing it. With it, we can achieve all optionality logic inside channels, and replace all if/else conditions with fancy uses of the join operator and it's behavior with outliers. With this logic, an optional channel is either empty (Channel.empty()), contains a subset only of datapoints (or ids) or contains tuples with missing indexes. We don't check it with conditions anymore, since it doesn't work. Instead, we use join with remainer: true, which sets the outliers with null entries. Then, we filter this joined channel to replace the null values with valid empty values.

I've done it quite a lot in versaFlow. The preprocess workflow has only conditions on step options, the channels are prepared for optionality using a reference id channel and two functions filter_datapoints and fill_missing_datapoints.

filter_datapoints filters a channel based on an input closure (the equivalent of a python lamba in groovy) and returns a set of ids (or meta) that abides to it. fill_missing_datapoints takes a channel, a list of ids that should be present, an index where the values should be missing and a fill_value to use there. It then checks all outliers ids and patches them with the fill_value at the given index if needed. Typical use-case :

reference_ids_channel = Channel.from( [ 1, 2, 3, 4, 5 ] )
data_channel = Channel.from( [ [1, "allo", "salut"], [2, "mon"], [3, "coco", "ami"] ] )

complete_data_channel = filter_datapoints( data_channel, { it.size() >= 3 } )
data_with_outliers_channel = fill_missing_datapoints( complete_data_channel, reference_ids_channel, 1, ["!", "!"] )
data_all_complete_channel =  fill_missing_datapoints( data_with_outliers_channel, reference_ids_channel, 2, ["bing bong"] )

data_all_complete_channel.view() // should contain : [ [1, "allo", "salut"], [2, "mon", "bing bong"], [3, "coco", "ami"], [4, "!", "!"], [5, "!", "!"] ]

It is not implemented in the best way possible, we can brainstorm on that and develop a solution better for nf-scil.

AlexVCaron · 2024-04-19T15:28:29Z

AlexVCaron
Apr 19, 2024
Maintainer Author

@ThoumyreStanislas @gagnonanthony @Manonedde I want to start a discussion. The first post acts more as a brain dump on my part. I'd like us to find a way to functionalize this into a standard way of dealing with optional channels, so we can provide easy to use functions in all subworkflows and pipelines.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal with optional values and channels #129

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to deal with optional values and channels #129

AlexVCaron Apr 19, 2024 Maintainer

Replies: 1 comment

AlexVCaron Apr 19, 2024 Maintainer Author

AlexVCaron
Apr 19, 2024
Maintainer

AlexVCaron
Apr 19, 2024
Maintainer Author