How to deal with optional values and channels #129
Pinned
AlexVCaron
started this conversation in
Ideas
Replies: 1 comment
-
@ThoumyreStanislas @gagnonanthony @Manonedde I want to start a discussion. The first post acts more as a brain dump on my part. I'd like us to find a way to |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
As we start developing workflows and using channels, we now hit the problem of having to play with
optionality
and deal correctly with channels or objects in them that are optional. I tend to always look the way that will make it the easiest for the end-user, as we are the experts and should deal with the complications and code repetitions, while still limiting them as much as possible.So now I want to start this discussion, so we can decide the best ways to take about
optionality
in the code base. I'd like to explore a few ways to deal with it, and have your ideas and experiences on the matter. To best orient the discussion, I'd like to divide it intwo parts
:Optionality in
Dataflows
. This happens insidechannels
, when they containtuples
(practically all the time in our case). At an index, the data will be missing. We detect it and patch it with avalid empty value
so a module that consumes it still works.Optionality in
Workflows
: This happens insidepipelines
andsubworkflows
, ingroovy
. Achannel
is missing, identified by anull
value (though any value that evaluates tofalse
ingroovy
does the trick). We detect it and consider its absence when doing operations onchannels
, e.g. either by addingempty values
in a channel instead of joining or usingif/else
conditions.There is good in doing the logic in either of those two parts. For now, we've mostly been using the
second way
, using conditions to either join a channel or, if absent, map empty values. It's a good way, but becomes super heavy with increasing count of optional channels.I am a big proponent of the
first way
of doing it. With it, we can achieve alloptionality
logic inside channels, and replace allif/else
conditions withfancy
uses of thejoin
operator and it's behavior withoutliers
. With this logic, anoptional channel
is eitherempty
(Channel.empty()
), contains a subset only of datapoints (or ids) or contains tuples withmissing indexes
. We don't check it with conditions anymore, since it doesn't work. Instead, we usejoin
withremainer: true
, which sets theoutliers
withnull
entries. Then, we filter thisjoined channel
to replace thenull values
with validempty values
.I've done it quite a lot in
versaFlow
. The preprocess workflow has only conditions onstep options
, the channels are prepared foroptionality
using a reference id channel and two functions filter_datapoints and fill_missing_datapoints.filter_datapoints
filters a channel based on an inputclosure
(the equivalent of a pythonlamba
in groovy) and returns a set ofids
(ormeta
) that abides to it.fill_missing_datapoints
takes a channel, a list ofids
that should be present, anindex
where thevalues
should be missing and afill_value
to use there. It then checks all outliersids
and patches them with thefill_value
at the givenindex
if needed. Typical use-case :It is not implemented in the best way possible, we can brainstorm on that and develop a solution better for
nf-scil
.Beta Was this translation helpful? Give feedback.
All reactions