Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Input filter enhancement] Adjusting input filter in node controller #243

Closed
LStruber opened this issue Nov 17, 2021 · 27 comments
Closed

[Input filter enhancement] Adjusting input filter in node controller #243

LStruber opened this issue Nov 17, 2021 · 27 comments
Assignees
Labels
enhancement New feature or request

Comments

@LStruber
Copy link
Collaborator

LStruber commented Nov 17, 2021

When you create a pipeline, you sometimes want to filter your database by a tag, or a patient name etc. There is a tool in MIA to do that: the input filter (which is automatically added to the pipeline, in iteration modes 2 and 3).

In the documentation, it is indicated that you select outputs of the filter by right clicking on it and selecting your desired keywords etc. in the opened window. Your filtered files are indicated in the "outputs" field of the input filter in the node controller. If you initialize, everything is working. No issue here.

However, one might expect that the filter button next to the outputs field in the node controller perform exactly the same action, since you would click here to filter your outputs of the input filter. And when you try to filter outputs with this button, it seems to work (filtered files are indicated in the "outputs" field of the node controller as previously), but when you initialize, the output field is reset to "all files in the database" and initialization is performed on all the database ! This is not a desirable behavior.

Minimum steps to reproduce:

  • In the pipeline manager, add a smooth brick and an input_filter brick
  • Export all unconnected plugs
  • In the input filter node controller, set as inputs all your database (filter button in "inputs" field then ok without filtering rule)
  • In the output filter node controller, set as outputs your filtered files (filter button in "outputs" field then filter on anat files only for example)
  • Initialize the pipeline
    The outputs are reset to all your project database, and in data browser the path of all database smoothed files are created.
@LStruber LStruber added the enhancement New feature or request label Nov 17, 2021
@servoz
Copy link
Contributor

servoz commented Nov 29, 2021

I spent some time trying to understand this ticket. I must admit that I am having some difficulty.

To help me understand, could you please edit your post and give the link to the documentation you are talking about?

For the Minimum steps to reproduce, is it the following pipeline?
Screenshot from 2021-11-29 18-45-46

Why not use the right click Export to database_scans to export the input plug of the Input_filter brick?
Screenshot from 2021-11-29 18-49-38

According to me the result you get is normal (and that's why I have a little trouble understanding this ticket, but it's not impossible that I just didn't understand!). When we run the brick, we define the input data, the brick does things with it and then it outputs data. Unless I'm mistaken you take all the data in input without applying any filter, you filter the output but it's only a temporary writing, which will be overwritten when the brick is run since all the data have been taken in input without doing any filtering. I have the impression that there is a confusion somewhere...

As far as I know, the input_filter brick needs the use of the right click Open filter (after exporting the input plug with right click Export to database_scan) or by doing the filtering directly in the controller for the input plug (which is the same under the hood ) to work properly. The output-only filtration will always be destroyed at initialisation time if no filtration is performed on the inputs (which is the purpose of this brick).

@LStruber
Copy link
Collaborator Author

For the Minimum steps to reproduce, is it the following pipeline?

Yes but it works with your second pipeline too

Why not use the right click Export to database_scans to export the input plug of the Input_filter brick?

Yes you can use export to database_scans instead of export plug and set as inputs your database. This comes to the same thing.

As far as I know, the input_filter brick needs the use of the right click Open filter (after exporting the input plug with right click Export to database_scan) or by doing the filtering directly in the controller for the input plug (which is the same under the hood ) to work properly. The output-only filtration will always be destroyed at initialisation time if no filtration is performed on the inputs (which is the purpose of this brick).

Finally it is indeed an issue of comprehension of the brick. When you set the filter from right click Open filter, that leaves the inputs of the filter unchanged (in the controller), while that assigns your filtered files to outputs. This is logical, your filter take the database as input and set its output through filtering the database.

Then, to me, it is very illogical that Open filter perform the same action that filtering the controller for the input plug. Setting the input plug should not be possible as it is directly linked to database_scans. On the other hand, setting the ouput plug should set the output of your filter and then the filtering.

I guess you understand it the other way, you set the inputs because you want to filter your inputs, but as a consequence when you filter inputs from the controller, your input are set to your filtered files and your output remains empty (whereas using right click open filter set the database as input and your filtered files as output)

@servoz
Copy link
Contributor

servoz commented Nov 30, 2021

After discussion in private message with @LStruber, we thought it would be good to remove the Filter button from the Outputs part in the controller because it can be confusing for users since it is the brick that will create the outputs (so it is useless to try to define the outputs with the Filter button because it is at the time of initialisation and run that the output values will actually be created).

The Input_filter brick must also allow to take as input either the whole database (right click Export to database_scans) or the output of the previous brick (connected to the input plug of the Input_filter brick). Basically, the role of the Input_filter brick is to take a list of images as input and to filter it according to the tags of these images.

In summary, to close this ticket, we need to:

  • Remove the output Filter button in the controller (only for the V2 controller).
  • Test and modify the codes in this sense if necessary, so that the Input_filter brick accepts either the whole database (right click Export to database_scans) or the output of the previous brick.
  • Test and modify the codes in this direction if necessary, so that the brick works well as a filter from the database (seems to work well now) or from the output of the previous brick (does not seem to work now). Particular attention will then be paid to manage the case where the filter contains tags that do not exist for the input data (no crash, exception handling if exist)

Any remarks or comments on this roadmap are welcome!

@servoz servoz self-assigned this Nov 30, 2021
@denisri
Copy link
Contributor

denisri commented Dec 1, 2021

OK I understand, it's a user interface issue.
I completely agree that the database Filter button on file parameters of processes is normally an "input feature", and should probably not be displayed on output plugs, and probably even not on the input plugs of processes which are connected to upstream outputs.
In the specific case of the Input_Filter brick, we could imagine, as @LStruber suggests, that this Filter button could be a shortcut to the right click popup and "Open filter" menu for this brick, but it is not a general behaviour.
Speaking of this, I also agree that from the naive user point of view, it's probably not very intuitive, and not very convenient to need a right click on the Input_Filter nodes and to select the "Open filter" option in the menu, a graphical indication (button on the brick box indicating that a user action is needed ?)

@denisri
Copy link
Contributor

denisri commented Dec 1, 2021

I'm trying to change this...

denisri added a commit that referenced this issue Dec 1, 2021
or inputs connected to the output of another node (#243)
@denisri
Copy link
Contributor

denisri commented Dec 1, 2021

OK I have removed the Filer buttons on outputs and connected inputs.

@servoz
Copy link
Contributor

servoz commented Dec 1, 2021

I finally found some peace to come back to this ticket!

So great @denisri, your codes change exceeds what we imagined with @LStruber !!! Effectively it is also necessary to remove Filter for the inputs plugs of processes which are connected to upstream outputs. Very nice.

However, when I start working on the rest of the ticket I notice that there is still a Filter if the input is a list (for each element of the list). At first feeling I think it's not necessary, but I'm just starting to think about it. For example, we take the Input_filter brick and connect it as input to the database (right click Export to database_scans). Then the Filter disappears next to the input plug. But the Filter still exists for each element of the list if we look at the list.
Screenshot from 2021-12-01 18-25-01

@servoz
Copy link
Contributor

servoz commented Dec 1, 2021

Speaking of this, I also agree that from the naive user point of view, it's probably not very intuitive, and not very convenient to need a right click on the Input_Filter nodes and to select the "Open filter" option in the menu, a graphical indication (button on the brick box indicating that a user action is needed ?)

Okay, I agree with this idea. I'll see what can be done in that direction. There is an aesthetic side to it! A simple QPushButton? The chosen solution has to be adaptable to a resizing of the node by the user ... I'll take any good ideas on the subject.

@servoz
Copy link
Contributor

servoz commented Dec 1, 2021

The more I think about user friendly access to the filter in the Input_Filter brick, the more I think the best solution would be, as @LStruber suggested, to have exceptionally a Filter button in the controller for the input plug, of the Input_Filter node, that would open the same window as when we right-click - Open filter ... Indeed, in this case, as noticed @denisri , it is not a general behavior. However, I still don't see the aesthetic solution in the graphic box...

@servoz
Copy link
Contributor

servoz commented Dec 3, 2021

Test and modify the codes in this direction if necessary, so that the brick works well as a filter from the database (seems to work well now) or from the output of the previous brick (does not seem to work now). Particular attention will then be paid to manage the case where the filter contains tags that do not exist for the input data (no crash, exception handling if exist)

I spent some time on this part today and the least we can say is that the fix is not trivial because if we want to do things correctly we have to move on the old chestnut of the initialisation and indexing of the database with "the data of the future".

Indeed, we are currently indexing the database at the initialisation time when the outputs do not yet exist. This is very bad and it would be best to index the database at runtime and to do it really well when each process has finished running.

Especially for this part of the ticket it is currently not possible to filter on the input of the Input_filter brick/node when this plug is connected to an output of another brick because the indexing of the database is currently done after the initialisation of the whole pipeline. This means that the Input_filter brick is only really useful (functional) if it is connected directly to the database, which greatly reduces the usefulness of the Input_filter brick...

The only way I can see to deal with this part of the ticket would be to make some fairly major changes:

  • We index the outputs in the database after the initialisation of each brick, so it will be possible to filter with the Input_filter brick in all cases. As we don't want to actually index the database at the time of the initialisation we should do this in a new temporary collection (COLLECTION_CURRENT_TEMP ?, which would be a copy of COLLECTION_CURRENT at the start of the initialisation ), which would allow the user to see the result of the initialisation (and which represents what will be expected during the run) by opening a pop-up at the end of the initialisation with a display of this temporary and ephemeral database.
  • The database will be definitively indexed at the time of the run. As the previous discussions indicated that doing this after the run of each brick will still quite difficult in the case of remote run, we could already consider doing it like now (we will stay with data from the future) but at the time of the run. We are getting closer to what we want to do ...

As these changes will be important, I think we have to think about it before we start.

All suggestions, remarks, ideas, discussions on this topic are welcome!

@LStruber LStruber changed the title [Input filter bug] Filtering outputs of input filter in node controller is not working [Input filter enhancement] Adjusting input filter in node controller Dec 6, 2021
@LStruber
Copy link
Collaborator Author

LStruber commented Dec 6, 2021

I looked a bit into this issue this afternoon and I've a concern about it.
As a result of recent changes, contrary to brick initialisation that is performed on pipeline (complete_parameters() function), the database is indexed using a workflow object that is created from pipeline after completion.
In other words initialization is performed as follows:

  1. complete pipeline (and bricks) parameters that are not specified by user
  2. convert pipeline to workflow
  3. index database from workflow

In my opinion, it is then not easy to index database after completion of each brick (either in initialisation or before run). To do that, we would need to :

  1. Either index database at each step of completion (then on pipeline) --> in this case, we will reface all problems that has been solved by initializing through workflow ([iteration enhancement] Check on 3rd party software and mandatory input parameters is not working in initialization #242 [Iteration enhancement] In Data Browser, for Bricks tag (history of the pipeline), Exec and Exec Time are wrong #218 [Iteration enhancement] In Data Browser, the Bricks tag contains n times THE SAME BRICK #219 at least)
  2. Or perform completion on workflow in order to loop the entire initialization on each job -> the problem here is that the completion engine is constructed around pipeline and seems to be efficient, it would be annoying to recode it
  3. Or maybe another "mixed" solution where we complete as much parameters as we can through pipeline, then update the database accordingly through workflow, retry to complete "uncompleted" parameters and update the database again (and do that until completion does not change anymore)

I need to think a bit more about the third solution to know if it would be possible/interesting/necessary or whatever.

I hope I've been clear, but not sure !

@servoz
Copy link
Contributor

servoz commented Dec 6, 2021

Yes, it is very clear ... that it is not very easy to index the database after each node/process initialisation!

However, this is the only solution if you want to be able to use the Input_filter brick in all cases (and especially if we want to filter a brick input linked to the output of another brick).

.. Or we can decide to filter only from the database (which already works), but this removes a major interest from the filtering of the brick input (we can already do it without the Input_filter brick, with the Filter button in the controller!).

If my memories are good (it's getting old) in V1 it worked because we initialised each brick one after the other with indexing after each initialisation. I admit that the code of this part in V1 was not very elegant and that in V2 the method is much more readable and elegant... However, it would be a shame to lose this feature (the possibility to filter all entries, not only those coming from the database!).

However I am confident !!!! From the discussion will come the light!

@denisri
Copy link
Contributor

denisri commented Dec 6, 2021

I didn't know the input filter brick was expected to work when connected to the output of another filter, so I didn't run into this problem. However it's certainly an interesting feature...
In my vision of things, assigning parameters values that can be guessed from other parameters or database tags is the job of the completion system. I vaguely envision (but without thinking too much...) that operating the filter afterwards would introduce a multi-pass initialization which would rapidly become an untractable gas plant (with large overheads for multiple completions of the same nodes). This particular node however, expects data which is not already existing, to be already indexed the database, in order to make the filter work. If we need to support this use case, I guess that we should index a temporary database (or temporary collection in the database) during completion of each node, as part of a derived completion system for Mia. In other words, indexing should not be performed at the end of initialization after completion, but inside the completion system. This way initialization would still be a single pass over each node, and in a general infrastructure without special cases for input nodes.
However this contradicts the former idea that adding data into the database should be done after data actually exist... Thus maybe the temporary database / collection is the right one.

@servoz
Copy link
Contributor

servoz commented Dec 6, 2021

This particular node however, expects data which is not already existing, to be already indexed the database, in order to make the filter work

In fact I have the impression that this is the case for all the bricks in a pipeline (except the first one which is linked to the database). Except for the first brick linked to the database, all bricks receive inputs from the future at the time of completion or initialisation (including the indexing of outputs) as we currently produce it. The only difference for the Input_filter brick is that for this one we don't do any calculation with the inputs, we just proceed to a filtration of these data coming from the future for the next brick. I also have the feeling that the indexing step should be integrated at the time of completion (inside the completion system), it would be done in a temporary database that would allow the user to visualise the state of the database expected after the run step (we thus fulfill one of the goals of the initialisation step which is to control, as much as possible, that everything should be fine at the time of the run.

However this contradicts the former idea that adding data into the database should be done after data actually exist... Thus maybe the temporary database / collection is the right one.

I stay convinced that the persistent database should only be definitively indexed after each (successful) calculation for each brick ... or at least as close as possible to that ... So I think that using an ephemeral database for the initialisation step is the best method (at least at the current state of our thought !) Anyway, the same issue will occur at run time, if we don't index after each brick that has been run (the Input_filter brick won't be able to filter if the input data are not in the database) ....

I think we are really discussing a deep concept for mia here. There are several opened tickets that are not at all trivial, that require thought, discussions and that are ultimately very strongly connected. Without listing them all, there is for example a ticket where we can ask ourselves for the feasibility of having only one database for all the tabs opened in the editor (#150)... If we initialise all the pipelines in all the opened tabs and then launch a run on one of the initialised tabs, I think that it could go wrong with the current functioning ... To do it right we should try to deal with these tickets together and find a solution that satisfies everyone and allows to answer all these tickets (which I believe are very strongly connected!) ...

Anyway, let's come back to this particular ticket ...

We currently have a problem with the initialisation which could be fixed with the use of a temporary database and an indexing integrated in the completion step (one brick after the other).
We also have a problem with the run step and here I don't see how to do anything else than indexing after the run of each brick and it doesn't seem very easy in the case of deported run.

However it is not impossible that we find a trick to achieve this !!!!

Or as I said, there is also the easy solution which would be to consider that the Input_filter brick has a real interest only when linked to the data base (it already works!)
In short, the discussion is really open!

@servoz
Copy link
Contributor

servoz commented Dec 6, 2021

I have just removed the automatic generation of the output plug value in the Input_filter brick when we make the filter definition (Open filter).
I think that it could mislead the user and especially it was an atypical functioning compared to the other bricks (outputs should only be generated when the brick is launched, and maybe for checking reason that all is well also at initialisation - with a temporary database, see current discussions)

@servoz
Copy link
Contributor

servoz commented Dec 6, 2021

Just a little point on what we have to do (independently of the substantive discussions undertaken):

In summary, to close this ticket, we need to:
-1 Remove the output Filter button in the controller (only for the V2 controller).
-2 Test and modify the codes in this sense if necessary, so that the Input_filter brick accepts either the whole database (right click Export to database_scans) or the output of the previous brick.
3 - Test and modify the codes in this direction if necessary, so that the brick works well as a filter from the database (seems to work well now) or from the output of the previous brick (does not seem to work now). Particular attention will then be paid to manage the case where the filter contains tags that do not exist for the input data (no crash, exception handling if exist)

  • 1: the "Filter" button always seems to exist when it is a list (see above)
  • 2: Ok
  • 3: Curently, only working when directly connected to database.
  • 4: user friendly access to the filter in the Input_Filter brick instead of right-clicking Open filter (button on the node box): Honestly for me this is rather a low priority (it is comfort when we still have to deal with the essentials on other points). I propose to wait a little for this point.

As there are several points still to be settled in this ticket, we could close it and open several specific ones (it might get messy if we mix up the discussion and the specific points to be dealt with) ?

@LStruber
Copy link
Collaborator Author

LStruber commented Dec 7, 2021

In fact I have the impression that this is the case for all the bricks in a pipeline (except the first one which is linked to the database). Except for the first brick linked to the database, all bricks receive inputs from the future at the time of completion or initialisation (including the indexing of outputs) as we currently produce it. The only difference for the Input_filter brick is that for this one we don't do any calculation with the inputs, we just proceed to a filtration of these data coming from the future for the next brick.

I did not even think to that before... It brings me to a question, which might be silly since I did not dig into mia_processes so far. Why this particular brick fail to complete its outputs from its inputs (when they are not yet created in the database), while other bricks manage to do it ? Wouldn't it be possible to change only the brick functioning (mimicking other bricks) to solve this particular issue ? (not talking about when initializing). Filtering "from inputs" instead of "from database".

Regarding your comment, ok for splitting thie issue in different ones.

@servoz
Copy link
Contributor

servoz commented Dec 7, 2021

During the initialisation (see the list_outputs method of the bricks) the Input_filter process queries the database whereas for all other processes it is only a parameter completion step. The inputs retrieves the outputs of the previous brick and the outputs is determined by completion.

@LStruber
Copy link
Collaborator Author

LStruber commented Dec 7, 2021

And what prevent us to do "only a parameter completion step" for the input_filter brick ?

@denisri
Copy link
Contributor

denisri commented Dec 7, 2021

In fact I have the impression that this is the case for all the bricks in a pipeline (except the first one which is linked to the database). Except for the first brick linked to the database, all bricks receive inputs from the future at the time of completion or initialisation (including the indexing of outputs) as we currently produce it. The only difference for the Input_filter brick is that for this one we don't do any calculation with the inputs, we just proceed to a filtration of these data coming from the future for the next brick.

There is a notable difference however: most (all, actuallly) other nodes perform completion based either on filenames (strings) given as some of their (input) parameters (like nipype nodes), or on a set of attributes (strings also) possibly given by a database, or by filenames parsing, or by another means, but which are given as a dictionary of strings: the database itself is not mandatory and is only used upstream to get either filenames or attributes dicts. The input filter node is different in the way that it is using directly a database (files already available in a database, even if they are not existing files), which makes the database mandatory, and needs it to be available, and up-to-date during completion. This is probably not a problem but this particular node has thus a specific requirement for completion, which other nodes do not have.

@servoz
Copy link
Contributor

servoz commented Dec 7, 2021

And what prevent us to do "only a parameter completion step" for the input_filter brick ?

in this case we lose the Input_Filter brick interest. Finally, the interest as we defined it almost at the beginning of mia. Maybe it is time to redefine some things, needs?

@servoz
Copy link
Contributor

servoz commented Dec 7, 2021

There is a notable difference however: most (all, actuallly) other nodes perform completion based either on filenames (strings) given as some of their (input) parameters (like nipype nodes), or on a set of attributes (strings also) possibly given by a database, or by filenames parsing, or by another means, but which are given as a dictionary of strings: the database itself is not mandatory and is only used upstream to get either filenames or attributes dicts. The input filter node is different in the way that it is using directly a database (files already available in a database, even if they are not existing files), which makes the database mandatory, and needs it to be available, and up-to-date during completion. This is probably not a problem but this particular node has thus a specific requirement for completion, which other nodes do not have.

I completely agree, the fundamental difference of the Input_filter brick is that it needs an updated database for all the i/o of the previous bricks ... and that there is not really any completion to do at the initialisation since we can even consider that the goal of this brick is precisely to do the completion for the output ...

@denisri
Copy link
Contributor

denisri commented Dec 7, 2021

For me I see the job of the input filter being precisely to perform completion. It is a specific completion, but can take place within the existing completion framework. It only performs its job during completion, and has no runtime work. This way the current infrastructure and way we operate doesn't need to change (or, not much). The only things would be to:

  1. integrate the filtering as a completion implementation for the input filter node.
  2. update a temporary database during completion of each node
    The completion system is already overloaded in Mia, thus I don't think this would be so difficult to implement.

Unless we want it to do its job at runtime, but then we enter the world of "dynamic pipelines" where parameters, and iterations could change during runtime, but this schema is not supported for now, and would question any initialization anyway - we are not on this field for now.

@servoz
Copy link
Contributor

servoz commented Dec 7, 2021

For me I see the job of the input filter being precisely to perform completion. It is a specific completion, but can take place within the existing completion framework. It only performs its job during completion, and has no runtime work.

This is another, perhaps clearer, way of saying:
"and that there is not really any completion to do at the initialisation since we can even consider that the goal of this brick is precisely to do the completion for the output". So I totally agree !

Unless we want it to do its job at runtime, but then we enter the world of "dynamic pipelines" where parameters, and iterations could change during runtime, but this schema is not supported for now, and would question any initialization anyway - we are not on this field for now.

I don't think we want to enter the world of "dynamic pipelines". I think we have a deterministic view of how the pipeline works. At the beginning we can already know everything that will happen, we don't have in mind a functioning where the parameters and initialisations could change along the way. However, we can consider that the runtime will be identical to the initialisation, we say that the runtime does not change because it has already been done during the initialisation... Well, I think I'm splitting hairs here and we have something else to do. The main thing is that I think we agree on what we want and how to get there!

denisri added a commit that referenced this issue Dec 7, 2021
as discussed in #243, the filter is confusing.
@denisri
Copy link
Contributor

denisri commented Dec 7, 2021

I have removed the filter button for file parameters inside a list. I could not distinguish between lists which can use the filter and those which should not (outputs and inputs connected to upstream outputs) since in individual file editors we don't have the context of the parent list. But I guess the filter is mainly useful at list level, when a list is involved.

@servoz
Copy link
Contributor

servoz commented Dec 7, 2021

I have removed the filter button for file parameters inside a list. I could not distinguish between lists which can use the filter and those which should not (outputs and inputs connected to upstream outputs) since in individual file editors we don't have the context of the parent list. But I guess the filter is mainly useful at list level, when a list is involved.

Cool ! I think this is exactly what we needed!

@servoz
Copy link
Contributor

servoz commented Dec 7, 2021

I have just opened 3 tickets #248 #249 #250, for the sake of simplicity, with the 3 actions that would need to be done to fully fix the issues raised in this ticket. Feel free to add/change to them if I have forgotten something or if it is not clear!
I think we can close this ticket now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants