Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[general need] Currently, DataBrowser Bricks tag does not give at a glance the whole history of a data #236

Closed
servoz opened this issue Nov 4, 2021 · 59 comments
Assignees
Labels
enhancement New feature or request

Comments

@servoz
Copy link
Contributor

servoz commented Nov 4, 2021

In the DataBrowser, we have a special tag (Bricks) that has been created to provide the history of a data.

Let's say we have this pipeline:
anat.nii -> |BrickA| -> A_anat.nii ->|BrickB| -> A_B_anat.nii

Currently, after launching this pipeline, we will find in the DataBrowser, the Bricks tag with only the last brick which the document has passed through. For this example:
anat.nii: Bricks tag is empty
A_anat.nii: Bricks tag = BrickA
A_B_anat.nii: Bricks tag = BrickB

I think this is not optimal. Indeed, we may want to have at a glance the whole history of a data (rather than having to reconstruct this history, which may not be instantaneous). In this case we want in DataBrowser:

anat.nii: Bricks tag is empty
A_anat.nii: Bricks tag = BrickA
A_B_anat.nii: Bricks tag = BrickA BrickB

@servoz servoz added the enhancement New feature or request label Nov 4, 2021
@servoz servoz changed the title Currently, DataBrowser Bricks tag does not give at a glance the whole history of a data [general need] Currently, DataBrowser Bricks tag does not give at a glance the whole history of a data Nov 4, 2021
@servoz
Copy link
Contributor Author

servoz commented Nov 4, 2021

I haven't thought about how to do this yet, but since we are currently working on tickets about iteration and indexing, I think we can handle this ticket at the same time. Indeed, we can imagine fixing this ticket by inheriting the Bricks tag from the input data and adding the passed through brick.

Things may not be so simple in real life ... For example, if there are several input data...
There is also the case of the estimate option of spm where the output is the same as the input (we only change the header) ... IAnyway, it will be necessary to think a little before coding!

@servoz
Copy link
Contributor Author

servoz commented Jan 4, 2022

After a private discussion with @LStruber , it was decided:

  • Retrieving all the information for each crossed brick before arriving at the final data may not be trivial (just as examples: case of pipelines with parallel branches, case of spm bricks with the "estimate" option that will rewrite the input data header, etc ...), but the latest work on the self.workflow object when improving iterations should make possible to manage this without too much difficulty (to be tested).
  • If this recovery is possible in all cases, it does not seem desirable to display all the crossed bricks in all cases, for aesthetic reasons in the case of a pipeline with many bricks. A solution would be to use a QComboBox displaying by default the last crossed brick, and all the bricks if the user decides to display them all.

@LStruber
Copy link
Collaborator

LStruber commented Jan 4, 2022

I just looked in workflow object, that has a "dependencies" attribute listing the dependencies (links) between jobs. Then, using it, it should not be too hard to recursively retrieve all dependencies of a brick.

However, thinking about these dependencies, I wondered how to order dependencies in the bricks tag, especially when a brick depends on several bricks.
For example taking the Spatial_preprocessing pipeline:

image
The normalize12_2 brick depends on new_segment and coregister bricks, then there is two "paths" of dependencies:

            .--> NewSegment
Normalize--|
            '--> Coregister --> Realign

How do we want to order these dependencies in the bricks tag ? Do we just append all previous dependencies in a list without taking into account the "paths" ? Do we need to show to the user that normalize relies on two bricks and then that coregister only depends on realign, and how do we do that ?

We could imagine an "expandable" list as for packages:

> Normalize
    > NewSegment
    > Coregister
        > Realign

Of course it is simpler not to do that and just indicate all previous dependencies at the same level. Moreover, "paths" could be more complicated than in Spatial_preprocessing with bricks with more than two dependencies and this in cascade:

> brick A
    > brick Aa
        > brick Aa1
        > brick Aa2
    > brick Ab
> brick B
> brick C
    > brick Ca
    > brick Cb
        > brick Cb1
        > brick Cb2
            > brick Cb2_alpha

and so on...

@LStruber LStruber self-assigned this Jan 5, 2022
@servoz
Copy link
Contributor Author

servoz commented Jan 7, 2022

oh yes, you're right, I hadn't thought about the display to adopt according to the "paths" of dependencies ...
I confess that I need to think about this a bit ...

@denisri
Copy link
Contributor

denisri commented Jan 7, 2022

Hi,
I haven't followed closely this issue until now, but I have a different point of view (I'm not saying my view is the best, just that there are other ways to look at it).
As you have seen, the complete history of processing chains leading to produce a given data is complex: it's not always a linear chain, you have shown situations where it is a tree, but in a general way it's not even a tree, it's a graph.
Thus the thee view is, not so bad, but still not an exact view of the reality.
My opinion is that:

  • the complete history of a single data should not be entirely written with it in the database: we have the "direct ancestors" and it's enough to build the graph using requests. I think it's better to keep the databbase as simple and small as possible, avoiding to put in it redundant information (large portions of history will be the same for multiple data inside a pipeline).
  • we don't need to display it all the time. I guess the user may ask it in a specific action, but we don't need to display it by default. Here again it's a simplicity principle: don't show things you don't need to see, but only useful things.
  • when the user wants to see the complete history of a data file, it's generally because there is a problem with it, and it needs thinking, and also needs a clear view, taking as much space as needed on screen, in order to allow thinking on it (many people, including me, draw figures and schemas to help them think), so let's give the user a full figure for it, possibly in a separate window, only when he/she needs it. I think a graph representation, with graphical boxes, just like pipelines, would be the best. We could do this using graphviz, or Qt, maybe even using the pipeline view (we could build a sort of custom pipeline for it). The view could be triggered by clicking on a button or item on the data entry in the database view.

@servoz
Copy link
Contributor Author

servoz commented Jan 10, 2022

@denisri, I think we agree on the main point, which is to offer the user a simple, avoiding redundancy and useful solution.

It is also true that the potential complexity of a pipeline (that you take in a general way as a graph) makes difficult a simple representation with a succession of bricks, in the special Bricks tags/field of the database, as we had imagined at the beginning ...

I am for the most efficient solution that will be realised as soon as possible (in this I tend to prefer a non-perfect but existing solution rather than the perfect solution that does not exist) and I am always ready to take the best solution even if it was not the one I proposed. So I am not against following your proposal which seems to meet the expectations.

To make sure that we are in agreement on the expectations:

The main idea is to have, when I am interested in a data, all its history:
- (1) the pipeline used to obtain the data
- (2) but also all the parameters used in this pipeline (up to the parameters of each brick/process, for example, for a smooth spm brick - to take a simple process - the parameters in_files, fwhm, out_prefix, etc.).

This is to be able to think when there is a problem with the data/pipeline, but not only that, it can be simply when we come back to a result several months later and want to see how exactly it was obtained.

For the (1) point, I totally agree that the best solution would be to have the possibility to access the pipeline as it is available in the editor and as you propose in a different window. This would be a real improvement compared to what we had imagined at the beginning.
For point (2), if we can find a way to have access to all the pipeline parameters, I say yes!

However, I must admit that I don't completely see how we can implement this. I think @LStruber and I are ready to code in this direction but we'll have to agree and understand, on at least a minimum, of how to proceed!

The idea is to allow to open a window reproducing the pipeline as we see it in the editor (with the possibility to open the pipeline to see the bricks in it, etc.) and the controller to have access to the parameters ? I must admit that if this is the solution you are thinking about, I don't see a better solution!

@denisri
Copy link
Contributor

denisri commented Jan 10, 2022

(1) I don't think (as far as I remember) that full pipelines are recorded in the history, but just individual processes (bricks), thus we cannot withdraw the exact full pipeline that has produced the given data, but we can rebuild the part of processing graph that has lead to this data. In "standard" processing, we have to follow upstream data and processes that have produced them, and rebuild a pipeline from them. (2) will come for free since parameters are recorded in the history with process instances. We will get the full history of the data and its "ancestors", even if they were produced by different pipelines at different times. Up to now it's not very difficult.
It's more difficult (and maybe not even possible) when, as you have noted it, a data is written several times, either by a process that modifies the input data (such as a SPM registration or normalization), or a cycling processing graph. Then a single data has been written several times by distinct processes. I don't remember if, in such a case, we record all the processes that have written it, or just the last one. In the first case we can do something using the timestamp (which may be imprecise or wrong thus may lead to errors in the pipeline order...), in the second case we lose part of the history.
Then we have to dig (again) in the data history questions...
It's possible (I vaguely remember) that the data history is erased when a process writes it, unless the same data is both an input and an output of the process, in which case an entry is added without erasing previous ones. But this is still insufficient since processing cycles may happen with several distinct processes. For instance we can perform a first normalization, write a normalized image, then segment the brain, extract a skull-stripped brain, and perform a second normalization that owerwrites the first one (that's what our Morphologist pipeline does, actually). We can say it's bad to erase the initial normalization and it's better to write a second, distinct, one, but we cannot impose what pipelines actually decide to do (we could for Morphologist because it's ours and we can modify it, but in a general situation we cannot), and sometimes it's not so bad to get rid of intermediate data that are not that important and consume disk space.
On the other hand if we never erase a data history, if we run a pipeline several times, the history will uselessly grow. So the only "good" method is to erase the history only if the process writing a data has not the data itself in its own input history. This is a bit tricky.
Saying that, there is another situation that I came to evoke just below: when intermediate data have been erased to save disk space. Then the history of the data we want to explore involves non-existing other data, which likely do not have a recorded history any longer - unless we keep the history of deleted data. This is another question.

@denisri
Copy link
Contributor

denisri commented Jan 10, 2022

Anyway I can try to propose a pipeline rebuilding tool.

@servoz
Copy link
Contributor Author

servoz commented Jan 10, 2022

(1) I don't think (as far as I remember) that full pipelines are recorded in the history, but just individual processes (bricks), thus we cannot withdraw the exact full pipeline that has produced the given data, but we can rebuild the part of processing graph that has lead to this data.

yes on the master branch, but @LStruber has just started to work on the fix4#236 branch which at the time of the initialisation/run (which are now launched automatically) recovers all the informations (at this time we have all the informations for pipeline and parameters) to propose in the database Bricks field the whole of the bricks crossed to arrive at the data in questions and thanks to a combo box to represent by default only the last brick, but the user can display all the bricks if he wishes it. You can if you have not done it yet test this branch fix4#236, just to see what it gives (see the database Bricks field in DataBrowser).

Now there is a problem in the case of a non linear pipeline, which is certainly the most common case, as raised by @LStruber earlier and I don't come back to it. This way is not perfect, for all the reasons discussed earlier, but it has the advantage of allowing to keep all the history at the time of the run and to save it in the database

Doing it afterwards while rebuilding will inevitably lead to loss of information and to the impossibility to rebuild the history in any case. I think that we should find a solution that is based on both points of view... What you propose (possibility to represent exactly the pipeline as we see in the editor and visualisation of all the parameters of the bricks as in the controller) and what we had imagined by saving the information (the self.workflow object or something derived from it?) at runtime ?

@denisri
Copy link
Contributor

denisri commented Jan 10, 2022

I haven't looked at branch fix4#236 (yet - I can't switch right now because I have modifs in progress in my current branch). But I wonder how you overcome the difficulties that have been raised above:

  • How do you know when to erase the history of a data which will be re-written ?
  • How do you get the list of bricks involved in the output of the given data ? Do you search it through the whole history, or in the current pipeline ? Does it take into account the fact that a data may be the result of several pipeline independent runs ?
  • Does it duplicate full pipelines in the history of a single file ? (I fear we store iterations over thousands of data in the history of any single data)

Doing this, have you modified the structure of the database ?

@servoz
Copy link
Contributor Author

servoz commented Jan 10, 2022

@LStruber since you are the one who coded the branch in question, can you answer @denisri ?
I can do it but I think you will be more precise than me?

@LStruber
Copy link
Collaborator

Currently in master branch, when a pipeline is run, all executed bricks are added to the brick table of the database (COLLECTION_BRICK) and all output files are added to the main database (COLLECTION_CURRENT). The link between both tables is done through the bricks tag of COLLECTION_CURRENT, which is appended by the (last) brick that produced the file.

What I did in fix#236 branch is that instead of appending only the last brick that produced the file, I appended in the bricks tag every bricks that were used to produce the file. Then to answer your questions :

  • I did not deal with this point, then I guess that behavior is the same than in master branch.
  • I retrieve list of bricks by recursively looking at dependencies attribute of the workflow. Then I only search in history of the current pipeline, but since the bricks tag is appended, the previous history is not erased. In this sense, it should take into account the fact that a data may be the result of several pipelines.
  • Since the bricks tag is only a link between COLLECTION_BRICK and COLLECTION_CURRENT, I did not add any entries in the database, and did not duplicate any history.
  • The structure of database is not modified, I just appended more bricks in the bricks tag than before.

Here is a screenshot of the result:
image

@denisri
Copy link
Contributor

denisri commented Jan 10, 2022

OK then, the only concern I still have is erasing a file history: if I understand you always append bricks ids in the bricks tag of file data, and never remove anything from it ?
So if I run a pipeline once, then (maybe another day) I rebuild the same pipeline and run it again, maybe with different parameters (say, different smoothing value for instance), from the same input data, then all outputs will be overwritten, but the previous run will still be in the history, even if there is nothing left from it in the actual data.

@servoz
Copy link
Contributor Author

servoz commented Jan 10, 2022

Thanks @LStruber .
As I have already observed, in fact we often end up looking for the same thing, imagining different solutions.
I am convinced that this is a wealth, because unless we make a mistake in our reasoning, which can happen (and it has happened to me!), if we make the effort to spend some time discussing together it can bring a better solution than each one taken separately.
The topic we are discussing now is very important in my opinion (being able to see the most complete history of a data is fundamental to solve a problem or simply to know how this data was created).
I know we all have a lot to do but maybe we could schedule a video meeting to discuss in a concrete and interactive way what exactly we want, how we can do it and how we can share the work?

@LStruber
Copy link
Collaborator

@denisri I'm not sure of the behavior in this case, I will check this asap. What I'm sure of, is that the behavior regarding erasing history is the same in master branch than in fix4#236 branch. In both branches, new brick(s) is (are) appended to the history.

@LStruber
Copy link
Collaborator

I just checked in MIA. Actually the bricks tag seems to be erased each time a pipeline is run. Then the bricks tag only keep track of the last pipeline history

@servoz
Copy link
Contributor Author

servoz commented Jan 11, 2022

Currently master branch crashes if we go to the DataBrowser while the pipeline is running. To avoid the crash it is necessary to wait for the end of the calculation to go in the DataBrowser

denisri added a commit that referenced this issue Jan 11, 2022
proceed backwards, using execution timestamps to avoid confusions when
data have been written several times.
Ambiguities still exist when several processes write the same data at
the same time, but this should not happen often.
denisri added a commit that referenced this issue Jan 11, 2022
in order to make all nodes activated
denisri added a commit that referenced this issue Jan 11, 2022
@denisri
Copy link
Contributor

denisri commented Jan 11, 2022

Currently master branch crashes if we go to the DataBrowser while the pipeline is running. To avoid the crash it is necessary to wait for the end of the calculation to go in the DataBrowser

I don't observe this behaviour but I have modified the code since this has been reported, so I don't know if it's fixed or if I haven't tested it in the same situation.

@denisri
Copy link
Contributor

denisri commented Jan 11, 2022

I know we all have a lot to do but maybe we could schedule a video meeting to discuss in a concrete and interactive way what exactly we want, how we can do it and how we can share the work?

I agree, but I already have a number of meetings these days. Can it wait for a about couple of weeks ?

@servoz
Copy link
Contributor Author

servoz commented Jan 11, 2022

Currently master branch crashes if we go to the DataBrowser while the pipeline is running. To avoid the crash it is necessary to wait for the end of the calculation to go in the DataBrowser

I don't observe this behaviour but I have modified the code since this has been reported, so I don't know if it's fixed or if I haven't tested it in the same situation.

With master on last commit I always observe the issue.
the raised exception:

Traceback (most recent call last):
  File "/casa/home/Git_projects/populse_mia/python/populse_mia/user_interface/main_window.py", line 1405, in tab_changed
    self.data_browser.table_data.add_rows(documents)
  File "/casa/home/Git_projects/populse_mia/python/populse_mia/user_interface/data_browser/data_browser.py", line 1019, in add_rows
    scan['FileName']))
TypeError: string indices must be integers

Edit: tested with mia_processes smooth or spatial_preprocessing_1. Same result. crash.

@servoz
Copy link
Contributor Author

servoz commented Jan 11, 2022

I know we all have a lot to do but maybe we could schedule a video meeting to discuss in a concrete and interactive way what exactly we want, how we can do it and how we can share the work?

I agree, but I already have a number of meetings these days. Can it wait for a about couple of weeks ?

no problem, we can continue to discuss in the tickets.

@denisri
Copy link
Contributor

denisri commented Jan 12, 2022

Currently master branch crashes if we go to the DataBrowser while the pipeline is running. To avoid the crash it is necessary to wait for the end of the calculation to go in the DataBrowser

I don't observe this behaviour but I have modified the code since this has been reported, so I don't know if it's fixed or if I haven't tested it in the same situation.

With master on last commit I always observe the issue. the raised exception:

Traceback (most recent call last):
  File "/casa/home/Git_projects/populse_mia/python/populse_mia/user_interface/main_window.py", line 1405, in tab_changed
    self.data_browser.table_data.add_rows(documents)
  File "/casa/home/Git_projects/populse_mia/python/populse_mia/user_interface/data_browser/data_browser.py", line 1019, in add_rows
    scan['FileName']))
TypeError: string indices must be integers

Edit: tested with mia_processes smooth or spatial_preprocessing_1. Same result. crash.

I have tried using spatial_preprocessings_1 and using morphologist, and in each case, while it was running, I could switch to the data browser tab without a problem, and even see the data history. So I must do it in a different way, but I don't know how to reproduce the problem.
Is it related to the recent changes for data history ? (because the general view in the data browser should not be affected by those changes)

@LStruber
Copy link
Collaborator

Thanks for the branch. I see what's the problem with my branch, however I don't see for now how to fix it without disambiguate things with execution times as you've done in your branch, and it is precisely what I'm trying to avoid because it seems that "sometimes" it does not works. By sometimes I mean that with your solution (based on testing we've done with @servoz few days ago on master branch), it may work/not work on the exact same pipeline (spatial preprocessing) with same execution parameters (I didn't dig into your code to understand why...). Sometimes (most of the time!) the whole pipeline was displaying, sometimes only a brick, sometimes the brick that override the output (coregister) was missing...

Unfortunately, I'm not sure it is actually possible to fix all problems we have without saving the real pipeline/workflow (and the brick collection is not adequate for that)

@LStruber
Copy link
Collaborator

I tried something to disambiguate things when several inputs/outputs have the same name: saving in the database (as a hidden list), the level of dependency of each brick in the workflow (a list of int, 1 meaning direct dependency, 2 meaning there is one brick between etc.). Then I check links between input of a brick and output of previous bricks by checking they have the same name, and if there are several possibilities I choose the one with the lower dependency. This corresponds to commit ca7fc20. It is working on spatial preprocessing pipeline (but it was already working before :))... @denisri could you again give a try to your pipeline ?

As I modified the database entries, you may have to create a new project..

@denisri
Copy link
Contributor

denisri commented Mar 16, 2022

Well, it's still better but still not exact... :)
Screenshot_20220316_182632
There are still disabled bricks and missing / wrong links (compare to the "good" graph earlier).
And now it takes about 30 seconds of calculation before showing the history graph, where there was no delay earlier.
Sorry for the bad news...

@LStruber
Copy link
Collaborator

This time I'm not sure to understand what makes it not work... Either I made a mistake somewhere (not unlikely), or the "level of dependency" between bricks I added is not enough to cover all cases (also not unlikely :))

Anyway, in view of the 30 sec of calculation, the answer is not very important since it excludes the solution... This delay is due to the fact that to disambiguate things I need to iterate over all previous bricks to check if there is a potential conflict, and this for each input ! (whereas before I stopped iterating as soon as I found a match). I wanted to try a solution between @denisri's and mine by iterating over all previous bricks and disambiguate conflicts with execution time, but it's a waste of time since the delay will be the same.

It let us with two solutions:

  1. Use @denisri's solution only based on execution time to recreate the pipeline. @servoz, I retried it (again on master) today and it managed to recreate the correct pipeline every time (on several tests)... Do you have minimum steps to repoduce issue you mentionned earlier ?

by taking as example the spatial_preprocessing_1 pipeline from mia_processes, by clicking on the bricks in the Bricks tag, we don't always see the same result (never the whole pipeline, some bricks are missing, sometimes even just the selected brick, rather than the whole pipeline, etc...). We should have the whole pipeline in all cases.

However, it won't solve the problem of deleted files/bricks, you also stated:

what happens if one or more bricks no longer exist in the mia (basically, we want to be able to find out how the data was created, even several months after it was created and even if the brick no longer exists in the mia)?

  1. At the time of initialization, save the pipeline/workflow into the database or on the disk (and refer to it in the database). For example we could pickle the workflow on disk and load it when clicking on brick tag to rebuild the pipeline from the beginning up to the considered brick

I may repeat what @denisri said few weeks ago, who was surely more clairvoyant, but I was sure it was possible to recreate the pipeline only from bricks (I still think it is possible, but as mention before, not in a reasonable timing)

@LStruber
Copy link
Collaborator

What do you think ?
I can try to propose something with pickle:

  1. Save pipeline on disk at the time of initialisation
  2. Add a string tag in the database which contains the path of the pipeline file for all files that the pipeline created
  3. Reload this file when clicking on the brick button
  4. Use the pipeline view of @denisri to recreate a part of the pipeline (from the beginnin up to the file)

@denisri
Copy link
Contributor

denisri commented Mar 17, 2022

I haven't thought about it yet, but just want to note that the pickle format is not suitable for long-term storage. It is good for temporary files, but not more: the pickle format (or sub-formats) evolves, is not compatible across different versions of python (it's impossible to write a pickle in python2 and read it in python3 or vice versa, in real life, even if it is documented to be possible), it loads modules thus depends on modules and their versions... But we can save workflows in Json format, and pipelines have an XML format, that we may (should) move to Json.
For the rest it needs a bit more thinking, which I have not done yet...

@denisri
Copy link
Contributor

denisri commented Mar 17, 2022

Maybe it would be useful for testing purposes to have a fake Morphologist pipeline, with the same nodes and parameters as the "real" one, and its completion system, but with execution part of each process replaced with fake things (basically a kind of generic function which, for instance, sleeps a given time, and writes outputs files as text file with fake things such as one line containing the name of the process), so that it does not depend on actual execution code, programs and libraries. I could create it automatically via a script, and send it to you, or put is somewhere in some test pipelines data. I guess this would also be useful to test / sandbox projects such as https://github.com/brainvisa/use-cases

@servoz
Copy link
Contributor Author

servoz commented Mar 17, 2022

Yes, it would be a good thing to have a complex shared use case.
For the tests requested by @LStruber , I don't think I have at hand a more complex pipeline than the one proposed by @denisri . I'll do some tests as soon as possible, but I think that @denisri one is really a good example to do some tests.

denisri added a commit to populse/capsul that referenced this issue Mar 18, 2022
denisri added a commit to populse/capsul that referenced this issue Mar 18, 2022
because they are otherwise exported in an uncontrolled order depending
on the links iteration orders (populse/populse_mia#236)
@denisri
Copy link
Contributor

denisri commented Mar 21, 2022

I'm quite struggling to build and save a fake pipeline which has really the same structure as our "real" Morphologist. Some details of the plugs and links properties are not entirely saved (like an optional parameter exported as mandatory, or the contrary) and some of them are tuned internally when building the pipelines, so for now I always get a pipeline with a disabled part. I need more time to finish it...

@servoz
Copy link
Contributor Author

servoz commented Mar 21, 2022

No problem, it takes time to do things!

By the way, it's a very good thing what you are doing because Mia should make it easy to share pipelines. This is the opportunity to make a real case to test the construction/code of the pipeline and the use of the pipeline integration in Mia (as a reminder, there is a function in Mia for this last point:)
process

@LStruber
Copy link
Collaborator

I think we all agree that it is necessary to save the pipeline or workflow object in the database to cover all cases (json format). To start investigating/coding in this direction we need to make some decisions:

  • Do we store a string link to the json in the database or do we wait for populse_db v3 that seems to allow saving a json directly ?
  • Do we want the pipeline to appear in the database (as a derived data), or do we want to mask it ?
  • @denisri when you said we can save workflow as json, did you have a function/package in mind ?

And finally, do we need a meeting to set things up, or I start coding with answers to the previous questions ?

@denisri
Copy link
Contributor

denisri commented Mar 28, 2022

@denisri when you said we can save workflow as json, did you have a function/package in mind ?

soma_workflow.client.Helper.serialize('pouet.json', workflow)

I don't have definitive answers for the other questions. I guess if we store workflows or pipelines in the database, it will be a good idea to have a means to see them somewhere (at least for debugging) ?
We can have a meeting if it's useful, yes.

@LStruber
Copy link
Collaborator

closed with #262 and #263

servoz pushed a commit that referenced this issue Sep 19, 2022
proceed backwards, using execution timestamps to avoid confusions when
data have been written several times.
Ambiguities still exist when several processes write the same data at
the same time, but this should not happen often.
servoz pushed a commit that referenced this issue Sep 19, 2022
in order to make all nodes activated
servoz pushed a commit that referenced this issue Sep 19, 2022
servoz pushed a commit that referenced this issue Sep 19, 2022
temp files do not exist any longer, are not indexed in the database,
thus break the history graphs. Moreover they all appear as "<temp>" in
process histories and several temp files cannot be distinguished.
servoz pushed a commit that referenced this issue Sep 19, 2022
Now graphs seem quite complete, including when temp intermediate values
are used (with a little risk of error, however).
But I had to disable the history cleaning for data and orphan bricks for
now, because the notion of obsolete/orphan has chnaged, and it was
erasing things that are used in the indirect history of data.
servoz pushed a commit that referenced this issue Sep 19, 2022
The implementation doesn't use Process instances but a light wrapper,
and a pipeline (with fake processes) is only built at the end, in
data_history_pipeline(). Other functions are Process-agnostic.
Docstrings have been added, and a simpler function,
get_data_history_bricks() whic may be used in the history cleaning
functions (which are currently disabled in pipeline runs).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants