Skip to content
This repository has been archived by the owner on Oct 29, 2019. It is now read-only.

reportedly incomplete datafusion / corrupted files #9

Closed
tstoeger opened this issue May 6, 2015 · 15 comments
Closed

reportedly incomplete datafusion / corrupted files #9

tstoeger opened this issue May 6, 2015 · 15 comments

Comments

@tstoeger
Copy link

tstoeger commented May 6, 2015

on project
/BIOL/sonas/biol_uzh_pelkmans_s5/Data/Users/Prisca/240215_siRNA_SM_TfRecycling_SimpleRestartB

iBrain: warning: 240215_siRNA_SM_TfRecycling_SimpleRestartB (0 JOBS): Corrupt datafusion files found after second datafusion attempt.

@ewiger
Copy link
Contributor

ewiger commented May 6, 2015

Could you provide more information? Which files are corrupt?
BASICDATA.mat or individual Measurements of the CellProfiler (CPCluster step)?

@tstoeger
Copy link
Author

tstoeger commented May 6, 2015

statement of corruption refers to iBrain reporting that files are corrupt.

I did not test, which ones are corrupt (or whether iBrain's error message is correct).

If one specific dataset would appear highly suspicious to me, it would be Measurements_Cytoplasm_Location since a) yesterday evening its jobs were not fuse (whereas others were) b) the second newest file in the BATCH is called "Measurements_Cytoplasm_Location.datacheck-incomplete"

@ewiger
Copy link
Contributor

ewiger commented May 6, 2015

indeed DataFusionCheckAndCleanup_150506071109.results contains a line

!!! DATA INCOMPLETE - /BIOL/sonas/biol_uzh_pelkmans_s5/Data/Users/Prisca/240215_siRNA_SM_TfRecycling_SimpleRestartB/BATCH/Measurements_Cytoplasm_Location.mat

after load('Measurements_Cytoplasm_Location.mat') and inspecting the struct I did not found anything suspicious.

Log file corresponding to merge this measurement has no errors: DataFusion_Measurements_Cytoplasm_Location_150505 182525.results

I have killed rm DataFusionCheckAndCleanup.submitted to see if the error will be reproduced.

@ewiger
Copy link
Contributor

ewiger commented May 6, 2015

A second run of DataFusionCheckAndCleanup did not produce any error.

I did not/forgot to remove previous log report DataFusionCheckAndCleanup_150506071109.results, and iBRAIN has picked up an error from the old output.

After rm DataFusionCheckAndCleanup_150506071109.results the website should correctly report the status.

As discussed this is an example of a general problem:

  • an error by restarting the project step (killing flag files and log report)
  • lack of documentation

These issues are addressed in iBRAIN_UZH version.

@ewiger ewiger closed this as completed May 6, 2015
@tstoeger
Copy link
Author

tstoeger commented May 6, 2015

To know for sure that indeed there is no problem, I suggest that we:

a) see if iBrain now continues with the project as anticipated, if everything was fine
b) then remove project from iBrain_Brutus, remove all flags appearing after making CP pipeline, then resubmit the project again (starting with CP).

If error indeed was rare event, which by chance affected this specific single measurement (e.g. node usage / problems of storage system last night), the pipeline should be able to complete without problem (note: datafusion of all other measurements appears to have finished without problem).

@tstoeger tstoeger reopened this May 6, 2015
@ewiger
Copy link
Contributor

ewiger commented May 6, 2015

According to the code there was at least one resubmission of DataFusion step (if DataFusion.resubmitted was created once), iBRAIN website will show the error forever. This means that restarting DataFusion step should include removal of DataFusion.resubmitted.

Testing it now.

@tstoeger
Copy link
Author

tstoeger commented May 6, 2015

sounds very plausible and I believe that it will work.

However, we will be able to formally conclude that iBrain_Brutus / CPP can process this example pipeline, if we restart it. (which should only be 5 min of manual work for resubmission) -> without doing anything we will know for sure this evening or tomorrow if it works

(in the other scenario / if it does not work: if error reproducibly remains after resubmitting pipeline we have a good starting point / test for debugging unexpected strange measurement-specific error)

@ewiger
Copy link
Contributor

ewiger commented May 6, 2015

I would prefer to have the situation that iBRAIN_BRUTUS do it.
I had to do now rm DataFusion* to kill the submission flag as well. Waiting..

@tstoeger
Copy link
Author

tstoeger commented May 6, 2015

sry, I guess this was miscommunication.

a) I would at first have iBrain_Brutus take care of it.
b) Once everything is fine (including the handling by iBrain_Brutus), we start again with the pipeline, and ensure that iBrain_Brutus takes care of everything (so that there is no need for rm DataFusionCheckAndCleanup_ , which you had to done manually for the current run of the testpipeline)

@ewiger
Copy link
Contributor

ewiger commented May 6, 2015

So it looks like the iBRAIN_BRUTUS bug to me. I do not see any CPP errors, but clearly iBRAIN is confused with flags and logs in BATCH and project folder. See the screenshot where the second resubmission takes place.

datafusion_bug

@tstoeger
Copy link
Author

tstoeger commented May 6, 2015

Counter our expectation
a) (letting iBrain take care after manual rm) did not work.

Now there are the results of b) (restarting project)
b) also did not work. Interestingly, it is again the same file (Measurements_Cytoplasm_Location)

Though I now won't dive into debugging, my suspicion is the following:

The CP pipeline uses the standard CP module ExpandOrShrink to shrink objects. Against the expectation, this module creates new, shrunken, objects, but does not ensure a 1:1 relation between parent objects (e.g.: cells and shrunkencells). Allocating a cytoplasm therefore breaks the implicit assumption of an unambiguous 1:1:1 mapping between nuclei, cells and cytoplasm (where all part of the same biological cells have the same identifier). In addition ExpandOrShrink will completely remove objects that are smaller than the specified shrinking distance (which again triggers internal confusion in CP).

Indeed 1/4 of the sites does not have the same amount of cells and cytoplasm (see Image_Object count measurement). These are the lucky situations where one realizes that the mapping is wrong (instead of only allocating measurements to wrong cells / nuclei).

I assume that the matlab code of iBrain, which does the fusion just happens to run into some rare situation, where it gets confused by the massive wrong allocation of cytoplasms. (note: fusing wrong data without any error would be an even worse option).

(Within a measurement specific handling, e.g: some measurements have different degree of nesting within handles and thus possibly be processed by a separate routine)

Also I assume that the error could be circumvented by replacing the ExpandOrShrink CP module by the ShrinkObjectSafely module, which never eliminates objects and always preserves the same internal object ID (which in contrast to CP's original module, however thus does not always shrink objects to the specified extent).

@tstoeger
Copy link
Author

tstoeger commented May 7, 2015

Running pipeline again with ShrinkObjectsSafely, again left unfused Cytoplasm_Location (and in addition PlasmaMembrane_Location).

I believe that this is not a problem of datafusion, but an indicator of a massive pipelinespecific bug in CPP (see pelkmanslab/CellProfilerPelkmans#15 ) (where iBrain / datafusion does not know how to handle it)

@tstoeger tstoeger changed the title Incomplete datafusion / corrupted files reportedly incomplete datafusion / corrupted files May 7, 2015
@ewiger
Copy link
Contributor

ewiger commented May 7, 2015

Thank you for detailed report Thomas. We will track and resolve this bug
systematically.
On May 7, 2015 3:21 PM, "Thomas Stoeger" notifications@github.com wrote:

Running pipeline again with ShrinkObjectsSafely, again left unfused
Cytoplasm_Location (and in addition PlasmaMembrane_Location).

I believe that this is not a problem of datafusion, but an indicator of a
massive pipelinespecific bug in CPP (see
pelkmanslab/CellProfilerPelkmans#15
pelkmanslab/CellProfilerPelkmans#15 ) (where
iBrain / datafusion does not know how to handle it)


Reply to this email directly or view it on GitHub
#9 (comment)
.

@tstoeger
Copy link
Author

tstoeger commented May 7, 2015

things are getting even more messy.

Without human input(?), the Image_Children has changed compared to early afternoon, basically setting counts of cytoplasm to 0 in every site. (for children measurments like the ones in early afternoon see or the ones from the last run deactBATCHFromThomas05)

Whatever the origin or the inconsistent nucleus and cytoplasm measurement is, it is a major bug (and I believe that we are lucky to notice that something is wrong)

@tstoeger
Copy link
Author

after running pipeline with fixed modules, iBrain no longer reports wrong datafusion and corrupt Cytoplasm_location (suggesting that this error message was wrong / misleading)

rewrote most parts of original IdentifyTertiary, which contained several problems that could have been related to the reported problem in generating Cytoplasm_location

Specifically

  • The module did not adhere to CP's own convention of saving [0 0] as the location, if a site does not have an object
  • In original module: primary + secondary != tertiary. They force a shrinking of the primary, presumably to generate at least a few pixels of tertiary. However that logic is broken and can remove objects, also it does not ensure presence of tertiary (and in cases where it is not broken it leads to misleading / wrong extraction of features of tertiary since tertiary will always include outer parts of primary) , also see Possibly wrong allocation of nucleus and cytoplasm CellProfilerPelkmans#15

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants