Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Qhull #248

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Update Qhull #248

wants to merge 1 commit into from

Conversation

ximion
Copy link
Contributor

@ximion ximion commented Sep 1, 2023

Hi!
This was created in an attempt to fix #235, which seems to be a kind of stack corruption issue. Unfortunately I could not track down the root cause, but updating Qhull and guarding the message functions appears to make the problem go away for normal operation (even not guarding the log functions makes it much less likely). Unfortunately address sanitizer is still unhappy with the result, so this will not resolve the root cause.

It would probably need an extensive session in Valgrind (a major pain given that TensorFlow is in the mix and we allocate huge chunks of memory) to find out what is going on.
In the meanwhile, I take the less crashes.

Updating Qhull is probably a good idea anyway, since the shipped version was quite old.
Thanks for creating Stardist and considering this PR!

@maweigert
Copy link
Member

Wow, thanks a lot for digging that deep! The whole qhull part is indeed a bit opaque and I remember that it was rather tedious to ensure its using the qhull thread-safe varieties throughout (as needed for OpenMP).

@ximion
Copy link
Contributor Author

ximion commented Sep 1, 2023

That makes sense! - Unfortunately we're probably not done with this, I just had Stardist process my large image (instead of just a smaller chunk of it), and it crashed again, after working flawlessly for 30 test runs before 😒
If I knew the code better, it would probably be easier to know where to look for a real cause of this issue.

(Still, updating Qhull would be useful and I think this PR is still fine - the generated results look sane to me)

@maweigert
Copy link
Member

Ok, too bad!

You mentioned that tensorflow is a pain during debugging: one could avoid calling tensorflow during the NMS/qhull section by splitting the prediction and NMS call into two separate steps:

  1. get predictions and save them somewhere
prob, dist, points = model.predict_sparse(img,...)
  1. Do NMS/qhull part:
model._instances_from_prediction(img.shape, prob, dist, points, ...) 

cf.

def _instances_from_prediction(self, img_shape, prob, dist, points=None, prob_class=None, prob_thresh=None, nms_thresh=None, overlap_label=None, return_labels=True, scale=None, **nms_kwargs):

@ximion
Copy link
Contributor Author

ximion commented Sep 2, 2023

Due to #249 actually solving the linked bug, I amended this PR to include only the Qhull update and nothing else (so, no mutex patch).

@ximion ximion changed the title Update Qhull and guard its logging functions Update Qhull Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crashes when running predict_instances on a 3D image
3 participants