Skip to content
This repository has been archived by the owner on Mar 17, 2022. It is now read-only.

testPixaReplacePix crashes in native code #159

Closed
megabytefisher opened this issue Jul 1, 2016 · 17 comments
Closed

testPixaReplacePix crashes in native code #159

megabytefisher opened this issue Jul 1, 2016 · 17 comments
Labels

Comments

@megabytefisher
Copy link
Contributor

I'm creating this issue to discuss why testPixaReplacePix crashes in the JNI code, as mentioned here: #157 (comment)

The code referenced is here:

I am working on finding out exactly how to fix this, but the preliminary testing show that commenting out the following two lines fixes the crash:
returnedPix.recycle(); returnedBox.recycle();

I believe these Pix and Box get recycled and destroyed before they should be, and then when the Pixa tries to recycle them internally, it crashes. I am not 100% sure on this, and plan to test more.

@rmtheis rmtheis added the bug label Jul 4, 2016
@alexcohn
Copy link
Contributor

Still crashes (if uncommented).

It doesn't crash if one of pix.recylce() or returnedPix.recycle() is commented out: both share the same underlying nativePix object.

What's worse, the test completes successfully when I run it alone.

@alexcohn
Copy link
Contributor

alexcohn commented Jul 16, 2019

On the face of it, the cause is that pixaReplacePix() does not increase pix->refcount. I am not sure if this is a bug, though. @DanBloomberg, was it intentional?

alexcohn added a commit to alexcohn/leptonica that referenced this issue Jul 16, 2019
@DanBloomberg
Copy link

Not a bug. The previous pix in the pixa is destroyed, and the pixa takes ownership of the new pix, so its refcount is unchanged:

  • Notes:
  •  (1) In-place replacement of one pix.
    
  •  (2) The previous pix at that location is destroyed.
    

I'll add to the notes this:
(3) The pixa takes ownership of the new pix (it is an "insert" operation)

@alexcohn
Copy link
Contributor

alexcohn commented Jul 16, 2019

So, it's kind of move operation? To avoid the mistakes like in the Java test, the replacing pix should somehow be marked invalid.

But I must say that I don't see why increasing refcount is less clean.

@DanBloomberg
Copy link

Leptonica is consistent in having a container own the objects in it. So when we put the pix in the pixa, the pixa owns it.

Now, we could put a copy (or clone) of the pix in, but then you'd have the old one lying around and have to remember to destroy it. Just asking for a memory leak. Not good. So by default, the pix is INSERTED into the pixa. If you look at pixaAddPix(), you'll see that you have several choices. I decided not to include those choices in this function, because it makes the interface (and its use) more complicated, and for no reason at all, because if you put the pix in a pixa, you almost certainly don't want to mess with it using the original handle.

To be more safe, I could have nulled the original handle by the function (by passing in its address and dereferencing inside), to prevent an accidental double free, but again, it seemed like excess complication.

There are no right answers -- but I've tried to be consistent in use of ownership while keeping the interfaces as simple as possible.

@alexcohn
Copy link
Contributor

So, the fix for the Java test should be to set pix = null and remove pix.recylce()?

@alexcohn
Copy link
Contributor

@DanBloomberg On the second thought, I would try to convince you that by default, pixaReplacePix() should be a CLONE operation. The fact that this test has been crashing for 3 years speaks for that. Nobody actually read the notes, even after their porgram crashes.

But most importantly, consider C++ RAII when a Pix is allocated to replace a pix in Pixa. When the program goes out of the block where the original Pix was created, it will naturally destroy it, causing the same problem as we see in this small Java test.

I don't see a scenario when replace would be called with a Pix that cannot be destroyed, or where calling such destroy would be counter-intuitive.

@DanBloomberg
Copy link

Leptonica has patterns that are implemented consistently, and one of them is the ownership of the object by the container. Also, unnecessary copies of images should be avoided, as well as clones because they require pixDestroy() on both.

To change the behavior of this function would unfortunately cause every existing invocation that is not crashing to have a memory leak!

However, your point is interesting, and not something I've run up against because people usually do not use std::unique_ptr when allocating pix. If you do use such a pointer, with the current implementation you would need to call release() on your ptr to get a new pix ptr that is not "smart", and use that in the function call.

So it's a judgment call. The best I can do now is to annotate the function with a caution not to use the content of a smart pointer, which by definition maintains ownership.

Dan

@alexcohn
Copy link
Contributor

Note that the Java test that triggered this discussion does not use unique_ptr. The test goes like this:

Pixa pixa = new Pixa(…)
Pix pix = new Pix(…);
Box box = new Box(…);
pixa.replacePix(0, pix, box);
pix.recycle(); // this line is wrong
box.recycle(); // this line is wrong, too
pixa.recycle();

For some strange reason, box.recycle() doesn't cause a crash in test. But it is equally wrong as pix.recycle(). Essentially, after being used in replacePix(), both pix and box are but zombies.

I will fix the Java test, and add your explanation to JavaDoc.

alexcohn added a commit to alexcohn/tess-two that referenced this issue Jul 18, 2019
fix JavaDoc
enable and fix testPixaReplacePix

see rmtheis#159 (comment)
alexcohn added a commit to alexcohn/tess-two that referenced this issue Jul 18, 2019
fix JavaDoc
enable and fix testPixaReplacePix

see rmtheis#159 (comment)
@Robyer
Copy link
Contributor

Robyer commented Jul 19, 2019

@alexcohn For some reason it works correctly without crash in my fork. I tried to use same Leptonica version in tess-two, but it still crashes there. Maybe the difference is caused by different compiler or some other build configuration?

@alexcohn
Copy link
Contributor

@Robyer this is not a surprise, this crash caused by use-after-free bug did not happen for me in debug build, or when I ran this test single. It does not depend on version of Leptonica, because it is related to 'as designed' behavior of the library, see #159 (comment).

@Robyer
Copy link
Contributor

Robyer commented Jul 19, 2019

@alexcohn You are right, it doesn't crash when running only the single test, but it crashes when I run all tests. Difference with my fork and current tess-two is that in my case it crashes in some other random test, but obviously this is the cause. So I'll use your fix too, thanks.

Robyer added a commit to adaptech-cz/Tesseract4Android that referenced this issue Jul 19, 2019
Crash didn't happen when running the single test, but happened when running all the tests.

See rmtheis/tess-two#159
@alexcohn
Copy link
Contributor

Unfortunately, now the test produces a warning Box was not terminated using recycle(). It happens because the Java Box.mRecycled flag gets out of sync with the native refcount.

Same would have been reported for the pix, only there we don't have a finalize() override to let us worry.

I must say that I am in doubt about the cleanest way to deal with this issue. The easiest would be to remove Box.finalize(). This introduces another way to leak (native) memory, but it's not too bad, if you consider that there are no similar checks for Pix.

The damage that such finalize() can cause, is worse. When Java decides that an object must be finally destroyed, it will operate on the underlying native object, which could be already deallocated (that's what would have happened in this testPixaReplacePix case before my fix).

The big problem is that the JVM will trigger such finalize() at its own discretion, most likely when it goes low on memory. This means that an unexpected crash can easily pass the usual tests, and happen in production.

I hope that @DanBloomberg will help us choose the best route.

@DanBloomberg
Copy link

DanBloomberg commented Jul 22, 2019 via email

@alexcohn
Copy link
Contributor

alexcohn commented Jul 23, 2019

@DanBloomberg my question now is not how to deal with this specific pixaReplacePix() behavior. The finalizers could backfire in other scenarios where the (leptonica) refcount changes while the Java object is unaware.

If Pixa.replacePix() is the single relevant case, it can be deprecated, and done with it. But if there are other situations, as hinted by removal of Pix.finalize(), they require more serious approach.

@DanBloomberg
Copy link

These details are well above my pay grade :-)
I believe you now understand the issues better than I do.

@alexcohn
Copy link
Contributor

@rmtheis What do you think about this? Do you remember why @renard314 removed Px.finalize() ?

Robyer added a commit to adaptech-cz/Tesseract4Android that referenced this issue Sep 17, 2019
This makes it consistent with Pix implementation, where finalizer was also removed in rmtheis/tess-two@44627d8

This removes warning "Box was not terminated using recycle()" which was wrongly raised in testPixaReplacePix() test (see rmtheis/tess-two#159 (comment)), but now requires that user don't forget to release his Box instances (same as with Pix).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants