Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"zend_mm_heap corrupted #<$id>" | Added Error ID to Error Message #2445

Closed
wants to merge 2 commits into from

Conversation

schwindy
Copy link

I was recently debugging "zend_mm_heap corrupted" errors in pthreads. For this, I was trying to figure out which line of C was causing "zend_mm_heap corrupted" to be output.

I then noticed that there are 15 different lines in ./Zend/zend_alloc.c that output the exact error message "zend_mm_heap corrupted".

I feel it would be helpful for an ID of which "zend_mm_heap corrupted" to be output with the messsage, so that the root of whatever problem a developer is facing in terms of memory corruption can be tracked down.

I added #1, #2, #3....#15 to each "zend_mm_heap corrupted" check to accomplish this.

@schwindy schwindy changed the title master "zend_mm_heap corrupted #<$id>" | Added Error ID to Error Message Mar 31, 2017
@krakjoe
Copy link
Member

krakjoe commented Apr 3, 2017

It would be better to include the reason for the failed assertion, for example where chunk->heap == heap is checked, the message might read "zend_mm_heap_corrupted: chunk allocated in a different context` ...

Numbering the errors is not very useful without reading (and memorizing) zend_alloc.c.

@krakjoe krakjoe added the Feature label Apr 3, 2017
@nikic
Copy link
Member

nikic commented Apr 3, 2017

I don't have a problem with this change, but I don't think it's particularly useful for debugging either. MM panics are typically far removed from the root cause of the problem and knowing which one specifically was triggered will not get you any closer to it. The proper way of debugging is such issues is through liberal application of valgrind...

@tpunt
Copy link
Contributor

tpunt commented Apr 3, 2017

I agree with @nikic. If you would like to find out where the error specifically occurred, then a breakpoint can simply be set on the zend_mm_panic function.

@staabm
Copy link
Contributor

staabm commented Apr 3, 2017

this is true when you are a developer which has the required fu to start debugging and knows php underlying technology.

could those more detailed error messages help to produce better bug reports so core people can dig into the cause more easily?

@weltling
Copy link
Contributor

weltling commented Apr 3, 2017

Of course there can be bugs in ZMM, too, but usually the issue lays somewhere completely else. What matters is the backtrace and the reproducer. In most case, the msg text change at the touched places won't make a bug report any better.

Thanks.

@schwindy
Copy link
Author

schwindy commented Apr 4, 2017

Granted, this is the first time I've had to dig into the PHP internals. I am a novice C dev.

But, I had never seen an error like zend_mm_heap corrupted before, so I Googled it. The results were not overly helpful in solving my problem. It took a while to find https://wiki.php.net/internals/zend_mm, which I personally found to be more informative than http://php.net/manual/en/internals2.memory.php (though I do not know what has been refactored since it was written).

All I learned was that something was corrupting the memory (zend_mm_heap), exactly what the error output says. I tried all kinds of GC related things in PHP, nothing fixed the error. My only remaining option was to clone php-src and find the line where the error was being output with CTRL+F. Confused when I found 15, I then recompiled PHP after adding the error IDs.

This allowed me to:

  1. See which line of C the error was happening on, aka exactly which validation failed.
  2. See that it was not always the same validation that failed, given the same code, data and environment.

For example, I now realize commenting out those memory validations will not fix my particular problem. Given I was working with multiple extensions, I had no way of knowing if the check was failing but the memory was fine (again, novice, so that may not make sense but arbitrarily debugging this problem, that seemed logical to me).

I agree with @krakjoe in the sense that the validation itself should be output, but I think the ID numbers themselves are also useful so one could see that their code is causing different validations to fail, at different points in C execution each time. As there are multiple places where the same validation is run.

Thank you for noting zend_mm_panic @tpunt and valgrind @nikic, I had never heard of these. Would it be possible to add these notes to http://php.net/manual/en/internals2.memory.php?

I have added a new commit to this pull request that represents what I believe to be the optimal error output for someone with no prior knowledge of PHP internals.

Zend MM: Failed Validation #1 (page_offset != 0) | http://php.net/manual/en/internals2.memory.php

Even if this output only helps core devs in 1/1000 cases, and PHP developers in 1/1000000 cases, I feel the small changes required are worthwhile. I would be happy to add these error #s to an HTML file, though others could describe the logic behind each of these checks better. One could compare this situation to:

Possible integer overflow in memory allocation
vs
Possible integer overflow in memory allocation (%zu + %zu)

@krakjoe
Copy link
Member

krakjoe commented Apr 4, 2017

I see more detailed messages as useful for the person creating the reproducing code, who doesn't necessarily know how to use the tools we use to debug php itself, all they can do is fiddle with their code and try to reduce it to the simplest possible form. They may be looking at more than one kind of bug and not know it.

The backtrace is not always very useful on these occasions either. You may see is a trace to some function that frees a chunk or whatever, but what you want to see is the trace that decreases the refcount or otherwise manipulates a chunk of memory incorrectly, which may be somewhere totally other than the place that tries to free memory.

@weltling
Copy link
Contributor

weltling commented Apr 4, 2017

In practice, things like backtraces and mem dumps are helpful in 50/50 cases or even more often. AFM, numbering look weird, and then you'll have to renumber when the code changes which is a monkey job :) Then, linking to a page with internal details, well - that's unlikely people without tools can make any use of that. But otherwise, they witness a more hard weight bug in somewhere else, where someone will anyway have to get hands on debugger, create a debug build, turn off ZMM and and take other measures to get on the issue. So it is useless for people who don't know and don't want to know about PHP internals, which is the most case. Imho we should encourage people, especially newcomers, to use the right way to debug. To debug PHP itself through some output is already the day before yesterday, for C it was probably never the case. To debug ZMM - some more verbose thing could be useful, it just look like the reporter had a wrong expectation on what is actually being fixed till their last post.

@schwindy welcome to C and PHP internals, much success to you!

Thanks.

@schwindy
Copy link
Author

schwindy commented Apr 4, 2017

The main reason I added the link to http://php.net/manual/en/internals2.memory.php was to hopefully spark someone adding the way PHP core devs debug problems like this, to that page :D

In my case, this would have helped greatly, and I also probably could have saved @tpunt some time digging through my code finding the source of the problem.

@krakjoe
Copy link
Member

krakjoe commented Jul 24, 2017

While I personally didn't hate this, there isn't the kind of consensus we need from other core devs to merge it ... so I'm going to close the PR ...

@krakjoe krakjoe closed this Jul 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants