Skip to content

opcache.consistency_checks > 0 causes segfaults in PHP >= 8.1.5 in fpm context #8065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CyberLine opened this issue Feb 9, 2022 · 29 comments

Comments

@CyberLine
Copy link

CyberLine commented Feb 9, 2022

Description

After upgrading some projects to PHP 8.1.0 and 8.1.2 we could see requests getting slower and slower over time. Restart of the correspondig fpm-pool fixed that for short. After digging in the problem we could isolate and reproduce it:

  • use a Debian 10 or 11 maschine and install php8.1-fpm and php8.1-opcache.
  • modify opcache.ini:
zend_extension=opcache.so

[opcache]
opcache.consistency_checks=1
  • require laminas: composer require laminas/laminas-servicemanager
  • create a test script (we use laminas-servicemanager here):
<?php

declare(strict_types=1);

chdir(dirname(__DIR__));
require 'vendor/autoload.php';

use Laminas\ServiceManager\ServiceManager;
$container = new ServiceManager();
  • call that script in a loop 1000 or more times while running htop in another terminal:
    for i in {1..10000}; do curl localhost/test.php; done;

You would see an increment usage of memory that will not getting freed after all. Set opcache.consistency_checks to 0 to check that it will not keep any used memory.

PHP Version

8.1.0

Operating System

Debian 10/11

@cmb69
Copy link
Member

cmb69 commented Feb 9, 2022

Do you get any "Checksum failed for" log entries, if you set opcache.error_log=… and opcache.log_verbosity_level=4?

@CyberLine
Copy link
Author

Wed Feb  9 15:15:47 2022 (222625): Message Checksum failed for '/home/alexander/test/vendor/laminas/laminas-config-aggregator/src/PhpFileProvider.php':  expected=0x68e34d40, found=0x35584fc2
Wed Feb  9 15:15:47 2022 (222625): Message Cached script '/home/alexander/test/vendor/laminas/laminas-config-aggregator/src/PhpFileProvider.php'
Wed Feb  9 15:15:47 2022 (222625): Message Added key '/home/alexander/test/vendor/composer/../laminas/laminas-config-aggregator/src/PhpFileProvider.php'
Wed Feb  9 15:15:47 2022 (222625): Message Checksum failed for '/home/alexander/test/vendor/laminas/laminas-servicemanager/src/ServiceManager.php':  expected=0x76daed23, found=0x478bef0e
Wed Feb  9 15:15:47 2022 (222625): Message Cached script '/home/alexander/test/vendor/laminas/laminas-servicemanager/src/ServiceManager.php'
Wed Feb  9 15:15:47 2022 (222625): Message Added key '/home/alexander/test/vendor/composer/../laminas/laminas-servicemanager/src/ServiceManager.php'
Wed Feb  9 15:15:47 2022 (222625): Message Checksum failed for '/home/alexander/test/vendor/laminas/laminas-servicemanager/src/ServiceLocatorInterface.php':  expected=0xb598eac3, found=0x7ae8ecda
Wed Feb  9 15:15:47 2022 (222625): Message Cached script '/home/alexander/test/vendor/laminas/laminas-servicemanager/src/ServiceLocatorInterface.php'
Wed Feb  9 15:15:47 2022 (222625): Message Added key '/home/alexander/test/vendor/composer/../laminas/laminas-servicemanager/src/ServiceLocatorInterface.php'
Wed Feb  9 15:15:47 2022 (222625): Message Checksum failed for '/home/alexander/test/vendor/container-interop/container-interop/src/Interop/Container/ContainerInterface.php':  expected=0x20efdca4, found=0x5182df08
Wed Feb  9 15:15:47 2022 (222625): Message Cached script '/home/alexander/test/vendor/container-interop/container-interop/src/Interop/Container/ContainerInterface.php'
Wed Feb  9 15:15:47 2022 (222625): Message Added key '/home/alexander/test/vendor/composer/../container-interop/container-interop/src/Interop/Container/ContainerInterface.php'

@cmb69
Copy link
Member

cmb69 commented Feb 9, 2022

Thanks!

If the checksum comparison fails, the file is compiled again, and the old oparray is not freed – that is by design, to avoid copying shared memory. As such, this doesn't look like a bug, but I wonder why the checksum comparison fails in the first place. Are these files modified during script execution?

@CyberLine
Copy link
Author

CyberLine commented Feb 9, 2022

Are these files modified during script execution?

No, they aren't. You can run the same code on PHP 7.4 or 8.0 without "Checksum failed" error 🤷

Same output with php7.4 or 8.0:

Wed Feb  9 15:32:07 2022 (222628): Message Cached script '/home/alexander/test/vendor/laminas/laminas-servicemanager/src/ServiceManager.php'
Wed Feb  9 15:32:07 2022 (222628): Message Added key '/home/alexander/workspace/test/vendor/composer/../laminas/laminas-servicemanager/src/ServiceManager.php'
Wed Feb  9 15:32:07 2022 (222628): Message Cached script '/home/alexander/test/vendor/laminas/laminas-servicemanager/src/ServiceLocatorInterface.php'
Wed Feb  9 15:32:07 2022 (222628): Message Added key '/home/alexander/test/vendor/composer/../laminas/laminas-servicemanager/src/ServiceLocatorInterface.php'
Wed Feb  9 15:32:07 2022 (222628): Message Cached script '/home/alexander/test/vendor/psr/container/src/ContainerInterface.php'
Wed Feb  9 15:32:07 2022 (222628): Message Added key '/home/alexander/test/vendor/composer/../psr/container/src/ContainerInterface.php'
Wed Feb  9 15:32:07 2022 (222628): Message Cached script '/home/alexander/test/vendor/container-interop/container-interop/src/Interop/Container/ContainerInterface.php'
Wed Feb  9 15:32:07 2022 (222628): Message Added key '/home/alexander/test/vendor/composer/../container-interop/container-interop/src/Interop/Container/ContainerInterface.php'

@CyberLine
Copy link
Author

So with PHP 8.1.5 this causes random SIGSEGV instead of wasting just memory

@cmb69
Copy link
Member

cmb69 commented May 2, 2022

Ugh.

@cmb69 cmb69 changed the title opcache.consistency_checks > 0 is leaking memory in PHP >= 8.1.0 in fpm context opcache.consistency_checks > 0 causes segfaults in PHP >= 8.1.5 in fpm context May 2, 2022
@iluuu1994
Copy link
Member

I went down this rabbit hole for a few hours and I have an incomplete explanation. This seems to be related to multiple changes in PHP 8.1. The following fields differ for each new creation of zend_persistent_script for 4 files per request.

  • zend_class_entry.inheritance_cache
  • zend_class_entry.info.user.filename
  • zend_class_entry.info.user.doc_comment

inheritance_cache is a new addition in PHP 8.1, why filename and doc_comment are different, I don't know. I'd assume they should be interned.

Also, I noticed that there's likely a mistake in the hash calculation.

zend_adler32(checksum, mem, persistent_script_check_block_size);

I think this line should be:

checksum = zend_adler32(checksum, mem, persistent_script_check_block_size);

As zend_adler32 doesn't seem to be modifying the buffer and thus has no effect.

I'll take a closer look soon but wanted to document my findings, even if small.

@CyberLine
Copy link
Author

Nice one. Dunno if it could help, but even with checksums disabled, our prod fpm is segfaulting randomly on deploy when we call opcache_reset() on the pool. Eventually i can catch a core dump.

@cmb69
Copy link
Member

cmb69 commented May 9, 2022

Also, I noticed that there's likely a mistake in the hash calculation.

Oh, good catch! The code as is makes no sense.

@boesing
Copy link

boesing commented Jul 6, 2022

Is there anything where I can be of help here? I do work together with @CyberLine and we also have some reports from other parts of our company where random segfaults do appear after production deployments.

From my PoV, the suggestion from @iluuu1994 might be a first step?

Any hint on how to create a failing test case for this would be awesome, I'd be happy to contribute as much as I can if that brings us closer to a fix here.

@iluuu1994 iluuu1994 self-assigned this Jul 6, 2022
@iluuu1994
Copy link
Member

I completely forgot about this one.

From my PoV, the suggestion from @iluuu1994 might be a first step?

I don't think that will solve the issue. If anything, the mistake in the hash calculation would lead to stale cache being used but we're having the opposite problem here. (it should still be fixed of course, or at least removed). I'll have another look at this soon.

@ItsReddi
Copy link

ItsReddi commented Aug 18, 2022

@iluuu1994 anything we can help with?
we discovered this bug in production yesterday.

grafik

The SIGSEV's are coming after 128GB of ram, went full.

@iluuu1994
Copy link
Member

I'm looking more into it right now.

@iluuu1994
Copy link
Member

iluuu1994 commented Aug 20, 2022

@zsuraski Thank you for your assessment!

I want to point out that generally we compute the checksum by looking into the persistent_script structure, with everything we know about its semantics - factoring in our knowledge on which parts are immutable - while ignoring the parts that aren’t.

Note that this is different from the things that were already mutable because those things were all at the end of zend_persistent_script and thus easily skippable during hash calculation. The things that break the calculation right now are in the middle of the mem block, as they are changes in classes or op_arrays.

If you take a look at #9377, the inheritance_cache case is relatively easy to fix because the number of classes in a script are usually limited, so we can just loop over them and temporarily reset them. However, for JIT it's not as trivial because the JIT replaces various opcode handlers and managing this would just be increasingly complex.

Another option is to, instead of just iterating over mem, traverse the zend_persistent_script manually but this is also complex and will need to be updated every time something changes in the data structures which seems like something that can be forgotten easily (apart from also being much slower).

Of course, if we can find a simple way to fix this it's preferable to keep the consistency checks. Maybe there's a simple solution that I'm missing.

his is a production-oriented feature

In this case the documentation is wrong, which states this should not be enabled in production.

https://www.php.net/manual/en/opcache.configuration.php#ini.opcache.consistency-checks

iluuu1994 added a commit to iluuu1994/php-src that referenced this issue Aug 20, 2022
iluuu1994 added a commit to iluuu1994/php-src that referenced this issue Aug 20, 2022
iluuu1994 added a commit to iluuu1994/php-src that referenced this issue Aug 20, 2022
iluuu1994 added a commit to iluuu1994/php-src that referenced this issue Aug 20, 2022
@zsuraski
Copy link
Contributor

Thanks @iluuu1994!

I want to point out that generally we compute the checksum by looking into the persistent_script structure, with everything we know about its semantics - factoring in our knowledge on which parts are immutable - while ignoring the parts that aren’t.

Note that this is different from the things that were already mutable because those things were all at the end of zend_persistent_script and thus easily skippable during hash calculation. The things that break the calculation right now are in the middle of the mem block, as they are changes in classes or op_arrays.

To be perfectly honest - I remembered an entirely different implementation of the checksum code, that accounts for things that can change within op_arrays (most notably refcounts on zvals). But either my memory is wrong, or this somehow evolved into not being necessary - as early as when we open source opcache.
Either way - what I remembered - diving into the structures inside ->mem in order to calculate the checksums, clearly doesn't happen in the current codebase.

If you take a look at #9377, the inheritance_cache case is relatively easy to fix because the number of classes in a script are usually limited, so we can just loop over them and temporarily reset them. However, for JIT it's not as trivial because the JIT replaces various opcode handlers and managing this would just be increasingly complex.

Another option is to, instead of just iterating over mem, traverse the zend_persistent_script manually but this is also complex and will need to be updated every time something changes in the data structures which seems like something that can be forgotten easily (apart from also being much slower).

Of course, if we can find a simple way to fix this it's preferable to keep the consistency checks. Maybe there's a simple solution that I'm missing.

his is a production-oriented feature

In this case the documentation is wrong, which states this should not be enabled in production.

It may be that since the (apparent) change to the way checksums are calculated - mprotect is as good as consistency_checks. But I think we need to get to the bottom of this before moving forward:

  1. consistency_checks definitely used to be a production-quality feature, of course that came with a penalty of a certain performance hit - but running a checksum every now and again is often not a huge price to pay for the safety net it used to provide. Can it go back into being that? If it can be, then the docs should be updated. But we're certainly not there right now, and we need to figure out whether we can go back to it.
  2. protect_memory is positioned as a debug-only feature, without explanation as to why. IIRC, there are elements inside ->mem that actually can actually get updated in the course of a perfectly normal run, that in the past we used to filter out from the checksum calculation. If I'm right and it isn't just my inventive memory, perhaps that's why protect_memory isn't entirely reliable - but it doesn't explain why until recently, consistency_checks worked fine while doing a whole ->mem checksum. If this is a thing of the past, then perhaps protect_memory is just as good as consistency_checks reliability-wise, but likely better in terms of performance.

I'll wait for Dmitry to weigh in - maybe he remembers more about the evolution of consistency_checks, and he's certainly a lot more into the current codebase than I am...

@iluuu1994
Copy link
Member

consistency_checks definitely used to be a production-quality feature, of course that came with a penalty of a certain performance hit - but running a checksum every now and again is often not a huge price to pay for the safety net it used to provide. Can it go back into being that?

Provided we can fix the checksum calculation, I think we should change how errors are handled. Currently, only an info is logged (which are not enabled by default), so these errors are easily missed. From an info it's also unclear to the user whether that is even an issue. Scripts are also recompiled with every checksum mismatch which will fill shared memory very quickly if it happens after every request, and thus lead to a complete restart of opcache very often. I think that does more harm than good. It might make more sense to just log the error, keep using the old script and ask the user file a bug. If the safety net leads to filling of memory and crashing servers people will likely not use it 🙂

but it doesn't explain why until recently, consistency_checks worked fine while doing a whole ->mem checksum.

I checked and it looks like the original implementation is very similar to what we have now:

php-src/ZendAccelerator.c

Lines 818 to 839 in 528006a

static unsigned int zend_accel_script_checksum(zend_persistent_script *persistent_script)
{
signed char *mem = (signed char*)persistent_script->mem;
size_t size = persistent_script->size;
size_t persistent_script_check_block_size = ((char *)&(persistent_script->dynamic_members)) - (char *)persistent_script;
unsigned int checksum = ADLER32_INIT;
if (mem < (signed char*)persistent_script) {
checksum = zend_adler32(checksum, mem, (signed char*)persistent_script - mem);
size -= (signed char*)persistent_script - mem;
mem += (signed char*)persistent_script - mem;
}
zend_adler32(checksum, mem, persistent_script_check_block_size);
mem += sizeof(*persistent_script);
size -= sizeof(*persistent_script);
if (size > 0) {
checksum = zend_adler32(checksum, mem, size);
}
return checksum;
}

The zend_persistent_script.dynamic_members are explicitly filtered out, the rest is used for the checksum calculation. If I remember correctly classes were always copied from shared memory into process memory at the time specifically to avoid modifications to shared memory, so this is not something that should've occurred back then.

So, at least from how I understand it, the mem block (apart from the zend_persistent_script.dynamic_members) was indeed completely immutable. This only changed with the JIT (modifying opcode handlers) and inheritance cache (modifying zend_class_entry.inheritance_cache). The reason this was not noticed is likely because we don't have a single test that enables consistency_checks manually, nor do we test it in the opcache variation tests on nightly. This is definitely something we should do if we can sort things out.

I'll wait for Dmitry to weigh in

👍 Since this issue can be avoided by disabling consistency_checks for now there's no rush.

@zsuraski
Copy link
Contributor

consistency_checks definitely used to be a production-quality feature, of course that came with a penalty of a certain performance hit - but running a checksum every now and again is often not a huge price to pay for the safety net it used to provide. Can it go back into being that?

Provided we can fix the checksum calculation, I think we should change how errors are handled. Currently, only an info is logged (which are not enabled by default), so these errors are easily missed. From an info it's also unclear to the user whether that is even an issue. Scripts are also recompiled with every checksum mismatch which will fill shared memory very quickly if it happens after every request, and thus lead to a complete restart of opcache very often. I think that does more harm than good. It might make more sense to just log the error, keep using the old script and ask the user file a bug. If the safety net leads to filling of memory and crashing servers people will likely not use it 🙂

What we're currently seeing is a result of the checksum calculation being inherently broken. Behind that feature, there's a hidden assumption that generally speaking - neither that feature, nor some other glaring part in PHP - are fundamentally broken, and repeatedly result in a varying checksum on every request - which would in turn fill up the shared memory and do a lot more harm than good. If we put these scenarios aside (which apparently aren't that common) - this feature is aimed at providing a safety net against unexpected, not necessarily recurring bugs in PHP or underlying library that happen in certain edge cases.

E.g., IIRC in the past (20 years ago or so), there was a bug in the OCI8 library that overwrote parts of opcodes in some rare situations. The problem is that without that feature - it would effectively mean the server would start crashing on every access to the corrupted file (not just in case of the rare extreme bug, but any subsequent request following it) - without any realistic prospect for recovery other than a server restart. Over the years there have been issues that were detected by this feature, and I can't recall (not that it means that much 🙂) situations similar to the one we're currently experiencing, when corruption was so prevalent that this feature was effectively useless.

but it doesn't explain why until recently, consistency_checks worked fine while doing a whole ->mem checksum.

I checked and it looks like the original implementation is very similar to what we have now:

php-src/ZendAccelerator.c

Lines 818 to 839 in 528006a

static unsigned int zend_accel_script_checksum(zend_persistent_script *persistent_script)
{
signed char *mem = (signed char*)persistent_script->mem;
size_t size = persistent_script->size;
size_t persistent_script_check_block_size = ((char *)&(persistent_script->dynamic_members)) - (char *)persistent_script;
unsigned int checksum = ADLER32_INIT;
if (mem < (signed char*)persistent_script) {
checksum = zend_adler32(checksum, mem, (signed char*)persistent_script - mem);
size -= (signed char*)persistent_script - mem;
mem += (signed char*)persistent_script - mem;
}
zend_adler32(checksum, mem, persistent_script_check_block_size);
mem += sizeof(*persistent_script);
size -= sizeof(*persistent_script);
if (size > 0) {
checksum = zend_adler32(checksum, mem, size);
}
return checksum;
}

It's old, but it's certainly not the original implementation... The original was written back in 1999-2000. So a lot has passed between then and when we open sourced it in 2013. That said, it's quite possible that I'm just not remembering correctly (I remember a much more complex checksum calculation implementation). Unfortunately I no longer have access to the original CVS (!) code that hosted these archeological versions...

So, at least from how I understand it, the mem block (apart from the zend_persistent_script.dynamic_members) was indeed completely immutable. This only changed with the JIT (modifying opcode handlers) and inheritance cache (modifying zend_class_entry.inheritance_cache). The reason this was not noticed is likely because we don't have a single test that enables consistency_checks manually, nor do we test it in the opcache variation tests on nightly. This is definitely something we should do if we can sort things out.

I'll wait for Dmitry to weigh in

👍 Since this issue can be avoided by disabling consistency_checks for now there's no rush.

If there's no way around this then I guess disabling consistency_checks when jit is enabled could be a way to go... But, still need to understand whether we need both consistency_checks and protect_memory.

@dstogov
Copy link
Member

dstogov commented Aug 22, 2022

This is definitely a bug that was introduced with tracing (and adoptive) JIT and then with inheritance cache.

I think the easiest way to fix the problem (without binary compatibility breaks) is maintaining of a skip list (just a sorted plain array of offsets from zend_persistent_script.mem). This list should be used by zend_adler32() to skip some mutable pointers.

@iluuu1994 would you like to try implement this solution?

@iluuu1994
Copy link
Member

@dstogov Sure, I can give it a try. This should work well for inheritance_cache as it is always NULL initially. For tracing JIT, on the initial checksum calculation it will be unknown (I think?) which op handlers will be replaced at a later point in time, so would we need to skip all handlers for the hashsum calculation?

@dstogov
Copy link
Member

dstogov commented Aug 22, 2022

I think, we will need to skip only the handlers overridden by zend_jit_setup_hot_trace_counters() and potentially by zend_jit_trace_setup_ret_counter(). Function JIT may override only the handlers of the first instructions of basic blocks with (ZEND_BB_START|ZEND_BB_ENTRY|ZEND_BB_RECV_ENTRY) flags.

@dstogov
Copy link
Member

dstogov commented Aug 22, 2022

I would construct that skip list, by adding pointers directly from zend_jit_setup_hot_trace_counters() and similar functions for adoptive function JIT.

@iluuu1994
Copy link
Member

@dstogov Thanks for the suggestion! I'll give that a try soon.

@iluuu1994
Copy link
Member

@dstogov This seems to also be necessary for zend_jit_trace_setup_ret_counter. However, I have no idea under what circumstances this handler gets replaced. Can you guide me on how to identify these oplines? Here's the initial PoC:

master...iluuu1994:php-src:checksum-skip-list

It works well for the consistency_checks, (JIT_G(hot_loop) and JIT_G(hot_func), although I'm not sure if there's a way to avoid recomputation of the CFG for JIT_G(hot_loop) and just skipping some oplines more generously. Also, the patch is not binary compatible as we're modifying zend_persistent_script , accel_globals and zend_adler32, but I don't see a way to avoid that.

@dstogov
Copy link
Member

dstogov commented Aug 29, 2022

@iluuu1994 I made only a quick review...

zend_jit_trace_setup_ret_counter may set a handler on zend_op that follows DO_UCALL/DO_FCALL instruction. (run sapi/cli/php -d opcache.jit_debug=0x1ff002 ../Zend/bench.php 2>&1 | less than search for "start (return" for example)

It would be great to avoid CFG recalculation. For tracing JIT you may call checksum_skip_list_add directly from zend_jit_restart_hot_trace_counters() + add instructions after DO_UCALL/DO_FCALL. Function JIT should work out of the box. JIT on first exec, profile request and hot counter may need CFG.

Binary compatibility shouldn't be a problem, because zend_persistent_script , accel_globals and zend_adler32 are used only by opcache.

nielsdos added a commit to nielsdos/php-src that referenced this issue Feb 19, 2023
Fixes phpGH-8065 (see also for analysis)

The class inheritance cache must be excluded from the checksum
computation because it may change (legally) in between the computation
and recomputation.
Furthermore the handler pointer in the zend_op structure may be modified
by the JIT, and the op_array->reserved[] could be changed as well.

This approach is based on iluuu1994's original approach, with some
slight changes. First let's explain the main idea. The idea is to
keep a "skip list" of offsets to skip in the checksum computation.
This will include the offsets described above
(class inheritance cache, handler, reserved). These skips are of size
pointer_size (equals to the native pointer size).

It differs in the sense that this is a pragmatic approach. It was found
that in the original approach it was very difficult to figure out what
can be exactly changed when, and let to redundant computation of the CFG
for example. Furthermore, such a process is also error-prone and complex
to maintain. Instead of that, we just always skip the handler pointer
and the reserved area. The positive is that it leads to code which is
easier to understand and maintain. The negative is that we won't catch
corruptions in those memory areas. However, I believe this is acceptable
because of the following reasons:
* We'd have to be very unlucky to *only* have corruptions in *those*
  areas. Usually, corruptions are not that well-targetted.
* A checksum isn't a fully error-proof thing either, it's very possible
  that there is a corruption in other areas that the checksum doesn't
  capture.

Let's now talk about the implementation. Keeping all those entries in a
skip list wouldn't be efficient: we'd need an entry for every opcode,
which would lead to many many entries. However, this approach actually
implements a "compression" scheme. Instead of just storing the offset,
we actually store a triple:
<offset, size of the area to be checked, amount of repetitions>.
The offset indicates which pointer needs to be skipped. If the number of
repetitions is greater than zero, it will do the following
`repetitions` times:
* Checksum the bytes at
  [offset + pointer_size, offset + pointer_size + size of area to be checked]
* Move the offset to after the area that was just checksummed +
  pointer_size.
This allows us to only use one skip list entry for all the opcodes in an
op_array.

The offsets also aren't checked in the zend_adler32() function itself,
the zend_accel_script_checksum() function will checksum in "blocks"
of bytes that should not be skipped, by setting the size for
zend_alder32() in such a way that it computes the checksum until the
next offset to skip. The main advantage of this is better performance,
since there are fewer checks, and the accelerated SSE2 (or future other
accelerated) routines can keep being used.

Finally some other minor differences:
* We precompute the exact size needed for the skip list. This avoids
  reallocations and also fixes one issue where there was a warning about
  the computed size not being equals to the actual size.
nielsdos added a commit to nielsdos/php-src that referenced this issue Feb 19, 2023
…P >= 8.1.5 in fpm context

Fixes phpGH-8065 (see also for analysis)

The class inheritance cache must be excluded from the checksum
computation because it may change (legally) in between the computation
and recomputation.
Furthermore the handler pointer in the zend_op structure may be modified
by the JIT, and the op_array->reserved[] could be changed as well.

This approach is based on iluuu1994's original approach, with some
slight changes. First let's explain the main idea. The idea is to
keep a "skip list" of offsets to skip in the checksum computation.
This will include the offsets described above
(class inheritance cache, handler, reserved). These skips are of size
pointer_size (equals to the native pointer size).

It differs in the sense that this is a pragmatic approach. It was found
that in the original approach it was very difficult to figure out what
can be exactly changed when, and let to redundant computation of the CFG
for example. Furthermore, such a process is also error-prone and complex
to maintain. Instead of that, we just always skip the handler pointer
and the reserved area. The positive is that it leads to code which is
easier to understand and maintain. The negative is that we won't catch
corruptions in those memory areas. However, I believe this is acceptable
because of the following reasons:
* We'd have to be very unlucky to *only* have corruptions in *those*
  areas. Usually, corruptions are not that well-targetted.
* A checksum isn't a fully error-proof thing either, it's very possible
  that there is a corruption in other areas that the checksum doesn't
  capture.

Let's now talk about the implementation. Keeping all those entries in a
skip list wouldn't be efficient: we'd need an entry for every opcode,
which would lead to many many entries. However, this approach actually
implements a "compression" scheme. Instead of just storing the offset,
we actually store a triple:
<offset, size of the area to be checked, amount of repetitions>.
The offset indicates which pointer needs to be skipped. If the number of
repetitions is greater than zero, it will do the following
`repetitions` times:
* Checksum the bytes at
  [offset + pointer_size, offset + pointer_size + size of area to be checked]
* Move the offset to after the area that was just checksummed +
  pointer_size.
This allows us to only use one skip list entry for all the opcodes in an
op_array.

The offsets also aren't checked in the zend_adler32() function itself,
the zend_accel_script_checksum() function will checksum in "blocks"
of bytes that should not be skipped, by setting the size for
zend_alder32() in such a way that it computes the checksum until the
next offset to skip. The main advantage of this is better performance,
since there are fewer checks, and the accelerated SSE2 (or future other
accelerated) routines can keep being used.

Finally some other minor differences:
* We precompute the exact size needed for the skip list. This avoids
  reallocations and also fixes one issue where there was a warning about
  the computed size not being equals to the actual size.
nielsdos added a commit to nielsdos/php-src that referenced this issue Feb 20, 2023
…ts in PHP >= 8.1.5 in fpm context"

This reverts commit 9353452.
nielsdos added a commit to nielsdos/php-src that referenced this issue Feb 21, 2023
…P >= 8.1.5 in fpm context

Fixes phpGH-8065 (see also for analysis)

The class inheritance cache must be excluded from the checksum
computation because it may change (legally) in between the computation
and recomputation.
Furthermore the handler pointer in the zend_op structure may be modified
by the JIT, and the op_array->reserved[] could be changed as well.

This approach is based on iluuu1994's original approach, with some
slight changes. First let's explain the main idea. The idea is to
keep a "skip list" of offsets to skip in the checksum computation.
This will include the offsets described above
(class inheritance cache, handler, reserved). These skips are of size
pointer_size (equals to the native pointer size).

It differs in the sense that this is a pragmatic approach. It was found
that in the original approach it was very difficult to figure out what
can be exactly changed when, and let to redundant computation of the CFG
for example. Furthermore, such a process is also error-prone and complex
to maintain. Instead of that, we just always skip the handler pointer
and the reserved area. The positive is that it leads to code which is
easier to understand and maintain. The negative is that we won't catch
corruptions in those memory areas. However, I believe this is acceptable
because of the following reasons:
* We'd have to be very unlucky to *only* have corruptions in *those*
  areas. Usually, corruptions are not that well-targetted.
* A checksum isn't a fully error-proof thing either, it's very possible
  that there is a corruption in other areas that the checksum doesn't
  capture.

Let's now talk about the implementation. Keeping all those entries in a
skip list wouldn't be efficient: we'd need an entry for every opcode,
which would lead to many many entries. However, this approach actually
implements a "compression" scheme. Instead of just storing the offset,
we actually store a triple:
<offset, size of the area to be checked, amount of repetitions>.
The offset indicates which pointer needs to be skipped. If the number of
repetitions is greater than zero, it will do the following
`repetitions` times:
* Checksum the bytes at
  [offset + pointer_size, offset + pointer_size + size of area to be checked]
* Move the offset to after the area that was just checksummed +
  pointer_size.
This allows us to only use one skip list entry for all the opcodes in an
op_array.

The offsets also aren't checked in the zend_adler32() function itself,
the zend_accel_script_checksum() function will checksum in "blocks"
of bytes that should not be skipped, by setting the size for
zend_alder32() in such a way that it computes the checksum until the
next offset to skip. The main advantage of this is better performance,
since there are fewer checks, and the accelerated SSE2 (or future other
accelerated) routines can keep being used.

Finally some other minor differences:
* We precompute the exact size needed for the skip list. This avoids
  reallocations and also fixes one issue where there was a warning about
  the computed size not being equals to the actual size.
nielsdos added a commit to nielsdos/php-src that referenced this issue Feb 22, 2023
…P >= 8.1.5 in fpm context

Fixes phpGH-8065 (see also for analysis)

The class inheritance cache must be excluded from the checksum
computation because it may change (legally) in between the computation
and recomputation.
Furthermore the handler pointer in the zend_op structure may be modified
by the JIT, and the op_array->reserved[] could be changed as well.

This approach is based on iluuu1994's original approach, with some
slight changes. First let's explain the main idea. The idea is to
keep a "skip list" of offsets to skip in the checksum computation.
This will include the offsets described above
(class inheritance cache, handler, reserved). These skips are of size
pointer_size (equals to the native pointer size).

It differs in the sense that this is a pragmatic approach. It was found
that in the original approach it was very difficult to figure out what
can be exactly changed when, and let to redundant computation of the CFG
for example. Furthermore, such a process is also error-prone and complex
to maintain. Instead of that, we just always skip the handler pointer
and the reserved area. The positive is that it leads to code which is
easier to understand and maintain. The negative is that we won't catch
corruptions in those memory areas. However, I believe this is acceptable
because of the following reasons:
* We'd have to be very unlucky to *only* have corruptions in *those*
  areas. Usually, corruptions are not that well-targetted.
* A checksum isn't a fully error-proof thing either, it's very possible
  that there is a corruption in other areas that the checksum doesn't
  capture.

Let's now talk about the implementation. Keeping all those entries in a
skip list wouldn't be efficient: we'd need an entry for every opcode,
which would lead to many many entries. However, this approach actually
implements a "compression" scheme. Instead of just storing the offset,
we actually store a triple:
<offset, size of the area to be checked, amount of repetitions>.
The offset indicates which pointer needs to be skipped. If the number of
repetitions is greater than zero, it will do the following
`repetitions` times:
* Checksum the bytes at
  [offset + pointer_size, offset + pointer_size + size of area to be checked]
* Move the offset to after the area that was just checksummed +
  pointer_size.
This allows us to only use one skip list entry for all the opcodes in an
op_array.

The offsets also aren't checked in the zend_adler32() function itself,
the zend_accel_script_checksum() function will checksum in "blocks"
of bytes that should not be skipped, by setting the size for
zend_alder32() in such a way that it computes the checksum until the
next offset to skip. The main advantage of this is better performance,
since there are fewer checks, and the accelerated SSE2 (or future other
accelerated) routines can keep being used.

Finally some other minor differences:
* We precompute the exact size needed for the skip list. This avoids
  reallocations and also fixes one issue where there was a warning about
  the computed size not being equals to the actual size.
nielsdos added a commit to nielsdos/php-src that referenced this issue Mar 1, 2023
…P >= 8.1.5 in fpm context

Fixes phpGH-8065 (see also for analysis)

The class inheritance cache must be excluded from the checksum
computation because it may change (legally) in between the computation
and recomputation.
Furthermore the handler pointer in the zend_op structure may be modified
by the JIT, and the op_array->reserved[] could be changed as well.

This approach is based on iluuu1994's original approach, with some
slight changes. First let's explain the main idea. The idea is to
keep a "skip list" of offsets to skip in the checksum computation.
This will include the offsets described above
(class inheritance cache, handler, reserved). These skips are of size
pointer_size (equals to the native pointer size).

It differs in the sense that this is a pragmatic approach. It was found
that in the original approach it was very difficult to figure out what
can be exactly changed when, and let to redundant computation of the CFG
for example. Furthermore, such a process is also error-prone and complex
to maintain. Instead of that, we just always skip the handler pointer
and the reserved area. The positive is that it leads to code which is
easier to understand and maintain. The negative is that we won't catch
corruptions in those memory areas. However, I believe this is acceptable
because of the following reasons:
* We'd have to be very unlucky to *only* have corruptions in *those*
  areas. Usually, corruptions are not that well-targetted.
* A checksum isn't a fully error-proof thing either, it's very possible
  that there is a corruption in other areas that the checksum doesn't
  capture.

Let's now talk about the implementation. Keeping all those entries in a
skip list wouldn't be efficient: we'd need an entry for every opcode,
which would lead to many many entries. However, this approach actually
implements a "compression" scheme. Instead of just storing the offset,
we actually store a triple:
<offset, size of the area to be checked, amount of repetitions>.
The offset indicates which pointer needs to be skipped. If the number of
repetitions is greater than zero, it will do the following
`repetitions` times:
* Checksum the bytes at
  [offset + pointer_size, offset + pointer_size + size of area to be checked]
* Move the offset to after the area that was just checksummed +
  pointer_size.
This allows us to only use one skip list entry for all the opcodes in an
op_array.

The offsets also aren't checked in the zend_adler32() function itself,
the zend_accel_script_checksum() function will checksum in "blocks"
of bytes that should not be skipped, by setting the size for
zend_alder32() in such a way that it computes the checksum until the
next offset to skip. The main advantage of this is better performance,
since there are fewer checks, and the accelerated SSE2 (or future other
accelerated) routines can keep being used.

Finally some other minor differences:
* We precompute the exact size needed for the skip list. This avoids
  reallocations and also fixes one issue where there was a warning about
  the computed size not being equals to the actual size.
nielsdos added a commit to nielsdos/php-src that referenced this issue Mar 6, 2023
This feature does not work right now and leads to memory leaks and other
problems. For analysis and discussion see phpGH-8065. In phpGH-10624 it was
decided to disable the feature to prevent problems for end users.
If end users which to get some consistency guarantees, they can rely on
opcache.protect_memory.
nielsdos added a commit to nielsdos/php-src that referenced this issue Mar 6, 2023
This feature does not work right now and leads to memory leaks and other
problems. For analysis and discussion see phpGH-8065. In phpGH-10624 it was
decided to disable the feature to prevent problems for end users.
If end users which to get some consistency guarantees, they can rely on
opcache.protect_memory.
nielsdos added a commit that referenced this issue Mar 7, 2023
* PHP-8.1:
  Fix GH-8065: opcache.consistency_checks > 0 causes segfaults in PHP >= 8.1.5 in fpm context
  Fix GH-8646: Memory leak PHP FPM 8.1
nielsdos added a commit that referenced this issue Mar 7, 2023
* PHP-8.2:
  Re-add some CTE functions that were removed from being CTE by a mistake
  Fix GH-8065: opcache.consistency_checks > 0 causes segfaults in PHP >= 8.1.5 in fpm context
  Fix GH-8646: Memory leak PHP FPM 8.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
@CyberLine @iluuu1994 @zsuraski @boesing @cmb69 @dstogov @ItsReddi and others