Introduce a type check cache (TCC) #5096

dtakken · 2020-01-19T22:07:19Z

This was an experiment I did while trying to learn more about PHP internals. When the union types feature was merged it raised some concerns regarding the cost of complex type checks. I was wondering if type checks were cached in some way. It turned out this was not the case. The same type check is redone over and over again, both simple and complex checks.

I also noticed that the JIT compiler does not generate efficient code for complex type checks. A cache would turn complex checks into simple lookups, probably simple enough to implement in the JIT compiler. A double gain.

So here it is, a type check cache for PHP. The PR also extends the JIT compiler to exploit the cache. Where the JIT generated code previously had to bail out to slow code paths it now keeps running at full jitty speed.

Some key characteristics:

Only type checks involving classes are cached
Cache is global, shared between all op arrays
Cache size is dynamic, it grows on demand at run time
Cache is limited to a configurable maximum size
Argument type checks, return type checks and typed property writes are supported

An additional gain that having a cache may bring is that it might allow the PHP type system to continue to develop in directions that are currently not considered due to the performance cost involved.

Weaknesses

When the cache hits the configured memory constraints some type checks will not be cached. Classes are assigned a cache slot on a first come first served basis. There is no guarantee that the classes that are most used in type checks get a cache slot. However, it is possible to re-assign cache slots at run time. It may be advantageous to keep track of cache hits and misses at run time and optimize the cache at some point. This could be implemented later.
For reasons of simplicity the cache will typically contain a lot of entries for type / class combinations that are never actually checked in the application.
Significant performance gains are only to be expected for hot, type check heavy code paths.
This PR only extends the JIT compiler to exploit the cache for doing argument type checks, because handlers for these checks are already in place. Handling of return type checking and typed property writes appears to be missing in the JIT compiler at this time. Accelerating these using the cache will have to wait.

Things I am not sure about

Sane defaults for cache capacity. Currently, the default is to have at most 1024 classes and 1024 type declarations. This means the cache can grow to 1MB of memory. Having 1024 classes (not counting interfaces, abstract classes and traits) and 1024 globally distinct type declarations sounds like a lot to me but I have no numbers of average real world code bases.
Each entry in the TCC occupies one byte of memory to store a zero or a one, which is a bit wasteful. Using single bits is possible but this makes TCC lookups more expensive. In practice I do not expect the TCC to ever require much more than a couple of MB at most as it is, so I'm not sure if compressing it makes sense. There may be CPU caching effects to think about as well here.

Some numbers finally

Using a benchmarking script I measured the performance difference relative to current master. The script is based on the script written by Dmitry Stogov to benchmark union type checks. It can be found here:

https://gist.github.com/dtakken/1539d64170921363dc8d1ed62effcd45

Below I placed the benchmark results I obtained of the master branch and the tcc branch side by side to compare the overall performance gain. The numbers in the leftmost columns are time spent doing a large number of operations that trigger type checks in a tight loop. Overhead of the loop itself is subtracted. First some numbers with JIT turned off:

                                   master tcc    speedup
Foo::$static_prop = ...            0.390  0.402  -3%
Foo::$class_union_static_prop = ...0.878  0.627  40%
$o->prop = ...                     0.375  0.384  -2%
$o->class_union_prop = ...         0.831  0.682  22%
func($x)                           0.814  0.817  0%
func($obj)                         1.049  0.948  11%
func(A|B|C|null $obj) (null)       0.784  0.890  -12%
func(A|B|C|null $obj) (A)          1.148  1.155  -1%
func(A|B|C|null $obj) (C)          1.427  1.157  23%
func(A|B|C|null $obj) (D)          1.526  1.154  32%
func(A|B|C|null $obj) (E)          1.608  1.153  39%
func(A|B|C|null $obj) (F)          1.599  1.159  38%
func($obj): A|B|C|null (A)         0.986  0.991  -1%
func($obj): A|B|C|null (C)         1.218  0.989  23%
func($obj): A|B|C|null (D)         1.302  1.002  30%
func($obj): A|B|C|null (E)         1.341  1.006  33%
func($obj): A|B|C|null (F)         1.451  0.995  46%

The numbers are slightly noisy. Still, the effect of the TCC shows nicely here. With the TCC enabled, the cost of simple and complex checks is similar.

Next, the same run with JIT enabled:

                                   master tcc    speedup
Foo::$static_prop = ...            0.608  0.641  -5%
Foo::$class_union_static_prop = ...1.341  0.888  51%
$o->prop = ...                     0.685  0.685  0%
$o->class_union_prop = ...         1.344  0.898  50%
func($x)                           0.305  0.349  -13%
func($obj)                         0.430  0.425  1%
func(A|B|C|null $obj) (null)       0.364  0.380  -4%
func(A|B|C|null $obj) (A)          0.731  0.471  55%
func(A|B|C|null $obj) (C)          1.747  0.472  270%
func(A|B|C|null $obj) (D)          2.255  0.472  378%
func(A|B|C|null $obj) (E)          2.388  0.471  407%
func(A|B|C|null $obj) (F)          2.508  0.472  431%
func($obj): A|B|C|null (A)         0.914  0.927  -1%
func($obj): A|B|C|null (C)         1.217  0.926  31%
func($obj): A|B|C|null (D)         1.392  0.931  50%
func($obj): A|B|C|null (E)         1.543  0.930  66%
func($obj): A|B|C|null (F)         1.627  0.927  76%

While these numbers look really nice, there are some important things to take into consideration here.

The results of the argument type checks are a bit unfair because it partly compares static compiled code with fully JIT generated code. On the other hand, without the TCC where would not have been efficient JIT code to start with.
Some operations like typed property assignments and return type checks have no JIT equivalent yet, which means that JIT slows things down. The observed gains will increase once support for these operations is added.

The final measurement compares the performance of the master branch to the performance of the tcc branch while setting the TCC capacity to zero. This shows the worst case scenario of having a cache in place while badly misconfiguring it:

                                   master tcc miss   speedup
Foo::$static_prop = ...            0.390  0.397      -2%
Foo::$class_union_static_prop = ...0.878  0.908      -3%
$o->prop = ...                     0.375  0.384      -2%
$o->class_union_prop = ...         0.831  0.889      -7%
func($x)                           0.814  0.827      -2%
func($obj)                         1.049  1.013      4%
func(A|B|C|null $obj) (null)       1.148  1.424      -19%
func(A|B|C|null $obj) (A)          1.427  1.612      -11%
func(A|B|C|null $obj) (C)          1.526  1.697      -10%
func(A|B|C|null $obj) (D)          1.608  1.82       -12%
func(A|B|C|null $obj) (E)          0.784  0.836      -6%
func(A|B|C|null $obj) (F)          1.599  2.027      -21%
func($obj): A|B|C|null (A)         0.986  1.204      -18%
func($obj): A|B|C|null (C)         1.218  1.457      -16%
func($obj): A|B|C|null (D)         1.302  1.532      -15%
func($obj): A|B|C|null (E)         1.341  1.655      -19%
func($obj): A|B|C|null (F)         1.451  1.701      -15%

Please note that this is my first significant contribution, I'm not familiar with the things I had to touch. Careful review is highly appreciated.

dtakken · 2020-01-19T22:11:04Z

I noticed I need to rebase (master is moving fast!), will do that later.

Girgias · 2020-01-20T21:50:29Z

CI failures are legit, seems there are various Segfaults and Bus errors.

nikic · 2020-01-21T17:40:51Z

Zend/zend_opcode.c

@@ -1039,6 +1040,9 @@ ZEND_API int pass_two(zend_op_array *op_array)
 		opline++;
 	}

+	// TODO: Should we re-assign CE columns in opcache after loading them from cache?
+	tcc_assign_ce_columns();


So yeah, this is a problem. Classes loaded from opcache may be immutable, which means that it's not possible to change the index. One could use a MAP pointer for this purpose, which adds an extra level of indirection.

CI failures are legit, seems there are various Segfaults and Bus errors.

I am looking into these. Thanks.

So yeah, this is a problem. Classes loaded from opcache may be immutable, which means that it's not possible to change the index. One could use a MAP pointer for this purpose, which adds an extra level of indirection.

Ah, I did not consider this. Classes can be shared between processes, so these processes cannot write their own stuff into them.

Reconsidering, I think I still need the class entries to have a consecutive integer ID that is unique. But unique for all classes that exist in opcache, which means that opcache should assign them and guarantee uniqueness. Sounds tricky. Then, each process could map that global ID to a column index in the local TCC. I'm not sure if that is what you mean by a MAP pointer though.

cmb69 · 2021-12-28T14:48:04Z

I'm closing this PR due to inactivity. @dtakken, feel free to fix the merge conflicts, address the test failures, and re-open.

Thanks for your work, anyway! :)

Dik Takken added 2 commits January 20, 2020 18:06

Introduce type check cache

082d5a4

Extend JIT compiler to use type check cache

31930ac

dtakken force-pushed the tcc branch from fff8b6c to 31930ac Compare January 20, 2020 19:52

nikic reviewed Jan 21, 2020

View reviewed changes

Girgias added the Waiting on Author label Nov 3, 2020

cmb69 closed this Dec 28, 2021

cmb69 removed the Waiting on Author label Dec 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce a type check cache (TCC) #5096

Introduce a type check cache (TCC) #5096

Uh oh!

dtakken commented Jan 19, 2020

Uh oh!

dtakken commented Jan 19, 2020

Uh oh!

Girgias commented Jan 20, 2020

Uh oh!

nikic Jan 21, 2020

Uh oh!

dtakken Jan 21, 2020

Uh oh!

dtakken Jan 21, 2020

Uh oh!

cmb69 commented Dec 28, 2021

Uh oh!

Uh oh!

Introduce a type check cache (TCC) #5096

Introduce a type check cache (TCC) #5096

Uh oh!

Conversation

dtakken commented Jan 19, 2020

Weaknesses

Things I am not sure about

Some numbers finally

Uh oh!

dtakken commented Jan 19, 2020

Uh oh!

Girgias commented Jan 20, 2020

Uh oh!

nikic Jan 21, 2020

Choose a reason for hiding this comment

Uh oh!

dtakken Jan 21, 2020

Choose a reason for hiding this comment

Uh oh!

dtakken Jan 21, 2020

Choose a reason for hiding this comment

Uh oh!

cmb69 commented Dec 28, 2021

Uh oh!

Uh oh!