-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Not sure where to put this, so I'm putting it an issue for now.
Speculative specialization
Speculative specialization is absolutely the key optimization for any dynamic language. The evidence for this is overwhelming. Most optimizations for dynamic languages will do one of these:
- Specialize
- Exploit specialization
- Enable specialization
Examples:
- Type profiling: enable specializations
- Compilation to machine code: Exploits specialization (we can't usefully compile until we have something like statically type code)
- Inline caches: enable specialization and specialize (that's why inline caches are good, they both enable and perform specialization)
Definition
Speculative specialization is where the VM runs fast for some code when some predicate holds, expecting that that predicate will nearly always be true. It is also necessary that the VM runs correctly when that predicate does not hold, although it may be very slow in that case.
Analysis
Ttotal = Tfast * p + Tslow *(1-p)
where p
is the probability of the predicate holding.
An optimization is useful if Tfast * p + Tslow *(1-p) < Tgeneral
where Tgeneral
is the time for the unoptimized operation.
Provided p ≃ 1
then it doesn't matter how large Tslow
is, because if p ≃ 1
then Ttotal ≃ Tfast
For smaller values of p
, some care needs to be taken that Tslow
isn't too much slower than Tgeneral
, the unoptimized case.
If Tslow ≃ Tgeneral
then it doesn't matter too much what p
is, although if p
is small, the performance gains will be small.
Practicalities
Although specialization is often thought to be by type, or something similar like a type, fundamentally specialization can be on any predicate. However we want things to be fast, so the predicates must be fast and thus simple. Typically they should be a few (ideally zero) simple operations on an object, followed by a test-and-branch.
if (predicate(obj) != EXPECTED_VALUE) {
goto slow_path;
}
Often we will have to use some a proxy for the predicate we want, as the exact predicate is too slow.
If predicate p1
is expensive to compute, we can use a proxy predicate p2
provided p2
is fast to compute and p2(obj) ⇒ p1(obj)
. Note that we do not require that p1(obj) ⇒ p2(obj)
.
For example, suppose we want to specialize an operation depending on whether some attribute of the class of an object is as expected, that is type(obj).attr is EXPECTED
.
However attribute lookup on a type is expensive, so we use the version of the type as a proxy.
The expensive test:
PyType_Lookup(obj->ob_type, "attr") == EXPECTED_OBJECT
can be replaced with the much cheaper test:
obj->ob_type->tp_version == EXPECTED_VERSION
We have to accept that the faster predicate may produce false negatives.