See also fatoptimizer optimizations.
Example:
def _get_sep(path):
if isinstance(path, bytes):
return b'/'
else:
return '/'
def isabs(s):
"""Test whether a path is absolute"""
sep = _get_sep(s)
return s.startswith(sep)
Inline _get_sep()
into isabs()
and simplify the code for the str
type:
def isabs(s: str):
return s.startswith('/')
It can be implemented as a simple call to the C function PyUnicode_Tailmatch()
.
Note: Inlining uses more memory and disk because the original function should be kept. Except if the inlined function is unreachable (ex: "private function"?).
Links:
- Issue #10399: AST Optimization: inlining of function calls
See issue #26110: Speedup method calls 1.2x
Example:
def func(obj, lines):
for text in lines:
print(obj.cleanup(text))
Become:
def func(obj, lines):
local_print = print
obj_cleanup = obj.cleanup
for text in lines:
local_print(obj_cleanup(text))
Local variables are faster than global variables and the attribute lookup is only done once.
Optimizations:
- Avoid reference counting
- Memory allocations on the heap
- Release the GIL
Example:
def demo():
s = 0
for i in range(10):
s += i
return s
In specialized code, it may be possible to use basic C types like char
or int
instead of Python codes which can be allocated on the stack, instead of allocating objects on the heap. i
and s
variables are integers in the range [0; 45]
and so a simple C type int
(or even char
) can be used:
PyObject *demo(void)
{
int s, i;
Py_BEGIN_ALLOW_THREADS
s = 0;
for(i=0; i<10; i++)
s += i;
Py_END_ALLOW_THREADS
return PyLong_FromLong(s);
}
Note: if the function is slow, we may need to check sometimes if a signal was received.
Many methods of builtin types don't need the GIL <gil>
. Example: "abc".startswith("def")
.
Examples:
len('abc')
becomes3
"python2.7".startswith("python")
becomesTrue
math.log(32) / math.log(2)
becomes5.0
Can be implemented in the AST optimizer.
Propagate constant values of variables. Example:
Original | Constant propagation |
---|---|
|
|
Implemented in fatoptimizer.
Read also the Wikipedia article on copy propagation.
Compute simple operations at the compilation. Usually, at least arithmetic operations (a+b, a-b, a*b, etc.) are computed. Example:
Original | Constant folding |
---|---|
|
|
Implemented in fatoptimizer and the CPython peephole optimizer
<cpython-peephole>
.
See also
- issue #1346238: A constant folding optimization pass for the AST
- Wikipedia article on constant folding.
See CPython peephole optimizer <cpython-peephole>
.
Example:
for i in range(4):
print(i)
The loop body can be duplicated (twice in this example) to reduce the cost of a loop:
for i in range(0,4,2):
print(i)
print(i+1)
i = 3
Or the loop can be removed by duplicating the body for all loop iterations:
i=0
print(i)
i=1
print(i)
i=2
print(i)
i=3
print(i)
Combined with other optimizations, the code can be simplified to:
print('0')
print('1')
print('2')
i = 3
print('3')
Implemented in fatoptimizer
Read also the Wikipedia article on loop unrolling.
- Replace
if 0: code
withpass
if DEBUG: print("debug")
whereDEBUG
is known to be False
Implemented in fatoptimizer and the CPython peephole optimizer
<cpython-peephole>
.
See also Wikipedia Dead code elimination article.
Load globals when the module is loaded? Ex: load "print" name when the module is loaded.
Example:
def hello():
print("Hello World")
Become:
local_print = print
def hello():
local_print("Hello World")
Useful if hello()
is compiled to C code.
fatoptimizer implements a "copy builtins to constants optimization" optimization.
Inlining and other optimizations don't create Python frames anymore. It can be a serious issue to debug programs: tracebacks are an important feature of Python.
At least in debug mode, frames should be created.
PyPy supports lazy creation of frames if an exception is raised.