-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate slow writes to class variables #80193
Comments
Benchmark show what writes to class variables are anomalously slow. class A(object):
pass
A.x = 1 # This write is 3 to 5 times slower than other writes. FWIW, the same operation for old-style classes in Python 2.7 was several times faster. We should investigate to understand why the writes are so slow. There might be a good reason or there might be an opportunity for optimization. ------------------------------------------------- $ python3.8 Tools/scripts/var_access_benchmark.py
Variable and attribute read access:
4.3 ns read_local
4.6 ns read_nonlocal
14.5 ns read_global
19.0 ns read_builtin
18.4 ns read_classvar_from_class
16.2 ns read_classvar_from_instance
24.7 ns read_instancevar
19.7 ns read_instancevar_slots
19.5 ns read_namedtuple
26.4 ns read_boundmethod Variable and attribute write access: |
It seems 50% of the overhead (50ns) is due to two reasons:
res = _PyObject_GenericSetAttrWithDict((PyObject *)type, name, value, NULL); by PyObject* dictptr = _PyObject_GetDictPtr(type);
res = _PyObjectDict_SetItem(type, dictptr, name, value); and delete the update_slot(type, name) call afterwards, the times are reduced to 50ns. |
can you include your python 2.7 runs? for me it looks similar |
It will give similar results unless you switch to old-style classes (edit out the inheritance from object). class A:
pass
A.x = 1 $ python2.7 var_access_benchmark.py
Variable and attribute read access:
6.7 ns read_local
10.9 ns read_global
18.9 ns read_builtin
24.4 ns read_classvar_from_class
30.2 ns read_classvar_from_instance
22.7 ns read_instancevar
25.5 ns read_instancevar_slots
99.3 ns read_namedtuple
40.9 ns read_boundmethod Variable and attribute write access: |
It turns out that "update_slot()" is always called, even when we are not updating a slot name (which is always a special dunder-name). The linear search for names in "update_slots()" is a huge waste of time here, and short-circuiting out of it when the name does not start with "_" cuts the overall update time by 50%. I pushed a PR. Another improvement would be a sub-linear algorithm for searching the slot name, but that's a bigger change. |
This are the timings that I am measuring with PR 11907: Variable and attribute read access: Variable and attribute write access: |
Wow, I didn't expect to get an immediate win this of this magnitude :-) |
Some profiling using 'perf'. This is for cpython 63fa1cf. children self After applying PR 11907: children self The PyUnicode_InternInPlace() can mostly be eliminated by testing PyUnicode_CHECK_INTERNED() first (we already have called PyUnicode_Check() on it). That only gives a 7% speedup on my machine though. The is_dunder_name() is a much bigger optimization. |
BTW, 'perf report [...]' has a really neat annotated assembly view. Scroll to the function you are interested in and press 'a'. Press 't' to toggle the time units (left side numbers). I'm attaching a screenshot of the disassembly of the lookdict function. The time units are sample accounts. Each count is a point where the profiler woke up on that specific instruction. |
Thanks Neil. The tooling does indeed look nice. I added your PyUnicode_InternInPlace() suggestion to the PR. At this point, the PR looks ready-to-go unless any of you think we've missed some low-hanging fruit. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: