-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unknown error handlers should be reported early #81569
Comments
I was just bit by specifying an nonexisitng error handler for bytes.decode() without noticing. Consider this code: >>> 'a'.encode('cp1250').decode('utf-8', errors='Boom, Shaka Laka, Boom!')
'a' Nobody notices that the error handler doesn't exist. However: >>> 'ž'.encode('cp1250').decode('utf-8', errors='Boom, Shaka Laka, Boom!')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: unknown error handler name 'Boom, Shaka Laka, Boom!' The error is only noticeable once there is an error in the data. While nobody could possibly mistake 'Boom, Shaka Laka, Boom!' for a valid error handler, I was bit by this:
Which in fact should have been
Yet I wasn't notified, because the bytes in question were actually decodeable as valid utf-8. I suggest that unknown error handler should rise an exception immediately like this: >>> 'b'.encode('cp1250').decode('utf-8', errors='Boom, Shaka Laka, Boom!')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: unknown error handler name 'Boom, Shaka Laka, Boom!' |
Getting an error handler is expensive compared to the time to encode text/decode bytes. Python loads the error handler lazily to provide best performances. Text codecs are performance critical for Python. If we add a check, it should only be enabled in development (python3 -X dev) and/or debug mode (./configure --with-pydebug). |
bench.py: microbenchmark to measure the overhead of PR 14341. |
I compared ref (commit e1a63c4) to patch (commit ed076ed). I ran bench.py using: # edit Makefile.pre.in to use: ./configure --enable-optimizations && make $ python3 -m pyperf compare_to ref_e1a63c4f2.json patch_ed076ed4.json
0B: Mean +- std dev: [ref_e1a63c4f2] 76.7 ns +- 0.9 ns -> [patch_ed076ed4] 77.5 ns +- 2.9 ns: 1.01x slower (+1%)
1B: Mean +- std dev: [ref_e1a63c4f2] 92.9 ns +- 0.8 ns -> [patch_ed076ed4] 94.0 ns +- 2.4 ns: 1.01x slower (+1%)
5B: Mean +- std dev: [ref_e1a63c4f2] 106 ns +- 2 ns -> [patch_ed076ed4] 110 ns +- 2 ns: 1.04x slower (+4%)
10B: Mean +- std dev: [ref_e1a63c4f2] 105 ns +- 1 ns -> [patch_ed076ed4] 109 ns +- 1 ns: 1.03x slower (+3%)
25B: Mean +- std dev: [ref_e1a63c4f2] 108 ns +- 3 ns -> [patch_ed076ed4] 111 ns +- 3 ns: 1.03x slower (+3%)
100B: Mean +- std dev: [ref_e1a63c4f2] 114 ns +- 1 ns -> [patch_ed076ed4] 115 ns +- 2 ns: 1.01x slower (+1%)
1000B: Mean +- std dev: [ref_e1a63c4f2] 267 ns +- 3 ns -> [patch_ed076ed4] 253 ns +- 4 ns: 1.06x faster (-5%)
$ python3 -m pyperf compare_to ref_e1a63c4f2.json patch_ed076ed4.json --table
+-----------+---------------+-----------------------------+
| Benchmark | ref_e1a63c4f2 | patch_ed076ed4 |
+===========+===============+=============================+
| 0B | 76.7 ns | 77.5 ns: 1.01x slower (+1%) |
+-----------+---------------+-----------------------------+
| 1B | 92.9 ns | 94.0 ns: 1.01x slower (+1%) |
+-----------+---------------+-----------------------------+
| 5B | 106 ns | 110 ns: 1.04x slower (+4%) |
+-----------+---------------+-----------------------------+
| 10B | 105 ns | 109 ns: 1.03x slower (+3%) |
+-----------+---------------+-----------------------------+
| 25B | 108 ns | 111 ns: 1.03x slower (+3%) |
+-----------+---------------+-----------------------------+
| 100B | 114 ns | 115 ns: 1.01x slower (+1%) |
+-----------+---------------+-----------------------------+
| 1000B | 267 ns | 253 ns: 1.06x faster (-5%) |
+-----------+---------------+-----------------------------+ The overhead of my change is around 1 ns, 4 ns (on 106 ns) in the worst case (5B). The change "looks" faster on the 1000B case, but it's likely a glitch of PGO compilation which is not really deterministic. |
Ok, the encoding and errors are now checked in almost all cases if you enable the development mode using -X dev command line option. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: