-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
repl segfaults on non utf-8 input #91273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Some bytes that are non utf-8 segfaults python repl in 3.10 and later on linux. Example: $ python3.10
Python 3.10.4 (main, Mar 24 2022, 14:20:44) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> �
Segmentation fault (core dumped) It is treated correctly in Python 3.9 and earlier $ python3.9
Python 3.9.12 (main, Mar 24 2022, 14:21:53)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> �
File "<stdin>", line 0
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte How to reproduce: In Gnome on Ubuntu 20.04 with the Swedish keyboard layout, holding left alt and pressing the ö key enters the byte 0xb6 into the terminal. I have only been able to make it crash the repl. I can't make it crash the parser. For instance trying to eval the byte. |
This looks similar to https://bugs.python.org/issue46206 |
Yes. I think they are the same. I can reproduce the emoji crash. This is much easier to reproduce. No need to have a Swedish keyboard layout.
|
very similar back trace too (gdb) run
Starting program: /home/jon/.pyenv/versions/3.10.4/bin/python3.10
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 3.10.4 (main, Mar 24 2022, 14:20:44) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> _
Program received signal SIGSEGV, Segmentation fault.
__strchr_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:57
57 ../sysdeps/x86_64/multiarch/strchr-avx2.S: No such file or directory.
(gdb) bt
#0 __strchr_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:57
#1 0x00005555557d4a7a in get_error_line (lineno=lineno@entry=0, p=<optimized out>, p=<optimized out>) at Parser/pegen.c:443
#2 0x00005555557d541b in _PyPegen_raise_error_known_location (p=0x7ffff7885ed0,
errtype=0x5555558fe420 <_PyExc_SyntaxError>, lineno=0, col_offset=0, end_lineno=0, end_col_offset=-1,
errmsg=0x5555558a2dd3 "(%s) %U", va=0x7fffffffd410) at Parser/pegen.c:499
#3 0x00005555557d5646 in _PyPegen_raise_error (p=p@entry=0x7ffff7885ed0, errtype=<optimized out>,
errmsg=errmsg@entry=0x5555558a2dd3 "(%s) %U") at Parser/pegen.c:422
#4 0x00005555557d5839 in raise_decode_error (p=p@entry=0x7ffff7885ed0) at Parser/pegen.c:271
#5 0x00005555557d6193 in initialize_token (token_type=60, end=0x0, start=<optimized out>, token=0x7ffff7a55d10,
p=0x7ffff7885ed0) at Parser/pegen.c:720
#6 _PyPegen_fill_token (p=p@entry=0x7ffff7885ed0) at Parser/pegen.c:793
#7 0x00005555557fec00 in statement_newline_rule (p=0x7ffff7885ed0) at Parser/parser.c:1080
#8 interactive_rule (p=0x7ffff7885ed0) at Parser/parser.c:1002
#9 _PyPegen_parse (p=p@entry=0x7ffff7885ed0) at Parser/parser.c:34508
#10 0x00005555557d6c60 in _PyPegen_run_parser (p=0x7ffff7885ed0) at Parser/pegen.c:1342
#11 0x00005555557d718f in _PyPegen_run_parser_from_file_pointer (fp=fp@entry=0x7ffff7e29980 <_IO_2_1_stdin_>,
start_rule=start_rule@entry=256, filename_ob=filename_ob@entry=0x7ffff7a85670, enc=enc@entry=0x7ffff7a7c1a0 "utf-8",
ps1=<optimized out>, ps1@entry=0x1e000000160 <error: Cannot access memory at address 0x1e000000160>,
ps2=ps2@entry=0xe0000001a0 <error: Cannot access memory at address 0xe0000001a0>, flags=0x7fffffffd7f8,
errcode=0x7fffffffd724, arena=0x7ffff792cc70) at Parser/pegen.c:1448
#12 0x000055555575661c in _PyParser_ASTFromFile (fp=fp@entry=0x7ffff7e29980 <_IO_2_1_stdin_>,
filename_ob=filename_ob@entry=0x7ffff7a85670, enc=enc@entry=0x7ffff7a7c1a0 "utf-8", mode=mode@entry=256,
ps1=0x1e000000160 <error: Cannot access memory at address 0x1e000000160>, ps1@entry=0x7ffff7acf960 ">>> ",
ps2=0xe0000001a0 <error: Cannot access memory at address 0xe0000001a0>, ps2@entry=0x7ffff7af02e0 "... ",
flags=<optimized out>, errcode=<optimized out>, arena=<optimized out>) at Parser/peg_api.c:26
#13 0x00005555556cad97 in PyRun_InteractiveOneObjectEx (fp=fp@entry=0x7ffff7e29980 <_IO_2_1_stdin_>, filename=filename@entry=0x7ffff7a85670, flags=flags@entry=0x7fffffffd7f8) at Python/pythonrun.c:257
#14 0x00005555556cba26 in _PyRun_InteractiveLoopObject (fp=fp@entry=0x7ffff7e29980 <_IO_2_1_stdin_>, filename=filename@entry=0x7ffff7a85670, flags=flags@entry=0x7fffffffd7f8) at Python/pythonrun.c:148
#15 0x00005555556cc5ce in _PyRun_AnyFileObject (flags=<optimized out>, closeit=<optimized out>, filename=0x7ffff7a85670, fp=<optimized out>) at Python/pythonrun.c:84
#16 PyRun_AnyFileExFlags (fp=0x7ffff7e29980 <_IO_2_1_stdin_>, filename=filename@entry=0x555555802103 "<stdin>", closeit=closeit@entry=0, flags=flags@entry=0x7fffffffd7f8) at Python/pythonrun.c:116
#17 0x00005555555bb5c7 in pymain_run_stdin (config=0x555555932ce0) at Modules/main.c:502
#18 pymain_run_python (exitcode=exitcode@entry=0x7fffffffd930) at Modules/main.c:590
#19 0x00005555555bba1f in Py_RunMain () at Modules/main.c:666
#20 pymain_main (args=0x7fffffffd8f0) at Modules/main.c:696
#21 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:720
#22 0x00007ffff7c610b3 in __libc_start_main (main=0x5555555aedb0 <main>, argc=1, argv=0x7fffffffda58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffda48)
at ../csu/libc-start.c:308
#23 0x00005555555ba57e in _start () at ./Include/internal/pycore_pyerrors.h:14 |
Ah yes, we have been defeated by half an emoji :) |
Thanks for the report, Jon! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: