Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUL bytes in commented lines #64314

Closed
arigo mannequin opened this issue Jan 3, 2014 · 21 comments
Closed

NUL bytes in commented lines #64314

arigo mannequin opened this issue Jan 3, 2014 · 21 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@arigo
Copy link
Mannequin

arigo mannequin commented Jan 3, 2014

BPO 20115
Nosy @gvanrossum, @arigo, @birkenfeld, @terryjreedy, @benjaminp, @jwilk, @alex, @serhiy-storchaka, @iritkatriel

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2014-01-03.17:59:13.390>
labels = ['interpreter-core', 'type-bug', '3.9', '3.10', '3.11']
title = 'NUL bytes in commented lines'
updated_at = <Date 2021-09-07.05:32:27.769>
user = 'https://github.com/arigo'

bugs.python.org fields:

activity = <Date 2021-09-07.05:32:27.769>
actor = 'gvanrossum'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2014-01-03.17:59:13.390>
creator = 'arigo'
dependencies = []
files = []
hgrepos = []
issue_num = 20115
keywords = []
message_count = 19.0
messages = ['207232', '207282', '207290', '207358', '207872', '207873', '207879', '207939', '208086', '208087', '218239', '395928', '395932', '395938', '395989', '401193', '401196', '401204', '401208']
nosy_count = 11.0
nosy_names = ['gvanrossum', 'arigo', 'georg.brandl', 'terry.reedy', 'benjamin.peterson', 'jwilk', 'Arfrever', 'alex', 'ita1024', 'serhiy.storchaka', 'iritkatriel']
pr_nums = []
priority = 'low'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue20115'
versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

@arigo
Copy link
Mannequin Author

arigo mannequin commented Jan 3, 2014

This is probably the smallest example of a .py file that behaves differently in CPython vs PyPy, and for once, I'd argue that the CPython behavior is unexpected:

   # make the file:
   >>> open('x.py', 'wb').write('#\x00\na')

# run it:
python x.py

Expected: either some SyntaxError, or "NameError: global name 'a' is not defined". Got: nothing. It seems that CPython completely ignores the line that is immediately after a line with a '#' and a following '\x00'.

@arigo arigo mannequin added build The build process and cross-build interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jan 3, 2014
@pitrou pitrou added type-bug An unexpected behavior, bug, or error and removed build The build process and cross-build labels Jan 4, 2014
@serhiy-storchaka
Copy link
Member

Indeed. CPython parser reads first line '#\x00\n' and save it in the buffer. But because C strings are used here (result of decode_str()), the line is truncated to '#'. As far as this data is not ended by '\n', it considered incomplete and next line is read and appended: '#' + 'a' -> '#a'. And this line is commented out now.

@benjaminp
Copy link
Contributor

I guess NULL bytes should just be banned.

@arigo
Copy link
Mannequin Author

arigo mannequin commented Jan 5, 2014

Fwiw, both exec and eval() ban NUL bytes, which means that there is a strange case in which some files can be imported, but not loaded and exec'ed. So I agree with Benjamin.

@terryjreedy
Copy link
Member

Python should have a uniform definition of 'Python source' in both the doc and in practice in all source code processing functions. Currently, "2. Lexical analysis" in the Language Manual just says "Python reads program text as Unicode code points; the encoding of a source file can be given by an encoding declaration and defaults to UTF-8." UTF-8 encodes code point U+0000 as a null byte and this code point is nowhere excluded in the doc. (The definition of string literals uses 'source character' without any additional specification, so I take it to mean 'Unicode code point'.)

If U+0000 is a legal 'source character', it, as with other control chars not given special meaning, should be a SyntaxError unless occurring in a comment or string literal. Eval and exec exclude even the latter with
TypeError: source code string cannot contain null bytes
If null bytes are legal, this is wrong.

Simply truncating lines as done by the CPython parser is wrong whether not not U+0000 is legal.

The simplest change would be to change the parser to match exec and add " other than U+000" after "Unicode code points" in the sentence quoted above.

@terryjreedy
Copy link
Member

Armin, what is the different behavior of PyPy?

We should perhaps get Guido's opinion on this issue.

@arigo
Copy link
Mannequin Author

arigo mannequin commented Jan 10, 2014

PyPy 2.x accepts null characters in all of import, exec and eval, and complains if they occur in non-comment.

PyPy 3.x refuses them in import, which is where this bug report originally comes from (someone complained that CPython 3.x "accepts" them but not PyPy 3.x, even thought this complain doesn't really make sense as CPython just gets very confused by them). I don't know about exec and eval.

We need a consistent decision for 3.5. I suppose it's not really worth backporting it to CPython 2.7 - 3.3 - 3.4, but it's your choice. PyPy will just follow the lead (or keep its current behavior for 2.x if CPython 2.x is not modified).

@birkenfeld
Copy link
Member

I'm in favor of PyPy's behavior: null bytes anywhere in the source, even in comments, usually mean there's something weird or fishy going on with either the editor or (if downloaded/copied) the source of the code.

@serhiy-storchaka
Copy link
Member

I'll try, but I'm not sure this is possible. Some used C functions (e.g. fgets()) returns char* and doesn't work with string containing null bytes. Some public API (e.g. PyParser_SimpleParseString()) work with null-terminated C strings.

@serhiy-storchaka
Copy link
Member

See also bpo-13617.

@ita1024
Copy link
Mannequin

ita1024 mannequin commented May 10, 2014

Do not touch that please!!!!

The null bytes are already rejected when forbidden by the encoding (utf-8 for example).

Null byte characters in comments are perfectly valid in ISO8859-1 encoding, and a few scripts depend on them:
http://ftp.waf.io/pub/release/waf-1.7.16

Parsing the commented lines is also likely to slow down the parser, so keep your hands of it please! There are too many regressions already! http://bugs.python.org/issue21086

@iritkatriel
Copy link
Member

This is still the same in 3.11. I added another line to the example's file, which shows more clearly what's happening:

>> open('x.py', 'wb').write(b'#\x00\na\nb\n')

% ./python.exe x.py
Traceback (most recent call last):
  File "x.py", line 2, in <module>
    a
NameError: name 'b' is not defined

@iritkatriel iritkatriel added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes labels Jun 16, 2021
@terryjreedy
Copy link
Member

https://docs.python.org/3/reference/toplevel_components.html#file-input
says that file input and exec input (should) have the same grammar. This implies that the divergence is a bug.

@gvanrossum
Copy link
Member

Yeah, null bytes should just be rejected. If someone comes up with a fix for this we'll accept it.

@iritkatriel
Copy link
Member

See also bpo-1105770.

@terryjreedy
Copy link
Member

The compile() doc currently says ""This function raises SyntaxError if the compiled source is invalid, and ValueError if the source contains null bytes." And indeed, in repository 3.9, 3.10, 3.11,

>>> compile('\0','','exec')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: source code string cannot contain null bytes

Ditto when run same in a file from IDLE or command line. The exception sometimes when the null is in a comment or string within the code.

>>> '\0'
'\x00'
>>> #\0
>>> compile('#\0','','single', 0x200)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: source code string cannot contain null bytes
>>> compile('"\0"','','single', 0x200)
ValueError: source code string cannot contain null bytes

I am puzzled because "\0" and #\0 in the IDLE shell are sent as strings containing the string or comment to compiled with the call above in codeop. There must be some difference in when \0 is interpreted.

@gvanrossum
Copy link
Member

Which part puzzles you?

I see that you tried

>> #\0

This does not contain a null byte, just three characters: a hash, a backslash, and a digit zero.

@terryjreedy
Copy link
Member

What I missed before is that duplicating the effect of the first two interactive entries (no exception) requires escaping the backslash so that the source argument for the explicit compile does not have a null.

compile("'\\0'", '', 'exec')
<code object <module> at 0x00000214431CAA20, file "", line 1>
compile("#\\0", '', 'exec')
<code object <module> at 0x00000214431CAC30, file "", line 1>

So I did not actually see an exception to the rule.
---

*On Win 10*, I experimented with a version of Armin and Irit's example, without and with b'...' and 'wb'.

s = '#\x00\na\nb\n'
print(len(s)) # 7
with open("f:/Python/a/nulltest.py", 'w') as f:
f.write(s)
import nulltest

When I ran a local repository build of 3.9, 3.10, or 3.11 with
f:\dev\3x>python f:/Python/a/nulltest.py
I got Irit's strange NameError instead of the proper ValueError.

When I ran with installed 3.9 or 3.10 with
py -3.10 -m a.nulltest
I got the null-byte ValueError.

When I ran from IDLE's editor running on either installed or repository python, the import gave the null-byte ValueError.

@gvanrossum
Copy link
Member

Serhiy’s comment from 2014-01-04 gives the answer. It’s different reading from a file than from a string. And only “python x.py” still reads from a file.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@asottile
Copy link
Contributor

I believe this can be closed as fixed #96670

@iritkatriel
Copy link
Member

Agreed, this is fixed.

% ./python.exe x.py 
  File "/Users/iritkatriel/src/cpython-1/x.py", line 1
    #
SyntaxError: source code cannot contain null bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

8 participants