Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shlex doesn't differentiate escaped characters in output #81078

Open
MatthewGamble mannequin opened this issue May 13, 2019 · 5 comments
Open

shlex doesn't differentiate escaped characters in output #81078

MatthewGamble mannequin opened this issue May 13, 2019 · 5 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@MatthewGamble
Copy link
Mannequin

MatthewGamble mannequin commented May 13, 2019

BPO 36897
Nosy @ericvsmith

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-05-13.00:59:51.237>
labels = ['3.7', 'type-bug', 'library']
title = "shlex doesn't differentiate escaped characters in output"
updated_at = <Date 2019-05-14.01:12:01.944>
user = 'https://bugs.python.org/MatthewGamble'

bugs.python.org fields:

activity = <Date 2019-05-14.01:12:01.944>
actor = 'Matthew Gamble'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2019-05-13.00:59:51.237>
creator = 'Matthew Gamble'
dependencies = []
files = []
hgrepos = []
issue_num = 36897
keywords = []
message_count = 5.0
messages = ['342276', '342383', '342399', '342406', '342410']
nosy_count = 2.0
nosy_names = ['eric.smith', 'Matthew Gamble']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue36897'
versions = ['Python 3.6', 'Python 3.7']

@MatthewGamble
Copy link
Mannequin Author

MatthewGamble mannequin commented May 13, 2019

The output of the following invocations are exactly the same:

list(shlex.shlex('a ; b', posix=True, punctuation_chars=True))

list(shlex.shlex('a \; b', posix=True, punctuation_chars=True))

They both output the following:

['a', ';', 'b']

This makes it impossible to determine when the user wanted to escape the semi-colon for some reason, such as if they were using find's -exec argument.

@MatthewGamble MatthewGamble mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels May 13, 2019
@ericvsmith
Copy link
Member

The goal is to match posix shell semantics. Can you provide a concrete example where shlex.shlex does something different from a posix-compliant shell? With all the escaping, it's going to be tough.

Note also that your code raises a DeprecationWarning in 3.7, at least, and will be an error in the future. You should probably use r-strings in your examples.

@MatthewGamble
Copy link
Mannequin Author

MatthewGamble mannequin commented May 13, 2019

The point is that it's not possible to use the output of shlex.shlex to try to match the behaviour of a POSIX-compliant shell by reliably splitting up a user's input into multiple commands. In the first case I presented (no escape character), the user entered two commands. In the second case, the user entered a single command with two arguments. However, there's no way to differentiate the two situations based on the output of shlex.

It's also worth noting that the output is the same with this too:

list(shlex.shlex('a \\; b', posix=True, punctuation_chars=True))

I tested this code on python 3.6.7 and 3.7.2, and didn't see any deprecation warnings at all. I also checked the history of shlex.py:

https://github.com/python/cpython/commits/master/Lib/shlex.py

The last commit was from 2017, and I don't see any usages of DeprecationWarning inside that file. I'm also not sure how r-strings are relevant, as I don't see any regular expressions used inside of the shlex class.

@ericvsmith
Copy link
Member

Run 3.7 with -Wd:

$ python3 -Wd
Python 3.7.3 (default, Mar 29 2019, 13:03:53)
[GCC 7.4.0] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 'a \; b'
<stdin>:1: DeprecationWarning: invalid escape sequence \;
'a \\; b'
>>>

The deprecation is in relation to invalid escape sequences, not shlex.

My point is just that you should use r'a \; b' or 'a \\;b', and not rely on invalid escape sequences. For one reason, I can never remember how they're interpreted, and had to look it up. r-strings don't have anything to do with regular expressions per-se, they're a way of changing how python interprets stings, no matter what they're used for.

The point is that it's not possible to use the output of shlex.shlex to try to match the behaviour of a POSIX-compliant shell by reliably splitting up a user's input into multiple commands. In the first case I presented (no escape character), the user entered two commands. In the second case, the user entered a single command with two arguments. However, there's no way to differentiate the two situations based on the output of shlex.

My question is: can a posix-compliant shell tell the difference? I don't know, it's an honest question. Can you show some shell code where it can tell the difference?

@MatthewGamble
Copy link
Mannequin Author

MatthewGamble mannequin commented May 14, 2019

My apologies, I didn't realise you were talking about the invalid escape sequence. Thanks for letting me know about the fact that it's deprecated, I'll definitely be keeping that in mind going forward.

In a bash shell with the find command available, run the following command:

find . -type f -exec ls {} \;

You should see a list of files.

If you run this:

find . -type f -exec ls {} ;

You should see an error message from find:

"find: missing argument to `-exec'"

If I pass the first example in this message to shlex, I get no indication that the user attempted escaped the semi-colon in their input.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant