Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-22454: Add shlex.join() (the opposite of shlex.split()) #7605

Merged
merged 7 commits into from
May 29, 2019

Conversation

bbayles
Copy link
Contributor

@bbayles bbayles commented Jun 10, 2018

This PR adds shlex.join(), using the implementation suggested in the issue.

This is the opposite of shlex.split(), and is intended to prevent the naive use of ' '.join(list_of_shell_tokens), which can be unsafe.

https://bugs.python.org/issue22454

Doc/library/shlex.rst Outdated Show resolved Hide resolved
@supakeen
Copy link
Contributor

I do like having the inverse of shlex.split in the standard library however do be careful with the documentation. The returned value is not safe to execute when built up from user input just as using shlex.split(user_input)[0] and feeding it to a shell is not safe :)

Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment on the docs, otherwise this implementation looks fine to me. No comment on whether or not it should be added to the module, since I don't know the shlex use cases enough to weigh in.

With regards to the test strategy, I think it's probably fine as is, but I've given a suggestion anyway because I've been going back and forth on it. You are only testing the fact that join is an inverse of split, not testing the behavior of join itself (which would be useful even if split did not exist), but honestly it's probably fine to define join in terms of split and then say that it's sufficient to test split and the fact that join is the inverse of split. Still, I don't think a few explicit join tests would hurt.

Doc/library/shlex.rst Outdated Show resolved Hide resolved
Lib/test/test_shlex.py Show resolved Hide resolved
Copy link
Member

@berkerpeksag berkerpeksag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a note to Doc/whatsnew/3.8.rst?

Thank you!

Doc/library/shlex.rst Show resolved Hide resolved
Doc/library/shlex.rst Outdated Show resolved Hide resolved
Lib/shlex.py Show resolved Hide resolved
Lib/test/test_shlex.py Outdated Show resolved Hide resolved
Lib/test/test_shlex.py Show resolved Hide resolved
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@mahmoud
Copy link

mahmoud commented Apr 17, 2019

I think this is a great idea! Always nice to have symmetry in an API. Speaking of which, one thing I noticed about this implementation is that it doesn't mirror the new posix flag in the split API. So you end up with slightly overzealous quoting:

>>> from shlex import join, split                                                                                                         
>>> join(split('test ê test'))                                                                                                          
"test 'ê' test"

Not sure if you're up to adding another flag, but figured I'd mention.

@bbayles
Copy link
Contributor Author

bbayles commented Apr 17, 2019

@berkerpeksag, I think I've made the changes you requested.

@mahmoud, do you have an implementation suggestion? The quote function doesn't have the posix flag, and this is just ' '.join(quote(arg) for arg in split_command).

@pganssle
Copy link
Member

I guess the problem is more going the other way, right?

>>> split("test 'ê' test", posix=True)
['test', 'ê', 'test']

>>> split("test 'ê' test", posix=False)
['test', "'ê'", 'test']

>>> split("test 'ê' test", posix=False)
['test', "'ê'", 'test']
>>> join(split("test 'ê' test", posix=False))
'test \'\'"\'"\'ê\'"\'"\'\' test'

I'm not sure how much of a bug this is, but it does mean that split(join(split(x, posix=False)), posix=False) is not an idempotent operation. I'm not sure how much it needs to be, though.

@mahmoud
Copy link

mahmoud commented Apr 17, 2019

@pganssle Bingo! Agreed that full roundtrippability is nice to have, but not necessarily a blocker. (when I've needed this, I've done exactly what join() does now, but that was before this new posix flag).

@bbayles This may be outside the scope of this change, but I think it makes sense to add the posix flag for symmetry.

If there's an appetite for that, then I think we could make the string constants inside the shlex class into global constants, shared between shlex and quote.

I haven't done the venn diagram, but there may be other corners lurking in quote's "exclusive" regex of unsafe chars (re.compile(r'[^\w@%+=:,./-]', re.ASCII)) and shlex's "inclusive" specification of safe chars:

        self.wordchars = ('abcdfeghijklmnopqrstuvwxyz'
                          'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_')
        if self.posix:
            self.wordchars += ('ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ'
                               'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ')

You can see how, instead of the regex, you could check if any of the word characters aren't those characters. Similar thing could happen for punctuation_chars.

@bbayles
Copy link
Contributor Author

bbayles commented Apr 17, 2019

I'll defer to @berkerpeksag on whether to increase the scope of this PR.

@pganssle
Copy link
Member

I think we want join to default to using the POSIX behavior, right? So if quote defaults to the correct behavior with posix=True (which it seems to), then I think posix=False can be added to join in a separate PR.

If quote defaults to posix=False, then I think it needs to be added into this PR (since we want the default for join to be posix=True, and it would be best if this were not a backwards-incompatible change).

@berkerpeksag
Copy link
Member

I think we need to add a posix parameter because split()'s posix defaults to True and users would expect join(split(cmd)) == cmd.

@pganssle
Copy link
Member

pganssle commented Apr 18, 2019

@berkerpeksag I'm not sure that join(split(cmd)) == cmd will ever be possible for all cmd when posix=True, because split is lossy in POSIX mode:

>>> shlex.split("one 'two' three")
['one', 'two', 'three']

>>> shlex.split("one two three")
['one', 'two', 'three']

If you turn off POSIX mode, the splitting is no longer lossy (at least in this example):

>>> shlex.split("one 'two' three", posix=False)
['one', "'two'", 'three']

>>> shlex.split("one two three", posix=False)
['one', 'two', 'three']

So right now, you have the situation where:

split(join(split(cmd, posix=True)), posix=True) == split(cmd, posix=True)

Which is I think as good as we can hope for with a default behavior. I think we can go ahead with this and add a posix parameter (defaulting to True) to both join and quote as a separate feature.

Is this accurate @mahmoud?

@mahmoud
Copy link

mahmoud commented Apr 26, 2019

@pganssle Just caught this, and yeah I agree. The sooner we get the functionality, the fewer ' '.join(args) will be written. Corner cases in a new ticket sounds good to me, too!

Copy link

@mahmoud mahmoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My other comments notwithstanding, this looks like a solid shlex addition I wish would've existed long ago. Thanks for adding it!

@pganssle
Copy link
Member

@berkerpeksag The feature freeze is coming up pretty soon, have you had a chance to review Mahmoud and my comments about the feature? It would be nice to get this merged for Python 3.8.

@berkerpeksag
Copy link
Member

I agree that shlex.split() is lossy in POSIX mode and I'm fine with it. I'd merge this PR now if the default mode of shlex.split() wasn't POSIX (it has been the default mode for more than a decade)

"Let's merge this and we'll fix other cases later" never works in practice and I'm not comfortable adding an unfinished API to the standard library.

Let's see what @vsajip thinks as he has been working on maintaining the shlex module lately.

@pganssle
Copy link
Member

pganssle commented May 18, 2019

I agree that shlex.split() is lossy in POSIX mode and I'm fine with it. I'd merge this PR now if the default mode of shlex.split() wasn't POSIX (it has been the default mode for more than a decade)

If I understand you correctly, it sounds like you have the opposite understanding from me as to what needs to be done. When I said that shlex.split is lossy in POSIX mode, what I mean is that this PR already does as well as it possibly can do with the default (POSIX) mode.

The new feature that we're blocking on is support for lossless round trips when posix=False is specified, and since the default is posix=True, I think it's a sufficiently clean separation that adding support for it can be considered a separate feature (especially considering that quote also has no way to specify posix=False).

@bbayles
Copy link
Contributor Author

bbayles commented May 19, 2019

I'm not available to increase the scope of this PR currently. So if shlex.join needs to be more than it is here, then I'll close this and allow someone else to contribute.

@vsajip vsajip merged commit ca80495 into python:master May 29, 2019
@pganssle
Copy link
Member

Thanks @bbayles for putting in the work on this, and thanks @berkerpeksag and @mahmoud for the reviews and discussion!

Thanks @vsajip for the review and merge!

DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants