Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-55688: Add note about ending backslashes for raw strings #94768

Merged
merged 6 commits into from
Dec 28, 2022

Conversation

slateny
Copy link
Contributor

@slateny slateny commented Jul 12, 2022

@bedevere-bot bedevere-bot added awaiting review docs Documentation in the Doc dir labels Jul 12, 2022
Doc/tutorial/introduction.rst Outdated Show resolved Hide resolved
@@ -189,6 +189,29 @@ the first quote::
>>> print(r'C:\some\name') # note the r before the quote
C:\some\name

There is one subtle aspect to raw strings: a raw string may not end in an odd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text of the addition is fine. But I am wondering whether such a subtle point should be part of the introduction.rst. Perhaps keep only the first line and refer to a location with details?

(not sure what would be a better location)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not strictly a part of this file, see https://docs.python.org/3/reference/lexical_analysis.html:

... even a raw string cannot end in an odd number of backslashes ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text you added in this PR will end up here: https://docs.python.org/3/tutorial/introduction.html#strings right?

It is an informal introduction, so I would write something like:

There is one subtle aspect to raw strings: a raw string may not end in an odd
number of \ characters. For details see String and Bytes literals

But the text added is correct, so it is mostly a matter of style.

Copy link
Contributor Author

@slateny slateny Dec 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading this, I realize that my comment above doesn't actually address your original concern on whether this is too subtle of a point for the tutorial introduction - IMO, since the section above does talk about escaping quotes as well as raw strings, I think it's somewhat natural to at least mention the case with them combined.

As for how brief a mention, the current phrasing can definitely be made more brief. I think also that just one of the workarounds can be left in, while the other two examples put elsewhere, along with the wording specific to Windows paths. If the section would introduce the problem of raw strings not ending in an odd number of slashes, I think it should also suggest some sort of solution to it, so I would prefer keeping one example in.

As for moving the rest of the information, I don't think the reference is the most appropriate place, maybe instead in some FAQ entry or a howto page?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An FAQ entry sounds reasonable. Maybe we do something like There is one subtle aspect to raw strings: a raw string may not end in an odd number of \ characters; see [the FAQ entry] for workarounds?

Copy link
Contributor

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with eendebakpt that this seems like too subtle a point to spend this much space discussing in the tutorial, so I think we should try to find another home for this.

Maybe an alternate approach would be to add some special casing to give users a more helpful error message in this case?

Copy link
Contributor

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks broadly good to me.

My favourite workaround for this would actually be to use (implicit) concatenation. So:

r'C:\this\will\work' '\\'

or

r'C:\this\will\work' + '\\'

I think this is probably less error prone than the strip approach (e.g. you probably want to use rstrip, and what if you have more trailing spaces?), so maybe we swap that workaround for concatenation?

@hauntsaninja
Copy link
Contributor

Oh one other thing that is sort of confusing that we could clarify here is that in r'asdf\'asdf' the backslash only escapes the quote for the purposes of tokenisation; the backslash still appears in the final string. I'm not sure the best way to phrase this, but if you think of something good we could mention it!

@@ -189,6 +189,16 @@ the first quote::
>>> print(r'C:\some\name') # note the r before the quote
C:\some\name

Note that escaping quotes in raw strings will keep the backslash::
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to be explicit, it could be something like

Suggested change
Note that escaping quotes in raw strings will keep the backslash::
Note that, unlike the non–raw string case, escaping quotes in raw strings will keep the backslash::

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change! This looks good, but let's move it to the FAQ section?

My reasoning is based on the description of raw strings it's not really surprising that the backslash is preserved. It's only really surprising in the context of the end quote situation, where the backslash "escapes" the quote for the tokeniser, but not for the actual value.

That is, if you read:

A raw string ending with an odd number of backslashes will escape the string's quote

you may come away with the impression that raw strings are raw except for quotes or something. I guess a concrete phrasing would be adding the following to the end of the FAQ entry:

Note that while a backslash will "escape" a quote for the purposes
of determining where the raw string ends, there are no escape
sequences that affect the interpretation of the value of the raw string.
That is, the backslash remains present in the value of the raw string::

>>> r'backslash\'preserved'
"backslash\\'preserved"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, there's prior art. Here's the relevant words from Lexical Analysis:

Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw literal cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the literal, not as a line continuation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not too certain what the tokenization / determining where the raw string ends means too well, but I've commited the changes (and added a link to the reference) for the time being.

@hauntsaninja hauntsaninja added needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes labels Dec 28, 2022
Copy link
Contributor

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me cc @JelleZijlstra

@hauntsaninja hauntsaninja merged commit b95b1b3 into python:main Dec 28, 2022
@miss-islington
Copy link
Contributor

Thanks @slateny for the PR, and @hauntsaninja for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry @slateny and @hauntsaninja, I had trouble checking out the 3.11 backport branch.
Please retry by removing and re-adding the "needs backport to 3.11" label.
Alternatively, you can backport using cherry_picker on the command line.
cherry_picker b95b1b3b25b0a93a22c7d58ac5bd5870e62070a8 3.11

@bedevere-bot
Copy link

GH-100570 is a backport of this pull request to the 3.10 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Dec 28, 2022
…thonGH-94768)

(cherry picked from commit b95b1b3)

Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
Co-authored-by: hauntsaninja <hauntsaninja@gmail.com>
@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Dec 28, 2022
@hauntsaninja hauntsaninja added needs backport to 3.11 only security fixes and removed needs backport to 3.11 only security fixes labels Dec 28, 2022
@miss-islington
Copy link
Contributor

Thanks @slateny for the PR, and @hauntsaninja for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-100571 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Dec 28, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Dec 28, 2022
…thonGH-94768)

(cherry picked from commit b95b1b3)

Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
Co-authored-by: hauntsaninja <hauntsaninja@gmail.com>
miss-islington added a commit that referenced this pull request Dec 28, 2022
(cherry picked from commit b95b1b3)

Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
Co-authored-by: hauntsaninja <hauntsaninja@gmail.com>
miss-islington added a commit that referenced this pull request Dec 28, 2022
(cherry picked from commit b95b1b3)

Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
Co-authored-by: hauntsaninja <hauntsaninja@gmail.com>
@slateny slateny deleted the s/55688 branch December 28, 2022 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants