Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format the hex codes in Unicode/hex escape sequences (\U, \u, \x) in string literals #2067

Closed
Jackenmen opened this issue Mar 26, 2021 · 6 comments · Fixed by #2916
Closed
Labels
F: strings Related to our handling of strings T: style What do we want Blackened code to look like?

Comments

@Jackenmen
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Currently, one can write either "\U0001f977" or "\U0001F977" which are equivalent but they look differently. Similarly, one can write "\u200b" or "\u200B" which also are equivalent but they look differently (do mind that \U and \u are NOT equivalent though; unless we also want to talk about shortening "\U0000200b" to "\u200b" which I guess would make sense but is probably a separate issue).

Right now, I'm forced to think whether I should use uppercase or lowercase letters as Black doesn't enforce it.

Describe the solution you'd like
I think it would make sense to have these be consistent in some way, possibly in the same way as the numeric literals, although personally, I think it would make more sense to have it all uppercase in case of \U0001F977 and all lowercase in case of \u200b. I definitely think that the \u200b should not be \u200B but I don't have a strong opinion on the \U0001f977 vs \U0001F977.

Describe alternatives you've considered
Alternatives were not considered.

Additional context
None.

@Jackenmen Jackenmen added the T: enhancement New feature or request label Mar 26, 2021
@JelleZijlstra
Copy link
Collaborator

I agree that Black should make this consistent but have no real view on which direction the consistency should go. Some things that may help a decision:

  • Are there existing style guides that mandate one or the other?
  • Does the Python documentation prefer one over the other?
  • Does either output appear in repr() somehow? If so, I'd prefer we match it.

@Jackenmen
Copy link
Contributor Author

  • Are there existing style guides that mandate one or the other?

None that I know of.

  • Does the Python documentation prefer one over the other?

Most occurrences seem to use the lowercased representation, though there's not that much usage of it within the Python documentation. As for CPython's Python code, it seems to use a lowercased version more often, but I don't think it's that consistent there.

  • Does either output appear in repr() somehow? If so, I'd prefer we match it.

Python's repr() shows lowercased representation:

>>> "\x1B"
'\x1b'
>>> "\u200B"
'\u200b'
>>> "\U0001F977"
'\U0001f977'

@JelleZijlstra
Copy link
Collaborator

Thanks! That would make me lean towards using all lowercase in Black too.

@ichard26 ichard26 added T: style What do we want Blackened code to look like? and removed T: enhancement New feature or request labels Apr 2, 2021
@JelleZijlstra JelleZijlstra added the F: strings Related to our handling of strings label May 30, 2021
@TomFryers
Copy link
Contributor

I think it may be relevant that Black currently formats hex literals 0xf to upper case 0xF, but Python’s built-in hex outputs lower case.

@JelleZijlstra
Copy link
Collaborator

This also affects \x escapes. (#2828 is about \N escapes, where I think there's a pretty clear norm to use uppercase.)

I'm leaning towards keeping \U, \u, \x all lowercase but happy to change if people have a different preference.

@JelleZijlstra JelleZijlstra changed the title Format the hex codes in Unicode escape sequences (\U and \u) in string literals Format the hex codes in Unicode/hex escape sequences (\U, \u, \x) in string literals Dec 18, 2022
@JelleZijlstra
Copy link
Collaborator

Also see https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals for reference on how these work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F: strings Related to our handling of strings T: style What do we want Blackened code to look like?
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants