Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string_processing: Use explicit string concatenation #2553

Open
chasefinch opened this issue Oct 21, 2021 · 13 comments
Open

string_processing: Use explicit string concatenation #2553

chasefinch opened this issue Oct 21, 2021 · 13 comments
Labels
C: preview style Issues with the preview and unstable style. Add the name of the responsible feature in the title. F: strings Related to our handling of strings T: style What do we want Blackened code to look like?

Comments

@chasefinch
Copy link

chasefinch commented Oct 21, 2021

Describe the style change

I suggest to use explicit string concatenation, instead of implicit concatenation, for (currently experimental) string processing, with whitespace pushed to the end of each line.

Examples in the current Black style

(
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor"
    " incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis"
    " nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
)

(
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod     "  # whitespace is split
    " incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis"
    " nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
)

(
    "https://www.some-really-really-really-really-really-really-really-really-really-really-really-really-long-url.com"
)

Desired style

(
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor "
    + "incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis "
    + "nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
)

(
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod      "  # whitespace is not split
    + "incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis"
    + "nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
)

(
    "https://www.some-really-really-really-really-really-really-really-really-"
    + "really-really-really-really-long-url.com"
)

Additional context

Advantages of this method:

  • Lines with multiple whitespace characters in a row line up nicely
  • Lines with no whitespace can be split and still visually consistent (if desired)
  • Plus sign indicates that this is a line continuation, not a list, without having to glance at previous line ending
  • Avoid common coding errors by accidentally-left-out commas
  • Compatible with linters [1 2 3] which forbid implicit concatenation to enforce consistency and/or to help avoid aforementioned coding errors

Worth noting that Guido & the Python community have considered removing implicit concatenation for these reasons, but ran into complications (here's an article about this). However, the reasons why they left it in the language (% precedence, etc.) don't apply to whether Black uses the feature (for the % precedence issue, it doesn't apply at least because the multiline strings are wrapped in parentheses), so I think that Black ought to use the explicit version.

@chasefinch chasefinch added the T: style What do we want Blackened code to look like? label Oct 21, 2021
@felix-hilden
Copy link
Collaborator

Thanks for the suggestion, it's opposite to #2268, and if it was implemented it would make #3159 pretty redundant. I don't feel strongly, but to me this is more readable than no parens. I'm not sure if this or parens are better.

@felix-hilden felix-hilden added the F: strings Related to our handling of strings label Aug 15, 2022
@chasefinch
Copy link
Author

chasefinch commented Aug 16, 2022

Yeah I still think this is the way to go.

@yilei
Copy link
Contributor

yilei commented Sep 6, 2022

I don't have a strong preference between this (explicit str concat) and parens (#3159). There are some cases I think parens have better readability, e.g.

some_list = [
    (
        " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
        " incididunt ut labore et dolore magna aliqua Ut enim ad minim"
    ),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]
some_list = [
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
    + " incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

Parens make it more clear that it's a two element list.

However, during the implementation of #3159, we realized adding parens (thus another level of indentation) can result in significant changes to the code, which could be off putting.

--- a/django:django/db/backends/mysql/features.py
+++ b/django:django/db/backends/mysql/features.py
@@ -122,12 +132,14 @@
             skips.update(
                 {
                     "https://jira.mariadb.org/browse/MDEV-19598": {
-                        "schema.tests.SchemaTests."
-                        "test_alter_not_unique_field_to_primary_key",
+                        (
+                            "schema.tests.SchemaTests."
+                            "test_alter_not_unique_field_to_primary_key"
+                        ),
                     },
                 }
             )

v.s.

--- a/django:django/db/backends/mysql/features.py
+++ b/django:django/db/backends/mysql/features.py
@@ -122,12 +132,14 @@
             skips.update(
                 {
                     "https://jira.mariadb.org/browse/MDEV-19598": {
                         "schema.tests.SchemaTests."
-                        "test_alter_not_unique_field_to_primary_key",
+                        + "test_alter_not_unique_field_to_primary_key",
                     },
                 }
             )

This is the reason in #3159 we decided to limit parens to only list/tuple/set literals, and exclude function calls. IMO we should still do something to function calls, maybe explicit str concatenation is the answer. If so, then we can choose explicit str concatenation over parens for list/tuple/set literals too.

@chasefinch
Copy link
Author

chasefinch commented Sep 7, 2022

@yilei I prefer keeping the parens rather than by implementing #3260 — Would actually prefer for Black to continue adding them; they are consistent, if not always pretty — but if dropping the parens for function calls is the way to get explicit concatenation everywhere else, then ok 😛

@felix-hilden
Copy link
Collaborator

It would feel a bit off to use parens in other places and explicit concatenation in others in my opinion.

@yilei
Copy link
Contributor

yilei commented Sep 13, 2022

Agreed that we should try to use the same formatting in function calls v.s. list/tuple/set literals.

I don't think there exists a clear call that either explicit-str-concat or use-parens would be better.

Then how about this approach: Black takes existing parens (in unformatted code) into account and decides which format to use?

  1. If existing code doesn't have parens, use explicit str concatenations:
# before
function_call(
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

# after
function_call(
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
    + " incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
    + " incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]
  1. If existing code already have parens, keep the parens and use implicit str concatenations inside parens:
# before
function_call(
    (" lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim"),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    (" lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim"),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

# after
function_call(
    (
        " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
        " incididunt ut labore et dolore magna aliqua Ut enim ad minim"
    ),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    (
        " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
        " incididunt ut labore et dolore magna aliqua Ut enim ad minim"
    ),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

Then, we are able to use either of them as we see fit depending on the local context.

I understand that this will add another exception to Previous formatting is taken into account as little as possible, with rare exceptions like the magic trailing comma., but IMHO this is a good trade-off.

WDYT? @felix-hilden @JelleZijlstra

@felix-hilden
Copy link
Collaborator

I see your point, but we do really want to minimise the effect of the code itself to our formatting choices for consistency. If there isn't a clear winner but we want to format consistently, then I guess the decision has to be more or less arbitrary. Much like string prefixes or other minor details.

@yilei
Copy link
Contributor

yilei commented Sep 13, 2022

Hmm, focusing on existing parens here, this actually fits Black's paren management here:

Please note that Black does not add or remove any additional nested parentheses that you might want to have for clarity or further code organization. For example those parentheses are not going to be removed:

return not (this or that)
decision = (maybe.this() and values > 0) or (maybe.that() and values < 0)

These parens are inner parens that you might want to have for clarity or further code organization. Following this rule, the existing parens in these cases should not be removed, i.e. we should implement #3260.

With above implemented, we can then take a look at whether we want to use explicit str concatenations, or force adding parens + implicit str concatenations. If this is a more or less arbitrary decision, I think we can chose explicit str concatenations here as it produces less diffs than force adding parens.

The examples below illustrate above decisions:

# before
function_call(
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]
function_call(
    (" lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim"),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    (" lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim"),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

# after
function_call(
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
    + " incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
    + " incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]
function_call(
    (
        " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
        + " incididunt ut labore et dolore magna aliqua Ut enim ad minim"
    ),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    (
        " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
        + " incididunt ut labore et dolore magna aliqua Ut enim ad minim"
    ),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

@felix-hilden
Copy link
Collaborator

Alright, turns out that in terms of a cohesive parens policy, we're not really there 😅 I'm not sure how each case was decided, but we definitely don't preserve all "organising" parens.

playground link

Perhaps we should consider the bigger picture first.

@yilei
Copy link
Contributor

yilei commented Sep 21, 2022

@felix-hilden hmm, I think we draw a different line between "organising" and "non-organising" parens.

The purpose of them is to (significantly) increase readability, so maybe we should really call them readability-parens.

  1. For the examples from the "current" blacks style:

    return not (this or that)
    decision = (maybe.this() and values > 0) or (maybe.that() and values < 0)

    It's not difficult to agree that they are readability-parens as you don't need to memorize the operator priority table.

  2. For the parens of concatenated strings across multiple lines, I'd argue they also have significant readability benefits. The "improved string processing" from --preview style doesn't agree right now, hence Consider keeping the parens of implicitly concatenated strings when it's a argument of a function call #3260 should be implemented instead.

  3. For the newly removed unnecessary parens in the --preview style (https://black.readthedocs.io/en/stable/the_black_code_style/future_style.html#improved-parentheses-management), I don't think they are readability-parens, so they still fit.

  4. Other unremoved parens from playground link, with the exception of (a) == (b), they don't belong to readability-parens. They aren't removed yet I think that's because they haven't been looked at (unlike the ones from --preview style's "Improved parentheses management", which are recently looked at).

@chasefinch
Copy link
Author

Bump for this one.

I think the parens argument (now resolved) was not related, and muddied this thread.

I would still love to see explicit concatenation replace implicit concatenation in all cases, for the reasons mentioned at the top:

  • Lines with multiple whitespace characters in a row line up nicely
  • Lines with no whitespace can be split and still visually consistent (if desired)
  • Plus sign indicates that this is a line continuation, not a list, without having to glance at previous line ending
  • Avoid common coding errors by accidentally-left-out commas
  • Compatible with linters [1 2 3] which forbid implicit concatenation to enforce consistency and/or to help avoid aforementioned coding errors

@tylerlaprade
Copy link

As an additional data point, --preview broke my code for both Pyright (reportImplicitConcatenation) and Ruff/Flake8 (ISC002) today, so I had to revert to non-preview mode (which I was otherwise really enjoying for the other enhancements! Breaking on the right side instead of the left is so much more aesthetic).

@saolof
Copy link

saolof commented Sep 4, 2023

It may be worth mentioning that starlark (and google internal linters) explicitly bans implicit python string concatenation. Guido himself has mentioned that implicit string concat only exists because of backwards compatibility: https://mail.python.org/pipermail/python-ideas/2013-May/020557.html

My own first experience of black adding implicit concat somewhere was complete surprise and confusion at this being a language feature, so while I do not have a strong opinion about black adding + operators, I am very strongly against black introducing implicit string concat anywhere where it was not there before, since that also sort of blocks most linters from having a lint rule against it that is enabled by default.

@hauntsaninja hauntsaninja added the C: preview style Issues with the preview and unstable style. Add the name of the responsible feature in the title. label Sep 5, 2023
@JelleZijlstra JelleZijlstra changed the title Use explicit string concatenation in (experimental) string processing string_processing: Use explicit string concatenation Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: preview style Issues with the preview and unstable style. Add the name of the responsible feature in the title. F: strings Related to our handling of strings T: style What do we want Blackened code to look like?
Projects
None yet
Development

No branches or pull requests

6 participants