Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySpark long instructions style #1816

Closed
vmgustavo opened this issue Nov 11, 2020 · 4 comments
Closed

PySpark long instructions style #1816

vmgustavo opened this issue Nov 11, 2020 · 4 comments
Labels
F: linebreak How should we split up lines? R: duplicate This issue or pull request already exists T: style What do we want Blackened code to look like?

Comments

@vmgustavo
Copy link

Describe the style change
It is frequent to use long instructions for PySpark chained with the dot operator. The official Apache Spark tutorials on how to use PySpark suggest using \ to break the statement. The current style splits the statement simply to avoid crossing the char columns limit and doesn't consider readability. By splitting the statement at the dot operators this could be better.

Examples in the current Black style

spark = (
    SparkSession.builder.appName("Python Spark SQL basic example")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()
)
data.repartition("column").write.partitionBy("column").mode("ignore").option(
    "compress", "snappy"
).parquet(output_path)

Desired style
Should follow the Apache Spark code style

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()

Or maybe if not using \ to split the lines

spark = (
    SparkSession
    .builder
    .appName("Python Spark SQL basic example")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()
)

Additional context

@vmgustavo vmgustavo added the T: style What do we want Blackened code to look like? label Nov 11, 2020
@JelleZijlstra JelleZijlstra added the F: linebreak How should we split up lines? label May 30, 2021
@JelleZijlstra
Copy link
Collaborator

We will not use the style with backslashes; that's against Black general style principles.

I'm most interested in changing the output for your second example, with data.repartition. That does look ugly right now, and I'd welcome suggestions for how to make it better.

@ichard26 ichard26 added the S: needs discussion Needs further hashing out before ready for implementation (on desirability, feasibility, etc.) label Jan 29, 2022
@vmgustavo
Copy link
Author

By simpling wrapping the instructions with parenthesis the style is already great:

(data.repartition("column").write.partitionBy("column").mode("ignore").option(
    "compress", "snappy"
).parquet(output_path))
(
    data.repartition("column")
    .write.partitionBy("column")
    .mode("ignore")
    .option("compress", "snappy")
    .parquet(output_path)
)

The bad formmating only happens when there is no parenthesis. Would it be possible to "force add" parenthesis?

@felix-hilden
Copy link
Collaborator

@vmgustavo We have an open issue about improvements to fluent interfaces: #571

@felix-hilden
Copy link
Collaborator

And I think the case Jelle was interested in is also covered by that issue, so I'll close this as a duplicate.

@felix-hilden felix-hilden added R: duplicate This issue or pull request already exists and removed S: needs discussion Needs further hashing out before ready for implementation (on desirability, feasibility, etc.) labels Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F: linebreak How should we split up lines? R: duplicate This issue or pull request already exists T: style What do we want Blackened code to look like?
Projects
None yet
Development

No branches or pull requests

4 participants