Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV/STYLE: use ruff for linting #50160

Merged
merged 19 commits into from Jan 10, 2023
Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Dec 9, 2022

flake8 is one of the slow parts of our linting (which can become especially annoying when using it as a pre-commit check), and https://github.com/charliermarsh/ruff/ is a faster replacement, and is starting to picked up by several other projects.

This PR only replaces flake8 and flake8-bugbear on code (not docs) and yesqa with ruff. Later on, we could also consider replacing other parts of our linting with ruff (eg isort, pylint, pyupgrade, or some of our custom checks)

@jorisvandenbossche
Copy link
Member Author

Currently, the errors that I get are:

$ ruff pandas
Found 25 error(s).
pandas/core/arrays/casting.py:100:1: E501 Line too long (101 > 88 characters)
pandas/core/arrays/casting.py:111:1: E501 Line too long (99 > 88 characters)
pandas/core/arrays/casting.py:144:70: W292 No newline at end of file
pandas/core/computation/pytables.py:247:25: E713 Test for membership should be `not in`
pandas/core/generic.py:10825:52: B026 Star-arg unpacking after a keyword argument is strongly discouraged. It only works when the keyword parameter is declared after all parameters supplied by the unpacked sequence, and this change of ordering can surprise and mislead readers.
pandas/io/formats/latex.py:57:1: B024 `RowStringConverter` is an abstract base class, but it has no abstract methods
pandas/io/formats/style.py:2250:24: E714 Test for object identity should be `is not`
pandas/io/parsers/python_parser.py:1001:21: E741 Ambiguous variable name: `l`
pandas/plotting/_matplotlib/core.py:458:5: B027 `MPLPlot` is an empty method in an abstract base class, but has no abstract decorator
pandas/plotting/_matplotlib/core.py:667:5: B027 `MPLPlot` is an empty method in an abstract base class, but has no abstract decorator
pandas/plotting/_matplotlib/core.py:1858:20: E741 Ambiguous variable name: `l`
pandas/tests/arithmetic/test_datetime64.py:208:9: E741 Ambiguous variable name: `l`
pandas/tests/arithmetic/test_datetime64.py:211:13: E741 Ambiguous variable name: `l`
pandas/tests/frame/test_constructors.py:2481:22: E713 Test for membership should be `not in`
pandas/tests/indexes/common.py:35:5: F401 `pandas.core.api.UInt64Index` imported but unused
pandas/tests/indexes/period/test_indexing.py:781:20: E713 Test for membership should be `not in`
pandas/tests/indexes/period/test_indexing.py:782:20: E713 Test for membership should be `not in`
pandas/tests/io/formats/test_format.py:143:13: E741 Ambiguous variable name: `l`
pandas/tests/io/test_html.py:346:59: B026 Star-arg unpacking after a keyword argument is strongly discouraged. It only works when the keyword parameter is declared after all parameters supplied by the unpacked sequence, and this change of ordering can surprise and mislead readers.
pandas/tests/plotting/test_datetimelike.py:1024:16: E741 Ambiguous variable name: `l`
pandas/tests/plotting/test_datetimelike.py:1048:16: E741 Ambiguous variable name: `l`
pandas/tests/plotting/test_datetimelike.py:1065:16: E741 Ambiguous variable name: `l`
pandas/tests/plotting/test_datetimelike.py:1089:16: E741 Ambiguous variable name: `l`
pandas/tests/test_problem.py:8:5: F841 Local variable `res` is assigned to but never used
pandas/util/_doctools.py:80:21: E741 Ambiguous variable name: `l`

Are that all things we are OK with to fix, or do we want to ignore those rules instead?

(it seems we are currenlty not ignoring E741 (ambiguous short variable name) and E713 (Test for membership should be not in), but so ruff seems to find a few more of those compared to flake8?)

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jan 1, 2023

I think it'd be good to fix them - is it OK if I add a commit to do that? I'd be pretty keen to get this in and remove flake8, it'd be a nice improvement to dev workflow


EDIT: timings from running on all files in CI:

title-capitalization              0.06
pandas-errors-documented          0.06
validate-min-versions-in-sync     0.07
end-of-file-fixer                 0.09
pip-to-conda                      0.09
pg8000-not-installed-CI           0.12
invalid-ea-testing                0.13
seed-check-asv                    0.13
future-annotations                0.13
no-bool-in-core-generic           0.14
cython-casting                    0.17
no-return-exception               0.21
incorrect-backticks               0.24
np-testing-array-equal            0.27
rst-backticks                     0.27
trailing-whitespace               0.33
double-quote-cython-strings       0.59
rst-directive-colons              0.63
use-pd_array-in-core              0.82
validate-errors-locations         0.85
rst-inline-touching-normal        0.87
flake8-pyi                        0.90
unwanted-patterns                 1.05
use-io-common-urlopen             1.37
cpplint                           1.67
codespell                         1.70
ruff                              3.35
flake8-rst                        3.57
check-test-naming                 3.96
absolufy-imports                  5.13
debug-statements                  5.29
isort                             6.59
cython-lint                       6.98
vulture                          10.31
sphinx-lint                      14.08
pyupgrade                        21.74
black                            59.38

3.35 seconds to lint the entire codebase...incredible 😱

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, big +1 from me (though I added a commit to fixup the remaining issues, so probably best if someone else takes a look too)

@@ -121,12 +115,6 @@ repos:
rev: v0.6.7
hooks:
- id: sphinx-lint
- repo: https://github.com/asottile/yesqa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruff has its own version of yesqa

@jorisvandenbossche jorisvandenbossche changed the title Test ruff for linting DEV/STYLE: use ruff for linting Jan 2, 2023
@jorisvandenbossche jorisvandenbossche added the Code Style Code style, linting, code_checks label Jan 2, 2023
@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review January 2, 2023 14:31
additional_dependencies: &flake8_dependencies
- flake8==6.0.0
- flake8-bugbear==22.7.1
- pandas-dev-flaker==0.5.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely removing flake8 here means we also stop using pandas-dev-flaker. I am not familiar with it, but are there things we would like to keep from that? Is it possible to run flake8 only with the pandas-dev-flaker checks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into that but it's not so simple...I think the simplest thing would be to revert the pandas-dev-flaker PR #50519

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see, that's indeed also a good option.

pyproject.toml Outdated
]

[tool.ruff.per-file-ignores]
# ignoring multiline is not yet supported (https://github.com/charliermarsh/ruff/issues/1052)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be fixed now (since 0.0.203)

MarcoGorelli and others added 2 commits January 2, 2023 14:37
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
pyproject.toml Outdated
"pandas/util/__init__.py" = ["F401"]
"pandas/tests/extension/base/__init__.py" = ["F401"]
# undefined name with global (https://github.com/charliermarsh/ruff/issues/960)
"pandas/io/clipboard/__init__.py" = ["F821"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also be fixed since 0.0.171

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, thanks! all the per-file-ignores ones could be removed in fact, the Literal one was just due to Literal having been imported from pandas._typing rather than typing, so a static tool doesn't recognise it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at pandas/core/reshape/merge.py, this may no longer be relevant, but as a heads up, we do now support a setting specifically for this use-case, to mark modules as "typing redefinitions`.

Issue that motivated it (from Airflow): astral-sh/ruff#1744
Docs: https://github.com/charliermarsh/ruff#typing-modules

@jorisvandenbossche
Copy link
Member Author

(though I added a commit to fixup the remaining issues, so probably best if someone else takes a look too)

All code changes look fine to me.


ignore = [
# space before : (needed for how black formats slicing)
# "E203", # not yet implemented
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might seem that ruff doesn't implement full of flake8, since some of those rules we ignore are not implemented, but the context is that those were not prioritized, exactly because those are not needed when using black (astral-sh/ruff#1527 (comment))

@MarcoGorelli
Copy link
Member

(though I added a commit to fixup the remaining issues, so probably best if someone else takes a look too)

All code changes look fine to me.

Cool - good to merge, or shall we wait for a 3rd opinion?

@mroeschke
Copy link
Member

+1 to move to a faster style checker. Probably comfortable merging #50519 first so we don't lose the pandas style checker checks in the meantime

pyproject.toml Outdated Show resolved Hide resolved
@MarcoGorelli
Copy link
Member

good points @twoertwein , thanks - we OK to get this in now?

There's a few more checks that can be enabled (e.g. TID252 allows to get rid of absolufy-imports), but perhaps let's keep that to a follow-up?

Regarding removing linters from the environment file - yeah I'd be totally in favour of that, let's just touch base on this in the call and then do it

@MarcoGorelli
Copy link
Member

Regarding removing linters from the environment file - yeah I'd be totally in favour of that, let's just touch base on this in the call and then do it

Just got:

  error    libmamba Could not solve for environment specs
      Encountered problems while solving:
        - nothing provides requested ruff 0.0.217**
      
      The environment can't be solved, aborting the operation
      
  critical libmamba Could not solve for environment specs
  Error: Failed to execute ["bash","-c","micromamba create -n test --strict-channel-priority -y --log-level warning -f /home/runner/work/pandas/pandas/environment.yml"]

Let's just spare ourselves with extra headaches and just not include it in the environment.yml file?

@jorisvandenbossche
Copy link
Member Author

conda-forge is at 0.0.215 (https://anaconda.org/conda-forge/ruff), so you can use that version

@jorisvandenbossche
Copy link
Member Author

Regarding removing linters from the environment file - yeah I'd be totally in favour of that, let's just touch base on this in the call and then do it

If you want to discuss that, can you open an issue about it? (that can be done in advance of the call)

@MarcoGorelli
Copy link
Member

sure, have reverted to 0.0.215

there's #46561 from sometime ago, just never got around to discussing it in calls, perhaps worth revisiting if there's time

@MarcoGorelli MarcoGorelli added this to the 2.0 milestone Jan 10, 2023
@MarcoGorelli
Copy link
Member

Let's get this in - thanks @jorisvandenbossche !

@MarcoGorelli MarcoGorelli merged commit db8af0e into pandas-dev:main Jan 10, 2023
@jorisvandenbossche
Copy link
Member Author

And thanks you for much of the follow-up work!

# Function definition does not bind loop variable
"B023",
# Functions defined inside a loop must not use variables redefined in the loop
# "B301", # not yet implemented

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the drive-by, but I believe this was removed from flake8-bugbear when it went Python 3-only: PyCQA/flake8-bugbear#182

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks for your input! yes there's a few things that could be addressed in a follow-up, such as this one

@charliermarsh
Copy link

This is so cool! Thank you for trying Ruff. I had no idea this was happening. You all should've been bugging me way more for fixes :)

(And sorry again for the PR spam. Just never thought I'd see Ruff in a project like Pandas :))

@jorisvandenbossche
Copy link
Member Author

Just never thought I'd see Ruff in a project like Pandas :)

And thanks a lot to you for Ruff, it's a great project making the python linting experience so much better!

kaxil added a commit to astronomer/astro-sdk that referenced this pull request Jan 12, 2023
https://github.com/charliermarsh/ruff/ is a faster replacement of most of the linting tools we use, and is starting to picked up by several other projects. Even project like Pandas have adopted Ruff (pandas-dev/pandas#50160)

This PR replaces flake8, isort, pyupgrade.
kaxil added a commit to astronomer/astro-sdk that referenced this pull request Jan 12, 2023
https://github.com/charliermarsh/ruff/ is a faster replacement of most of the linting tools we use, and is starting to picked up by several other projects. Even project like Pandas have adopted Ruff (pandas-dev/pandas#50160)

This PR replaces flake8, isort, pyupgrade.
kaxil added a commit to astronomer/astro-sdk that referenced this pull request Jan 12, 2023
https://github.com/charliermarsh/ruff/ is a faster replacement of most of the linting tools we use, and is starting to picked up by several other projects. Even project like Pandas have adopted Ruff (pandas-dev/pandas#50160)

This PR replaces flake8, isort, pyupgrade.
kaxil added a commit to astronomer/astro-sdk that referenced this pull request Jan 12, 2023
https://github.com/charliermarsh/ruff/ is a faster replacement of most
of the linting tools we use, and is starting to picked up by several
other projects. Even project like Pandas have adopted Ruff
(pandas-dev/pandas#50160)

This PR replaces flake8, isort, pyupgrade.

```
❯ time pre-commit run flake8 --all-files
flake8...................................................................Passed
pre-commit run flake8 --all-files  3.48s user 0.55s system 372% cpu 1.084 total

❯ time pre-commit run isort --all-files
Run isort................................................................Passed
pre-commit run isort --all-files  0.31s user 0.18s system 50% cpu 0.973 total

❯ time pre-commit run pyupgrade --all-files
pyupgrade................................................................Passed
pre-commit run pyupgrade --all-files  1.26s user 0.23s system 280% cpu 0.530 total
```
 vs
```
❯ time pre-commit run ruff --all-files
ruff.....................................................................Passed
pre-commit run ruff --all-files  0.37s user 0.16s system 142% cpu 0.373 total
```

**_Drops from 5s to 0.3s_**


Some other popular tools that have adopted it:

- [FastAPI](https://github.com/tiangolo/fastapi)
- [Bokeh](https://github.com/bokeh/bokeh)
- [Zulip](https://github.com/zulip/zulip)
- [Pydantic](https://github.com/pydantic/pydantic)
- [Sphinx](https://github.com/sphinx-doc/sphinx)
- [Hatch](https://github.com/pypa/hatch)
- [Jupyter](https://github.com/jupyter-server/jupyter_server)
- [Synapse](https://github.com/matrix-org/synapse)
- [Saleor](https://github.com/saleor/saleor)
- [Polars](https://github.com/pola-rs/polars)
- [Ibis](https://github.com/ibis-project/ibis)
- [OpenBB](https://github.com/OpenBB-finance/OpenBBTerminal)

It will also be used by Apache Airflow :) and should be used in
Astronomer-providers too.
charliermarsh pushed a commit to astral-sh/ruff that referenced this pull request Jan 12, 2023
utkarsharma2 pushed a commit to astronomer/astro-sdk that referenced this pull request Jan 17, 2023
https://github.com/charliermarsh/ruff/ is a faster replacement of most
of the linting tools we use, and is starting to picked up by several
other projects. Even project like Pandas have adopted Ruff
(pandas-dev/pandas#50160)

This PR replaces flake8, isort, pyupgrade.

```
❯ time pre-commit run flake8 --all-files
flake8...................................................................Passed
pre-commit run flake8 --all-files  3.48s user 0.55s system 372% cpu 1.084 total

❯ time pre-commit run isort --all-files
Run isort................................................................Passed
pre-commit run isort --all-files  0.31s user 0.18s system 50% cpu 0.973 total

❯ time pre-commit run pyupgrade --all-files
pyupgrade................................................................Passed
pre-commit run pyupgrade --all-files  1.26s user 0.23s system 280% cpu 0.530 total
```
 vs
```
❯ time pre-commit run ruff --all-files
ruff.....................................................................Passed
pre-commit run ruff --all-files  0.37s user 0.16s system 142% cpu 0.373 total
```

**_Drops from 5s to 0.3s_**


Some other popular tools that have adopted it:

- [FastAPI](https://github.com/tiangolo/fastapi)
- [Bokeh](https://github.com/bokeh/bokeh)
- [Zulip](https://github.com/zulip/zulip)
- [Pydantic](https://github.com/pydantic/pydantic)
- [Sphinx](https://github.com/sphinx-doc/sphinx)
- [Hatch](https://github.com/pypa/hatch)
- [Jupyter](https://github.com/jupyter-server/jupyter_server)
- [Synapse](https://github.com/matrix-org/synapse)
- [Saleor](https://github.com/saleor/saleor)
- [Polars](https://github.com/pola-rs/polars)
- [Ibis](https://github.com/ibis-project/ibis)
- [OpenBB](https://github.com/OpenBB-finance/OpenBBTerminal)

It will also be used by Apache Airflow :) and should be used in
Astronomer-providers too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Style Code style, linting, code_checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants