Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Add unwanted pattern check #37298

Merged
merged 2 commits into from
Oct 21, 2020
Merged

CI: Add unwanted pattern check #37298

merged 2 commits into from
Oct 21, 2020

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Oct 21, 2020

Follow up to #37188

@dsaxton dsaxton added CI Continuous Integration Code Style Code style, linting, code_checks labels Oct 21, 2020
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Oct 21, 2020

Nice! Would it work to make this a local pygrep hook?

Something like

    -   id: unwanted-namespace-usage
        name: Check for unwanted namespace usage in tests
        language: pygrep
        entry: |
            (?x)
            pd\.DataFrame|
            pd\.Series
        types: [python]
        files: ^pandas/tests

?

(if I've understood what this does - are you jus checking for instances of pd.DataFrame in the test suite?)

@jreback jreback added this to the 1.2 milestone Oct 21, 2020
@jreback jreback merged commit 021d831 into pandas-dev:master Oct 21, 2020
@jreback
Copy link
Contributor

jreback commented Oct 21, 2020

thanks @dsaxton

yes making this a pre-commit hook would be good

@dsaxton
Copy link
Member Author

dsaxton commented Oct 21, 2020

Nice! Would it work to make this a local pygrep hook?

Something like

    -   id: unwanted-namespace-usage
        name: Check for unwanted namespace usage in tests
        language: pygrep
        entry: |
            (?x)
            pd\.DataFrame|
            pd\.Series
        types: [python]
        files: ^pandas/tests

?

(if I've understood what this does - are you jus checking for instances of pd.DataFrame in the test suite?)

@MarcoGorelli This is trying to make sure that we aren't using both pd.DataFrame and DataFrame in the same test file. So it's okay if we use pd.DataFrame, just not in this way:

import pandas as pd
from pandas import DataFrame

df = pd.DataFrame(...)

As far as I can tell there's no way to write this as a single grep command, but maybe there is still a way to add it to pre-commit?

@dsaxton dsaxton deleted the ci-unwanted-pattern-dataframe branch October 21, 2020 13:58
@MarcoGorelli
Copy link
Member

@dsaxton Thanks for explaining! You could do

    -   id: namespace
        name: namespace
        language: pygrep
        entry: |
            (?x)
            (
                pd\.DataFrame\(    # pd.DataFrame(
                .*                 # match anything
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                DataFrame\(        # DataFrame(
            )|
            (
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                DataFrame\(        # DataFrame(
                .*                 # match anything
                pd\.DataFrame\(    # pd.DataFrame(
            )
        types: [python]
        args: [--multiline]
        files: ^pandas/tests

This brings up a further inconsistency in

diff --git a/pandas/tests/io/test_compression.py b/pandas/tests/io/test_compression.py
index 31e9ad4cf..8cbe8d264 100644
--- a/pandas/tests/io/test_compression.py
+++ b/pandas/tests/io/test_compression.py
@@ -205,8 +205,8 @@ def test_with_missing_lzma_runtime():
         import sys
         import pytest
         sys.modules['lzma'] = None
-        import pandas
-        df = pandas.DataFrame()
+        import pandas as pd
+        df = pd.DataFrame()

I'd be tempted to make a custom script if we want to add this to pre-commit though. IMO it'd be worth it, as accidentally using pd.DataFrame instead of DataFrame in tests is likely quite common, and waiting for the CI checks to all pass isn't idea...I'll pick this up later

Anyway, thanks for having done this!

@dsaxton
Copy link
Member Author

dsaxton commented Oct 21, 2020

    -   id: namespace
        name: namespace
        language: pygrep
        entry: |
            (?x)
            (
                pd\.DataFrame\(    # pd.DataFrame(
                .*                 # match anything
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                DataFrame\(        # DataFrame(
            )|
            (
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                DataFrame\(        # DataFrame(
                .*                 # match anything
                pd\.DataFrame\(    # pd.DataFrame(
            )
        types: [python]
        args: [--multiline]
        files: ^pandas/tests

Nice, yeah if it could be done in pre-commit that's probably better. Is there a way to make a single regex extensible to other classes (e.g., Series, Index, etc.) or would those be added hooks?

@MarcoGorelli
Copy link
Member

I'm not sure, I think we'd need a custom Python script which gets called with an argument (e.g. DataFrame), but I'll come back to this

@MarcoGorelli
Copy link
Member

@dsaxton got it, we can do

        language: pygrep
        entry: |
            (?x)
            (
                pd\.(\w+)\(        # pd.DataFrame(
                .*                 # match anything
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                \2\(               # DataFrame(
            )|
            (
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                (\w+)\(            # DataFrame(
                .*                 # match anything
                pd\.\4\(           # pd.DataFrame(
            )
        types: [python]
        args: [--multiline]
        files: ^pandas/tests

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Code Style Code style, linting, code_checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants