CI: Add unwanted pattern check #37298

dsaxton · 2020-10-21T01:23:52Z

Follow up to #37188

MarcoGorelli · 2020-10-21T08:05:52Z

Nice! Would it work to make this a local pygrep hook?

Something like

    -   id: unwanted-namespace-usage
        name: Check for unwanted namespace usage in tests
        language: pygrep
        entry: |
            (?x)
            pd\.DataFrame|
            pd\.Series
        types: [python]
        files: ^pandas/tests

?

(if I've understood what this does - are you jus checking for instances of pd.DataFrame in the test suite?)

jreback · 2020-10-21T12:31:21Z

thanks @dsaxton

yes making this a pre-commit hook would be good

dsaxton · 2020-10-21T13:58:10Z

Nice! Would it work to make this a local pygrep hook?

Something like

    -   id: unwanted-namespace-usage
        name: Check for unwanted namespace usage in tests
        language: pygrep
        entry: |
            (?x)
            pd\.DataFrame|
            pd\.Series
        types: [python]
        files: ^pandas/tests

?

(if I've understood what this does - are you jus checking for instances of pd.DataFrame in the test suite?)

@MarcoGorelli This is trying to make sure that we aren't using both pd.DataFrame and DataFrame in the same test file. So it's okay if we use pd.DataFrame, just not in this way:

import pandas as pd
from pandas import DataFrame

df = pd.DataFrame(...)

As far as I can tell there's no way to write this as a single grep command, but maybe there is still a way to add it to pre-commit?

MarcoGorelli · 2020-10-21T14:40:51Z

@dsaxton Thanks for explaining! You could do

    -   id: namespace
        name: namespace
        language: pygrep
        entry: |
            (?x)
            (
                pd\.DataFrame\(    # pd.DataFrame(
                .*                 # match anything
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                DataFrame\(        # DataFrame(
            )|
            (
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                DataFrame\(        # DataFrame(
                .*                 # match anything
                pd\.DataFrame\(    # pd.DataFrame(
            )
        types: [python]
        args: [--multiline]
        files: ^pandas/tests

This brings up a further inconsistency in

diff --git a/pandas/tests/io/test_compression.py b/pandas/tests/io/test_compression.py
index 31e9ad4cf..8cbe8d264 100644
--- a/pandas/tests/io/test_compression.py
+++ b/pandas/tests/io/test_compression.py
@@ -205,8 +205,8 @@ def test_with_missing_lzma_runtime():
         import sys
         import pytest
         sys.modules['lzma'] = None
-        import pandas
-        df = pandas.DataFrame()
+        import pandas as pd
+        df = pd.DataFrame()

I'd be tempted to make a custom script if we want to add this to pre-commit though. IMO it'd be worth it, as accidentally using pd.DataFrame instead of DataFrame in tests is likely quite common, and waiting for the CI checks to all pass isn't idea...I'll pick this up later

Anyway, thanks for having done this!

dsaxton · 2020-10-21T17:31:49Z

    -   id: namespace
        name: namespace
        language: pygrep
        entry: |
            (?x)
            (
                pd\.DataFrame\(    # pd.DataFrame(
                .*                 # match anything
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                DataFrame\(        # DataFrame(
            )|
            (
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                DataFrame\(        # DataFrame(
                .*                 # match anything
                pd\.DataFrame\(    # pd.DataFrame(
            )
        types: [python]
        args: [--multiline]
        files: ^pandas/tests

Nice, yeah if it could be done in pre-commit that's probably better. Is there a way to make a single regex extensible to other classes (e.g., Series, Index, etc.) or would those be added hooks?

MarcoGorelli · 2020-10-21T17:49:50Z

I'm not sure, I think we'd need a custom Python script which gets called with an argument (e.g. DataFrame), but I'll come back to this

MarcoGorelli · 2020-10-22T08:19:42Z

@dsaxton got it, we can do

        language: pygrep
        entry: |
            (?x)
            (
                pd\.(\w+)\(        # pd.DataFrame(
                .*                 # match anything
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                \2\(               # DataFrame(
            )|
            (
                (?<!pd\.)          # negative lookbehind: pd.
                (?<!\w)            # negative lookbehind: any character
                (\w+)\(            # DataFrame(
                .*                 # match anything
                pd\.\4\(           # pd.DataFrame(
            )
        types: [python]
        args: [--multiline]
        files: ^pandas/tests

CI: Add unwanted pattern check

19fcd17

dsaxton added CI Continuous Integration Code Style Code style, linting, code_checks labels Oct 21, 2020

Fix

78dbd31

jreback added this to the 1.2 milestone Oct 21, 2020

jreback merged commit 021d831 into pandas-dev:master Oct 21, 2020

dsaxton deleted the ci-unwanted-pattern-dataframe branch October 21, 2020 13:58

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020

CI: Add unwanted pattern check (pandas-dev#37298)

314e8ea

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

CI: Add unwanted pattern check (pandas-dev#37298)

d436dbd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Add unwanted pattern check #37298

CI: Add unwanted pattern check #37298

dsaxton commented Oct 21, 2020

MarcoGorelli commented Oct 21, 2020 •

edited

Loading

jreback commented Oct 21, 2020

dsaxton commented Oct 21, 2020

MarcoGorelli commented Oct 21, 2020

dsaxton commented Oct 21, 2020

MarcoGorelli commented Oct 21, 2020

MarcoGorelli commented Oct 22, 2020

CI: Add unwanted pattern check #37298

CI: Add unwanted pattern check #37298

Conversation

dsaxton commented Oct 21, 2020

MarcoGorelli commented Oct 21, 2020 • edited Loading

jreback commented Oct 21, 2020

dsaxton commented Oct 21, 2020

MarcoGorelli commented Oct 21, 2020

dsaxton commented Oct 21, 2020

MarcoGorelli commented Oct 21, 2020

MarcoGorelli commented Oct 22, 2020

MarcoGorelli commented Oct 21, 2020 •

edited

Loading