Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: unify numpy.random-related imports in pandas/tests/plotting #37103

Conversation

onshek
Copy link
Contributor

@onshek onshek commented Oct 13, 2020

latest update

all files are moved to #37492


A separate PR(CI) will be opened to add related code_check(#37117).
PS: Should a whatsnew note be added?

update

The whole project is reformatted except pandas/_testing.py for there are quite a few tm.randn used in testing files.

below is the script to do the whole clean-up

import os
import re


class NumpyRandomRelatedScript:
    def __init__(self, base_dir: str = "/dir/of/pandas/pandas") -> None:
        self.base_dir = base_dir
        self.py_files = []
        self.backup_files = []
        self.p1 = re.compile("import numpy as np")
        self.p2 = re.compile("from numpy.random import[ ,a-zA-Z]*[\s]")
        self.p3 = re.compile("from numpy import random[\s]")

    def search_py_files(self, pandas_file_dir: str) -> None:
        if os.path.isfile(pandas_file_dir):
            if pandas_file_dir[-3:] == ".py":
                if pandas_file_dir != os.path.join(self.base_dir, "_testing.py"):
                    self.py_files.append(pandas_file_dir)
        elif os.path.isdir(pandas_file_dir):
            for d in os.listdir(pandas_file_dir):
                self.search_py_files(os.path.join(pandas_file_dir, d))

    def do_the_clean_up(self, file_dir: str) -> None:
        with open(file_dir, "r") as file:
            data = file.read()
            m1, m2, m3 = (
                re.search(self.p1, data),
                re.search(self.p2, data),
                re.search(self.p3, data),
            )
            if not (m2 or m3):
                # print("There's no need to change, please recheck!")
                return

        backup_dir = file_dir + ".issue37053_backup"
        self.backup_files.append(backup_dir)
        print("Backup: " + backup_dir)
        with open(backup_dir, "w+") as file:
            file.write(data)

        with open(file_dir, "w+") as file:
            if m2:
                if not m1:
                    data = re.sub(self.p2, "import numpy as np\n", data)
                    m1 = True
                else:
                    data = re.sub(self.p2, "", data)
                methods = (
                    m2.group(0)
                    .replace("from numpy.random import", "")
                    .replace(" ", "")
                    .replace("\n", "")
                    .split(",")
                )
                if isinstance(methods, str):
                    methods = [methods]
                for method in methods:
                    data = re.sub(
                        r"[\s]{meth}[^a-z]".format(meth=method),
                        " np.random.{meth}(".format(meth=method),
                        data,
                    )
                    data = re.sub(
                        r"[\s]-{meth}[^a-z]".format(meth=method),
                        " -np.random.{meth}(".format(meth=method),
                        data,
                    )
                    data = re.sub(
                        r"[\(]{meth}[^a-z]".format(meth=method),
                        "(np.random.{meth}(".format(meth=method),
                        data,
                    )
            if m3:
                if not m1:
                    data = re.sub(self.p3, "import numpy as np\n", data)
                else:
                    data = re.sub(self.p3, "", data)
                data = re.sub(r"[\s]random.{1}", " np.random.", data)
                data = re.sub(r"[\(]random.{1}", "(np.random.", data)
            file.write(data)
            print("Clean: " + file_dir)

    def remove_backup(self) -> None:
        for file in self.backup_files:
            print("Remove: " + file)
            os.remove(file)


if __name__ == "__main__":
    script = NumpyRandomRelatedScript()
    script.search_py_files(script.base_dir)
    for f in script.py_files:
        script.do_the_clean_up(f)
    script.remove_backup()

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @onshek for the PR!

No need for a whatsnew since this isn't a user facing change.

Also maybe note in the description that this partially closes #37053 (we'll leave that open until we get the code check in)

@onshek
Copy link
Contributor Author

onshek commented Oct 13, 2020

Thanks @onshek for the PR!

No need for a whatsnew since this isn't a user facing change.

Also maybe note in the description that this partially closes #37053 (we'll leave that open until we get the code check in)

updated

@charlesdong1991 charlesdong1991 changed the title ENH: unify numpy.random-related imports in pandas/tests/plotting CLN: unify numpy.random-related imports in pandas/tests/plotting Oct 13, 2020
Copy link
Member

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm, shall we hold this PR up and implement the logic in code_check.sh first and do this whole clean-up in one go by running code_check.sh? I assume this unification might pop up in other files too

@mroeschke
Copy link
Member

mroeschke commented Oct 13, 2020

As noted in #37053 (comment), I would personally like to standardize towards np.random.<method>(...) instead of from numpy.random import <method>; <method>(...).

IMO it's easier to reason about where <method> comes from.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we don't want to do this. apologize if i said this was ok omewhere else. we always want fully qualified imports. We spent a lot of time removing this over the past few years and don't want to add it back. fuly qualified is much more obvious / explicit. what we do want are explicit checks that we are not using the shortcut imports.

@onshek
Copy link
Contributor Author

onshek commented Oct 14, 2020

Fully understand what you said above. Well this PR is still necessary I think, since from numpy import random and from numpy.random import randn also do exist. A code_check will be added as well.
Thanks for your correction ! @charlesdong1991 @mroeschke @jreback @arw2019
One more question, is the new code check used for checking (just catching the location where random.<method> / <method> are used), or we also want it to generate the correct imports?

@charlesdong1991 charlesdong1991 removed the Visualization plotting label Oct 14, 2020
@jbrockmendel
Copy link
Member

@onshek can you merge master

jbrockmendel and others added 12 commits October 28, 2020 20:57
…v#37219)

* DOC: Updated resample.py and groupby.py to fix SA04 Errors

* DOC: Fixed flake8 issue

* pandas-dev#28792 DOC: Fixed SA04 Errors

* pandas-dev#28792 Doc Fix SA04 Errors

* Update pandas/core/arrays/integer.py

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

* Update pandas/core/arrays/boolean.py

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

* pandas-dev#28792 DOC: udpated reviewers comments

* pandas-dev#28792 DOC: Fixing flake8 Errors

Co-authored-by: Nagesh Kumar C <Nageshkumar.c@ab-inbev.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
@onshek
Copy link
Contributor Author

onshek commented Oct 29, 2020

@onshek can you merge master

sure, will do it later

@onshek onshek closed this Oct 29, 2020
@onshek onshek deleted the issue37053-unify-numpy-related-imports-in-pandas/tests/plotting branch October 29, 2020 17:26
@onshek onshek restored the issue37053-unify-numpy-related-imports-in-pandas/tests/plotting branch October 29, 2020 17:28
…plotting' of https://github.com/onshek/pandas into issue37053-unify-numpy-related-imports-in-pandas/tests/plotting
@onshek onshek reopened this Oct 29, 2020
…s/tests/plotting' of https://github.com/onshek/pandas into issue37053-unify-numpy-related-imports-in-pandas/tests/plotting"

This reverts commit 50a92fb, reversing
changes made to 5532ae8.
@onshek
Copy link
Contributor Author

onshek commented Oct 29, 2020

all files are moved to #37492

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLN: unify numpy.random-related imports