Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: unify numpy.random-related imports #37492

Merged

Conversation

onshek
Copy link
Contributor

@onshek onshek commented Oct 29, 2020

update 2020-10-30

This is the replacement for #37103, since I messed up the git timeline in the previous PR(the conflicts are too complex to fix). The separate CI is in #37117.

The whole project is reformatted except pandas/_testing.py for there are quite a few tm.randn used in testing files.

below is the script to do the whole clean-up

import os
import re


class NumpyRandomRelatedScript:
    def __init__(self, base_dir: str = "/dir/of/pandas/pandas") -> None:
        self.base_dir = base_dir
        self.py_files = []
        self.backup_files = []
        self.p1 = re.compile("import numpy as np")
        self.p2 = re.compile("from numpy.random import[ ,a-zA-Z]*[\s]")
        self.p3 = re.compile("from numpy import random[\s]")

    def search_py_files(self, pandas_file_dir: str) -> None:
        if os.path.isfile(pandas_file_dir):
            if pandas_file_dir[-3:] == ".py":
                if pandas_file_dir != os.path.join(self.base_dir, "_testing.py"):
                    self.py_files.append(pandas_file_dir)
        elif os.path.isdir(pandas_file_dir):
            for d in os.listdir(pandas_file_dir):
                self.search_py_files(os.path.join(pandas_file_dir, d))

    def do_the_clean_up(self, file_dir: str) -> None:
        with open(file_dir, "r") as file:
            data = file.read()
            m1, m2, m3 = (
                re.search(self.p1, data),
                re.search(self.p2, data),
                re.search(self.p3, data),
            )
            if not (m2 or m3):
                # print("There's no need to change, please recheck!")
                return

        backup_dir = file_dir + ".issue37053_backup"
        self.backup_files.append(backup_dir)
        print("Backup: " + backup_dir)
        with open(backup_dir, "w+") as file:
            file.write(data)

        with open(file_dir, "w+") as file:
            if m2:
                if not m1:
                    data = re.sub(self.p2, "import numpy as np\n", data)
                    m1 = True
                else:
                    data = re.sub(self.p2, "", data)
                methods = (
                    m2.group(0)
                    .replace("from numpy.random import", "")
                    .replace(" ", "")
                    .replace("\n", "")
                    .split(",")
                )
                if isinstance(methods, str):
                    methods = [methods]
                for method in methods:
                    data = re.sub(
                        r"[\s]{meth}[^a-z]".format(meth=method),
                        " np.random.{meth}(".format(meth=method),
                        data,
                    )
                    data = re.sub(
                        r"[\s]-{meth}[^a-z]".format(meth=method),
                        " -np.random.{meth}(".format(meth=method),
                        data,
                    )
                    data = re.sub(
                        r"[\(]{meth}[^a-z]".format(meth=method),
                        "(np.random.{meth}(".format(meth=method),
                        data,
                    )
            if m3:
                if not m1:
                    data = re.sub(self.p3, "import numpy as np\n", data)
                else:
                    data = re.sub(self.p3, "", data)
                data = re.sub(r"[\s]random.{1}", " np.random.", data)
                data = re.sub(r"[\(]random.{1}", "(np.random.", data)
            file.write(data)
            print("Clean: " + file_dir)

    def remove_backup(self) -> None:
        for file in self.backup_files:
            print("Remove: " + file)
            os.remove(file)


if __name__ == "__main__":
    script = NumpyRandomRelatedScript()
    script.search_py_files(script.base_dir)
    for f in script.py_files:
        script.do_the_clean_up(f)
    script.remove_backup()

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @onshek!

Here or in a follow-on we'd like to add a CI check to enforce this in future PRs (unless it's already here and I missed it)

@arw2019 arw2019 added the Clean label Oct 29, 2020
@onshek
Copy link
Contributor Author

onshek commented Oct 30, 2020

Thanks @onshek!

Here or in a follow-on we'd like to add a CI check to enforce this in future PRs (unless it's already here and I missed it)

Hi @arw2019 , the separate CI is in #37117. I wonder are these errors caused by my change?

@arw2019
Copy link
Member

arw2019 commented Oct 30, 2020

Hi @arw2019 , the separate CI is in #37117. I wonder are these errors caused by my change?

The CI failures? I don't think so. Some lines can be a bit flaky

@onshek
Copy link
Contributor Author

onshek commented Oct 30, 2020

I see, thanks @arw2019 !
@jreback @charlesdong1991 @jbrockmendel let me know if where's any problem, thanks!

@onshek onshek changed the title CNL: unify numpy.random-related imports CLN: unify numpy.random-related imports Oct 30, 2020
@jreback jreback added this to the 1.2 milestone Oct 30, 2020
@jreback jreback added the Testing pandas testing functions or related to the test suite label Oct 30, 2020
@jreback jreback merged commit 31f59f7 into pandas-dev:master Oct 30, 2020
@jreback
Copy link
Contributor

jreback commented Oct 30, 2020

thanks @onshek happy to have a follow for the check (which you can implement using the pre-commit); we already do somethings like this, e.g. to enforce imports of DataFrame,Series and so on (so similar but a slightly different check)

@onshek
Copy link
Contributor Author

onshek commented Oct 30, 2020

thanks @onshek happy to have a follow for the check (which you can implement using the pre-commit); we already do somethings like this, e.g. to enforce imports of DataFrame,Series and so on (so similar but a slightly different check)

Is this ? commit https://github.com/pandas-dev/pandas/blob/master/.pre-commit-config.yaml
Then do I need to convert the related check in #37117 into the pre-commit file?

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
ukarroum pushed a commit to ukarroum/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLN: unify numpy.random-related imports
3 participants