Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Use consistent imports in tests #23018

Closed
h-vetinari opened this issue Oct 6, 2018 · 22 comments
Closed

CLN: Use consistent imports in tests #23018

h-vetinari opened this issue Oct 6, 2018 · 22 comments
Labels
Clean Code Style Code style, linting, code_checks Testing pandas testing functions or related to the test suite

Comments

@h-vetinari
Copy link
Contributor

A while ago, I got the review from @jreback that:

I kept applying these rules until #22730, where @jreback then said:

Before we do any more of this
[quote of the above]
I would like a lint rule to see what is what (and basically list the offenders as exclusions for now).
We have been changing things like this for a long time and I think we somtimes change them back (e.g. to use the imports of assert_*, rather than tm., and to use the pd.* rather than the direct imports.

and further:

let's see which way is more common, then have a lint rule to enforce it, rather than ad-hoc conversions (which has been the policy up till now).

Opening this issue for someone to write such a lint rule, because I wouldn't know where to start.

@pambot
Copy link
Contributor

pambot commented Oct 8, 2018

@h-vetinari I'll give this a shot

@h-vetinari
Copy link
Contributor Author

@pambot Great!

@pambot
Copy link
Contributor

pambot commented Oct 8, 2018

@jreback Pinging you because it was your idea to add the linting rule - I'm happy to do it, but what's the best way to get get started? I looked around, and all I can find is ci/lint.sh (but that's probably just for CI) and a suggestion on StackOverflow to modify a tox.ini file (but it doesn't look like Tox is supported currently)

@h-vetinari
Copy link
Contributor Author

@pambot
It was not my idea but @jreback's. I opened this issue to record that review feedback for someone else, because I don't know how to tackle setting up custom linting rules...

Any tips for @pambot, @TomAugspurger @datapythonista, maybe?

@TomAugspurger
Copy link
Contributor

You may want to wait till #22854 and #22863 are in before working on this too much, since those are changing up the linters. I would also wait for some consensus on what we want these import rules to look like.

Finally, I think @alimcmaster1 has some plans for a larger import order refactor using isort. That will be a large change that'll cause conflicts in a lot of outstanding PRs. Depending on the size of the changes for this issue, it may be good to do those two PRs close together, so that people only have to fix conflicts once.

@TomAugspurger
Copy link
Contributor

On this issue in particular, I don't have strong preferences.

  • I prefer tm.assert* over assert*
  • I slightly prefer pd. rather than directly importing top-level things in the test files

@pambot
Copy link
Contributor

pambot commented Oct 8, 2018

Sounds good, I'm cool with waiting.

@h-vetinari
Copy link
Contributor Author

@TomAugspurger

Thanks for the infos/feedback.

Re:

I slightly prefer pd. rather than directly importing top-level things in the test files

there might be good reasons for having different rules between pandas/core and pandas/tests, but in the former, it's almost exclusively(?) direct imports.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 8, 2018 via email

@jreback
Copy link
Contributor

jreback commented Oct 8, 2018

just to be clea
this is really aimed at the test files
we may extend this to the code itself at a later point

@datapythonista datapythonista changed the title CLN: internal import rules CLN: Use consistent imports in tests Oct 9, 2018
@datapythonista datapythonista added Testing pandas testing functions or related to the test suite Code Style Code style, linting, code_checks Clean labels Oct 9, 2018
@SaturnFromTitan
Copy link
Contributor

I just had a quick look at this. Using the following grep command with varying search texts gives the number of files where this pattern is used:
grep -R -l --include="*.py" <search_text> pandas/tests/ | wc -l

search_text = "from pandas.util.testing import ": 118
search_text = "import pandas.util.testing as tm": 324
search_text = "from pandas import ": 367
search_text = "import pandas as pd": 308

For the test utils, the tm. import seems to be pretty dominant - so that decision should be straight forward.

In the .pd case, I'd vote for direct imports because
1.) It's already the dominant variation (migrating 60 additional files is quite some effort)
2.) It's in line with @jreback's earlier review comments
3.) According to @TomAugspurger's comment above, this is the way it is mostly done in the rest of the project as well

Wdyt?

@jreback
Copy link
Contributor

jreback commented Oct 27, 2019

i would be +1 in applying those as lint rules (have to fix the current first of course)

prefer

search_text = "import pandas.util.testing as tm": 324
search_text = "from pandas import ": 367

@WillAyd
Copy link
Member

WillAyd commented Oct 28, 2019

The more I've been reading up on it I would prefer import pandas as pd to from pandas import ... if only because the latter seems more prone to circular imports; this probably doesn't matter much for tests but I think sets a better standard throughout code base

Guido himself recommends against using from <module> import ... :

https://docs.python.org/3/faq/programming.html?highlight=circular%20import#how-can-i-have-modules-that-mutually-import-each-other

@h-vetinari
Copy link
Contributor Author

@WillAyd [my bold]: The more I've been reading up on it I would prefer import pandas as pd to from pandas import ... if only because the latter seems more prone to circular imports; this probably doesn't matter much for tests but I think sets a better standard throughout code base

That may be true for the top level modules, but it is not true anymore when submodules are targeted directly, i.e. the following are not equivalent:

  1. import pandas as pd and then using pd.Series
  2. from pandas.core.series import Series and then using Series

Number 1. cannot be used in any code that defines something that is itself exposed through the toplevel module. Number 2. can be imported even there (except within pandas.core.series itself, obviously).

For this reason, it makes sense to stick with option 2. IMO, because it can be used consistently in more situations (and especially within the core itself).

@WillAyd
Copy link
Member

WillAyd commented Oct 28, 2019

Number 1. cannot be used in any code that defines something that is itself exposed through the toplevel module.

I don't believe that is true. read_json for instance is exposed in the top level and imports from there as well. Even pandas.core.series does a import pandas as pd

For this reason, it makes sense to stick with option 2. IMO, because it can be used consistently in more situations (and especially within the core itself).

I didn't realize this was one of the options, but I am -1 on that. Tests should generally be importing objects from the same locations that an end user would

@h-vetinari
Copy link
Contributor Author

@WillAyd: I don't believe that is true. read_json for instance is exposed in the top level and imports from there as well. Even pandas.core.series does a import pandas as pd

Interesting. That goes against my experience but python may have gotten smarter about this with doing some lazy importing. For example, pd is only used once in pandas.core.series in apply, but pd still cannot be used in the class constructor, for example (just tested this on master).

@WillAyd: I didn't realize this was one of the options, but I am -1 on that. Tests should generally be importing objects from the same locations that an end user would

There's probably a misunderstanding here. For tests, both styles are fine. But since you were specifically aiming at "a better standard throughout code base", I responded that the core has different restrictions. And if a uniform standard is desired, then tests should import from pandas ... (without submodules though).

@WillAyd
Copy link
Member

WillAyd commented Oct 28, 2019

Ah OK - sorry yea I think we misunderstood each other on the last post. I was specifically talking about tests.

Expanding on other areas in the code base though, from ... imports still will cause more circular dependencies than aliasing. You can see this locally with two modules categorical and dtype

# categorical.py
import dtype as dt

class Categorical:
    pass

# dtype.py
import categorical as cat
#from categorical import Categorical

class Dtype:

    def __init__(self):
        self.known_types = [cat.Categorical]


if __name__ == "__main__":
    print("Hello world")

If you uncomment and try to do from categorical import Categorical this will fail (at least on Python 3.7.4). I think we run into this issue a lot in core.

FWIW I'm not an expert here so maybe simple I'm overlooking, but generally just prefer to go with what Guido suggests on language elements :-)

@h-vetinari
Copy link
Contributor Author

FWIW I'm not an expert here so maybe simple I'm overlooking, but generally just prefer to go with what Guido suggests on language elements :-)

That's kind of what left me confused with your original comment - first you say you'd prefer import pandas as pd and then you quote Guido as supporting evidence to use from pandas import ... :)

@WillAyd
Copy link
Member

WillAyd commented Oct 28, 2019

There was a typo in my original comment that I've since fixed, but may have confused us both. Quoting directly from the article:

Guido van Rossum recommends avoiding all uses of from <module> import ...

@SaturnFromTitan
Copy link
Contributor

SaturnFromTitan commented Oct 29, 2019

I think @WillAyd's reasoning is sound and convinced me. It seems like using import pandas as pd has some real advantages, whereas all arguments for from pandas import ... seem to be about personal preference.

As there's no clear prefered version in the current test suite I'd rather convert everything to import pandas as pd.

If no one has further objections, I'd already start writing a follow-up ticket for the import pandas.util.testing as tm imports. By the time this is resolved, we hopefully came to a conclusion on the other case ✌️

@jreback
Copy link
Contributor

jreback commented Oct 29, 2019

As there's no clear prefered version in the current test suite I'd rather convert everything to import pandas as pd.

let's hold off on this one

use the import pandas.util.testing as tm is fine though

@mroeschke
Copy link
Member

Looks like #33203 is the more up to date discussion on this topic. Closing favor of that thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Code Style Code style, linting, code_checks Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

8 participants