Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Lazy import testing on python >=3.7 #14097

Merged
merged 2 commits into from
Aug 22, 2019

Conversation

hmaarrfk
Copy link
Contributor

@hmaarrfk hmaarrfk commented Jul 24, 2019

See discussion here.

Just testing with CIs for now.

#14083

       before           after         ratio
     [ea965e4c]       [1a42f13e]
     <master>         <import_time_tester>
-       100±0.6ms       90.7±0.5ms     0.90  bench_import.Import.time_numpy

@charris charris changed the title WIP: MNT: Avoid importing tester. WIP: MAINT: Avoid importing tester. Jul 25, 2019
@charris
Copy link
Member

charris commented Jul 25, 2019

I can see removing Tester at some point, maybe even separating the tests into their own tree. For now, I wonder why it is so slow? All the instantiation does is set an internal variable. Maybe

    # Pytest testing
    from numpy._pytesttester import PytestTester
    test = PytestTester(__name__)
    del PytestTester

is a problem. What if we just import numpy._pytesttester and test = numpy._pytesttester.PytestTester(__name__)
or some such?

@hmaarrfk
Copy link
Contributor Author

The problem seems two fold: first testing imports all of unittest, which is slow.

Importing tester makes the testing namespace available, which many now expect.

I think different python versions deal differently with implicit namespaces. I kinda wanted to see where the tests fail first.

It may be possible to do something with the new module level getattr for modern python versions. In that case, we can lazy load tester for modern python. At this stage. This PR was more to get a full view of the problem as it stands now accross different python versions you all support.

@seberg
Copy link
Member

seberg commented Jul 26, 2019

On newer python versions, we could use the module __getattr__ to achieve the same thing? It would only optimize new versions, but that seems really fine (plus it makes half a step towards deprecating it)? Eg. see gh-13525 for the fun :).

@charris
Copy link
Member

charris commented Jul 26, 2019

Could make the unittest import lazy, it is only needed for testing. Might also look if pytest can supply the needed functionality.

EDIT: Note that we only keep the nose module for backwards compatibility, we don't use it and it is only tested if nose is present.

@hmaarrfk
Copy link
Contributor Author

Right. so I think I've isolated the 3 places where testing is used internally. Maybe the getattr trick @eric-wieser demo'ed could be used.

Serious question, does __getattr__ make things slower on the long run?

@seberg
Copy link
Member

seberg commented Jul 26, 2019

I am not 100% sure, but I think __getattr__ is only used as a backup solution, so unless you hit __getattr__ (or try to use variables that do not exist a lot) there is no slowdown.

@eric-wieser
Copy link
Member

Also, as soon as __getattr__ is used the first time for a lazy import, subsequent accesses will hit the module anyway:

# numpy/__init__.py
def __getattr__(attr):
    if attr == 'testing':
        import numpy.testing
        return numpy.testing
    raise AttributeError

In user code

np.testing  # tries np.__dict__, fails, runs __getattr__
np.testing  # tries np.__dict__, succeeds

@hmaarrfk
Copy link
Contributor Author

The issue of using the __getattr__ is that you don't deprecate the automatic importing of .testing.

(numpy) mark@mark-xps ~
$ ipython
Python 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 21:52:21)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.6.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from numpy import testing
/home/mark/miniconda3/envs/numpy/bin/ipython:1: UserWarning: some useful warning
  #!/home/mark/miniconda3/envs/numpy/bin/python
diff --git a/doc/sphinxext b/doc/sphinxext
--- a/doc/sphinxext
+++ b/doc/sphinxext
@@ -1 +1 @@
-Subproject commit a482f66913c1079d7439770f0119b55376bb1b81
+Subproject commit a482f66913c1079d7439770f0119b55376bb1b81-dirty
diff --git a/numpy/__init__.py b/numpy/__init__.py
index 14854f39f..54e856f15 100644
--- a/numpy/__init__.py
+++ b/numpy/__init__.py
@@ -183,12 +183,19 @@ else:
     numarray = 'removed'

     # Do we need this???
-    # if sys.version_info[0] >= 3 and sys.version_info[1] >= 5:
+    if sys.version_info[0] >= 3 and sys.version_info[1] >= 7:
         # pass
-    # else:
+        def __getattr__(attr):
+            if attr == 'testing':
+                import importlib
+                import warnings
+                warnings.warn("some useful warning", stacklevel=2)
+                return importlib.import_module(".testing", __name__)
+    else:
         # We don't actually use this ourselves anymore, but I'm not 100% sure that
         # no-one else in the world is using it (though I hope not)
-    #    from .testing import Tester
+        from .testing import Tester
+

     # Pytest testing
     from numpy._pytesttester import PytestTester

@hmaarrfk
Copy link
Contributor Author

Honestly, I like the testing module, I just don't like the general performance hit on most code.

@hmaarrfk
Copy link
Contributor Author

I like the fact that it is readily available

@eric-wieser
Copy link
Member

eric-wieser commented Jul 26, 2019

The issue of using the getattr is that you don't deprecate the automatic importing of .testing.

I have no idea what you mean by that. Isn't deprecation exactly what you just demoed, modulo the use of the wrong warning type?

@seberg
Copy link
Member

seberg commented Jul 26, 2019

In your diff, you should also add test to the __getattr__, or else parts of testing are imported through _pytesttester, no?

@hmaarrfk
Copy link
Contributor Author

You would have to create a whole new numpy testing package. Is that the plan too? I love the numpy testing functionality

@eric-wieser
Copy link
Member

eric-wieser commented Jul 26, 2019

@hmaarrfk: I have no idea what you are talking about, the patch you give there looks correct (although importlib is pointless, just use import). Please give an example of exactly what behavior is not intended from that patch, and show what the intended behavior would be.

@seberg
Copy link
Member

seberg commented Jul 26, 2019

@hmaarrfk if we deprecate it, all that would change is that you have to do from numpy import testing? I think that can be achieved, although I have not tried it.

@eric-wieser
Copy link
Member

eric-wieser commented Jul 26, 2019

I think it's worth a recap here that from numpy import testing expands to roughly

def _():  # function to avoid `numpy` being in `globals()`
    import numpy
    return numpy.testing
testing = _()

Ie, it does not mean import numpy.testing as testing. With the proposed __getattr__, the former would raise a warning, but the latter would not.

@seberg
Copy link
Member

seberg commented Jul 26, 2019

Right, sorry. In any case, IIRC the deprecation discussion was about Tester, not testing...

@hmaarrfk
Copy link
Contributor Author

I guess I never use

import numpy.testing as testing

I always thought it meant the exact same thing as

from numpy import testing

Though now I see that the first means

Find the file numpy/testing/__init__.py

While the second means

Find the attribute testing in numpy/__init__.py

@hmaarrfk
Copy link
Contributor Author

Do you all want to impose the first syntax on everybody?

@hmaarrfk
Copy link
Contributor Author

Right, sorry. In any case, IIRC the deprecation discussion was about Tester, not testing...

Importing Tester is implicitly importing the testing namespace

@eric-wieser
Copy link
Member

Do you all want to impose the first syntax on everybody?

Don't most people using it as np.testing anyway? If so, then a deprecation would just make them add import numpy.testing at the top of their file.

The other question is: what do we gain by deprecating vs leaving as a warning-free lazy import?

@hmaarrfk
Copy link
Contributor Author

I don't think we gain much.

That said. You can likely use the getattr deprecation warnings to finally get rid of oldnumeric and numarray

@hmaarrfk
Copy link
Contributor Author

Is this the behaviour you want:

(numpy) mark@mark-xps ~
$ python -c "import numpy.testing"
(numpy) mark@mark-xps ~
$ python -c "from numpy import testing"
-c:1: DeprecationWarning: Importing testing from numpy is deprecated and will be removed in numpy 1.22. Please use ``import numpy.testing as testing`` instead.
(numpy) mark@mark-xps ~
$ python -c "import numpy; numpy.testing"
-c:1: DeprecationWarning: Importing testing from numpy is deprecated and will be removed in numpy 1.22. Please use ``import numpy.testing as testing`` instead.
(numpy) mark@mark-xps ~

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer if this PR does not do any deprecations and limits to the speed up discussion. I also tend to think that there is not much of a point in deprecating testing at all. The other changes (in the testing files) should not be necessary afterwards.

I assume that you will get all the speedups (on new enough python) that you were looking for and it is a nice, fairly short, PR.

stacklevel=2)
return 'removed'

# This never gets triggered????
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try what np.asdfasdjfkwer gives you? (after import not during import)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did. I think I got it right, and I'll see if I can highlight a test that ensures that, or make a new one

" Please use ``import numpy.testing as testing`` instead.",
DeprecationWarning,
stacklevel=2)
return importlib.import_module("." + attr, __name__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Eric said, just use import, importlib is just harder to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I guess I just didn't understand the point of the PEP that had the importlib example

@hmaarrfk
Copy link
Contributor Author

I'll remove all the deprecation stuff. There was just some discussion about it and I wanted to make things explicit. The get attr does what we need.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might want a wider design for deprecating module level attributes before we do any deprecations. Perhaps a __deprecated_all__ member or something

test = PytestTester(__name__)
del PytestTester

if sys.version_info[0] >= 3 and sys.version_info[1] >= 7:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys.version_info[:2] >= (3, 7)

# on failure, this will produce alot of output, and potentially
# deadlock the call without a call to `communicate`
sub.communicate()
sub.wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just subprocess.check_call(...), which handles all this for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On failure, you would get some output that seems quite large, all looking like

  File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist
  File "/home/mark/git/numpy/numpy/__init__.py", line 198, in __getattr__
    from . import testing
  File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist
  File "/home/mark/git/numpy/numpy/__init__.py", line 198, in __getattr__
    from . import testing
  File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist
  File "/home/mark/git/numpy/numpy/__init__.py", line 198, in __getattr__
    from . import testing
  File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist
  File "/home/mark/git/numpy/numpy/__init__.py", line 198, in __getattr__
    from . import testing
  File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist
RecursionError: maximum recursion depth exceeded

I think you end up hitting

Warning The preexec_fn parameter is not safe to use in the presence of threads in your application. The child process could deadlock before exec is called. If you must use it, keep it trivial! Minimize the number of libraries you call into.

I think the python 2.7 docs explain it better than the 3.7 docs, and suggests doing what I implemented above
https://docs.python.org/2/library/subprocess.html#subprocess.check_output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output that seems quite large,

This output goes on the testing console, quite annoying. Therefore I opted to PIPE the output. I guess I could also send it to a tempfile???

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your goal is to discard it, check_call(stdout= subprocess.DEVNULL, stderr= subprocess.DEVNULL) should do the job .

I'd maybe argue that the output contains useful debugging information, and pytest will discard it unless it fails anyway - so why not keep it around and use check_call with no extra arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pytest is not catching the error output and it is going straight to the consule when that one specific test is run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, ok, i found a way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows seems to have issues with piping stderr to stdout. Since this will only output stuff during a failure, I think it is acceptable to leave it undefined without complicating the code. Hopefully the test is more clear now.

@hmaarrfk hmaarrfk force-pushed the import_time_tester branch 2 times, most recently from f876581 to 8ae72f5 Compare August 1, 2019 04:11
numpy/__init__.py Outdated Show resolved Hide resolved
@hmaarrfk
Copy link
Contributor Author

@eric-wieser I think I've got this one to a place where I'm happy. Not sure if you are OK with the tests and the other changes I had to do to get things to pass. Feel free to remove the WIP tag

@hmaarrfk hmaarrfk changed the title WIP: MAINT: Avoid importing tester. MAINT: Avoid importing tester. Aug 17, 2019
@seberg seberg removed the 25 - WIP label Aug 17, 2019
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo __dir__ question.



def __dir__():
return __all__ + ['Tester']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably also add testing? If there is a test for that somewhere for Tester, can you add this too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [3]: 'testing' in dir(numpy)                                                                          
Out[3]: True
In [4]: numpy.__version__                                                       
Out[4]: '1.17.0'

you are right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeError: module 'numpy.testing' has no attribute '__module__'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fixed up the tests. Hopefully that should address this concern.

@hmaarrfk hmaarrfk force-pushed the import_time_tester branch 2 times, most recently from 0303f21 to 26e613b Compare August 17, 2019 15:21
@hmaarrfk
Copy link
Contributor Author

seems like the failure isn't due to this PR right?

@mattip mattip requested a review from seberg August 22, 2019 06:02
@seberg seberg changed the title MAINT: Avoid importing tester. MAINT: Lazy import testing on python >=3.7 Aug 22, 2019
@seberg
Copy link
Member

seberg commented Aug 22, 2019

Thanks Mark, LGTM.

@seberg seberg merged commit 3ca0eb1 into numpy:master Aug 22, 2019
@hmaarrfk
Copy link
Contributor Author

happy this made it in!

@hmaarrfk hmaarrfk deleted the import_time_tester branch August 22, 2019 13:48
@hmaarrfk
Copy link
Contributor Author

Thank you all for the great discussion

Comment on lines +169 to +170
__all__.extend(['bool', 'int', 'float', 'complex', 'object', 'unicode',
'str'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a mistake to me - removed in #14881

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants