New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update scipy hook to support 1.0.0 revision #3048

Merged
merged 2 commits into from Mar 13, 2018

Conversation

Projects
None yet
2 participants
@mobiusklein
Contributor

mobiusklein commented Nov 29, 2017

This hook will add the new private scipy._lib C-extension modules to the archive, and when building on Windows, include the DLL bundle used by the official wheel provider stored in site-packages/scip/extra-dll

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

Test failures due to #3045.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

The Appveyor tests may also fail because they ran for over an hour, which seemed to be a hard limit for my account. When I forcibly reran the Python2.7 Library tests, it succeeded but took ~55 minutes.

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

That's unreasonable. The current test takes 16 minutes; that would mean that the scipy test alone takes ~40 minutes. If that's happening, something is wrong.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

I'm referring to the configuration with TEST_LIBRARIES=YES. In the other configuration, they do take about 15 minutes.

There aren't many timestamps in the Appveyor build-log, but here's the link:
https://ci.appveyor.com/project/mobiusklein/pyinstaller/build/job/epeufno6qja3k190

According to the "slowest tests" table at the end the scipy tests took around two minutes a piece, and there are only four scipy tests. It's running all the library tests, though, which is around 150 tests. If we assume that every test takes on average 15 seconds, we're at 37 minutes already. There were other worse offenders, like openpyxl and pandas which took almost 3 minutes.

@ghost

This comment has been minimized.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

I'm not sure I have a solid grasp of the state for each test, but given that the Travis builds ran in 30 minutes I'm tempted to go down a path that checks if the scipy hook is getting picked up on every test, but I'm not sure how to do that. There is a lot of runtime magic to trace through, and it's unclear how tests are isolated from each other (or the test harness program for that matter).

Does this sound like a reasonable line of inquiry given that I made so few changes, and if so, which module makes sense to start in?

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

Don't spend extra time on this now. I'm going to submit a PR to reduce the test time and then we'll see where we are.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

I wrote two test scripts, one that imported scipy and one that did not and ran PyInstaller on them. Both executables bundled the scipy DLLs.

I see now that I didn't realize that by adding that hook, it would be executed every time PyInstaller ran, not just every time a script with scipy was analyzed. So those DLLs were added to every test app on Windows, not just the scipy tests' apps.

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

Wait, what was the test app that did not import SciPy? What is the exact script?

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

Here is the test script with scipy

import numpy as np
from scipy import linalg
from scipy.optimize import leastsq

print(linalg.pinv(np.eye(100)))


def gaussian(x, m, s2):
    return 1. / np.sqrt(np.pi * s2 * 2) * np.exp((-(x - m) ** 2) / (2 * s2))


def generate_gaussian(x, m, s2):
    return gaussian(x, m, s2)


x = np.linspace(-10, 10)
y = generate_gaussian(x, 0.5, 5)


def fit_gaussian(params, x, y):
    m, s2 = params
    yhat = gaussian(x, m, s2)
    return y - yhat


params, _ = leastsq(fit_gaussian, (0, 1), (x, y))

print(params)

Here is the test script without scipy

import numpy as np


def gaussian(x, m, s2):
    return 1. / np.sqrt(np.pi * s2 * 2) * np.exp((-(x - m) ** 2) / (2 * s2))


def generate_gaussian(x, m, s2):
    return gaussian(x, m, s2)


x = np.linspace(-10, 10)
y = generate_gaussian(x, 0.5, 5)

print(y.mean(), y.sd())
# package the DLL bundle that official scipy wheels for Windows ship
if "windows" in platform.platform().lower():
dll_path = os.path.join(os.path.dirname(scipy.__file__), 'extra-dll', "*.dll")

This comment has been minimized.

@ghost

ghost Nov 29, 2017

This should be

from PyInstaller.utils.hooks import get_module_file_attribute
get_module_file_attribute('scipy')
@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

It seems not all scripts fire the scipy hook. A script with just the standard library didn't trigger that hook. I would have expected scipy to trigger including numpy, but not the other way around.

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

Can you post the trace log as a gist? It's pyinstaller --log-level=TRACE IIRC.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

This is where scipy is coming from:

7848 TRACE: _safe_import_hook 'scipy' SourceModule('numpy.testing.nosetester', 'C:\\Users\\Owner\\Anaconda2\\envs\\cython-debug\\lib\\site-packages\\numpy\\testing\\nosetester.py') None 0
@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

That makes sense. The file has been moved in HEAD, but is still there in the released version.

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

What we need to do is add a hook-numpy with excludedimports = collect_submodules('numpy.testing').

# don't bundle the testing module as this will cause
# `scipy` to be pulled in unintentionally
excludedimports = ['numpy.testing']

This comment has been minimized.

@ghost

ghost Nov 29, 2017

Sorry, this should be:

from PyInstaller.utils.hooks import collect_submodules
excludedimports = collect_submodules('numpy.testing')

If I understand correctly.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

It appears that after excluding numpy.testing, using either mode of expression, numpy becomes unimportable:

Traceback (most recent call last):
  File "C:\Users\Owner\Dev\testing\pyinstaller-scipy\other-script.py", line 1, in <module>
    import numpy as np
  File "C:\Users\Owner\Anaconda2\envs\cython-debug\Lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 396, in load_module
    exec(bytecode, module.__dict__)
  File "C:\Users\Owner\Anaconda2\envs\cython-debug\lib\site-packages\numpy\__init__.py", line 142, in <module>
    from . import add_newdocs
  File "C:\Users\Owner\Anaconda2\envs\cython-debug\Lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 396, in load_module
    exec(bytecode, module.__dict__)
  File "C:\Users\Owner\Anaconda2\envs\cython-debug\lib\site-packages\numpy\add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "C:\Users\Owner\Anaconda2\envs\cython-debug\Lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 396, in load_module
    exec(bytecode, module.__dict__)
  File "C:\Users\Owner\Anaconda2\envs\cython-debug\lib\site-packages\numpy\lib\__init__.py", line 8, in <module>
    from .type_check import *
  File "C:\Users\Owner\Anaconda2\envs\cython-debug\Lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 396, in load_module
    exec(bytecode, module.__dict__)
  File "C:\Users\Owner\Anaconda2\envs\cython-debug\lib\site-packages\numpy\lib\type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "C:\Users\Owner\Anaconda2\envs\cython-debug\Lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 396, in load_module
    exec(bytecode, module.__dict__)
  File "C:\Users\Owner\Anaconda2\envs\cython-debug\lib\site-packages\numpy\core\__init__.py", line 74, in <module>
    from numpy.testing.nosetester import _numpy_tester
ImportError: No module named testing.nosetester
[9384] Failed to execute script other-script

I'll have to re-enable it for now.

The offending line: https://github.com/numpy/numpy/blob/master/numpy/__init__.py#L151

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

Oh well. But would you mind modifying the numpy hook to match the scipy hook? These DLLs will be in the next numpy release as well.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

Sure.

Since numpy drags scipy in, and numpy is used by many, many big C extension libraries, it follows that the scipy hook is firing more often than I expected. I'm still not sure how that translates to nearly doubling the runtime of the test suite beyond adding a few dozen DLLs to probe on a file system walk. Does PyInstaller do more with DLLs than copy them around?

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

Actually, since they're added to binaries, they're run through pefile to inspect DLL dependencies. That would explain most of the extra time.

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

@htgoebel What's the best way to solve this?

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

Thank you for helping me debug this in real-time. I've only written hooks for my own libraries by trial-and-error. I'll have to pick this up again tomorrow.

@htgoebel

Thanks for this pull-requests. I have some clean-up requests. Some of the comments for the numpy hook are valid for the scipy-hook, too.

When you finished, please cleanup the commits. There should be two: one for each hook. Please also follow the commit-message guide (https://pyinstaller.readthedocs.io/en/latest/development/commit-messages.html#commit-messages) or have a look at other "Add hook" commits.

Thanks.

binaries = []
# package the DLL bundle that official scipy wheels for Windows ship
if "windows" in platform.platform().lower():

This comment has been minimized.

@htgoebel

htgoebel Nov 29, 2017

Member

Please use PyInstaller.compat.is_win

# package the DLL bundle that official scipy wheels for Windows ship
if "windows" in platform.platform().lower():
dll_path = os.path.join(os.path.dirname(get_module_file_attribute('numpy')), 'extra-dll', "*.dll")

This comment has been minimized.

@htgoebel

htgoebel Nov 29, 2017

Member

I suggest naming this dll_glob sind it is no path, but a glob.

@@ -0,0 +1,15 @@
import os

This comment has been minimized.

@htgoebel

htgoebel Nov 29, 2017

Member

Copyright header missing.

This comment has been minimized.

@ghost

ghost Dec 5, 2017

@mobiusklein Did you see this feedback?

import platform
from PyInstaller.utils.hooks import collect_submodules, get_module_file_attribute
# if we bundle the testing module, this will cause

This comment has been minimized.

@htgoebel

htgoebel Nov 29, 2017

Member

This "If wee bundle" sounds crooked. I suggest rewording into somethink like: "Including the testing module, this would …"

# if we bundle the testing module, this will cause
# `scipy` to be pulled in unintentionally but numpy imports
# numpy.testing at the top level for historical reasons.
# excludedimports = collect_submodules('numpy.testing')

This comment has been minimized.

@htgoebel

htgoebel Nov 29, 2017

Member

Why is this line commented out? Looks reasonable to me.

This comment has been minimized.

@mobiusklein

mobiusklein Dec 5, 2017

Contributor

numpy/__init__.py imports numpy.testing and instantiates a pair of benchmarking objects for historical reasons.

binaries = []
# package the DLL bundle that official scipy wheels for Windows ship

This comment has been minimized.

@htgoebel

htgoebel Nov 29, 2017

Member

scipy? THis is the numpy hook?!

import platform
binaries = []

This comment has been minimized.

@htgoebel

htgoebel Nov 29, 2017

Member

Blank line too much.

@@ -0,0 +1,15 @@
from PyInstaller.utils.hooks import get_module_file_attribute

This comment has been minimized.

@htgoebel

htgoebel Nov 29, 2017

Member

Blank line. Please also move the PyInstaller import to the end. BTW: platform will no be required when using compat.is_win.

@@ -0,0 +1,15 @@
import os
import platform

This comment has been minimized.

@htgoebel

htgoebel Nov 29, 2017

Member

platform will no be required when using compat.is_win.

@htgoebel

This comment has been minimized.

Member

htgoebel commented Nov 29, 2017

What we need to do is add a hook-numpy with
excludedimports = collect_submodules('numpy.testing').

excludedimports = ['numpy.testing']

should be okay, given that this is the only module of the testing package imported superfluously.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Nov 29, 2017

I can make these changes, but the tail end of the exploration @xoviat and I did found that we must include numpy.testing because numpy.__init__ imports it for historical reasons. This means that any time we include numpy we also include scipy automatically.

This may explain why the tests are taking so long on Windows, since numpy will be pulled into at least 8 other tests directly, and a difficult to count number indirectly.

Since this adds ~30 DLLs to the bundle that all have to be inspected as @xoviat mentioned, it's unclear if there is an alternative solution that avoids this expensive step. Furthermore, it sounds like these same DLLs will be put in the numpy wheels in the future, meaning that even if we somehow break this linkage between numpy and scipy, in practice people building executables with scipy on Windows will have to analyze 60 DLLs, which could make build times even longer. They may even conflict and send us spiraling into DLL-Hell.

@ghost

This comment has been minimized.

ghost commented Nov 29, 2017

One solution may be to patch numpy with a runtime hook.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Dec 5, 2017

Yes, I just screwed up trying to get rid of all those commits. I'm about to start adding the properly committed files if I haven't destroyed things.

@ghost

This comment has been minimized.

ghost commented Dec 5, 2017

Wow, those commits are gone for good! Hopefully you can recover the changes.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Dec 5, 2017

I have the files backed up. I've never tried to rewrite published history before.

@ghost

This comment has been minimized.

ghost commented Dec 5, 2017

It looks like you're even with develop, so you can just copy and paste each file from your backup into the repository and commit. Simple but effective.

@mobiusklein mobiusklein reopened this Dec 5, 2017

@ghost

This comment has been minimized.

ghost commented Dec 5, 2017

Did you see some of the feedbacck in the review above? Particularly the copyright header?

@mobiusklein mobiusklein force-pushed the mobiusklein:develop branch from 66f4bfe to d664c90 Dec 5, 2017

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Dec 5, 2017

I appear to have added it to the scipy hook, but not the NumPy one. Fortunately, this I could just --amend.

@ghost

ghost approved these changes Dec 5, 2017

Not perfect, but good enough for me.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Dec 5, 2017

Nevermind, the backed up files were the wrong version. I'll get it right tomorrow.

@ghost

This comment has been minimized.

ghost commented Dec 5, 2017

If you want to go for another round, please make sure that the files comply with flake8. Particularly the lines are < 80 characters.

# isolated to non-Windows platforms. See also: scipy/scipy#5461
scipy==0.19.1; sys_platform != 'win32'
# SciPy now provides binary wheels for Windows
scipy==1.0.0

This comment has been minimized.

@ghost

ghost Dec 6, 2017

Also, probably should be ;python_version != 3.3.

@ghost

This comment has been minimized.

ghost commented Dec 6, 2017

Also, I don't think we can add scipy to requirements.txt due to the significant increase in test time.

Hooks: Add hook for scipy 1.0.0
* Added hidden imports for library-wide C-extensions
* Added conditional inclusion of DLL bundle included in
  official Windows wheel

@mobiusklein mobiusklein force-pushed the mobiusklein:develop branch from d664c90 to f9f79dd Dec 7, 2017

@htgoebel htgoebel added the hooks label Dec 7, 2017

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Dec 7, 2017

Hopefully I've addressed the issues in the format of the hooks and the commits. I've held off on updating the test requirements until discussion confirms whether they should be added.

I don't think it's a big time cost for non-Windows builds, but it might be worth noting that the hooks for pandas, h5py, and matplotlib definitely pull in numpy, and openpyxl and PyQT may pull in numpy. That in turn pulls in scipy.

Hooks: add hook for upcoming numpy build
* Added conditional inclusion of DLL bundle planned
  to be included in an upcoming official Windows wheel
* Noted an unresolvable coupling between numpy and scipy

@mobiusklein mobiusklein force-pushed the mobiusklein:develop branch from f9f79dd to 5452ba4 Dec 7, 2017

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Dec 8, 2017

@xoviat , @htgoebel tests appear to be passing, and I've addressed the stylistic issues. Are there more changes I should make to get the minimum functionality accepted? I've omitted the test requirements update as per @xoviat 's comment.

@ghost

ghost approved these changes Dec 13, 2017

LGTM. It's always possible to avoid the SciPy inclusion with a clean virtualenv, and I don't think this should be blocked on that.

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Jan 3, 2018

@htgoebel Did I make the changes you requested?

@mobiusklein

This comment has been minimized.

Contributor

mobiusklein commented Mar 9, 2018

@htgoebel I'm not sure what else is needed in order for this to be merged. Did I miss something?

@htgoebel htgoebel merged commit 9423db6 into pyinstaller:develop Mar 13, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@htgoebel htgoebel added this to the PyInstaller 3.4 milestone Mar 13, 2018

@htgoebel htgoebel referenced this pull request Mar 15, 2018

Closed

Scipy 1.0.0 hook #2987

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment