Rework the tensorflow hook to improve compatibility across different tensorflow versions #46

rokm · 2020-09-13T16:45:04Z

This PR reworks the tensorflow hook to improve compatibility across different tensorflow versions, with aim to fix issues such as pyinstaller/pyinstaller#5002.

Currently, the tensorflow hook collects the submodules from tensorflow_core in a rather roundabout way: by specifying tensorflow_core as a hidden import, and relying on this to pull in tensorflow_core.python. The hook for tensorflow_core.python is the one that actually collects data and submodules from tensorflow_core.

This mechanism is brittle, and seems to break on:

tensorflow 1.15.x: due to import rerouting between tensorflow_core and tensorflow, the tensorflow_core.python is not pulled in
tensorflow 2.2.0+: the tensorflow_core module has been removed, and its contents now reside within tensorflow itself.

To fix this issue and improve compatibility across different versions of tensorflow, the tensorflow_core.python hook is removed, and tensorflow hook is modified to collect data and submodules from either tensorflow or tensorflow_core, depending on version:

1.14.x and earlier: collect from tensorflow
1.15.x - 2.1.x: collect from tensorflow_core
2.2x and later: collect from tensorflow

The PR also adds two new tests for verifying that basic functionality of tensorflow actually works, and aim to uncover submodule-not-found errors that are not picked up by the existing (import) test.

In addition, all three tensorflow tests in onefile mode are now skipped, because the latest versions of tensorflow are quite large (~1.5 GB), and the onefile tests sometimes timeout due to compression/decompression times.

Added two new tests for tensorflow in order to verify that at least the basic tensorflow functionality is both importable and functional: * a basic use of a Conv2D layer * a basic model training and validation on MNIST dataset

…fferent versions Clean and up and rework the tensorflow hook to improve compatibility across different tensorflow versions. Currently, the `tensorflow` hook collects the submodules from `tensorflow_core` in a rather roundabout way: by specifying `tensorflow_core` as a hidden import, and relying on this to pull in `tensorflow_core.python`. The hook for `tensorflow_core.python` is the one that actually collects data and submodules from `tensorflow_core`. This mechanism is brittle, and seems to break on: * tensorflow 1.15.x: due to import rerouting between `tensorflow_core` and `tensorflow`, the `tensorflow_core.python` is not pulled in * tensorflow 2.2.0+: the `tensorflow_core` module has been removed, and its contents now reside within `tensorflow` itself. To fix this issue and improve compatibility across different versions of tensorflow, the `tensorflow_core.python` hook is removed, and `tensorflow` hook is modified to collect data and submodules from either `tensorflow` or `tensorflow_core`, depending on version: * 1.14.x and earlier: collect from `tensorflow` * 1.15.x - 2.1.x: collect from `tensorflow_core` * 2.2x and later: collect from `tensorflow`

Skip tensorflow tests in modes other than onedir mode. With recent tensorflow versions, the resulting dist size could be up to 1.5 GB, which may cause onefile tests to timeout due to relatively long compression and decompression time.

src/_pyinstaller_hooks_contrib/tests/test_libraries.py

bwoodsend

This is really good work. Only issue I see is you forgot the changelog entry 😝 :.

bwoodsend · 2020-09-13T17:27:31Z

Ohh and there's that linter to appease. God forbid we allow a 95 character long line into our repository...

bwoodsend · 2020-09-13T17:36:11Z

Oh, one genuine problem: You forgot to add tensorflow to the test-requirement.txt so your tests won't be run on CI or by any user unless they already have tensorflow.

Although, in the interest of keeping the CI reasonably quick, I might be tempted to leave it this way. I believe we've already run these tests on all platforms? Is there any need to keep re-running them? @rokm @Legorooj Thoughts on this?

rokm · 2020-09-13T18:04:26Z

Thanks for the feedback!

Yeah, I figured I'll add the changelog entry after the linter is appeased... (and to add insult to injury, the tests file already contains lines longer than the ones I added here :))

Two questions:

Is there a better way to compare versions in the hook file than using is_module_satisfies repeatedly?
If not, I was planning to work around the linter by defining four booleans:

tf_pre_1_15_0 = is_module_satisfies("tensorflow < 1.15.0")
tf_post_1_15_0 = is_module_satisfies("tensorflow >= 1.15.0")
tf_pre_2_0_0 = is_module_satisfies("tensorflow < 2.0.0")
tf_pre_2_2_0 = is_module_satisfies("tensorflow < 2.2.0")

if tf_pre_1_15_0:
    ...

and using those within the if/else clauses. Would that be OK?

For the added tests, would it make more sense to put them in the external scripts instead of having them inline? It might be nicer to break lines in a stand-alone script than within the string...

As for requirements-test.txt, this means that the existing import test wasn't being run, either. We could add it, but it would probably mean testing only against one version of tensorflow. In order to test all different versions, we would need a setup like the one you used last time (different platforms, different python versions, different tf versions) - but I imagine this is infeasible in the grand scheme of things.

On the other hand, with tensorflow, people will likely be using specific versions instead of the latest one due to incompatibilities and the fact that academic code usually doesn't get maintained after the paper publication...

bwoodsend · 2020-09-13T20:13:32Z

Is there a better way to compare versions in the hook file than using is_module_satisfies repeatedly?

Umm, well we can circumvent PyInstaller slightly:

from pkg_resources import get_distribution, Requirement
tf_version = get_distribution("tensorflow").version

Then bizarrely you use in to test version constraints. And you still have to write package>1.15 although the name itself is ignored:

tf_version in Requirement.parse("x>1.15")

To be honest, I'd go with what you were suggesting, just because it's less confusing to read.

the existing import test wasn't being run

That's reassuring 🤣. And you're right in that we can only test one version with our current CI setup. I say we leave this one out then. If/when a significantly different update hits either PyInstaller or tensorflow then we can do another 1-shot bulk test again. I'll be sure not to delete that branch - although I have big plans for our CI to make that kind custom of one-off testing more accessible.

would it make more sense to put them in the external scripts
I'd personally prefer all the non-trivial (not just import xxx) test scripts were separate files. That test_packages.py is going to end up enormous otherwise. @Legorooj What do you think?

I think your last point is an important one. Normally we'd be able to forget about much older versions of packages but I guess in this case we can't. Ohh well...

Legorooj · 2020-09-13T23:09:50Z

Ignore the linter failures - I've been meaning to adjust the limit to 100 chars.

A minor cleanup; use `is_module_satisfies()` to define version related booleans, and then use those in the if/else blocks to make them easier to read.

Also grouped the tensorflow tests together in the test_libraries.py

rokm · 2020-09-14T11:24:12Z

OK, I moved those is_module_satisfies from if/else block into booleans, and moved the tests into separate scripts. I ended up going with the decorator as well, because with test bodies simplified, it looks better this way.

news/46.update.rst

bwoodsend

Well, I'm happy. @Legorooj Ready to merge?

Legorooj

Nice work!

rokm added 3 commits September 13, 2020 18:29

tests: added two new tests for tensorflow

10b54be

Added two new tests for tensorflow in order to verify that at least the basic tensorflow functionality is both importable and functional: * a basic use of a Conv2D layer * a basic model training and validation on MNIST dataset

tests: perform tensorflow tests only in onedir mode

1d9cbbd

Skip tensorflow tests in modes other than onedir mode. With recent tensorflow versions, the resulting dist size could be up to 1.5 GB, which may cause onefile tests to timeout due to relatively long compression and decompression time.

rokm requested review from a team and bwoodsend and removed request for a team September 13, 2020 16:45

rokm marked this pull request as draft September 13, 2020 16:45

bwoodsend reviewed Sep 13, 2020

View reviewed changes

src/_pyinstaller_hooks_contrib/tests/test_libraries.py Outdated Show resolved Hide resolved

bwoodsend requested changes Sep 13, 2020

View reviewed changes

rokm added 5 commits September 14, 2020 12:22

hooks: tensorflow: minimize the use of is_module_satisfies()

1dd6888

A minor cleanup; use `is_module_satisfies()` to define version related booleans, and then use those in the if/else blocks to make them easier to read.

tests: moved the two complex tensorflow tests to external script files

dacd15b

Also grouped the tensorflow tests together in the test_libraries.py

tests: use a decorator to skip tensorflow tests in non-onedir mode

c8820b7

Added news fragment for the PR.

bef1275

hooks: tensorflow: appease linter

d9ca8a9

rokm marked this pull request as ready for review September 14, 2020 11:24

bwoodsend reviewed Sep 14, 2020

View reviewed changes

news/46.update.rst Outdated Show resolved Hide resolved

Updated the news fragment [skip-ci]

1a3a083

bwoodsend approved these changes Sep 14, 2020

View reviewed changes

Legorooj approved these changes Sep 14, 2020

View reviewed changes

bwoodsend merged commit 15632f4 into pyinstaller:master Sep 15, 2020

rokm deleted the tensorflow2 branch September 16, 2020 08:53

harshithdwivedi mentioned this pull request Sep 20, 2020

Unable to package Tensorflow 2.3 #49

Closed

bwoodsend mentioned this pull request Sep 23, 2020

'tensorflow_core.python' not found in tensorflow 1.15.0 pyinstaller/pyinstaller#5002

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework the tensorflow hook to improve compatibility across different tensorflow versions #46

Rework the tensorflow hook to improve compatibility across different tensorflow versions #46

rokm commented Sep 13, 2020

bwoodsend left a comment

bwoodsend commented Sep 13, 2020

bwoodsend commented Sep 13, 2020

rokm commented Sep 13, 2020

bwoodsend commented Sep 13, 2020

Legorooj commented Sep 13, 2020

rokm commented Sep 14, 2020 •

edited

bwoodsend left a comment

Legorooj left a comment

Rework the tensorflow hook to improve compatibility across different tensorflow versions #46

Rework the tensorflow hook to improve compatibility across different tensorflow versions #46

Conversation

rokm commented Sep 13, 2020

bwoodsend left a comment

Choose a reason for hiding this comment

bwoodsend commented Sep 13, 2020

bwoodsend commented Sep 13, 2020

rokm commented Sep 13, 2020

bwoodsend commented Sep 13, 2020

Legorooj commented Sep 13, 2020

rokm commented Sep 14, 2020 • edited

bwoodsend left a comment

Choose a reason for hiding this comment

Legorooj left a comment

Choose a reason for hiding this comment

rokm commented Sep 14, 2020 •

edited