Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Remove unintended session initialization on TF backend #8377
A recent PR #8021 added a check at the beginning of the tensorflow_backend.py to determine the number of available GPUs. Unfortunately the
Moreover on commit 9166733 line 174, if the
NOTE: The unit-tests passed on my machine but I've been having issues with timeouts today on Travis. Please check the code, don't merge directly. Happy to update the PR if necessary based on feedback.
Let's agree not to merge until the tests on Travis are fixed. It's way better to close the PR than merge something that is "looks correct" but renders the tests on Travis unusable.
I just got home I'll test the codebase with a profiler on a different machine. I'm more than happy to close the PR, send a new one or let someone have another go. At this point, consider that this PR sketches a potential solution but not necessarily the best one. :)
It's the test_inceptionresnetv2_notop test
Running it on itsown works fine
In a small VM with 1 CPU it takes 218 seconds but it does finish. So it seems the test_inceptionresnetv2_notop is stalling.
My guess is that the 3 reported issues are connected. By removing the GPU count from backend the session is no longer initialized on import but rather on the first time we call
I patched the damn thing...
Apparently requesting for the available devices is VERY costly. This is why @TimZaman in his original PR he uses a cache variable which is populated only once. In a hindsight, I should have understood this earlier... This is the reason the unit-tests were stalling. Anyway, the only thing that needed to change on the original PR is where we actually populate the variable and how.
The initialization happens the first time the
The Travis tests pass, the PR is complete from my side, so if there are no objections please merge. Also if possible I would suggest to create a hotfix release for this. It would be particularly useful for those who do distributed predictions using Keras + Spark.
Reintroducing the lazy device loading, and swapping out
Not sure how the other code you added regarding