New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace native code with regular functions #265
Conversation
Use numpy.einsum() for sumouter and sumprod functions.
Looks good. I wonder whether the native code was originally used because these functions were slow through python? I suspect not, and this is a worthwhile change. I actually just found a bug with the native compiler today, as '-fopenmp' wasn't a defined flag on the OSX compiler I was using. But this fixes it in a nicer way :) If I'm not mistaken, this is the only place the nutils are used, so why not remove |
Hi Nick!
I don't know... :-) OpenMP There is no actual OpenMP 'code' in nutils, so the '-fopenmp' flag is useless.
Originally, I removed that code, but for this PR I decided to keep it. Later, it can be removed or moved to 'OLD'. |
In my mini benchmarks the new functions are a few times faster than the native ones in nutils. I ran ocropus-rtrain with '-N 1000' on uw3-500.tgz and it looks like there is a ~20% improvement in speed for training. I think it's a good idea that someone else will verify this in his machine. |
In lstm.py there are another sumouter and sumprod functions. The sumprod in lstm.py is not used anywhere. The sumouter is used in the softmax layer. For some reason, it operates on lists instead on numpy ndarrays. |
I just ran a few small benchmarks myself, which are consistent with your findings. My first one was with ocropus-rtrain -N 100 uw3-500.tgz, and found exactly 20% improvement in speed, as you did. I then compared ocropus-rtrain -N 100 with some private training data I've been working with, with about a 12% improvement in speed. So I think it's safe to say that this pull request isn't going to lose us speed :)
Good spot. I say remove sumprod, and rewrite so there's only one sumouter, later, in another pull request, maybe at the same time as removing the |
Looks good, thanks!
Also a good prospect, if it isn't necessary to compile on the fly anymore. We can then drop the OSX-specific instructions as well and reduce confusion about compiler versions and .pynative directory etc. |
Thank you @amitdo. If you want to start a PR with the rest of your dropnative branch changes, feel free. |
Use numpy.einsum() for sumouter and sumprod functions.