Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace native code with regular functions #265

Merged
merged 1 commit into from Dec 8, 2017

Conversation

amitdo
Copy link
Contributor

@amitdo amitdo commented Nov 27, 2017

Use numpy.einsum() for sumouter and sumprod functions.

Use numpy.einsum() for sumouter and sumprod functions.
@nickjwhite
Copy link

Looks good. I wonder whether the native code was originally used because these functions were slow through python? I suspect not, and this is a worthwhile change.

I actually just found a bug with the native compiler today, as '-fopenmp' wasn't a defined flag on the OSX compiler I was using. But this fixes it in a nicer way :)

If I'm not mistaken, this is the only place the nutils are used, so why not remove ocrolib/nutils.py and ocrolib/native.py too?

@amitdo
Copy link
Contributor Author

amitdo commented Nov 27, 2017

Hi Nick!

I wonder whether the native code was originally used because these functions were slow through python?

I don't know... :-)

OpenMP
https://github.com/tmbdev/ocropy/blob/master/README_OSX.md

There is no actual OpenMP 'code' in nutils, so the '-fopenmp' flag is useless.

If I'm not mistaken, this is the only place the nutils are used, so why not remove ocrolib/nutils.py and ocrolib/native.py too?

Originally, I removed that code, but for this PR I decided to keep it. Later, it can be removed or moved to 'OLD'.

@amitdo
Copy link
Contributor Author

amitdo commented Nov 27, 2017

In my mini benchmarks the new functions are a few times faster than the native ones in nutils.

I ran ocropus-rtrain with '-N 1000' on uw3-500.tgz and it looks like there is a ~20% improvement in speed for training. I think it's a good idea that someone else will verify this in his machine.

@amitdo
Copy link
Contributor Author

amitdo commented Nov 27, 2017

In lstm.py there are another sumouter and sumprod functions. The sumprod in lstm.py is not used anywhere. The sumouter is used in the softmax layer. For some reason, it operates on lists instead on numpy ndarrays.

@nickjwhite
Copy link

In my mini benchmarks the new functions are a few times faster than the native ones in nutils.

I just ran a few small benchmarks myself, which are consistent with your findings. My first one was with ocropus-rtrain -N 100 uw3-500.tgz, and found exactly 20% improvement in speed, as you did. I then compared ocropus-rtrain -N 100 with some private training data I've been working with, with about a 12% improvement in speed. So I think it's safe to say that this pull request isn't going to lose us speed :)

In lstm.py there are another sumouter and sumprod functions. The sumprod in lstm.py is not used anywhere. The sumouter is used in the softmax layer. For some reason, it operates on lists instead on numpy ndarrays.

Good spot. I say remove sumprod, and rewrite so there's only one sumouter, later, in another pull request, maybe at the same time as removing the nutils.py and native.py stuff.

@kba
Copy link
Collaborator

kba commented Nov 29, 2017

Looks good, thanks!

Good spot. I say remove sumprod, and rewrite so there's only one sumouter, later, in another pull request, maybe at the same time as removing the nutils.py and native.py stuff.

Also a good prospect, if it isn't necessary to compile on the fly anymore. We can then drop the OSX-specific instructions as well and reduce confusion about compiler versions and .pynative directory etc.

@kba kba merged commit ed5b545 into ocropus-archive:master Dec 8, 2017
@kba
Copy link
Collaborator

kba commented Dec 8, 2017

Thank you @amitdo. If you want to start a PR with the rest of your dropnative branch changes, feel free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants