Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numba autojit _fisher_jenks_means if numba is available #829

Merged
merged 1 commit into from Jul 16, 2016

Conversation

mlyons-tcc
Copy link
Contributor

fixes 828. gets over 100x performance improvement for _fisher_jenks_means if numba is installed.

@sjsrey
Copy link
Member

sjsrey commented Jul 8, 2016

Thanks @mlyons-tcc - this looks really promising.

Can you add a test suite for this enhancement, in esda/tests/test_mapclassify.py ?

@ljwolf
Copy link
Member

ljwolf commented Jul 8, 2016

Yeah, this looks real cool @mlyons-tcc, thanks!

I've been looking at doing the same thing in cg, but haven't seen this kind of speedup. Jenks is a great target for this.

Locally, I go from ~10 seconds/run on the NAT HR90 attribute to ~80 ms.

@mlyons-tcc
Copy link
Contributor Author

Thanks. I tested out a Cython implementation that performs faster. Numba was on the same order of magnitude though, and I really like the autojit decorator. It's a shame that you have to jump through so many hoops to get Numba installed if you aren't using Conda. I'm not good enough at Cython to get Pure Python Mode working.

@sjsrey, there is already a test: TestFisherJenks that tests the algorithm.

If I understood correctly PySAL requires not depending on external packages other than numpy and scipy so I didn't force a numba dependency. I'm not experienced in using travis.yml scripts so I don't know the proper way to run tests with/without numba. It also seems weird to run the whole test suite twice for this one function with/without numba optimization.

When numba is installed, it just does an autojit on the one function. In my opinion, it's highly unlikely the function will not work properly when bypassing autojit if the autojit version is tested. However, it's possible for the autojit to break and not work despite the algo passing its test when autojit is bypassed. Because of that, I would recommend that I just add numba to the conda install section of the .travis.yml file.

Please advise.

@sjsrey
Copy link
Member

sjsrey commented Jul 8, 2016

I like this approach, but am thinking about two things: [1] whether we should use the same strategy more broadly throughout PySAL (where numba would pay off) and thus centralize the numba check a la this approach; and [2] how to modify our tests for the numba present and numba absent cases.

@ljwolf
Copy link
Member

ljwolf commented Jul 10, 2016

As @mlyons-tcc notes, it'd be real weird if autojit changed the results of the computation, but I understand the desire to test it just in case.

I'm working on setting up a test matrix to address this concern. Since we're going to need to do this anyway empower new soft dependencies, it's needed regardless of this contribution.

If I submit the test matrix PR, can we merge this?

@sjsrey
Copy link
Member

sjsrey commented Jul 10, 2016

+1

@sjsrey sjsrey merged commit ad96656 into pysal:dev Jul 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants