numba autojit _fisher_jenks_means if numba is available #829

mlyons-tcc · 2016-07-08T17:19:41Z

fixes 828. gets over 100x performance improvement for _fisher_jenks_means if numba is installed.

sjsrey · 2016-07-08T17:46:48Z

Thanks @mlyons-tcc - this looks really promising.

Can you add a test suite for this enhancement, in esda/tests/test_mapclassify.py ?

ljwolf · 2016-07-08T17:53:44Z

Yeah, this looks real cool @mlyons-tcc, thanks!

I've been looking at doing the same thing in cg, but haven't seen this kind of speedup. Jenks is a great target for this.

Locally, I go from ~10 seconds/run on the NAT HR90 attribute to ~80 ms.

mlyons-tcc · 2016-07-08T19:08:24Z

Thanks. I tested out a Cython implementation that performs faster. Numba was on the same order of magnitude though, and I really like the autojit decorator. It's a shame that you have to jump through so many hoops to get Numba installed if you aren't using Conda. I'm not good enough at Cython to get Pure Python Mode working.

@sjsrey, there is already a test: TestFisherJenks that tests the algorithm.

If I understood correctly PySAL requires not depending on external packages other than numpy and scipy so I didn't force a numba dependency. I'm not experienced in using travis.yml scripts so I don't know the proper way to run tests with/without numba. It also seems weird to run the whole test suite twice for this one function with/without numba optimization.

When numba is installed, it just does an autojit on the one function. In my opinion, it's highly unlikely the function will not work properly when bypassing autojit if the autojit version is tested. However, it's possible for the autojit to break and not work despite the algo passing its test when autojit is bypassed. Because of that, I would recommend that I just add numba to the conda install section of the .travis.yml file.

Please advise.

sjsrey · 2016-07-08T22:39:09Z

I like this approach, but am thinking about two things: [1] whether we should use the same strategy more broadly throughout PySAL (where numba would pay off) and thus centralize the numba check a la this approach; and [2] how to modify our tests for the numba present and numba absent cases.

ljwolf · 2016-07-10T21:24:47Z

As @mlyons-tcc notes, it'd be real weird if autojit changed the results of the computation, but I understand the desire to test it just in case.

I'm working on setting up a test matrix to address this concern. Since we're going to need to do this anyway empower new soft dependencies, it's needed regardless of this contribution.

If I submit the test matrix PR, can we merge this?

sjsrey · 2016-07-10T22:13:55Z

+1

numba autojit _fisher_jenks_means if numba is available

2186705

mlyons-tcc mentioned this pull request Jul 8, 2016

Fisher_Jenks pure python implementation is too slow #828

Closed

This was referenced Jul 12, 2016

[REBASED] PySAL+ optional testing matrix #830

Closed

PySAL+ optional testing matrix #833

Closed

sjsrey merged commit ad96656 into pysal:dev Jul 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numba autojit _fisher_jenks_means if numba is available #829

numba autojit _fisher_jenks_means if numba is available #829

mlyons-tcc commented Jul 8, 2016

sjsrey commented Jul 8, 2016

ljwolf commented Jul 8, 2016

mlyons-tcc commented Jul 8, 2016

sjsrey commented Jul 8, 2016

ljwolf commented Jul 10, 2016

sjsrey commented Jul 10, 2016

numba autojit _fisher_jenks_means if numba is available #829

numba autojit _fisher_jenks_means if numba is available #829

Conversation

mlyons-tcc commented Jul 8, 2016

sjsrey commented Jul 8, 2016

ljwolf commented Jul 8, 2016

mlyons-tcc commented Jul 8, 2016

sjsrey commented Jul 8, 2016

ljwolf commented Jul 10, 2016

sjsrey commented Jul 10, 2016