Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Optimize Cython code. Use scipy blas function pointers. #1524

Merged
merged 4 commits into from Apr 6, 2014

Conversation

Projects
None yet
6 participants
@jseabold
Copy link
Member

commented Mar 27, 2014

Didn't have to change any of the build setup which is nice. Tested building on windows for Python 2.7 32- and 64-bit without problems.

The old mailing list example from 9/13 is now 10-30x faster.

Since the Z matrix in the univariate ARMA case is always [[1, 0, ..., 0]]. We now avoid writing general KF code and just use slicing to select what we want.

There's probably a bit more performance that can be squeezed out here. We can us PyArray_DATA rather than memoryviews for the arrays that don't need to be sliced often. There are also still a few cases where we use dgemm when we don't need to. E.g., we're taking the outer product of two r x 1 vectors, but I left these as dgemm. The initial variance estimation is also still pure Python because I didn't want to rewrite it. This might gain us the most, but it's only called once per hit. There might also be some more ways to avoid the temporary arrays, but my thinking in BLAS linear algebra needs some work.

Need to add a note to the release about this.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 27, 2014

just to understand: this is a rewrite of #1069 where you replaced the lapack/blas part by using the scipy pointers ?

Do we have anyone to test on Apple? (given Pauli's comments about "funny" linear algebra libraries)

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Mar 27, 2014

Yes, it's a rewrite. We don't use any of the problematic headers from Apple. They're mostly only for single precision and we use double everywhere. I made a note about it at https://github.com/ChadFulton/pykalman_filter/issues/1

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 27, 2014

sounds good, one worry less.

However it would still be good if an Apple user could test this before merge.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Mar 27, 2014

We don't use sdot or any of the other problematic functions that Pauli pointed out. AFAIK, it's a non-issue if you're using scipy >= 0.13.x too or a non-accelerate BLAS. If we want to support older scipy and use these problematic functions, then we'll need to do something or tell people not to use the borked Accelerate BLAS.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Mar 27, 2014

We need to bump the Cython version on Travis. Cf. #1523.

@bashtage

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2014

Ubuntu 14.04 LTS will have SciPy 0.13.1, which makes it a little easier to choose a newer version.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2014

The current issue is that CObject does not exist in Python >= 3.2. They switched to PyCapsule. PyCapsule exists in Python >= 2.7, but as long as we target 2.6, then we can't use the easy fix.

We need some kind of conditional header file. It looks like NumPy provides this in core/include/numpy/npy_3kcompat.h, but I can't get it to build on Python 3 for some reason. I have a message out to the cython-user list. If I don't get any help there, then I'll have another look at this when I have a few hours.

@ev-br

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2014

(Probably irrelevant) might want to use dsymm instead of dgemm here and there

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2014

Yes, P is a symmetric matrix but it not yet handled as such in the code.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2014

Would save some operations. I think I'll have one more round of optimization / profiling before merging this, provided we get the build crap pinned down. We may not need to use memoryviews here all that much either. We might be better off without buffer acquisition here and using the numpy C api more.

@ev-br

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2014

Ignorance here: what is the reason for not using scipy blas wrappers as is? xSYMM wrappers are available from 0.13 onwards.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2014

Well, we are using the scipy blas wrappers. We're using the C function pointer to the fblas function directly from Cython. We don't want to involve any Python function calls in the C code.

Our dependency for scipy right now is 0.11.0 or 0.12.0. I don't remember offhand.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 5, 2014

Ah, I think I got this. Was trying to be too cute. Just wrote a header with the proper py3k macros.

@coveralls

This comment has been minimized.

Copy link

commented Apr 5, 2014

Coverage Status

Coverage remained the same when pulling afba01a on jseabold:arma-speedup-scipy into bed3499 on statsmodels:master.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 5, 2014

All green. Debating whether to try to squeeze a little more performance out of this.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 6, 2014

I'm leaning towards merging this and starting a new issue for further micro-optimizations. This is already a big speed-up even if we can get more. No reason to wait until I have some time to do further profiling.

jseabold added a commit that referenced this pull request Apr 6, 2014

Merge pull request #1524 from jseabold/arma-speedup-scipy
ENH: Optimize Cython code. Use scipy blas function pointers.

@jseabold jseabold merged commit a738b4f into statsmodels:master Apr 6, 2014

@jseabold jseabold deleted the jseabold:arma-speedup-scipy branch Apr 6, 2014

@josef-pkt josef-pkt added the PR label Apr 14, 2014

@piskvorky

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2014

I'll be porting these scipy-blas shenanigans soon too, for dual py2/3 compatibility in gensim.

How has the "capsule" thing panned out since being merged, Skipper? Any issues?

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 19, 2014

Works fine AFAICT. No complaints on Travis. See Pauli's reply here [1] for potential issues on Apple's Accelerate with some functions. You can find all the known issues here [2]. There's also some workaround code in Theano that tries to detect problems and conditionally write the headers [3]. I'm fairly certain that's the same sdot issue, but I don't know theano all that well.

[1] http://scipy-user.10969.n7.nabble.com/SciPy-User-scipy-linalg-blas-before-0-12-0-td19274.html
[2] https://github.com/scipy/scipy/tree/master/scipy/_build_utils/src
[3] https://github.com/Theano/Theano/blob/master/theano/tensor/blas_headers.py

@piskvorky

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2014

Yeah I noticed, I reported these Apple problems some month ago: https://groups.google.com/d/msg/cython-users/V_DR1xi5Ang/48OamQSX7TwJ and numpy/numpy#4007

General response was "not our problem", so my solution in gensim was to detect whether the sdot signature returns "float" or "double" at runtime... a nasty hack, but seems to get the work done :)

Will check out the theano code too, thanks for the links and heads up, Skipper.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Apr 19, 2014

Oh right. I remember reading that thread now. Fun stuff and small world.

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#1524 from jseabold/arma-speedup-scipy
ENH: Optimize Cython code. Use scipy blas function pointers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.