Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU support? Or not for this project #6439

Closed
karasjoh000 opened this issue Jan 21, 2020 · 6 comments
Closed

GPU support? Or not for this project #6439

karasjoh000 opened this issue Jan 21, 2020 · 6 comments
Milestone

Comments

@karasjoh000
Copy link

I haven't taken a look at the architecture of statsmodels, but I will be interested in integrating GPU support and have a backend that runs in a systems prog language. A good idea for this project or more like this needs to be a completely new project? Just did not find any other opensource econometrics libs, wanted to improve it.

@josef-pkt
Copy link
Member

It's unlikely that statsmodels can easily run on GPU. We make very heavy use of scipy, and as far as I know that doesn't run on GPU.
The main core parts are scipy.optimize and linear algebra, both numpy and scipy linalg.

I guess, the most difficult part is that the tsa.statespace models use the BLAS/LAPACK wrappers that scipy provides directly in cython.

@josef-pkt
Copy link
Member

With 200K lines of code, statsmodels is also pretty big.
What I have seen for some cases, is that developers outside took some part like GLM and refactored it to whatever they needed.

The other point is that I'm not sure where a GPU would help us.
For example, a more likely route is to use Dask or similar to get distributed computing, but the main target would be out-of-core memory handling and only secondary multi-core/distributed.

(I never worked on a CPU, the one time I tried I didn't manage to get it to install/work.)

@karasjoh000
Copy link
Author

I see. I will take a look.

@karasjoh000
Copy link
Author

I think putting an interface for linalg calls to abstract scipy and GPU operations can work in some models. There is also the issue of sending small jobs to GPU that will cause a communication overhead.

@bashtage
Copy link
Member

IMO if one had a compelling case for GPU, then numba would be the way to go. In most cases the GPU would only be needed for a small fraction of the code.

@josef-pkt
Copy link
Member

A while ago I looked at dask-ml https://github.com/dask/dask-ml
As far as I could figure out, they used distributed computing with scikit-learn only for the prediction part, but had to do the estimation in-memory (or on one computer)

In our case, estimation runs usually into memory problems before we get to long computation times. MKL and other linalg libraries uses multiple cores, but often that's not efficient.

The only cases where we use explicit multiprocessing inside statsmodels is where we use obviously parallel code, e.g. in bootstrap and maybe for cross-validation (AFAIR). In those cases we need the entire model available in the sub processes and not just small pieces of code.

For numba speed-up we can just optimize a single function in the center of a model or a stats function.

There are likely just a few usecases inside statsmodels where GPU might help. (with the caveat that I don't know much about use cases for GPU)

@bashtage bashtage added this to the 0.12 milestone Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants