GPU support? Or not for this project #6439

karasjoh000 · 2020-01-21T04:47:10Z

I haven't taken a look at the architecture of statsmodels, but I will be interested in integrating GPU support and have a backend that runs in a systems prog language. A good idea for this project or more like this needs to be a completely new project? Just did not find any other opensource econometrics libs, wanted to improve it.

josef-pkt · 2020-01-21T05:24:35Z

It's unlikely that statsmodels can easily run on GPU. We make very heavy use of scipy, and as far as I know that doesn't run on GPU.
The main core parts are scipy.optimize and linear algebra, both numpy and scipy linalg.

I guess, the most difficult part is that the tsa.statespace models use the BLAS/LAPACK wrappers that scipy provides directly in cython.

josef-pkt · 2020-01-21T05:33:19Z

With 200K lines of code, statsmodels is also pretty big.
What I have seen for some cases, is that developers outside took some part like GLM and refactored it to whatever they needed.

The other point is that I'm not sure where a GPU would help us.
For example, a more likely route is to use Dask or similar to get distributed computing, but the main target would be out-of-core memory handling and only secondary multi-core/distributed.

(I never worked on a CPU, the one time I tried I didn't manage to get it to install/work.)

karasjoh000 · 2020-01-21T05:40:02Z

I see. I will take a look.

karasjoh000 · 2020-01-21T16:31:07Z

I think putting an interface for linalg calls to abstract scipy and GPU operations can work in some models. There is also the issue of sending small jobs to GPU that will cause a communication overhead.

bashtage · 2020-01-21T16:32:09Z

IMO if one had a compelling case for GPU, then numba would be the way to go. In most cases the GPU would only be needed for a small fraction of the code.

josef-pkt · 2020-01-21T16:44:35Z

A while ago I looked at dask-ml https://github.com/dask/dask-ml
As far as I could figure out, they used distributed computing with scikit-learn only for the prediction part, but had to do the estimation in-memory (or on one computer)

In our case, estimation runs usually into memory problems before we get to long computation times. MKL and other linalg libraries uses multiple cores, but often that's not efficient.

The only cases where we use explicit multiprocessing inside statsmodels is where we use obviously parallel code, e.g. in bootstrap and maybe for cross-validation (AFAIR). In those cases we need the entire model available in the sub processes and not just small pieces of code.

For numba speed-up we can just optimize a single function in the center of a model or a stats function.

There are likely just a few usecases inside statsmodels where GPU might help. (with the caveat that I don't know much about use cases for GPU)

karasjoh000 closed this as completed Feb 19, 2020

bashtage added this to the 0.12 milestone Jul 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU support? Or not for this project #6439

GPU support? Or not for this project #6439

karasjoh000 commented Jan 21, 2020

josef-pkt commented Jan 21, 2020

josef-pkt commented Jan 21, 2020

karasjoh000 commented Jan 21, 2020

karasjoh000 commented Jan 21, 2020

bashtage commented Jan 21, 2020

josef-pkt commented Jan 21, 2020

GPU support? Or not for this project #6439

GPU support? Or not for this project #6439

Comments

karasjoh000 commented Jan 21, 2020

josef-pkt commented Jan 21, 2020

josef-pkt commented Jan 21, 2020

karasjoh000 commented Jan 21, 2020

karasjoh000 commented Jan 21, 2020

bashtage commented Jan 21, 2020

josef-pkt commented Jan 21, 2020