Using PyMC3 on the GPU #1246

twiecki · 2016-07-18T13:29:01Z

Most input provided by @fhuszar.

It seems there are at least two blockers of using pymc3 on the gpu. The first one is incompatilibty with float32 dtype. Here is an example model:

from pymc3 import Model, NUTS, sample
from pymc3.distributions import DensityDist
import pymc3 as pm
import theano
import numpy as np

theano.config.floatX = 'float32'
theano.config.compute_test_value = 'raise'
theano.config.exception_verbosity= 'high'
with Model() as denoising_model:
#     theano.config.compute_test_value = 'off'

    x = DensityDist('x',
            logp= lambda value: -(value**2).sum(),
            shape=(1, 1, 10, 10),
            testval=np.random.randn(1,1,10,10).astype('float32'),
            dtype='float32',
        )

    sampler = pm.Metropolis()
    trace = sample(10, sampler)

Output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    898             outputs =\
--> 899                 self.fn() if output_subset is None else\
    900                 self.fn(output_subset=output_subset)

TypeError: expected type_num 11 (NPY_FLOAT32) got 12

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-9a14b0f25285> in <module>()
     19 
     20     sampler = pm.Metropolis()
---> 21     trace = sample(10, sampler)

/home/wiecki/working/projects/pymc/pymc3/sampling.py in sample(draws, step, start, trace, chain, njobs, tune, progressbar, model, random_seed)
    148         sample_func = _sample
    149 
--> 150     return sample_func(**sample_args)
    151 
    152 

/home/wiecki/working/projects/pymc/pymc3/sampling.py in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed)
    157     progress = progress_bar(draws)
    158     try:
--> 159         for i, strace in enumerate(sampling):
    160             if progressbar:
    161                 progress.update(i)

/home/wiecki/working/projects/pymc/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    239         if i == tune:
    240             step = stop_tuning(step)
--> 241         point = step.step(point)
    242         strace.record(point)
    243         yield strace

/home/wiecki/working/projects/pymc/pymc3/step_methods/arraystep.py in step(self, point)
    125         bij = DictToArrayBijection(self.ordering, point)
    126 
--> 127         apoint = self.astep(bij.map(point))
    128         return bij.rmap(apoint)
    129 

/home/wiecki/working/projects/pymc/pymc3/step_methods/metropolis.py in astep(self, q0)
    125             q = q0 + delta
    126 
--> 127         q_new = metrop_select(self.delta_logp(q, q0), q, q0)
    128 
    129         if q_new is q:

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    910                     node=self.fn.nodes[self.fn.position_of_error],
    911                     thunk=thunk,
--> 912                     storage_map=getattr(self.fn, 'storage_map', None))
    913             else:
    914                 # old-style linkers raise their own exceptions

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    312         # extra long error message in that case.
    313         pass
--> 314     reraise(exc_type, exc_value, exc_trace)
    315 
    316 

/home/wiecki/miniconda3/lib/python3.5/site-packages/six.py in reraise(tp, value, tb)
    683             value = tp()
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value
    687 

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    897         try:
    898             outputs =\
--> 899                 self.fn() if output_subset is None else\
    900                 self.fn(output_subset=output_subset)
    901         except Exception:

TypeError: expected type_num 11 (NPY_FLOAT32) got 12
Apply node that caused the error: Elemwise{sqr,no_inplace}(Reshape{4}.0)
Toposort index: 5
Inputs types: [TensorType(float32, (True, True, False, False))]
Inputs shapes: [(1, 1, 10, 10)]
Inputs strides: [(800, 800, 80, 8)]
Inputs values: ['not shown']
Outputs clients: [[Sum{acc_dtype=float64}(Elemwise{sqr,no_inplace}.0)]]

Debugprint of the apply node: 
Elemwise{sqr,no_inplace} [id A] <TensorType(float32, (True, True, False, False))> ''   
 |Reshape{4} [id B] <TensorType(float32, (True, True, False, False))> ''   
   |Subtensor{int64:int64:} [id C] <TensorType(float32, vector)> ''   
   | |inarray1 [id D] <TensorType(float32, vector)>
   | |Constant{0} [id E] <int64>
   | |Constant{100} [id F] <int64>
   |TensorConstant{[ 1  1 10 10]} [id G] <TensorType(int64, vector)>

Storage map footprint:
 - inarray, Input, Shape: (100,), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)
 - Reshape{4}.0, Shape: (1, 1, 10, 10), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)
 - inarray1, Input, Shape: (100,), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)
 - TensorConstant{[ 1  1 10 10]}, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)
 - Constant{0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 - Constant{100}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
 TotalSize: 2448.0 Byte(s) 0.000 GB
 TotalSize inputs: 1648.0 Byte(s) 0.000 GB

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.

Not sure where the float64 dtype comes in. @nouiz any ideas?

The second problem is related to #566 and should have a simple solution of only setting the test value behavior in the model context.

The text was updated successfully, but these errors were encountered:

nouiz · 2016-07-18T18:29:25Z

This error is strange. Can you update Theano to the dev version?

It look like the reshape is upcasting the float32 to float64 when it
shouldn't. This is just calling numpy reshape. This is all on CPU.

On Mon, Jul 18, 2016 at 9:29 AM, Thomas Wiecki notifications@github.com
wrote:

Most input provided by @fhuszar https://github.com/fhuszar.

It seems there are at least two blockers of using pymc3 on the gpu. The
first one is incompatilibty with float32 dtype. Here is an example model:

from pymc3 import Model, NUTS, samplefrom pymc3.distributions import DensityDistimport pymc3 as pmimport theanoimport numpy as np

theano.config.floatX = 'float32'
theano.config.compute_test_value = 'raise'
theano.config.exception_verbosity= 'high'with Model() as denoising_model:# theano.config.compute_test_value = 'off'
x = DensityDist('x',
        logp= lambda value: -(value**2).sum(),
        shape=(1, 1, 10, 10),
        testval=np.random.randn(1,1,10,10).astype('float32'),
        dtype='float32',
    )

sampler = pm.Metropolis()
trace = sample(10, sampler)
Output:

TypeError Traceback (most recent call last)
/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in call(self, _args, *_kwargs)
898 outputs =
--> 899 self.fn() if output_subset is None else
900 self.fn(output_subset=output_subset)

TypeError: expected type_num 11 (NPY_FLOAT32) got 12

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
in ()
19
20 sampler = pm.Metropolis()
---> 21 trace = sample(10, sampler)

/home/wiecki/working/projects/pymc/pymc3/sampling.py in sample(draws, step, start, trace, chain, njobs, tune, progressbar, model, random_seed)
148 sample_func = _sample
149
--> 150 return sample_func(**sample_args)
151
152

/home/wiecki/working/projects/pymc/pymc3/sampling.py in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed)
157 progress = progress_bar(draws)
158 try:
--> 159 for i, strace in enumerate(sampling):
160 if progressbar:
161 progress.update(i)

/home/wiecki/working/projects/pymc/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
239 if i == tune:
240 step = stop_tuning(step)
--> 241 point = step.step(point)
242 strace.record(point)
243 yield strace

/home/wiecki/working/projects/pymc/pymc3/step_methods/arraystep.py in step(self, point)
125 bij = DictToArrayBijection(self.ordering, point)
126
--> 127 apoint = self.astep(bij.map(point))
128 return bij.rmap(apoint)
129

/home/wiecki/working/projects/pymc/pymc3/step_methods/metropolis.py in astep(self, q0)
125 q = q0 + delta
126
--> 127 q_new = metrop_select(self.delta_logp(q, q0), q, q0)
128
129 if q_new is q:

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in call(self, _args, *_kwargs)
910 node=self.fn.nodes[self.fn.position_of_error],
911 thunk=thunk,
--> 912 storage_map=getattr(self.fn, 'storage_map', None))
913 else:
914 # old-style linkers raise their own exceptions

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
312 # extra long error message in that case.
313 pass
--> 314 reraise(exc_type, exc_value, exc_trace)
315
316

/home/wiecki/miniconda3/lib/python3.5/site-packages/six.py in reraise(tp, value, tb)
683 value = tp()
684 if value.traceback is not tb:
--> 685 raise value.with_traceback(tb)
686 raise value
687

/home/wiecki/miniconda3/lib/python3.5/site-packages/theano/compile/function_module.py in call(self, _args, *_kwargs)
897 try:
898 outputs =
--> 899 self.fn() if output_subset is None else
900 self.fn(output_subset=output_subset)
901 except Exception:

TypeError: expected type_num 11 (NPY_FLOAT32) got 12
Apply node that caused the error: Elemwise{sqr,no_inplace}(Reshape{4}.0)
Toposort index: 5
Inputs types: [TensorType(float32, (True, True, False, False))]
Inputs shapes: [(1, 1, 10, 10)]
Inputs strides: [(800, 800, 80, 8)]
Inputs values: ['not shown']
Outputs clients: [[Sum{acc_dtype=float64}(Elemwise{sqr,no_inplace}.0)]]

Debugprint of the apply node:
Elemwise{sqr,no_inplace} [id A] <TensorType(float32, (True, True, False, False))> ''
|Reshape{4} [id B] <TensorType(float32, (True, True, False, False))> ''
|Subtensor{int64:int64:} [id C] <TensorType(float32, vector)> ''
| |inarray1 [id D] <TensorType(float32, vector)>
| |Constant{0} [id E]
| |Constant{100} [id F]
|TensorConstant{[ 1 1 10 10]} [id G] <TensorType(int64, vector)>

Storage map footprint:

inarray, Input, Shape: (100,), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)

Reshape{4}.0, Shape: (1, 1, 10, 10), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)

inarray1, Input, Shape: (100,), ElemSize: 8 Byte(s), TotalSize: 800 Byte(s)

TensorConstant{[ 1 1 10 10]}, Shape: (4,), ElemSize: 8 Byte(s), TotalSize: 32 Byte(s)

Constant{0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)

Constant{100}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
TotalSize: 2448.0 Byte(s) 0.000 GB
TotalSize inputs: 1648.0 Byte(s) 0.000 GB

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.

Not sure where the float64 dtype comes in. @nouiz
https://github.com/nouiz any ideas?

The second problem is related to #566
#566 and should have a simple
solution of only setting the test value behavior in the model context.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246, or mute the thread
https://github.com/notifications/unsubscribe-auth/AALC-255NsJ-zS24fQW-oeBYq6VhT3k7ks5qW3-jgaJpZM4JOtlO
.

twiecki · 2016-07-18T19:38:03Z

Just updated to '0.7.0.dev-7ba9c05257347024ea90eed2f464f26cb4242b93' and the error is the same.

nouiz · 2016-07-18T20:04:32Z

0.7.0.dev... is very old. Can you update to 0.9.0.dev2?

On Mon, Jul 18, 2016 at 3:38 PM, Thomas Wiecki notifications@github.com
wrote:

Just updated to '0.7.0.dev-7ba9c05257347024ea90eed2f464f26cb4242b93' and
the error is the same.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AALC-z43-8P4Mc51THprMJTM8WpZxDFLks5qW9YhgaJpZM4JOtlO
.

twiecki · 2016-07-18T20:23:26Z

Identical error on '0.9.0dev2.dev-d5944c965c453558ef834b439b671e0c01530b3c'

twiecki · 2016-07-21T14:27:26Z

@nouiz Any idea on what might be causing the error on most recent theano?

fhuszar · 2016-07-21T14:55:59Z

Isn't the problem that the Metropolis sampler updates some of the variables
in astep, and that this update happens outside theano so the float64s
come in? Somewhere perhaps a numpy array that is float is being added or
multiplied, for example, what's he type of self.scaling?

A quick fix might just be adding allow_input_downcast=True every time you
call theano.function For example in nuts: f = theano.function([q, p, e, q0, p0], [q1, p1, dE], profile=profile)

On Thu, Jul 21, 2016 at 3:27 PM, Thomas Wiecki notifications@github.com
wrote:

@nouiz https://github.com/nouiz Any idea on what might be causing the
error on most recent theano?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQrvnYG-xsueEkvzJd8oSV04eS-PyF1ks5qX4HUgaJpZM4JOtlO
.

twiecki · 2016-07-21T14:57:40Z

@fhuszar Interesting, I'll look into that. I suppose proposals should take either the type of the RV, or the type or floatX.

fhuszar · 2016-07-21T14:58:21Z

Yes, but I think if the theano function has allow_input_downcast it should automatically figure that out. - I think

nouiz · 2016-07-21T18:12:08Z

allow_input_downcast to theano.function() only work on the input to the
function, not to the inputs of each node. We can't easily lift that
restriction and I think it is a bad idea to do so.

On Thu, Jul 21, 2016 at 10:58 AM, Ferenc Huszar notifications@github.com
wrote:

Yes, but I think if the theano function has allow_input_downcast it should
automatically figure that out. - I think

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AALC-9QfJkoO9I7pNLhuBBCdKsqtptDRks5qX4kSgaJpZM4JOtlO
.

fhuszar · 2016-07-21T18:39:36Z

If you use GPU in theanorc then all nodes in the company graph will be
consistently floatX type. I think it's only the input variable whose type
will be the type of the bumpy array unless allow_input_downcast is set. I
agree that it would be ideal if the float64s were manually downcast but
that may be hard to do.

Also, ideally the random number generation, leapfrog step, metropolis
updates, accept/reject would also happen in theano (perhaps best
implemented via the updates parameter of theano.function as it is done for
example in lasagne.updates) so there would be no back and forth data
transfer between CPU and GPU. If you want to sample images or large
parameter tensors for deep learning, significant time will be lost
transferring data between CPU and GPU memory.

On Thursday, 21 July 2016, Frédéric Bastien notifications@github.com
wrote:

allow_input_downcast to theano.function() only work on the input to the
function, not to the inputs of each node. We can't easily lift that
restriction and I think it is a bad idea to do so.

On Thu, Jul 21, 2016 at 10:58 AM, Ferenc Huszar <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

Yes, but I think if the theano function has allow_input_downcast it
should
automatically figure that out. - I think

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246 (comment),
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AALC-9QfJkoO9I7pNLhuBBCdKsqtptDRks5qX4kSgaJpZM4JOtlO

.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1246 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQrvhnPFMZJz0bqPCGYLGhY74--9oRgks5qX7Z-gaJpZM4JOtlO
.

twiecki · 2016-07-22T08:54:57Z

@fhuszar You are certainly correct that that's the issue. I did some manual down-casting in Metropolis and NUTS and the above example works. Here is a branch that makes python use the correct dtypes: #1253, can you have a look?

The leapfrog steps and metropolis updates are already happening in theano I believe (but could be wrong). The random number generation however is not, that would probably be a better fix. Could you be a bit more specific what would need to change?

magnushiie · 2016-12-31T17:25:05Z

Another issue preventing running on the GPU is the _psi C function in distributions/special.py, which is missing the __device__ annotation. Theano has a similar _psi function with DEVICE macro:

// For GPU support
            #ifdef __CUDACC__
            #define DEVICE __device__
            #else
            #define DEVICE
            #endif

            #ifndef _PSIFUNCDEFINED
            #define _PSIFUNCDEFINED
            DEVICE double _psi(double x){

Perhaps PyMC3 should use Theano's Psi instead?

twiecki · 2016-12-31T17:52:14Z

@magnushiie Thanks for the pointer, can you point to where in the theano code that is? Should definitely use that.

twiecki · 2016-12-31T17:53:40Z

https://github.com/Theano/Theano/blob/master/theano/scalar/basic_scipy.py#L276

twiecki · 2021-09-16T18:26:43Z

Supported by JAX.

twiecki mentioned this issue Jul 22, 2016

WIP Samplers should keep the same dtype as provided by theano. #1253

Merged

twiecki mentioned this issue Jul 27, 2016

WIP Allow downcast in metropolis and nuts functions. #1265

Closed

twiecki mentioned this issue Dec 31, 2016

Use theano gammaln psi #1628

Merged

springcoil added the gpu label Apr 20, 2017

canyon289 added the hackathon label Sep 18, 2020

fonnesbeck mentioned this issue Sep 18, 2020

Multicore GPU/MultiGPU sampling support #3341

Closed

ricardoV94 removed the hackathon label Mar 17, 2021

twiecki closed this as completed Sep 16, 2021

OriolAbril mentioned this issue Oct 11, 2022

Potential flaky test: truncation_discrete_random #6206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using PyMC3 on the GPU #1246

Using PyMC3 on the GPU #1246

twiecki commented Jul 18, 2016

nouiz commented Jul 18, 2016

twiecki commented Jul 18, 2016

nouiz commented Jul 18, 2016

twiecki commented Jul 18, 2016

twiecki commented Jul 21, 2016

fhuszar commented Jul 21, 2016 •

edited

Loading

twiecki commented Jul 21, 2016

fhuszar commented Jul 21, 2016

nouiz commented Jul 21, 2016

fhuszar commented Jul 21, 2016

twiecki commented Jul 22, 2016

magnushiie commented Dec 31, 2016 •

edited

Loading

twiecki commented Dec 31, 2016

twiecki commented Dec 31, 2016

twiecki commented Sep 16, 2021

Using PyMC3 on the GPU #1246

Using PyMC3 on the GPU #1246

Comments

twiecki commented Jul 18, 2016

nouiz commented Jul 18, 2016

twiecki commented Jul 18, 2016

nouiz commented Jul 18, 2016

twiecki commented Jul 18, 2016

twiecki commented Jul 21, 2016

fhuszar commented Jul 21, 2016 • edited Loading

twiecki commented Jul 21, 2016

fhuszar commented Jul 21, 2016

nouiz commented Jul 21, 2016

fhuszar commented Jul 21, 2016

twiecki commented Jul 22, 2016

magnushiie commented Dec 31, 2016 • edited Loading

twiecki commented Dec 31, 2016

twiecki commented Dec 31, 2016

twiecki commented Sep 16, 2021

fhuszar commented Jul 21, 2016 •

edited

Loading

magnushiie commented Dec 31, 2016 •

edited

Loading