Deprecate all backends except text #2189

fbnrst · 2017-05-16T09:29:10Z

In https://pymc-devs.github.io/pymc3/api/backends.html#selecting-a-backend there is some pseudo code on how to select backends:

import pymc3 as pm
db = pm.backends.Text('test')
trace = pm.sample(..., trace=db)

I tried to get a minimal example running, but I did run into some issues. First, I think db = pm.backends.Text('test') needs a context. Furthermore, for the sampling we need at least one random variable

model = pm.Model()

with model:
    db = pm.backends.Text('test')
    a = pm.Normal('a', mu=0, sd=1)
    trace = pm.sample(1000, n_init=1000, trace=db)

However, I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-195-1b336b9db146> in <module>()
      6     db = pm.backends.Text('test')
      7     a = pm.Normal('a', mu=0, sd=1)
----> 8     trace = pm.sample(1000, n_init=1000, trace=db)

/home/fabian/anaconda2/envs/py3/lib/python3.5/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain, njobs, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, **kwargs)
    258         sample_func = _sample
    259 
--> 260     return sample_func(**sample_args)
    261 
    262 

/home/fabian/anaconda2/envs/py3/lib/python3.5/site-packages/pymc3/sampling.py in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed, live_plot, **kwargs)
    273     try:
    274         strace = None
--> 275         for it, strace in enumerate(sampling):
    276             if live_plot:
    277                 if it >= skip_first:

/home/fabian/anaconda2/envs/py3/lib/python3.5/site-packages/tqdm/_tqdm.py in __iter__(self)
    814 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    815 
--> 816             for obj in iterable:
    817                 yield obj
    818                 # Update and print the progressbar.

/home/fabian/anaconda2/envs/py3/lib/python3.5/site-packages/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    373                     strace.record(point, states)
    374                 else:
--> 375                     strace.record(point)
    376             else:
    377                 point = step.step(point)

/home/fabian/anaconda2/envs/py3/lib/python3.5/site-packages/pymc3/backends/text.py in record(self, point)
     92         """
     93         vals = {}
---> 94         for varname, value in zip(self.varnames, self.fn(point)):
     95             vals[varname] = value.ravel()
     96         columns = [str(val) for var in self.varnames for val in vals[var]]

/home/fabian/anaconda2/envs/py3/lib/python3.5/site-packages/pymc3/model.py in __call__(self, state)
    755 
    756     def __call__(self, state):
--> 757         return self.f(**state)
    758 
    759 

/home/fabian/anaconda2/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    777 
    778             if len(args) + len(kwargs) > len(self.input_storage):
--> 779                 raise TypeError("Too many parameter passed to theano function")
    780 
    781             # Set positional arguments

TypeError: Too many parameter passed to theano function

Interestingly, using the SQLite backend gives the same error, but the example without a trace backend works just fine:

import pymc3 as pm

model = pm.Model()

with model:
    db = pm.backends.Text('test')
    a = pm.Normal('a', mu=0, sd=1)
    trace = pm.sample(1000, n_init=1000)

The text was updated successfully, but these errors were encountered:

twiecki · 2017-05-16T09:49:13Z

Thanks @fabianrost84. What do you need the backend for? Have you tried the hdf5 backend?

We are considering deprecating the backends.

fbnrst · 2017-05-16T09:57:10Z

I thought of backends as a convenient way to store traces that result from long sampling, such that I can work with them later. Would there be another way to do that?

hdf5 backend gives the same error.

twiecki · 2017-05-16T10:01:38Z

Why do you need long sampling runs? Do you have poor convergence?

fbnrst · 2017-05-16T10:08:53Z

Haha ;) I should have been more specific. No, rather many datasets which to which I fit the same model. For one dataset the sampling is reasonably fast (~1 minute). But when I fit 20 datasets, it takes 20 minutes, and so I just wanted to store those traces in case I want to do further analysis.

twiecki · 2017-05-16T10:44:40Z

I see, so if you were able to store traces after sampling, that would be sufficient?

fbnrst · 2017-05-16T12:01:13Z

Exactly. I also tried to pickle them:

pt = pickle.dumps(trace)

However, pickle.loads(pt) then results in a recursion error.

junpenglao · 2017-05-16T12:09:09Z

you can also try to do pm.trace_to_dataframe and save the trace as pandas dataframe

fbnrst · 2017-05-16T13:35:58Z

Thanks for the suggestion @junpenglao. That would work for saving some important information. But I would loose the sampler stats, right? And, with a dataframe I loose some convenient functionalities, for instance I cannot create a traceplot from the dataframe.

Maybe to conclude from the discussion above:

There is an issue with the documentation of backends (because I could not get it working from what was written there). One reason might be, because it is considered to deprecate the backends.
There are two use-cases of backends:
One is to safe memory, but this should not be necessary any more, because "good" models should converge fast enough and we only need short traces.
The other one is to store the trace for later use. But this seems not to be the originally intended use-case.

So maybe we should split this issue into three, something like:

Depreciaton of backends
Update backend documentation
Storing traces after sampling

\2) and 3) will have to wait until 1) is decided. 3) Is a feature request which might have to wait.

twiecki · 2017-05-16T13:44:55Z

Can't you just pickle the trace object?

ColCarroll · 2017-05-16T13:45:03Z

Yes! This would all be great! For now I think a good hack would be something like:

with open('model_{}.pkl'.format(model_id), 'wb') as buff:
    pickle.dump({'model': model, 'trace': trace}, buff)

where you specify a model id you can use. (You might not even need to keep the model, but you could use it to keep sampling)

ColCarroll · 2017-05-16T13:47:46Z

Also, I call this a "hack" because you shouldn't trust pickle objects! They can run arbitrary python code, but it should be fine for local work.

madanh · 2017-07-19T10:49:10Z

In my case I have the same errors as @fabianrost84, while I need long runs because there is some obvious random walk behaviour in my (admittedly non-trivial) model. Anyway being able to save intermediate traces is a legitimate need, so some mechanism to achieve this must be present.

junpenglao · 2017-07-19T11:03:49Z

Maybe we can refactor some part of the SMC code to make the saving and loading intermediate trace a general feature.
@ColCarroll @hvasbath

madanh · 2017-07-19T11:13:16Z

I also wouldn't mind doing short runs interleaved with saving, but this means that we must be able to restart sampling from the previous state (possibly loaded from a file). Is there a clean way to do it now?

hvasbath · 2017-07-19T11:26:42Z

SMC does it already, yes.
But it seems then very model specific that the other backends stop working?!
Maybe the SMC trace wouldnt work either. I also dont understand why this error occurs?
Seems like your the point that is supposed to be recorded has more RVs to be stored than your model function takes as input.
These backends (hdf5, sqlite) reevaluate the model during point recording, which in my oppinion should be changed anyways as it is a huge performance drain to evaluate again what you know already ...

You do not get the error not specifying the trace, because it uses the numpy array then as backend...

Yes the SMC trace (doesnt reevaluate the model again) uses a list of arrays as record input compared to the other backends that need a point dict. Once we decided on how to best refactor we can do that.
Now that we have the list to dict bijection since the last SMC update we could refactor the SMC trace to take points as well, so the smc trace should work for any sampler then...

ColCarroll · 2017-07-19T13:17:27Z

@madanh How would you intend to use the file-backed traces? If it is truly too large to fit in memory, then it seems like you would need bespoke machinery to do any analysis on it anyways.

It seems like one of dask, blaze, pytables, or numpy.mmap might be appropriate. In my beautiful future, there would be a small external project implementing a O(1) memory backend using whichever of those is most appropriate that could be merged back into pymc3. At the same time, we would deprecate the other backends, and provide a performant save and load function for the NDArray for data persistence.

twiecki · 2017-07-19T14:33:26Z

@madanh What sampler do you use? What's your effective sample size? Is the model continuous?

madanh · 2017-07-20T06:15:10Z

@twiecki I'm in the process of figuring out what to use. I tried NUTS first, but the model is pretty finicky - it has a plateau where some components of the gradient are exactly zero for a certain region of parameter space (It;s a feature), so NUTS is slowly random walking (which is expected, in the hindsight). Metropolis is fast, but for some starting conditions rarely accepts (also expected). Slice hangs randomly - the reason seems to lie in the implementation - it tries to sample uniformly from a hypercube for non-scalar variables, rather than working them out component-by-component. And when those variables have large sizes and some covariance, it practically never succeeds. But that is a separate issue, which I will raise once I make sure that that is indeed the case.

Anyway, above stuff has nothing to do with saving traces. And as it is now I understand that, I don't need saving traces, but rather try fix the Slice sampler.

@ColCarroll For me it's not about the memory. One thing why having access to traces is nice is that it aids diagnosing/debugging. Imagine your sampler hangs after reasonable amount of work (which happened to me with Slice) - if you interrupt it - all is lost and you can't diagnose it. So you restart in debug mode and wait twice as long, or make a shorter run, trying to figure out what will happen, before it happens, or whatever. But if your traces are safe on disk, you can begin investigating straight away. Doubly useful if you have njobs>1.

hvasbath · 2017-07-20T07:49:24Z

Why dont you give the SMC a try- would be also good for us to further improve it and make it more smooth in terms of API?! How to start it is shown here:
https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/SMC2_gaussians.ipynb

madanh · 2017-07-20T10:49:27Z

@hvasbath I took a quick look, I will try that if hacking the Slice sampler will not help.

madanh · 2017-07-21T09:52:59Z

OK, fixed the Slice sampler in a quick and dirty fashion and now it at least does not hang. It's suboptimal though, as it returns only 1 sample per group of variables.
Here's a gist
https://gist.github.com/madanh/ca9dcf193d610042225118d2ba252e59

Also found a bug in my model, now NUTS is the best.

gaow · 2020-05-30T16:38:31Z

I have a model that currently only works well with large number of samples used, due to poor convergence. The in-memory backend cannot be used due to memory limits (32GB is not enough). At least before I can rethink and able to implement alternative model, I would like to see the possibility of at least using large sample to work out my problem. I think backends is a useful feature that should not be deprecated.

michaelosthege · 2020-09-18T14:23:16Z

@gaow thank you for this information.

Can you be more specific which backend you're using?

Because we might still be able to deprecate some of the others. We had deprecation warnings about those for several months now.

#3903 #3904 #3905 #3907

canyon289 · 2020-09-18T14:25:22Z

Per conversation drop all backends except text backend

hvasbath · 2020-09-18T20:28:07Z

Just wanted to add my five cents to underline the importance of keeping at least one backend. For the parts of the community that come from the physical modelling side sampling often takes days to weeks, because one likelihood evaluation may take several tens of seconds. Few hours would be fast!

fonnesbeck · 2020-09-18T20:30:17Z

Yes, the plan is to keep the plain text backend.

gaow · 2020-09-18T20:32:41Z

Can you be more specific which backend you're using?

I was using in-memory backend. But I'm happy now that I see the text backend will be preserved.

twiecki · 2021-09-16T18:27:08Z

This is done.

ColCarroll mentioned this issue May 18, 2017

Loading from a backend #2195

Closed

twiecki changed the title ~~Backends documentation: How to get backends running?~~ Deprecate backends? May 18, 2017

madanh mentioned this issue Jul 26, 2017

Make slice sampler sample from 1D conditionals as it should #2446

Merged

ColCarroll mentioned this issue Aug 16, 2019

Feature Request: "db" option for other sampling methods #3137

Closed

michaelosthege mentioned this issue Aug 16, 2019

save and reload using text backend does not preserve iteration properties #3185

Closed

This was referenced Aug 16, 2019

Bug with SQLite as backend when vars specified in backend #3256

Closed

Backends other than NDArray don't work with sequential sampling #2856

Closed

SQLite Backend Cannot operate on a closed cursor or closed database. #3290

Closed

canyon289 added the hackathon label Sep 18, 2020

twiecki changed the title ~~Deprecate backends?~~ Deprecate all backends except text Sep 18, 2020

twiecki added the beginner friendly label Sep 18, 2020

avivajpeyi mentioned this issue Sep 27, 2020

Save posteriors as a FITS/HDF file dfm/tess-atlas#25

Closed

ricardoV94 removed beginner friendly hackathon labels Mar 17, 2021

twiecki closed this as completed Sep 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate all backends except text #2189

Deprecate all backends except text #2189

fbnrst commented May 16, 2017

twiecki commented May 16, 2017

fbnrst commented May 16, 2017

twiecki commented May 16, 2017

fbnrst commented May 16, 2017

twiecki commented May 16, 2017

fbnrst commented May 16, 2017 •

edited

junpenglao commented May 16, 2017

fbnrst commented May 16, 2017 •

edited

twiecki commented May 16, 2017

ColCarroll commented May 16, 2017 •

edited

ColCarroll commented May 16, 2017

madanh commented Jul 19, 2017

junpenglao commented Jul 19, 2017

madanh commented Jul 19, 2017

hvasbath commented Jul 19, 2017 •

edited

ColCarroll commented Jul 19, 2017

twiecki commented Jul 19, 2017

madanh commented Jul 20, 2017 •

edited

hvasbath commented Jul 20, 2017 •

edited

madanh commented Jul 20, 2017

madanh commented Jul 21, 2017

gaow commented May 30, 2020

michaelosthege commented Sep 18, 2020

canyon289 commented Sep 18, 2020

hvasbath commented Sep 18, 2020

fonnesbeck commented Sep 18, 2020

gaow commented Sep 18, 2020

twiecki commented Sep 16, 2021

Deprecate all backends except text #2189

Deprecate all backends except text #2189

Comments

fbnrst commented May 16, 2017

twiecki commented May 16, 2017

fbnrst commented May 16, 2017

twiecki commented May 16, 2017

fbnrst commented May 16, 2017

twiecki commented May 16, 2017

fbnrst commented May 16, 2017 • edited

junpenglao commented May 16, 2017

fbnrst commented May 16, 2017 • edited

twiecki commented May 16, 2017

ColCarroll commented May 16, 2017 • edited

ColCarroll commented May 16, 2017

madanh commented Jul 19, 2017

junpenglao commented Jul 19, 2017

madanh commented Jul 19, 2017

hvasbath commented Jul 19, 2017 • edited

ColCarroll commented Jul 19, 2017

twiecki commented Jul 19, 2017

madanh commented Jul 20, 2017 • edited

hvasbath commented Jul 20, 2017 • edited

madanh commented Jul 20, 2017

madanh commented Jul 21, 2017

gaow commented May 30, 2020

michaelosthege commented Sep 18, 2020

canyon289 commented Sep 18, 2020

hvasbath commented Sep 18, 2020

fonnesbeck commented Sep 18, 2020

gaow commented Sep 18, 2020

twiecki commented Sep 16, 2021

fbnrst commented May 16, 2017 •

edited

fbnrst commented May 16, 2017 •

edited

ColCarroll commented May 16, 2017 •

edited

hvasbath commented Jul 19, 2017 •

edited

madanh commented Jul 20, 2017 •

edited

hvasbath commented Jul 20, 2017 •

edited