Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add save_trace and load_trace #2975

Merged
merged 2 commits into from May 18, 2018

Conversation

Projects
None yet
5 participants
@ColCarroll
Copy link
Member

commented May 15, 2018

This provides functions to save and load traces, avoiding pickle. My main use would be saving traces while running a large notebook, or distributing the traces with code containing the models used to produce them.

Pros:

  • it should be compatible between python versions (so long as these functions retain compatibility),
  • it avoids security concerns (all files are json or .npy)
  • appears to be smaller (though missing some stuff) -- the test model was 400kb, compared to 900kb pickled
  • answers a question that comes up reasonably often in issues about saving traces

Cons:

  • Requires model context to reload (in particular, the model is stored in the pickle, but not this file)
  • Does not contain any part of the trace.report yet (though that could be added without breaking compatibility)
  • Requires maintenance

ColCarroll added some commits May 15, 2018

@ColCarroll

This comment has been minimized.

Copy link
Member Author

commented May 16, 2018

Also, here is an example of it in use. This creates a local directory called .pymc.trace by default to save the trace to.

image

@junpenglao

This comment has been minimized.

Copy link
Member

commented May 16, 2018

THANK YOU! This will solve so much pickling issues!

@springcoil

This comment has been minimized.

Copy link
Member

commented May 18, 2018

Just so I understand what's the pickle issues? Incompatible between python versions and security concerns?

@springcoil

This comment has been minimized.

Copy link
Member

commented May 18, 2018

This looks good to me and ready to merge. Any objections @ColCarroll

@springcoil
Copy link
Member

left a comment

LGTM

@twiecki twiecki merged commit 850a2a7 into pymc-devs:master May 18, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.03%) to 88.597%
Details
@twiecki

This comment has been minimized.

Copy link
Member

commented May 18, 2018

Great stuff!

@twiecki

This comment has been minimized.

Copy link
Member

commented May 18, 2018

Oh, I forgot, we should add this to the release-notes, and also add some example docs somewhere.

@springcoil

This comment has been minimized.

Copy link
Member

commented May 18, 2018

@ColCarroll

This comment has been minimized.

Copy link
Member Author

commented May 18, 2018

I will add release notes, and I realized that there's an edge case I didn't cover that requires deleting the directory before writing to it (if you save a model with variables x, y,, and then save a new model with z, w, loading will give you a model with x, y, z, w).

@sudiptamazumda

This comment has been minimized.

Copy link

commented Aug 25, 2018

Does anyone have a sample code to predict with this trace load functionality? I have a Gaussian model :
y= f(x) + e...
f(x) ~ Gaussian(a, b),
e ~ N(0, sigma^2)
Trace saves the posterior of a,b and sigma...

My objective is to predict f(x) for a new x in a new python session without running the model training piece...

@ColCarroll

This comment has been minimized.

Copy link
Member Author

commented Aug 25, 2018

I might need more detail for what you're trying to do. Here's an example, though:

First, generate a random model:

import os

import numpy as np
import matplotlib.pyplot as plt
import theano
import theano.tensor as tt

dims = 2
N = 100

true_weights = np.random.normal(size=(dims,))

data = np.random.normal(size=(N, dims))
noise = np.random.normal(0, 0.5, size=N)

y = np.dot(data, true_weights) + noise
print(true_weights)

Now do a cached prediction -- running this multiple times will work, even changing the predict_data.

cache_file = 'my_trace.trace'


s_data = theano.shared(data)

with pm.Model() as model:
    weights = pm.Normal('weights', mu=0, sd=1, shape=dims)
    y_obs = pm.Normal('y_obs', mu=tt.dot(s_data, weights), sd=0.5, observed=y, shape=s_data.shape[0].eval())

if not os.path.exists(cache_file):
    with model:
        trace = pm.sample()

    pm.save_trace(trace, directory=cache_file)
else:
    trace = pm.load_trace(cache_file, model=model)

    
predict_data = np.array([
    [0, 1],
    [1, 0],
    [1, 1],
    [2, 2],
])

s_data.set_value(predict_data)

with model:
    ppc = pm.sample_ppc(trace)

print(trace['weights'].mean(axis=0))  # pretty close to true weights
print(ppc['y_obs'].mean(axis=0))  # should be reasonable
@sudiptamazumda

This comment has been minimized.

Copy link

commented Aug 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.