Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is BOLFI robust to scaling of the parameters? #292

Closed
coschroeder opened this issue Aug 30, 2018 · 4 comments
Closed

Is BOLFI robust to scaling of the parameters? #292

coschroeder opened this issue Aug 30, 2018 · 4 comments

Comments

@coschroeder
Copy link

coschroeder commented Aug 30, 2018

Summary:

Running BOLFI on the example MA2 with one scaled parameter is not giving meaningful results.

Description:

If I run the BOLFI fitting algorithm on the toy example MA2 where one parameter is scaled by the factor 10**4 for sampling and rescaled in the MA2 function the fitting breaks down. The acquisition function is not meaningful and also after running NUTS the posterior is not what it should look like.
If I am right, the fitting of the posterior should be independent of the order of magnitude of the parameters and all parameters in the GP should scale with the order. Is this correct?
I ran in this issue while trying to fit a more complex model.

Reproducible Steps:

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
import logging
logging.basicConfig(level=logging.INFO)

%matplotlib inline
import elfi
seed = (1)

def MA2(t1, t2, n_obs=100, batch_size=1, random_state=None):
    # Make inputs 2d arrays for numpy broadcasting with w
    t1 = np.asanyarray(t1).reshape((-1, 1))/10000 # rescale parameter t1
    t2 = np.asanyarray(t2).reshape((-1, 1))
    random_state = random_state or np.random

    w = random_state.randn(batch_size, n_obs+2)  # i.i.d. sequence ~ N(0,1)
    x = w[:, 2:] + t1*w[:, 1:-1] + t2*w[:, :-2]
    return x

# true parameters
t1_true = 0.6 * 10000 # scale true parameter t1
t2_true = 0.2

y_obs = MA2(t1_true, t2_true)

# Plot the observed sequence
#plt.figure(figsize=(11, 6));
#plt.plot(y_obs.ravel());

# To illustrate the stochasticity, let's plot a couple of more observations with the same true parameters:
plt.plot(MA2(t1_true, t2_true).ravel());
plt.plot(MA2(t1_true, t2_true).ravel());

# a node is defined by giving a distribution from scipy.stats together with any arguments (here 0 and 2)
t1 = elfi.Prior(scipy.stats.uniform, 0, 2 *10000) # scale prior distribution of parameter t1
t2 = elfi.Prior('uniform', 0, 2)

Y = elfi.Simulator(MA2, t1, t2, observed=y_obs)

def autocov(x, lag=1):
    C = np.mean(x[:,lag:] * x[:,:-lag], axis=1)
    return C

S1 = elfi.Summary(autocov, Y)
S2 = elfi.Summary(autocov, Y, 2)  # the optional keyword lag is given the value 2

# Finish the model with the final node that calculates the squared distance (S1_sim-S1_obs)**2 + (S2_sim-S2_obs)**2
d = elfi.Distance('euclidean', S1, S2)

#elfi.draw(d)  # just give it a node in the model, or the model itself (d.model)

log_d = elfi.Operation(np.log, d.model['d'])
# scale also bounds and acquisition noise to have the appropriate std
bolfi = elfi.BOLFI(log_d, batch_size=1, initial_evidence=20, update_interval=10,
                   bounds={'t1':(-20000, 20000), 't2':(-1, 1)}, acq_noise_var=[0.1*10000**2, 0.1], seed=seed)

#%%time
post = bolfi.fit(n_evidence=200)

#%%time 
result_BOLFI = bolfi.sample(1000, info_freq=100)

Current Output:

The acquisition function is screwed:
image

and also the marginals are uninformative:

image

Expected Output:

I expected a similar shape of the functions as for the unscaled model.

ELFI Version: 0.7

Python Version: 3.5

Operating System:

@vuolleko
Copy link
Member

I'm not an expert in GPs, but I think you should use a different kernel. The default kernel has just one length scale for all parameters. So for example something like:

kernel = GPy.kern.RBF(2, lengthscale=[100, 1], ARD=True)
kernel += GPy.kern.Bias(2)
bounds = {'t1':(-20000, 20000), 't2':(-1, 1)}
gp = elfi.GPyRegression(['t1', 't2'], bounds=bounds, kernel=kernel)
bolfi = elfi.BOLFI(log_d, batch_size=1, initial_evidence=20, update_interval=10,
                   bounds=bounds, target_model=gp, acq_noise_var=[100, 0.1], seed=seed)

I hope this helps you in the right direction.

Also, please update ELFI to the newest version 0.7.3. :)

@coschroeder
Copy link
Author

Thanks for your fast response. I tried out your suggestion and for the acquisition functions it looks like it helps.
Do you know, if I have to take a similar scaling into account for the NUTS sampling? Because as you can see in the plots below, the chains get stuck in the larger dimension and the posterior are still not that great.
The four chains:
image
And the resulting marginals of the posterior: True values are (6000,0.2)
image

@vuolleko
Copy link
Member

vuolleko commented Sep 4, 2018

I suppose very different scales can be problematic for NUTS as well, and we have considered implementing some kind of normalizing. You could also try the basic Metropolis sampler, which allows (requires, actually) manually setting the "scale" of each parameter.

Have you considered trying the opposite in your case: having roughly equal parameter scales and then upscaling in the simulator?

@coschroeder
Copy link
Author

Yeah, I already did a normalization/rescaling in my simulator and it seems to work.
I just thought this is an issue other users could be interested in, since different scales in parameters are quite common for real world problems.
Maybe it would be nice to drop a hint in the documentation. Reading the BOLFI paper this issue wasn't clear to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants