Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABC rejection comparing two time series in deterministic Lotka-Volterra model #222

Closed
p-robot opened this issue Aug 8, 2017 · 5 comments

Comments

@p-robot
Copy link

p-robot commented Aug 8, 2017

Summary:

Running a deterministic Lotka-Volterra example parameterised using ABC rejection. This follows the example outlined in Toni et al, (2009) where two parameters are to be found in predator-prey system. It is required to choose parameters that minimize the sum of squared errors between the observed and simulated time series. Priors for the parameters are taken from the paper, observed data are estimated from the paper above.

Rather than outputting a multi-dimensional array from the LV model, I've concatenated results from the model into a 1D array.

Not quite sure which of these functions I need to change. Perhaps the LV() function needs to allow processing of vector inputs? Any help would be much appreciated!

Description:

The Lotka-Volterra system of ODEs is created in Python, priors are defined, a Simulator object is defined (and tested), distance measure defined (and tested), ABC rejection sampler defined, but sampling from the rejection sampler throws an error.

Reproducible Steps:

The following code reproduces the error.

import numpy as np
import elfi
from scipy.integrate import odeint

seed = 2017
np.random.seed(seed)

# Define the observed data and flatten the array
y_obs_lv = np.array([[1.9, 0.5],[0.63, 2.6],[0.2, 1.55],[0.3, 0.03],
    [1.65, 1.15],[1.15, 1.68],[0.25, 1.1],[2.94, 0.94]])
y_obs_lv = y_obs_lv.flatten()

# Define the Lotka-Volterra derivative function
def LVderivative(xy, t, a, b):
    x, y = xy
    
    dx = a*x - x*y
    dy = b*x*y - y
    return [dx, dy]

# Define the function as input to the elfi.Simulator object
def LV(a, b, batch_size = 1, random_state = None):
    
    # Initial conditions state
    y0 = [1.0, 0.5]
    
    # Define times over which to solve the system of ODEs
    times = np.linspace(0, 15, 151)
    
    # Actually solve the ODE
    result = odeint(LVderivative, y0, times, args = (*a, *b))
    
    # Define the subset of times of interest
    t_obs = np.array([1.2, 2.4, 3.9, 5.7, 7.5, 9.6, 11.9, 14.5])*10
    t_obs = t_obs.astype(int)
    
    # Subset the solution and flatten the returned output
    return result[t_obs,:].flatten()

# Define the priors for alpha and beta
a = elfi.Prior('uniform', -10, 20, name = 'a')
b = elfi.Prior('uniform', -10, 20, name = 'b')

# Define the elfi.Simulator object
Y_lv = elfi.Simulator(LV, a, b, observed = y_obs_lv)

# Check the Simulator object can generate data
Y_lv.generate()
> array([  5.73756329e-02,   3.45129771e-03,   3.49858603e-03,
         8.29044191e-04,   1.06221009e-04,   1.82366487e-04,
         1.60337990e-06,   3.01316351e-05,   2.38143636e-08,
         4.98299215e-06,  -1.46843950e-09,   6.08976108e-07,
        -1.32007952e-09,   5.80888435e-08,   5.60478623e-11,
         3.97446214e-09])

# Define sum of squared errors as the distance function
def SSE(x, y):
 return np.sum((x-y)**2)
d_lv = elfi.Distance(SSE, Y_lv)

# Check the Distance object can generate SSE correctly
d_lv.generate()
> 516.5330268484762

# Run ABC-rejection
rej = elfi.Rejection(d_lv, batch_size = 1, seed = seed)
result = rej.sample(10000, threshold = 4.3)

Current Output:

Current output gives the following error

> TypeError: object of type 'numpy.float64' has no len()

ELFI Version:

0.6.0

Python Version:

3.6.1

Operating System:

OSX 10.12.5

@p-robot p-robot changed the title ABC rejection comparing two time series ABC rejection comparing two time series in deterministic Lotka-Volterra model Aug 8, 2017
@jlintusaari
Copy link
Contributor

jlintusaari commented Aug 8, 2017

Hi, the problem is that ELFI expects the outputs of these operations to be numpy arrays of length batch_size (even with batch_size=1). Currently your distance function SSE returns a numpy scalar. So you can either vectorize it with elfi.tools.vectorize, or as a quick solution to write e.g.

def SSE(x, y):
   return np.atleast_1d(np.sum((x-y)**2))

However your simulator should also wrap the results in numpy arrays with length batch_size (in this case it would be a 2D numpy array), so that it would work with other batch sizes besides 1. You can either do that yourself or use the elfi.tools.vectorize. This in turn then requires that your distance handles them correctly (assuming your observed data is also 2D array):

def SSE(x, y):
    return np.sum((x-y)**2, axis=1)

We should think a way to detect these cases and report a better error message since these seem to be quite common.

@p-robot
Copy link
Author

p-robot commented Aug 9, 2017

Thanks for the quick response and suggested adjustment.

I made the adjustment to the distance function and continued on with the example. However, now the rejection sampler hangs and if I kill the process (ctrl-C) and plot the parameter space that's being explored then it looks like it's searching in the correct area but it seems to take a while to reach the result (roughly a,b = 1). I was curious about ways to speed this up?

# (as example above ...)

def SSE(x, y):
   return np.atleast_1d(np.sum((x-y)**2))

rej = elfi.Rejection(d_lv, batch_size = 1, seed = seed)
result = rej.sample(1000, threshold = 30.0)

# kill the process after about a minute (ctrl-C)

# Plot the parameter space that's been explored
fig, ax = plt.subplots()
rej.plot_state(ax = ax)
ax.set_xlim([-11, 11])
ax.set_ylim([-11, 11])
plt.savefig("parameter_space.png")
plt.close()

parameter_space

Additionally, when outputting the result with rej.extract_result() the threshold parameter seems to have been ignored (it was set to 30.0).

rej.extract_result()

Method: Rejection
Number of samples: 1000
Number of simulations: 21063
Threshold: 484
Sample means: a: -0.518, b: 2

@vuolleko
Copy link
Member

There are several things you can try to speed things up, for example:

  1. Is requesting threshold=30 reasonable? ELFI will keep sampling until it finds 1000 samples below that threshold, which for a very small threshold may never happen. To be safe, you could initially request e.g. quantile=0.01 instead of the threshold. ELFI will then draw 1000 * 1 / 0.01 samples, return the best 1000 and report the resulting threshold value. That said, it seems ELFI ran only 21063 simulations in your case, so even this would apparently take several minutes.

  2. Enable parallelization. See the documentation.

  3. Use a more advanced inference method, e.g. BOLFI.

  4. This is probably the most effective way, but also the most difficult (and not always even possible): try to vectorize your simulator so that you could use and benefit from a larger batch_size.

Since you killed the process, what you get using extract_result may not be what you want. I suspect there were not enough samples below the given threshold.

@p-robot
Copy link
Author

p-robot commented Aug 10, 2017

Thanks for those ideas.

I used a threshold of 30.0 because that is what is quoted as the first threshold in ABC-SMC in the paper by Toni et al., (2009) (assuming the first threshold of ABC-SMC would be a suitable threshold for ABC-rejection). They report ~26,000 data generation steps to find 1000 accepted samples (with ABC-SMC).

@jlintusaari
Copy link
Contributor

Also ABC-SMC speeds up the inference compared to Rejection ABC. You might be interested in the iterative advancing of the algorithm:

http://elfi.readthedocs.io/en/latest/usage/tutorial.html#iterative-advancing

Basically you set the objective and then you can advance the inference step by step and investigate the result at any point. Internally the sample method runs the iterative advancing loop until the objective is reached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants