Problem with missing data #79

poncev · 2024-01-04T02:04:01Z

According to this tutorial, missing data can be modelled as a FlatNormal, then I did so for my first observations. The algorithm I run is Bootstrap, and then SMC.

When I run the algorithm, all the weights are nan for every time and the result is meaningless. To fix the problem I return the distribution Dirac(loc=np.zeros_like(x)) for the first observations. The algorithm runs well and the results are nice, but I think this is not the expected behaviour with FlatNormal.

===========

I have other question. I fit my experimental data to my model using cmdstanpy, and I plan to use particles to filter online data in production. The output of stan are samples of parameters, so I think I should run the particle filtering with different samples of the parameters fitted in stan. Does it make sense? is there a way to run the same model with different parameters in particles? I can use a for loop, but they are slow in Python. I am not a Bayesian, but a user, so naive questions.

Thank you for this amazing package!

The text was updated successfully, but these errors were encountered:

nchopin · 2024-01-10T13:58:19Z

Hi, sorry for the slow answer, I was taking holidays.

please post a MRE (minimal reproducible example), so that I can better understand what you tried to do and how it failed.
yes, but first, not all loops are slow, only looks with a "lean body" (when the body of the loop does very little computation). If, instead, you loop over something expensive, you are not going to lose much.
Anyway, particles makes it possible to run several particle filters in parallel (using all your CPU cores, instead of just one), to see how it works, have a look at module utils:
https://particles-sequential-monte-carlo-in-python.readthedocs.io/en/latest/_autosummary/particles.utils.html#module-particles.utils
and function multiSMC in the core module, which is illustrated in this tutorial:
https://particles-sequential-monte-carlo-in-python.readthedocs.io/en/latest/notebooks/advanced_tutorial_ssm.html#Running-many-particle-filters-in-one-go

poncev · 2024-01-10T14:13:48Z

A minimal code would be:

class ToyModelWithMissingData(ssms.StateSpaceModel):
    def PX0(self):
        return dists.Normal(scale=self.sigmaX)
    def PX(self, t, xp):
        return dists.Normal(loc=xp, scale=self.sigmaX)
    def PY(self, t, xp, x):
        if t <= 10:
            return dists.FlatNormal(loc=x)
        else:
            return dists.Normal(loc=x, scale=self.sigmaY)

Now, if I run the Bootstrap for an instance of the class (toy_model), and some simulated data:

from particles.collectors import Moments

fk_model = ssm.Bootstrap(ssm=toy_model, data=data)
pf = particles.SMC(
    fk=fk_model, N=100, collect=[Moments()])
pf.run()

Then, everything gets populated by nans. Momentarily, I fixed it replacing FlatNormal by Dirac(loc=np.zeros_like(x)).

nchopin · 2024-01-10T14:33:43Z

Ok, I tried to fill in the blanks, in order to turn your pieces of code into an actual MRE, this is what I got, it seems to work for me?

import particles  # was missing
from particles.collectors import Moments 
from particles import distributions as dists  # was missing
from particles import state_space_models as ssms  # was missing

class ToyModelWithMissingData(ssms.StateSpaceModel):
    def PX0(self):
        return dists.Normal(scale=self.sigmaX)
    def PX(self, t, xp):
        return dists.Normal(loc=xp, scale=self.sigmaX)
    def PY(self, t, xp, x):
        if t <= 10:
            return dists.FlatNormal(loc=x)
        else:
            return dists.Normal(loc=x, scale=self.sigmaY)

toy_model =  ToyModelWithMissingData(sigmaX=0.5, sigmaY=0.1)   # was missing
data = np.ones(30)  # artificial data, was missing
fk_model = ssms.Bootstrap(ssm=toy_model, data=data)  # fixed typo
pf = particles.SMC(fk=fk_model, N=100, collect=[Moments()])
pf.run()

print(pf.summaries.moments)  # prints filtering mean/var at each time t (I don't get Nans)

poncev · 2024-01-10T15:45:48Z

I was careless with my MRE. Now that I try to reproduce it, I realize that in my case states, data = toy_model.simulate(100), so it contains nans in the first 10 entries. In your data = np.ones(30) there is no nan, and it also runs well for me.

nchopin · 2024-01-11T10:43:27Z

ok, this is an actual bug then, FlatNormal.logpdf should not return Nan when a data point is Nan.
I pushed a fix on the experimental branch. Let me know if this works for you. This issue will close automatically when the fix is propagated to the master branch.

poncev · 2024-01-11T23:26:56Z

Thank you! It is running well, I tested the code

import matplotlib.pyplot as plt
import particles
from particles.collectors import Moments 
from particles import distributions as dists
from particles import state_space_models as ssms

class ToyModelWithMissingData(ssms.StateSpaceModel):
    def PX0(self):
        return dists.Normal(scale=self.sigmaX)
    def PX(self, t, xp):
        return dists.Normal(loc=xp, scale=self.sigmaX)
    def PY(self, t, xp, x):
        if t <= 10:
            return dists.FlatNormal(loc=x)
        else:
            return dists.Normal(loc=x, scale=self.sigmaY)

toy_model =  ToyModelWithMissingData(sigmaX=0.5, sigmaY=0.1)
states, data = toy_model.simulate(100)

fk_model = ssms.Bootstrap(ssm=toy_model, data=data)
pf = particles.SMC(
    fk=fk_model, N=100, collect=[Moments()],
    store_history=True)
pf.run()

plt.plot(states, label='data')
plt.plot([m['mean'] for m in pf.summaries.moments], label='filter')
plt.legend()

plt.show()

It estimates well the original states for t>10.

nchopin added a commit that referenced this issue Jan 11, 2024

Fix #79 (FlatNormal.logpdf should not return NaN when x is NaN)

43d6003

nchopin added a commit that referenced this issue Jan 11, 2024

More work on #79

c1ea6f2

poncev closed this as completed Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with missing data #79

Problem with missing data #79

poncev commented Jan 4, 2024

nchopin commented Jan 10, 2024

poncev commented Jan 10, 2024

nchopin commented Jan 10, 2024 •

edited

poncev commented Jan 10, 2024

nchopin commented Jan 11, 2024

poncev commented Jan 11, 2024

Problem with missing data #79

Problem with missing data #79

Comments

poncev commented Jan 4, 2024

nchopin commented Jan 10, 2024

poncev commented Jan 10, 2024

nchopin commented Jan 10, 2024 • edited

poncev commented Jan 10, 2024

nchopin commented Jan 11, 2024

poncev commented Jan 11, 2024

nchopin commented Jan 10, 2024 •

edited