Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with missing data #79

Closed
poncev opened this issue Jan 4, 2024 · 6 comments
Closed

Problem with missing data #79

poncev opened this issue Jan 4, 2024 · 6 comments

Comments

@poncev
Copy link

poncev commented Jan 4, 2024

According to this tutorial, missing data can be modelled as a FlatNormal, then I did so for my first observations. The algorithm I run is Bootstrap, and then SMC.

When I run the algorithm, all the weights are nan for every time and the result is meaningless. To fix the problem I return the distribution Dirac(loc=np.zeros_like(x)) for the first observations. The algorithm runs well and the results are nice, but I think this is not the expected behaviour with FlatNormal.

===========

I have other question. I fit my experimental data to my model using cmdstanpy, and I plan to use particles to filter online data in production. The output of stan are samples of parameters, so I think I should run the particle filtering with different samples of the parameters fitted in stan. Does it make sense? is there a way to run the same model with different parameters in particles? I can use a for loop, but they are slow in Python. I am not a Bayesian, but a user, so naive questions.

Thank you for this amazing package!

@nchopin
Copy link
Owner

nchopin commented Jan 10, 2024

Hi, sorry for the slow answer, I was taking holidays.

  1. please post a MRE (minimal reproducible example), so that I can better understand what you tried to do and how it failed.
  2. yes, but first, not all loops are slow, only looks with a "lean body" (when the body of the loop does very little computation). If, instead, you loop over something expensive, you are not going to lose much.
    Anyway, particles makes it possible to run several particle filters in parallel (using all your CPU cores, instead of just one), to see how it works, have a look at module utils:
    https://particles-sequential-monte-carlo-in-python.readthedocs.io/en/latest/_autosummary/particles.utils.html#module-particles.utils
    and function multiSMC in the core module, which is illustrated in this tutorial:
    https://particles-sequential-monte-carlo-in-python.readthedocs.io/en/latest/notebooks/advanced_tutorial_ssm.html#Running-many-particle-filters-in-one-go

@poncev
Copy link
Author

poncev commented Jan 10, 2024

A minimal code would be:

class ToyModelWithMissingData(ssms.StateSpaceModel):
    def PX0(self):
        return dists.Normal(scale=self.sigmaX)
    def PX(self, t, xp):
        return dists.Normal(loc=xp, scale=self.sigmaX)
    def PY(self, t, xp, x):
        if t <= 10:
            return dists.FlatNormal(loc=x)
        else:
            return dists.Normal(loc=x, scale=self.sigmaY)

Now, if I run the Bootstrap for an instance of the class (toy_model), and some simulated data:

from particles.collectors import Moments

fk_model = ssm.Bootstrap(ssm=toy_model, data=data)
pf = particles.SMC(
    fk=fk_model, N=100, collect=[Moments()])
pf.run()

Then, everything gets populated by nans. Momentarily, I fixed it replacing FlatNormal by Dirac(loc=np.zeros_like(x)).

@nchopin
Copy link
Owner

nchopin commented Jan 10, 2024

Ok, I tried to fill in the blanks, in order to turn your pieces of code into an actual MRE, this is what I got, it seems to work for me?

import particles  # was missing
from particles.collectors import Moments 
from particles import distributions as dists  # was missing
from particles import state_space_models as ssms  # was missing

class ToyModelWithMissingData(ssms.StateSpaceModel):
    def PX0(self):
        return dists.Normal(scale=self.sigmaX)
    def PX(self, t, xp):
        return dists.Normal(loc=xp, scale=self.sigmaX)
    def PY(self, t, xp, x):
        if t <= 10:
            return dists.FlatNormal(loc=x)
        else:
            return dists.Normal(loc=x, scale=self.sigmaY)

toy_model =  ToyModelWithMissingData(sigmaX=0.5, sigmaY=0.1)   # was missing
data = np.ones(30)  # artificial data, was missing
fk_model = ssms.Bootstrap(ssm=toy_model, data=data)  # fixed typo
pf = particles.SMC(fk=fk_model, N=100, collect=[Moments()])
pf.run()

print(pf.summaries.moments)  # prints filtering mean/var at each time t (I don't get Nans)

@poncev
Copy link
Author

poncev commented Jan 10, 2024

I was careless with my MRE. Now that I try to reproduce it, I realize that in my case states, data = toy_model.simulate(100), so it contains nans in the first 10 entries. In your data = np.ones(30) there is no nan, and it also runs well for me.

nchopin added a commit that referenced this issue Jan 11, 2024
@nchopin
Copy link
Owner

nchopin commented Jan 11, 2024

ok, this is an actual bug then, FlatNormal.logpdf should not return Nan when a data point is Nan.
I pushed a fix on the experimental branch. Let me know if this works for you. This issue will close automatically when the fix is propagated to the master branch.

@poncev
Copy link
Author

poncev commented Jan 11, 2024

Thank you! It is running well, I tested the code

import matplotlib.pyplot as plt
import particles
from particles.collectors import Moments 
from particles import distributions as dists
from particles import state_space_models as ssms

class ToyModelWithMissingData(ssms.StateSpaceModel):
    def PX0(self):
        return dists.Normal(scale=self.sigmaX)
    def PX(self, t, xp):
        return dists.Normal(loc=xp, scale=self.sigmaX)
    def PY(self, t, xp, x):
        if t <= 10:
            return dists.FlatNormal(loc=x)
        else:
            return dists.Normal(loc=x, scale=self.sigmaY)

toy_model =  ToyModelWithMissingData(sigmaX=0.5, sigmaY=0.1)
states, data = toy_model.simulate(100)

fk_model = ssms.Bootstrap(ssm=toy_model, data=data)
pf = particles.SMC(
    fk=fk_model, N=100, collect=[Moments()],
    store_history=True)
pf.run()

plt.plot(states, label='data')
plt.plot([m['mean'] for m in pf.summaries.moments], label='filter')
plt.legend()

plt.show()

It estimates well the original states for t>10.

@poncev poncev closed this as completed Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants