Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug with (cmd)stan(py)’s optimisation process running a probabilistic PCA #664

Closed
InfProbSciX opened this issue Mar 26, 2023 · 4 comments

Comments

@InfProbSciX
Copy link

InfProbSciX commented Mar 26, 2023

Posting here as the stan forums post didn't get any replies. I would like to understand why I’m seeing the results that I’m seeing running a PCA with Stan on some simulated data.


The data: is nine different mnist digits rotated a bunch of times (the issue below also occurs with different datasets).


The model: Y | X ~ N(0, XX^T + vI)
(this is matrix normal notation for every column of Y being independently drawn from a zero mean multivariate normal with a dot product covariance function applied to the latents).

The model famously corresponds to probabilistic PCA (the maximum likelihood solution for $\mathbf{X} | \mathbf{Y}$ is attained at the scaled eigenvectors of the sample covariance).


The problem: If I use unconstrained latents, I get a different solution to if I use constrained latents. This is very surprising - is this caused by numerical instabilities? If so, why?

If I set the line (in my parameters block below) as follows:

    matrix<lower=-100, upper=100>[n, q] X;  // latents

I obtain the latents:

This are incorrect looking latents.
If instead the line is changed to:

    matrix[n, q] X;  // latents

the result becomes:

which is absolutely fine - it’s a rotation of the eigenvectors and since the posterior is invariant to rotation, this makes sense.
I don’t understand this behaviour assuming that Stan places an improper uniform prior on the constrained parameters.

What’s more is that, using a bridgestan model that uses the constrained parameters, and using scipy’s optimize to optimize the parameters works and generates the correct solution (and not the weird axis aligned solution). Could someone shed some light into what’s going on?

I’m using cmdstan 2.31.0.

My code (a minimal reproducing example) is as follows:

import torch
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from cmdstanpy import CmdStanModel
from torchvision.transforms.functional import rotate
from tensorflow.keras.datasets.mnist import load_data

plt.ion(); plt.style.use('seaborn-pastel')
np.random.seed(42)

with open('dr.stan', 'w') as f: f.write("""
data {
    int n;  // num data
    int d;  // num data dims
    int q;  // num latents
    matrix[n, d] Y;  // data
}
transformed data {
    vector[d] w = rep_vector(1.0, d);
}
parameters {
    matrix<lower=-100, upper=100>[n, q] X;  // latents
    real<lower=1e-6, upper=1> sigma_sq;
}
model {
    Y' ~ multi_gp(add_diag(X * X', sigma_sq), w);
}
""")

def plot(X):
    plot_df = pd.DataFrame(dict(x=X[:, 0], y=X[:, 1], hue=c.astype(str)))
    plot_df = plot_df.set_index('hue').sort_index().reset_index()
    sns.scatterplot(data=plot_df, x='x', y='y', hue='hue', palette='Paired')

if __name__ == '__main__':

    (_, _), (Y, c) = load_data()

    c = np.arange(10)
    Y = np.concatenate([Y[[np.where(c == lb)[0][0]], ...] for lb in c], axis=0)
    Ys = []
    for Yi in Y:
        for angle in np.linspace(0, 350, 25):
            Ys.append(rotate(torch.tensor(Yi)[None, ...], angle))
    Y = torch.cat(Ys, axis=0).numpy().reshape(-1, 28**2)/255
    c = np.repeat(c, 25)

    Y += np.random.normal(0, 0.05, size=Y.shape)
    Y -= Y.mean(axis=0)
    Y -= Y.mean(axis=1)[..., None]

    (n, d), q = Y.shape, 2

    pd.Series(dict(
        Y=Y, n=n, d=d, q=2,
    )).to_json('data.json')

    model = CmdStanModel(stan_file='dr.stan')
    fit = model.optimize(data='data.json', show_console=True, iter=int(1e5), seed=42)

    X = fit.stan_variable('X')
    plot(X)
@WardBrian
Copy link
Member

What’s more is that, using a bridgestan model that uses the constrained parameters, and using scipy’s optimize to optimize the parameters works and generates the correct solution

Just to check, is this bridgestan version being done with Jacobian=True?

@InfProbSciX
Copy link
Author

InfProbSciX commented Mar 26, 2023

I didn't change that flag, what I did was something like (this - the code below isn't a MWE... can provide one if you'd like)

import bridgestan as bs
from scipy.optimize import minimize
bs.set_bridgestan_path('/Users/aditya/github/bridgestan')

bs_model = bs.StanModel.from_stan_file("dr.stan", "data.json")

minimize(
    lambda x: -bs_model.log_density(x),
    np.hstack([X.reshape(-1), 0.0]),
)

@WardBrian
Copy link
Member

WardBrian commented Mar 26, 2023

Stan's optimization by default does not apply jacobian adjustments (this will be configurable next release: #649), but the default in bridgestan is to apply them.

I suspect this is the cause of the confusion, since the statement "Stan places an improper uniform prior on the constrained parameters" is only true in the presence of the jacobian adjustment for the change of variables.

What happens if you set jacobian=False in your bridgestan example?

@InfProbSciX
Copy link
Author

Damn, great catch - I assumed that stan was adding the jacobians when automated transforms are used. I tried this with bridgestan with your suggestion, and it does indeed produce quite similar results to cmdstanpy.

I'm curious, does this mean that

real<lower=0, upper=1> x;
...
x ~ uniform(0, 1);

doesn't place a uniform density over x currently? (I'd guess that a constant is added to the target and not the jacobian here). It's a very subtle point, but is very interesting as constrains do weird things to posteriors in terms of sparsity, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants