New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_dataframe() does not include all variables #451

Open
Bonnevie opened this Issue Apr 4, 2018 · 6 comments

Comments

Projects
None yet
2 participants
@Bonnevie

Bonnevie commented Apr 4, 2018

I realize this is still an experimental feature, and I am not not really sure how to make a MWE of this, but there is a discrepancy between the output I get from my StanFit object and the dataframe returned by to_dataframe(). In particular, I have matrix variables gamma and gamma_tilde of size (3,6) and (3,5), respectively, but while my fit object has elements for all variables, the dataframe is missing some.

E.g. running something like:

print([x for x in fit.flatnames if x.startswith("gamma")])
df = fit.to_dataframe()
print(df.filter(regex="gamma",axis=1).columns)

gives me the two variable name lists

['gamma_tilde[1,1]', 'gamma_tilde[2,1]', 'gamma_tilde[3,1]', 'gamma_tilde[1,2]', 'gamma_tilde[2,2]', 'gamma_tilde[3,2]', 'gamma_tilde[1,3]', 'gamma_tilde[2,3]', 'gamma_tilde[3,3]', 'gamma_tilde[1,4]', 'gamma_tilde[2,4]', 'gamma_tilde[3,4]', 'gamma_tilde[1,5]', 'gamma_tilde[2,5]', 'gamma_tilde[3,5]', 'gamma[1,1]', 'gamma[2,1]', 'gamma[3,1]', 'gamma[1,2]', 'gamma[2,2]', 'gamma[3,2]', 'gamma[1,3]', 'gamma[2,3]', 'gamma[3,3]', 'gamma[1,4]', 'gamma[2,4]', 'gamma[3,4]', 'gamma[1,5]', 'gamma[2,5]', 'gamma[3,5]', 'gamma[1,6]', 'gamma[2,6]', 'gamma[3,6]']
Index(['gamma_tilde[1,1]', 'gamma_tilde[2,1]', 'gamma_tilde[3,1]',
       'gamma_tilde[1,2]', 'gamma_tilde[2,2]', 'gamma_tilde[3,2]',
       'gamma_tilde[1,3]', 'gamma_tilde[2,3]', 'gamma_tilde[3,3]',
       'gamma_tilde[1,4]', 'gamma_tilde[2,4]', 'gamma_tilde[3,4]',
       'gamma_tilde[1,5]', 'gamma_tilde[2,5]', 'gamma_tilde[3,5]',
       'gamma[1,1]', 'gamma[2,1]', 'gamma[3,1]'],
      dtype='object')

where one is clearly shorter than the other, missing most (but oddly not all) of the gamma variables.

(This is not an issue with the regex; the key gamma[1,2] is for instance not present in the dataframe if queried directly. The behaviour is also consistent for both permuted=True and permuted=False).

@ahartikainen

This comment has been minimized.

Show comment
Hide comment
@ahartikainen

ahartikainen Apr 4, 2018

Collaborator

Looks like a bug. I have to test this.

Here is one possible problem:

par_flatnames = [
            flatname for flatname in fit.flatnames if flatname.startswith(par)
            ]
Collaborator

ahartikainen commented Apr 4, 2018

Looks like a bug. I have to test this.

Here is one possible problem:

par_flatnames = [
            flatname for flatname in fit.flatnames if flatname.startswith(par)
            ]
@Bonnevie

This comment has been minimized.

Show comment
Hide comment
@Bonnevie

Bonnevie Apr 5, 2018

assuming par is something like gamma, then that should match both gamma_tilde[i,j] and gamma[i,j], which could cause trouble.

Bonnevie commented Apr 5, 2018

assuming par is something like gamma, then that should match both gamma_tilde[i,j] and gamma[i,j], which could cause trouble.

@Bonnevie

This comment has been minimized.

Show comment
Hide comment
@Bonnevie

Bonnevie Apr 6, 2018

The simple fix to par_flatnames might be to just do flatname.startswith(par + '[') or flatname==par. I believe Stan has [ as an illegal character, which should prevent the first filter from hitting anything other than the appropriate array elements.

Bonnevie commented Apr 6, 2018

The simple fix to par_flatnames might be to just do flatname.startswith(par + '[') or flatname==par. I believe Stan has [ as an illegal character, which should prevent the first filter from hitting anything other than the appropriate array elements.

@ahartikainen

This comment has been minimized.

Show comment
Hide comment
@ahartikainen

ahartikainen Apr 6, 2018

Collaborator

Yes, there is also this in the code (doing different thing):

if (par in pars) or (par[:par.find('[')] in pars)
Collaborator

ahartikainen commented Apr 6, 2018

Yes, there is also this in the code (doing different thing):

if (par in pars) or (par[:par.find('[')] in pars)
@ahartikainen

This comment has been minimized.

Show comment
Hide comment
@ahartikainen

ahartikainen Apr 25, 2018

Collaborator

Hi, I tested this and I could not reproduce the error.

model_code = """
parameters {
  matrix[3,6] gamma;
  matrix[3,5] gamma_tilde;
}
model {
  for (i in 1:3) {
    to_vector(gamma[i]) ~ normal(0,1);
    to_vector(gamma_tilde[i]) ~ normal(1,2);
  }
}
"""

sm = pystan.StanModel(model_code=model_code)
fit = sm.sampling()

df = fit.to_dataframe(permuted=False)

df.columns

Index(['chain', 'chain_idx', 'warmup', 'divergent__', 'energy__',
   'treedepth__', 'accept_stat__', 'stepsize__', 'n_leapfrog__',
   'gamma[1,1]', 'gamma[2,1]', 'gamma[3,1]', 'gamma[1,2]', 'gamma[2,2]',
   'gamma[3,2]', 'gamma[1,3]', 'gamma[2,3]', 'gamma[3,3]', 'gamma[1,4]',
   'gamma[2,4]', 'gamma[3,4]', 'gamma[1,5]', 'gamma[2,5]', 'gamma[3,5]',
   'gamma[1,6]', 'gamma[2,6]', 'gamma[3,6]', 'gamma_tilde[1,1]',
   'gamma_tilde[2,1]', 'gamma_tilde[3,1]', 'gamma_tilde[1,2]',
   'gamma_tilde[2,2]', 'gamma_tilde[3,2]', 'gamma_tilde[1,3]',
   'gamma_tilde[2,3]', 'gamma_tilde[3,3]', 'gamma_tilde[1,4]',
   'gamma_tilde[2,4]', 'gamma_tilde[3,4]', 'gamma_tilde[1,5]',
   'gamma_tilde[2,5]', 'gamma_tilde[3,5]', 'lp__'],
  dtype='object')

I also tried changing positions for gamma and gamma_tilde, but it still worked.

Filter function works correctly.

What pandas version do you have?

Collaborator

ahartikainen commented Apr 25, 2018

Hi, I tested this and I could not reproduce the error.

model_code = """
parameters {
  matrix[3,6] gamma;
  matrix[3,5] gamma_tilde;
}
model {
  for (i in 1:3) {
    to_vector(gamma[i]) ~ normal(0,1);
    to_vector(gamma_tilde[i]) ~ normal(1,2);
  }
}
"""

sm = pystan.StanModel(model_code=model_code)
fit = sm.sampling()

df = fit.to_dataframe(permuted=False)

df.columns

Index(['chain', 'chain_idx', 'warmup', 'divergent__', 'energy__',
   'treedepth__', 'accept_stat__', 'stepsize__', 'n_leapfrog__',
   'gamma[1,1]', 'gamma[2,1]', 'gamma[3,1]', 'gamma[1,2]', 'gamma[2,2]',
   'gamma[3,2]', 'gamma[1,3]', 'gamma[2,3]', 'gamma[3,3]', 'gamma[1,4]',
   'gamma[2,4]', 'gamma[3,4]', 'gamma[1,5]', 'gamma[2,5]', 'gamma[3,5]',
   'gamma[1,6]', 'gamma[2,6]', 'gamma[3,6]', 'gamma_tilde[1,1]',
   'gamma_tilde[2,1]', 'gamma_tilde[3,1]', 'gamma_tilde[1,2]',
   'gamma_tilde[2,2]', 'gamma_tilde[3,2]', 'gamma_tilde[1,3]',
   'gamma_tilde[2,3]', 'gamma_tilde[3,3]', 'gamma_tilde[1,4]',
   'gamma_tilde[2,4]', 'gamma_tilde[3,4]', 'gamma_tilde[1,5]',
   'gamma_tilde[2,5]', 'gamma_tilde[3,5]', 'lp__'],
  dtype='object')

I also tried changing positions for gamma and gamma_tilde, but it still worked.

Filter function works correctly.

What pandas version do you have?

@Bonnevie

This comment has been minimized.

Show comment
Hide comment
@Bonnevie

Bonnevie Apr 30, 2018

Hm, running your snippet I can replicate your results. My actual parameter list is perhaps unsurprisingly quite a lot longer, containing a number of matrix variables. Is there any chance I am hitting some sort of ceiling with respect to the number of variables? Seems unlikely that something like that would fail silently.

edit: Pandas version is 0.22

Bonnevie commented Apr 30, 2018

Hm, running your snippet I can replicate your results. My actual parameter list is perhaps unsurprisingly quite a lot longer, containing a number of matrix variables. Is there any chance I am hitting some sort of ceiling with respect to the number of variables? Seems unlikely that something like that would fail silently.

edit: Pandas version is 0.22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment