Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spaces in directory path cause errors in sampling #167

Closed
wesbarnett opened this issue Nov 7, 2019 · 14 comments · Fixed by #168

Comments

@wesbarnett
Copy link
Contributor

@wesbarnett wesbarnett commented Nov 7, 2019

Summary:

I was at the tutorial session at PyData NYC on Wednesday afternoon. I'm getting an error running some of the tutorial code on my Mac (but can run it fine on Linux).

Description:

Here is the python code, stan code, and data that are causing the error:

import os
from cmdstanpy import CmdStanModel
import pandas as pd

season_1975 = pd.read_csv('efron-morris-75-data.csv')
data_dict = {'N': season_1975.shape[0], 'y' : season_1975['Hits'].tolist(), 'K' : season_1975['At-Bats'].tolist()}

model_complete_pool = CmdStanModel(stan_file='simple_pool.stan')
model_complete_pool.compile()
complete_pool_fit = model_complete_pool.sample(data=data_dict)
complete_pool_fit.summary().round(decimals=2)
FirstName,LastName,At-Bats,Hits,BattingAverage,RemainingAt-Bats,RemainingAverage,SeasonAt-Bats,SeasonHits,SeasonAverage
Roberto,Clemente,45,18,0.4,367,0.346,412,145,0.352
Frank,Robinson,45,17,0.378,426,0.2981,471,144,0.306
Frank,Howard,45,16,0.356,521,0.2764,566,160,0.283
Jay,Johnstone,45,15,0.333,275,0.2218,320,76,0.238
Ken,Berry,45,14,0.311,418,0.2727,463,128,0.276
Jim,Spencer,45,14,0.311,466,0.2704,511,140,0.274
Don,Kessinger,45,13,0.289,586,0.2645,631,168,0.266
Luis,Alvarado,45,12,0.267,138,0.2101,183,41,0.224
Ron,Santo,45,11,0.244,510,0.2686,555,148,0.267
Ron,Swaboda,45,11,0.244,200,0.23,245,57,0.233
Rico,Petrocelli,45,10,0.222,538,0.2639,583,152,0.261
Ellie,Rodriguez,45,10,0.222,186,0.2258,231,52,0.225
George,Scott,45,10,0.222,435,0.3034,480,142,0.296
Del,Unser,45,10,0.222,277,0.2635,322,83,0.258
Billy,Williams,45,10,0.222,591,0.3299,636,205,0.251
Bert,Campaneris,45,9,0.2,558,0.2849,603,168,0.279
Thurman,Munson,45,8,0.178,408,0.3162,453,137,0.302
Max,Alvis,45,7,0.156,70,0.2,115,21,0.183
data {
  int<lower=0> N;           // items
  int<lower=0> K[N];        // initial trials
  int<lower=0> y[N];        // initial successes
}
parameters {
  real<lower=0, upper=1> phi;  // chance of success (pooled)
}
model {
  y ~ binomial(K, phi);  // likelihood
}

The error I am getting is the following:

INFO:cmdstanpy:stan to c++ (/var/folders/yl/v6d742w56rjdcr406hdxkfpr0000gn/T/tmpuy6wrr9z/tmpwf4d7zmm.hpp)
INFO:cmdstanpy:compiling c++
INFO:cmdstanpy:compiled model file: simple_pool
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:start chain 4
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    complete_pool_fit = model_complete_pool.sample(data=data_dict)
  File "/Users/jwbarnet/Documents/PyData NYC 2019/bayesian_inference/venv/lib/python3.7/site-packages/cmdstanpy/model.py", line 615, in sample
    raise RuntimeError(msg)
RuntimeError: Error during sampling, chain 0 returned error code -1, chain 1 returned error code -1, chain 2 returned error code -1, chain 3 returned error code -1
deleting tmpfiles dir: /var/folders/yl/v6d742w56rjdcr406hdxkfpr0000gn/T/tmpjwvcen19
done

Additional Information:

This error only occurs on MacOS. It does not seem to be a problem on my Linux machine. I've tried reinstalling both cmdstan and cmdstanpy.

Current Version:

cmdstanpy: 0.6.0
cmdstan: 2.21.0

@mitzimorris

This comment has been minimized.

Copy link
Member

@mitzimorris mitzimorris commented Nov 7, 2019

did you upgrade your Mac to Catalina?

@wesbarnett

This comment has been minimized.

Copy link
Contributor Author

@wesbarnett wesbarnett commented Nov 7, 2019

No, I'm on Mojave (10.14.5). (Thanks for the quick response!)

@mitzimorris

This comment has been minimized.

Copy link
Member

@mitzimorris mitzimorris commented Nov 7, 2019

complete_pool_fit = model_complete_pool.sample(data=data_dict)

I think what's going on is that this model has a really hard time fitting, given the parameterization. each sampler chain has its own random seed and it initializes the parameter values randomly between -2 and 2 in order to get warmup off the ground. a bad combination of random values will probably kill it.
so, 2 things to try:

a) first, run the sample command with csv_basename= set to a pathname:

complete_pool_fit = model_complete_pool.sample(data=data_dict, csv_basename='./foo')

this will create both csv files and txt files - the latter should have interesting error messages. also, any thoughts on #133 would be appreciated.

b) run the sample command setting adapt_delta at 0.9 or higher:

complete_pool_fit = model_complete_pool.sample(data=data_dict, adapt_delta='0.99', csv_basename='./foo')

CmdStanPy needs to do a better job when the sampler process throws an error - cf #141

@wesbarnett

This comment has been minimized.

Copy link
Contributor Author

@wesbarnett wesbarnett commented Nov 7, 2019

Unfortunately adding the csv output filename doesn't produce anything. It looks like it may be crashing before anything is output to the files. I also added adapt_delta and still get the same behavior, also with no output file (as an aside I had to change the parameter to a float from a String).

In contrast, I also ran the command from cmdstan outside of python and it seems to succeed (output not shown here):

./simple_pool data file=efron-morris-75-data.json sample

(I converted the csv to a json file):

{"N": 18, "y": [18, 17, 16, 15, 14, 14, 13, 12, 11, 11, 10, 10, 10, 10, 10, 9, 8, 7], "K": [45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45]}
@mitzimorris

This comment has been minimized.

Copy link
Member

@mitzimorris mitzimorris commented Nov 7, 2019

Doh! sorry, so none of the models work - simple_pool, simple_no_pool - OK -
and if you run from CmdStanPy and specify json file as input instead of Dict, does that work?

complete_pool_fit = model_complete_pool.sample(data='efron-morris-75-data.json')
@wesbarnett

This comment has been minimized.

Copy link
Contributor Author

@wesbarnett wesbarnett commented Nov 7, 2019

Specifying the json file also does not work. None of the examples from the workshop work (in the Jupyter notebooks both for the women's soccer and baseball examples), but the basic example from the documentation does work:

import os
from cmdstanpy import CmdStanModel, cmdstan_path

bernoulli_path = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.stan')
bernoulli_model = CmdStanModel(stan_file=bernoulli_path)
bernoulli_model.compile()

bernoulli_data = { "N" : 10, "y" : [0,1,0,0,0,0,0,0,0,1] }
bernoulli_fit = bernoulli_model.sample(chains=5, cores=3, data=bernoulli_data)

bernoulli_fit.summary()
@mitzimorris

This comment has been minimized.

Copy link
Member

@mitzimorris mitzimorris commented Nov 7, 2019

what does the model object report about itself?
please try this:

model_complete_pool = CmdStanModel(stan_file='simple_pool.stan')
model_complete_pool.compile()
print(model_complete_pool)
@mitzimorris

This comment has been minimized.

Copy link
Member

@mitzimorris mitzimorris commented Nov 7, 2019

BTW, thank you so much for your patience - you are a much valued beta tester!

@mitzimorris

This comment has been minimized.

Copy link
Member

@mitzimorris mitzimorris commented Nov 7, 2019

OK, recreated bug on my machine - it's the fact that there's a space in the directory name -

here = os.path.dirname(os.path.abspath('.'))
>>> here
'/Users/mitzi/Test Space'
@wesbarnett

This comment has been minimized.

Copy link
Contributor Author

@wesbarnett wesbarnett commented Nov 7, 2019

Oh wow, good catch! I can confirm, that was the case. After renaming my directory with spaces, it now runs. I would be willing to work on a fix. Would love to try to contribute.

@wesbarnett wesbarnett changed the title Error during sampling, chain 0 returned error code -1 on MacOS Spaces in directory path cause errors in sampling Nov 7, 2019
@ahartikainen

This comment has been minimized.

Copy link
Contributor

@ahartikainen ahartikainen commented Nov 7, 2019

@mitzimorris has something changed in Maybe... class?

Main problem is probably the bug that we create a cmd string and split it with str.split(). (Not a problem in Windows where we use shortpaths).

@mitzimorris

This comment has been minimized.

Copy link
Member

@mitzimorris mitzimorris commented Nov 8, 2019

Main problem is probably the bug that we create a cmd string and split it with str.split().

exactly!
@ahartikainen spotted this a while back - #91
@wesbarnett - good first issue?

@wesbarnett

This comment has been minimized.

Copy link
Contributor Author

@wesbarnett wesbarnett commented Nov 8, 2019

Sure, I'll tackle #91.

@wesbarnett

This comment has been minimized.

Copy link
Contributor Author

@wesbarnett wesbarnett commented Nov 8, 2019

This line is actually also an issue. I believe it needs to include the entire path.

@wesbarnett wesbarnett referenced this issue Nov 8, 2019
2 of 2 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.