New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running multiple chains causes RecursionError #879
Comments
Ran into the same issue. |
It seems to work for simpler models, but the stochastic volatility model I can only run with njobs=2, but it breaks with njobs=4. So odd. |
Can you try if e873d6d fixes it? |
Well, I get a different error, so that's progress.
|
And what a specific error it is. |
Yeah, that seemed odd -- creating an Exception subclass for an error that you're not totally sure about. |
Anyway, it looks like we're passing maybe an object where an int is expected? |
You can somewhat hack this with sys.setrecursionlimit(2000), but this also works up to a certain amount of parameters. With my latest model around 450 parameters it doesnt help. As I really need the parallel implementation to work otherwise my model has to run for monthes, I would want to look into this. Can you point me to some code lines where I could start looking- as I am not so familiar yet with the code base. Thank you! |
With increasing the recursion limit and the latest commit from twiecki above( e873d6d ) I get this error. It keeps running doing nothing. Does anybody have any advice where I could start investigating? Exception in thread Thread-14: |
All of the multiprocessing business for PyMC3 is in the |
I have also considered switching. The issue is that currently you can't launch processes internally (see ipython/ipyparallel#22 for a plan to change that). |
That should not be a deal-breaker. Forcing the user to spin up |
What about Dask? |
Would Dask be effective here? I could see if we were applying the same algorithm to subsets of a dataset, but a set of parallel chains executes over the entire dataset for each chain. So, its not clear how Dask's collections would be beneficial. That said, it may be useful if we ever implement expectation propagation, which does subdivide the data. |
Dask imperative + multiprocessing scheduler can schedule the chains without needing to use a specific collection to chunk. But this is out of my depth. Maybe @mrocklin can chime in. |
I don't think Dask, although awesome, can be leveraged here. |
If someone can briefly describe the problem I'd be happy to chime in if there is potential overlap. The dask schedulers are useful well outside the common use case of big chunked arrays. If you're considering technologies like multiprocessing or ipyparallel it's possible that one of the dask schedulers could be relevant. |
@mrocklin Matt, this is Monte Carlo sampling for Bayesian statistical modeling. Its an embarrassingly parallel task that just simulates Markov chains using the same model on the same dataset, then uses the sampled chains (the output of the algorithm) for inference. We are currently using the multiprocessing module for this, but are contemplating a move to something more robust. |
Something non-trivial must be going on to cause multiprocessing to hang. Looking at the traceback it seems like you might be trying to send something that If this is what is going on then the But really, I'm just guessing at the problem that you're trying to solve and so am probably out of my depth here. Happy to help if I can. Best of luck. |
Thanks, Matt. Unfortunately |
I write a function like the following: def apply(serialized_func, serialized_args, serialized_kwargs):
func = dill.loads(serialized_func)
args = dill.loads(serialized_args)
kwargs = dill.loads(serialized_kwargs) And then I dumps my func, args, kwargs ahead of time and call them with the apply function remotely. Something like the following: pool.map(apply, [dill.dumps(func) for i in range(len(sequence))], [dill.dumps(args) for args in sequence])
|
I might have found a solution using Joblib, but will give this a shot if that doesn't work. Thanks again. |
Oh great. That's much simpler. |
I don't think this solves the problem, unfortunately... On the joblib branch, with njobs=4 and a pretty big model, I still get a max recursion exceeded exception (see below). On inspection, it looks like Joblib uses multiprocessing as its default backend, so I guess that makes sense. I tried switching to the threading backend, but that failed with a different set of errors.
|
It was worth a shot. I will try flavoring it with a little |
Actually, |
I just updated to the latest Theano 0.8 and pymc3 and this problem has disappeared for me. |
yes for me it also wants to install theano 0.7 although I have the dev version thats somehow anoying, I simply disabled it in the setup script, although there must be a nice way. |
It's trying to pull 0.7 when you run pymc3's setup.py? |
Yes it does. |
Yes, it seemed to install fine and use Theano 0.8, but it was rather confusing. |
I have to abort it because when I let it install it, my import uses the 0.7 version instead of the dev version. They made sooooo many improvements in the current dev version so it is really significant to use the dev version. |
f9de16e should fix that. |
Ah great thx! |
Fixed it. thanks! |
Is it time to shut this? |
I haven't done extensive testing, but on some high dimensional problems that originally threw the recursion error, the problem has disappeared. So perhaps for now it is solved. :) |
That sounds amazing. I'll close it but feel free to reopen if the problem persists with master pymc3 and theano. |
Thanks for the recent bugfixes guys, also the updates to the build dependencies mean I'm now running EDIT: Okay, well - that does seem to have fixed it. I think I have a different bug though:
I assume the difference in 2 is that the model is already cached. It's tricky to replicate though, a bit of a Heisenbug! |
I also still get my segmentation faults - also with creating all the Text backends in advance... |
Oh! Really. Even with the latest pymc3 version, I am getting the same error with njobs=2. multiprocessing.pool.MaybeEncodingError: Error sending result: '[<MultiTrace: 1 chains, 10 iterations, 2106 variables>]'. Reason: 'RuntimeError('maximum recursion depth exceeded',)'
I have pymc3-3.0, numpy-1.11.0, Theano-0.8.1, scipy-0.17.0 installed. |
By "latest pymc3 version" do you mean that you installed it from GitHub master? That is,
|
I installed using
|
Make sure you use the |
Oh.. Thank you so much for your quick response. I'll update using |
Sorry @fonnesbeck , installing pymc3 with -U also leads to same error. I have,
My .theanorc config is,
Is there anything else to be done? |
Perhaps the GPU utilization is at fault? Have you tried with CPU? On Thu, Apr 21, 2016 at 9:04 AM, Vivek Harikrishnan Ramalingam <
|
Thanks @twiecki I will try with CPU and post my updates. |
|
Below snippet is I am trying to execute. import pymc3 as pm
import theano.tensor as T
import pandas
def tinvlogit(x):
return T.exp(x) / (1 + T.exp(x))
pandas_df = pandas.read_csv("data.csv")
x_col1 = pandas_df['col1']
x_col2 = pandas_df['col2']
x_col3 = pandas_df['col3']
n_col3 = len(pandas_df['col3'].unique())
with pm.Model() as model:
b_0 = pm.Normal('b_0', mu=0, sd=100)
b_col1 = pm.Normal('b_col1', mu=0, sd=100)
b_col2 = pm.Normal('b_col2', mu=0, sd=100)
sigma_col3 = pm.HalfNormal('sigma_col3', sd=100)
b_col3 = pm.Normal('b_col3', mu=0, sd=sigma_col3, shape=n_col3)
for i in range(0, len(pandas_df)):
p = pm.Deterministic('p', T.maximum(0, T.minimum(1, tinvlogit(
b_0 + b_col1 * x_col1.at[i] + b_col2 * x_col2.at[i] + b_col3[x_col3.at[i]))))
y = pm.Bernoulli('y', p, observed=pandas_df.y)
start = pm.find_MAP()
step_func = pm.NUTS()
trace = pm.sample(5000, step=step_func, start=start, njobs=2, progressbar=True)
|
You get the recursion error because your graph will be very long as your loop will be running for 50k times, each time with all the nodes. Although I dont really get the purpose of your model I have the feeling you could vectorize it and get rid of the loop. The RVs have a shape parameter where you can simply create vectors of length of your data frame. |
Setting the
njobs
parameter to run multiple chains results in an error:The text was updated successfully, but these errors were encountered: