New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compilation error for large number of categorical features #624
Comments
I came across the same error as well. Were you able to come up with a workaround? |
@slnovak Not yet, unfortunately. |
It's kinda odd to get the nesting level error. At the core, |
So I guess you could try to replace this loop https://github.com/pymc-devs/pymc/blob/master/pymc/glm/glm.py#L88 with the creation of a random vector (e.g. |
Well, for me, I was using a Dirichlet distribution with a shape of ~800. I was able to get the resulting Theano code to compile with include the following in
However, I gave up as it was taking 20+ min for Theano to compile the model. I'm trying PyMC 2.3.4 to see if the model will run. |
@twiecki When I try your suggestion ( |
@d10genes Hm, yeah there are few changes more I'm afraid. and |
@slnovak Note that there is no |
@twiecki thanks! The second suggestion seemed to do it. At least, it ran without errors. But either I'm not reading the trace plots right, or it's having trouble converging to the right solution. I can't tell if that's due to this change, or if my artificial data set and model are just too ill-defined. If the code seems right to you, should I send a PR, or do we need something more robust (not sure what kind of additional tests this would call for)? |
I had a similar problem when I was adding two Multinomials to the model per row of data. I could not include more than 20 datapoints in the model (out of one million!) I preferred the Python-based solution: |
Just setting the bracket depth is not the right solution. The model being constructed is just more complex than it needs to be. @d10genes Cool that it's at least compling now! The convergence looks pretty odd indeed. Seems like one of the coefficients is being drawn to a bad region and that this interacts with the intercept, so this suggest a colinearity. You could try and scatter plot the intercept vs the offending coefficient. Above you said that you did Regarding a PR, it's a bit more tricky as I think both are valid use cases (individual priors for each regressor, and a random vector for all of them). Actually, maybe just an additional kwarg that causes creation of a random vector instead ( |
Assuming that this issue is stale. Closing. |
I'm not sure if using a large number of categorical variables is an abuse of pymc, or if I'm just doing it wrong. I've reproduced the error with synthetic data with 500 possible string values for feature
X
(though the error appears with fewer values, like 250). I'm using theglm
module with the following model, which I can get working in statsmodels:glm('Y ~ C(X)'
:When I try to run the following model
I get
Exception: ('Compilation failed (return status=1): /Users/me/.theano/compiledir_Darwin-13.3.0-x86_64-i386-64bit-i386-2.7.8-64/tmp_cmBr5/mod.cpp:28159:32: fatal error: bracket nesting level exceeded maximum of 256.
full trace here.
Is this expected?
The text was updated successfully, but these errors were encountered: