-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectors of multivariate variables #535
Comments
will it be obvious what dimension is the multivariate dimension? On Fri, May 2, 2014 at 10:16 AM, Chris Fonnesbeck
|
It should be intuitive, if not obvious. I think most people would expect a vector of variables, which implies that the first dimension is the number of variable elements and the remaining dimension(s) the size of each variable. |
We at least need to be able to do the analog of this:
This has been a show-stopper for me trying to use PyMC 3 for new work, so I'm going to try to set aside some time to work on this. Thinking about it some more, however, I think that
for a single variable and:
for a vector containing 4 MvNormals of dimension 3. Better yet, we ought to be able to infer the dimension of the MvNormal from its arguments. |
That makes some sense. The words shape and dim seem very close, so it seems On Thu, May 29, 2014 at 1:30 PM, Chris Fonnesbeck
|
Perhaps using |
Just bumping this one. I come up against it frequently in epidemiological analyses. |
I like the originally proposed notation, |
I'd be happy with that. We would just have to adopt the convention that the last dimension is always the size of the individual multivariate node, and not the size of the array containing the nodes. The tricky part comes when you have, say, a vector of Wisharts that is itself multidimensional, so the total shape could be (4,4,3,3) for a 4x4 array of 3x3 variables. |
I would imagine it's a rare case but can't hurt to consider it and come up with a sane way to handle. In the end, complex things will be complex in code but defaulting to the last dimensions is an easy rule to keep in mind. |
+1 for
|
Multivariate classes could have the appropriate dimension specified in the class to know how to deal with the shape argument. Wisharts will always be 2-dimensional, for example, so any remaining dimensions will always be how many wisharts are in the set. Multinomials will always be a 1-d vector, etc. |
Okay, are we agreed that when we do this the multivariate dimensions start at the back? We could start them at the front, but the way numpy.dot works suggests at the back. |
Agree on defaulting to last dimension. I wonder, is the shape argument not redundant? Seems like we can always infer it from the inputs. shape could then only add the dimensions. Personally I would find this less confusing:
The 3,3 is already encoded in np.eye(3), no? |
Shape is not redundant when you want to have the same prior arguments for a On Mon, Jul 27, 2015 at 2:14 PM Thomas Wiecki notifications@github.com
|
right, I'm only talking about the case where the input to the RV (e.g.
|
Yeah, we could do that. I'm slightly worried that its going to make On Mon, Jul 27, 2015 at 2:23 PM Thomas Wiecki notifications@github.com
|
I recently ran into the confusion where I wanted 2 Dirichlets of len 3, should I do: So with my proposal there's a clear rule and I don't have to remember which dimensions of the shape kwarg match to which dimensions of my input. Why do you think it would be harder to implement? |
That does seem attractive from an API point of view. Let me check how that plays with broadcasting rules. |
That does seem to play nicely with things. I see two issues. Maybe we can resolve them. First, this change will break previously working models. Second, shape is a common argument for all distributions and this means the shape argument won't match the actual shape of the variable. I think that might not actually break anything right now, but seems like a bug waiting to happen. Perhaps we should have a different argument, not |
Or maybe What I also like about this is that it makes the translation from pymc2 style So if we were to change this, do we still need the |
And maybe we could even use |
Theoretically we could even teach users to use repeat directly and not be concerned with all this in the API. E.g.: from pymc3 import repeat # alias to theano.tensor.extra_ops.repeat
pm.Dirichlet(repeat(my_prior, 2)) # gives 2x3 if my_prior is shape 3 |
I don't think we should worry about breaking changes too much in a beta for such an important design decision. I like the idea of a
which results in an |
Shape currently means the actual shape of the resulting variable, and I kind of want to keep that unless there's a good reason. |
@PietJones You shouldn't include observed variables to be sampled. |
Can PyMC3 give a better user error for that case? On Thu, May 5, 2016 at 10:21 AM, Thomas Wiecki notifications@github.com
|
Cool, I took that from: http://austinrochford.com/posts/2016-02-25-density-estimation-dpm.html After changing, now I get the following error:
Is there some size limit that I am not aware of? The data frame is not that large: (450, 1051) |
You can use this PR for a work around: On Thu, May 5, 2016 at 10:30 AM, PietJones notifications@github.com wrote:
|
@nouiz Thnx for the advice, again not sure if this was what you meant that I should do, but I tried the following, and I still get the same error:
I then restarted my ipython/jupyter kernel and reran my code. |
Delete your Theano cache. If that don't fix it, you probably using the old On Thu, May 5, 2016 at 11:05 AM, PietJones notifications@github.com wrote:
|
I tried the following.
Still the same problem.
If it helps, I am running this on a MacOSX, in a conda virtualenv, using jupyter (did restart the kernel), (don't have cuda). Sorry for the trouble. |
I taught that you where on windows with a GPU. Then you have a new case. Can you use this Theano flag: nocleanup=True then after the error send me On Thu, May 5, 2016 at 12:44 PM, PietJones notifications@github.com wrote:
|
Hi, Find attached the mod.cpp file which failed to compile. https://gist.github.com/PietJones/8e53946b2738008095ced8fb9ab4db44 I dont think the entire file uploaded, just in case here is google drive link:https://drive.google.com/file/d/0B2e7WGnBljbJZnJ1T1NDU1FjS1k/view?usp=sharing On Thu, May 5, 2016 at 1:00 PM, Frédéric Bastien notifications@github.com
|
Update Theano to 0.8.2. I have the impression that you use an older version. Fred On Thu, May 5, 2016 at 1:25 PM, PietJones notifications@github.com wrote:
|
I originally had that version of Theano, which gave the same error. The P On Fri, May 6, 2016 at 9:03 AM, Frédéric Bastien notifications@github.com
|
Can you confirm it was the pull request about the GpuJoin proble on windows I check that code and it would have tested what I wanted to test. Can you manually apply this diff and test again? """
local_elemwise_fusion = local_elemwise_fusion_op(T.Elemwise, If it still fail, instead of a max of 512, try 256, 128, ... Tell me the biggest number that work. On Fri, May 6, 2016 at 9:47 AM, PietJones notifications@github.com wrote:
|
Thnx for the advice, I tried all of the above, editing the file manually, removing the .theano directory, then restarting the jupyter kernel and running the code again, still get the same error. This is also further down before the actual traceback:
|
Which new value did you try? Only 512? Can you try something like 31? If it still fait with 31, then try this diff:
This opt could also cause this extra big Elemwise. |
I have tried 1024, 512, 256 and 31, they all result in the same problem. 5549 def local_add_mul_fusion(node): which still gave an error: On Tue, May 10, 2016 at 10:16 AM, Frédéric Bastien <notifications@github.com
|
@fonnesbeck I think this works for Multivariate now, right? |
A list comprehension seems to work now, yes. Ultimately I'd like to be able to specify a vector of multivariates using the |
I think that should also work, no? At least for 3D multivariates. |
It runs, but does not "work": from pymc3 import *
with Model():
p = Dirichlet('p', np.ones(3), shape=(4,3))
x = Multinomial('x', np.array([20, 16, 10, 5]), p, shape=(4,3))
print('p initial:', p.tag.test_value)
print('x initial:', x.tag.test_value)
tr = sample(50)
print('x final:', x.tag.test_value) yields:
So, the x's don't sum to n, yet it does not fail! |
This is tied up in the shape refactoring. Closing. |
It would be useful if we could model multiple independent multivariate variables in the same statement. For example, if I wanted four multivariate normal vectors with the same prior, I should be able to specify:
but it currently returns a
ValueError
complaining of non-aligned matrices.The text was updated successfully, but these errors were encountered: