Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random dot product graph generator? #14

Closed
jovo opened this issue Dec 30, 2013 · 11 comments
Closed

random dot product graph generator? #14

jovo opened this issue Dec 30, 2013 · 11 comments

Comments

@jovo
Copy link

jovo commented Dec 30, 2013

i didn't see one in there.
there are obviously many ways to implement such a thing, i certainly do not know the best, and we often write papers in which we generate them, presumably, everybody using a different method.
perhaps we should have a reference RDPG generator?

@gaborcsardi
Copy link
Contributor

A little bit more specifically? Input is a vector for each vertex, and a similarity function?

@jovo
Copy link
Author

jovo commented Dec 30, 2013

that's correct, similarity function is just a dot-product.

let n be the number of vertices,
associated with each vertex v is a latent X_v \in R^d for some d (a hyperparameter)
in the undirected case, the probability of an edge between u and v is <X_u,X_v>=sum_i X_u(i) * X_v(i)
so, X \in R^{d*n}
the trouble is, if we want to ensure that X'X \in [0,1], and we want each vector X_u to be sampled independently,
then we must impose the constraint that each X_u \in R^d_+ \cap B(0,1), where B(0,1) is the unit ball centered at zero, and R^d_+ is the non-negative orthant of R^d.

a work-around of this is to just sample each X_u from a dirichlet, which more than satisfies the above constraints. however, if one samples from the dirichlet, one can never get anything whose ||X_v|| = 1 - eps, unless the vector is 1-sparse.

an alternative strategy is to sample vectors on the unit sphere. one can do this by just sampling from a multivariate gaussian, normalizing by its norm. then, just discard everything outside the non-negative orthant, and you get a set of vectors that satisfy the conditions (i imagine there are other ways). but, the "problem" with this is that you get nothing inside the unit ball, only vectors on the boundary. if i correctly understand, this gives us a degree corrected SBM, so it may be desirable in its own right, but it is not exactly what i'm requesting here.

i'm sure you can easily imagine algorithms that do sample from the proper space, some kind of rejection sampler parameterized by a multivariate gaussian i assume would do the trick (basically, a truncated gaussian). i guess the efficiency of that would depend on the parameters of the gaussian, but probably that is ok?

does that help/make sense?

@dpmcsuss
Copy link

I personally prefer Gabor's suggestion where the input is a vector for each vertex (or a matrix of size n by d, n=#vertices).
If there are non-standard distributions we want for the latent positions, these should be elsewhere.

There are some advantages to combining sampling latent positions and the graph together in one function but it does sacrifice flexibility, I believe.

@jovo
Copy link
Author

jovo commented Dec 30, 2013

oh, i see what you guys mean!
yes, i also prefer Gabor's suggestion then :)

for clarity, the input would then be a d-by-n matrix.

what would happen if any element of X'X is >1 or <0?
some options:

  1. truncate to 0 or 1 as appropriate, and tell the user that this happened
  2. throw an exception

i vote 1.

i guess another desirable function would be to generate vectors that are
sufficient? so, one function generates the graph from the vectors, another
completely separate function generates feasible vectors?

On Mon, Dec 30, 2013 at 11:15 AM, Daniel Sussman
notifications@github.comwrote:

I personally prefer Gabor's suggestion where the input is a vector for
each vertex (or a matrix of size n by d, n=#vertices).
If there are non-standard distributions we want for the latent positions,
these should be elsewhere.

There are some advantages to combining sampling latent positions and the
graph together in one function but it does sacrifice flexibility, I believe.


Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-31353042
.

perhaps consider allowing the quest for eudaimonia to guide you
openconnecto.me, jovo.me

@gaborcsardi
Copy link
Contributor

Btw. if we distance function is general, then this is quadratic. If we can use some properties of the distance function (e.g. if it is a dot-product and the vectors are on the unit sphere), and the graph is sparse, then we can maybe do something quicker, but it is tricky.

@jovo
Copy link
Author

jovo commented Jan 1, 2014

quicker sounds good, can you elaborate a bit?
dot-product is a special case of a similarity function,
in particular, of xMx', where M is d-by-d.
for dot-product, M=identity. is that what you mean?

On Wed, Jan 1, 2014 at 2:32 PM, Gabor Csardi notifications@github.comwrote:

Btw. if we distance function is general, then this is quadratic. If we can
use some properties of the distance function (e.g. if it is a dot-product
and the vectors are on the unit sphere), and the graph is sparse, then
we can maybe do something quicker, but it is tricky.


Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-31428537
.

perhaps consider allowing the quest for eudaimonia to guide you
openconnecto.me, jovo.me

@gaborcsardi
Copy link
Contributor

What I mean is that if the function is opaque, you have no other choice but
evaluate it for every potential edge.

If you have a special case, like a dot product, that has some structure,
then you can potentially avoid all evaluations. E.g. if you know that ||x||
< ||y|| and cos(angle(x,z)) < cos(angle(y,z)), then the probability of
connecting x to z is than connecting y to z. You can use this to avoid
evaluating all dot products. It is easier if all the vectors have norm one,
because then you don't need to deal with lengths.

This only makes sense if the graph is sparse, otherwise you have O(n^2)
edges anyway, although if the vectors are long (d>>1) you might still want
to avoid dot product evaluations.

On Wed, Jan 1, 2014 at 3:33 PM, joshua vogelstein
notifications@github.comwrote:

quicker sounds good, can you elaborate a bit?
dot-product is a special case of a similarity function,
in particular, of xMx', where M is d-by-d.
for dot-product, M=identity. is that what you mean?

On Wed, Jan 1, 2014 at 2:32 PM, Gabor Csardi notifications@github.comwrote:

Btw. if we distance function is general, then this is quadratic. If we
can
use some properties of the distance function (e.g. if it is a
dot-product
and the vectors are on the unit sphere), and the graph is sparse, then
we can maybe do something quicker, but it is tricky.


Reply to this email directly or view it on GitHub<
https://github.com/gaborcsardi/igraph/issues/14#issuecomment-31428537>
.

perhaps consider allowing the quest for eudaimonia to guide you
openconnecto.me, jovo.me

Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-31429642
.

@gaborcsardi
Copy link
Contributor

OK, so I am ready to write the functions to sample the vectors, and then we'll see if the dot-product generation itself can be made faster than O(n^2).

So, what kind of sampler do we want?

@jovo
Copy link
Author

jovo commented Jan 15, 2014

well, i guess a couple easy ones might be:

  1. uniform on the intersection of the sphere and non-negative orthant (perhaps with a single parameter to adjust the diameter of the sphere to be between 0 and 1.
  2. dirichlet
  3. uniform in the intersection of the ball & the non-negative orthant (again with a single parameter controlling the diameter of the ball)

slightly harder:

  1. truncated multivariate normal whose mean lives in the intersection of the sphere and non-negative orthant. i would think for this one, we might want to require something like x% of the mass lives in the feasible region.

i'm not sure if others have better suggestions, or a good prioritization of the above.

@gaborcsardi
Copy link
Contributor

OK, 1-3 are done, please reopen this issue if you have a more complete description of 4.

@jovo
Copy link
Author

jovo commented Jan 27, 2014

awesome, thanks!

On Mon, Jan 27, 2014 at 3:55 PM, Gabor Csardi notifications@github.comwrote:

OK, 1-3 are done, please reopen this issue if you have a more complete
description of 4.


Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-33421623
.

perhaps consider allowing the quest for eudaimonia to guide you
openconnecto.me, jovo.me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants