Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

laplacian diagonal elements for fake nodes #18

Open
maosi-chen opened this issue Nov 8, 2017 · 4 comments
Open

laplacian diagonal elements for fake nodes #18

maosi-chen opened this issue Nov 8, 2017 · 4 comments

Comments

@maosi-chen
Copy link

maosi-chen commented Nov 8, 2017

Hello Michaël,

My colleague told me when I use the graph.laplacian function, I have to assign 0.0 to the diagonal elements of the L matrix on the fake nodes. The current function will return 1.0 on those elements.

My colleague said on fake nodes
weights W = 0.0 so d will be 0.0.
d+=np.spacing(0.0) will have a value very close to 0.0 (e.g. 4.9e-324).
d = 1 / np.sqrt(d) will have a large value (e.g. 4.5e+161).
D*W*D will be 0.0 (b/c 0.0 times a large number (but not infinity) is 0.0).
L=I-D*W*D will be 1.0.

The original Laplacian definition L=D-W tell us the original L will have 0.0 on fake nodes (b/c D and W both are 0.0 on them). Normalizing (squashing) the original L by D shouldn't change the value 0.0 to 1.0.

Is my colleague right on this?

Thanks.
Maosi Chen

@maosi-chen
Copy link
Author

Here is our revised graph.laplacian function

def laplacian_r1(W, normalized=True):
    """Return the Laplacian of the weigth matrix."""

    # Degree matrix.
    #d = W.sum(axis=0)
    orig_d = W.sum(axis=0)
    
    fake_node_mask = np.where(orig_d<1e-30, 0.0, 1.0)
    fake_node_mask.astype(dtype=W.dtype)
    fake_node_mask = np.squeeze(fake_node_mask)
    
    d = orig_d

    # Laplacian matrix.
    if not normalized:
        D = scipy.sparse.diags(d.A.squeeze(), 0)
        L = D - W
    else:
        d += np.spacing(np.array(0, W.dtype))
        d = 1 / np.sqrt(d)
        tmp_d = d.A.squeeze()
        if (tmp_d.size > 1):
            D = scipy.sparse.diags(tmp_d, 0)
        else:
            D = scipy.sparse.diags(np.array([tmp_d]), 0)
        I = scipy.sparse.identity(d.size, dtype=W.dtype)
        L = I - D * W * D
    
    # for elements where D_ii=0, D*W*D_ii are set to 1.0 and L_ii are set to 0.0.
    # correct 0 * Inf -> 1.0    
    L_diag_old = L.diagonal()
    L_diag_new = np.multiply(fake_node_mask, L_diag_old)
    L.setdiag(L_diag_new)

    # assert np.abs(L - L.T).mean() < 1e-9
    assert type(L) is scipy.sparse.csr.csr_matrix
    return L

@mdeff
Copy link
Owner

mdeff commented Nov 15, 2017

Hi Maosi,

Regarding definitions, you're correct that the diagonal of the combinatorial Laplacian L = D - W is zero on the fake nodes. It is however undefined for the normalized Laplacian L = I - D^{1/2} W D^{1/2}, because of the division by zero.

The d += np.spacing(np.array(0, W.dtype)) is a choice I made to avoid the division by zero and set a value of 1 on those nodes. Zero would be another choice. A one will preserve the value at the node when diffusing (with y = Lx) and filtering (with polynomials of L), while a zero will kill any value. In general, a one is thus preferred.

For this application it should not matter as the value of a fake node should be zero anyway. But I guess this reasoning is related to your fear of propagating the bias (#17) right? And killing the value should clear any doubt.

@maosi-chen
Copy link
Author

Hi Michaël,

Yes, we found that setting L to one or zero on fake nodes does not change the model performance in my application because the corresponding values in x are zero (we applied masks on x after adding bias and pooling).

For my application (spatial interpolation on dynamic nodes), currently cnn_graph shows larger loss (~0.2) than the stacked bidirectional LSTM (~0.05) after they are stable. I think maybe the change of graph (i.e. locations and numbers of nodes) across samples is the cause? Because the weights trained in cnn_graph should be applied on a fixed graph?

Thanks.
Maosi

@mdeff
Copy link
Owner

mdeff commented Nov 16, 2017

Then it does not matter indeed.

The method was clearly developed with a fixed graph in mind, but filters can definitely be applied to multiple graphs (especially if they are local, i.e. of low polynomial order). Maybe you are destroying too much information with coarsening / pooling. I'm not sure I understand why you are doing this, but I saw in your other issue you had 7 layers, with the last graphs being mostly disconnected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants