Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GCN #89

Merged
merged 11 commits into from
Feb 23, 2018
Merged

Support GCN #89

merged 11 commits into from
Feb 23, 2018

Conversation

anaruse
Copy link
Contributor

@anaruse anaruse commented Feb 14, 2018

This PR aims to add a GCN model and support batch normalization for graph convolution. Regarding GCN model, please refer to https://arxiv.org/abs/1609.02907, though the GCN model in this branch is bit different from the model in the paper. I also modified example of tox21, so that you can test the GCN model with tox21 dataset.

@corochann
Copy link
Member

corochann commented Feb 15, 2018

Thank you very much for the PR!
Let me have time to read the paper briefly and review it.

This is question so far.

  1. Are graph_convolution functions and GraphConvolution link is used? It is imported but seems not used.
  2. My concern is "namespace" issue. I think GCN (GraphConvolutionalNetwork) is very "general" term compared to other network implementation, do you have any comment on this?

@anaruse
Copy link
Contributor Author

anaruse commented Feb 15, 2018

Thank you for your feedback!

Re 1., I'm sorry for that I forgot to remove them. They are not used now. will remove them.

Re 2., I have the same opinion that the name of "GCN" is too general, though the authors calls it just "GCN" in the paper. I once thought that "LBM" might be a good name for it because there is a paper (link) that classifies this model as one of "Laplacian Based Methods". However, "LBM" is bit confusing to me, mainly because "LBM" means "Lattice Boltzmann Method" to me... Anyway, I'm fine with using different name.

@codecov-io
Copy link

codecov-io commented Feb 15, 2018

Codecov Report

Merging #89 into master will increase coverage by 0.8%.
The diff coverage is 29.88%.

@@            Coverage Diff            @@
##           master      #89     +/-   ##
=========================================
+ Coverage   68.35%   69.16%   +0.8%     
=========================================
  Files          58       61      +3     
  Lines        2067     2296    +229     
=========================================
+ Hits         1413     1588    +175     
- Misses        654      708     +54

@corochann
Copy link
Member

Re 2.,
Actually I'm considering importing graph cnn at
https://github.com/pfnet-research/chainer-graph-cnn
and same issue happens that the name is too general.

I'm currently considering to use the name "Spectral"GraphConvolution for this.
https://github.com/corochann/chainer-chemistry/blob/spectral_gcnn/chainer_chemistry/models/spectral_graph_cnn.py#L15-L19

Let us consider naming issue with other members as well.

h = functions.softmax(h, axis=2) # softmax along channel axis
y = functions.sum(h, axis=1) # sum along node axis
return y

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original paper does not define "Readout" (get feature of "whole graph"), instead, their output is for each node by simply taking softmax only.

So, output_weight and sum is added by you to extract "graph feature"?
Is my understanding correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. To be more precise, I had no idea what to do with readout, so I used the NFP's readout implementation.

for i, (gconv, bnorm) in enumerate(zip(self.gconvs,
self.bnorms)):
h = gconv(h, w_adj)
h = bnorm(h)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referred pytorch implementation by author.
https://github.com/tkipf/pygcn/blob/master/pygcn/models.py#L14

batch norm is originally not used, so is it added by you?
If so, can you add an option in __init__ kwargs to use/not use batch normalization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will add an option to select whether to use bath normalization or not.

@@ -0,0 +1,32 @@
#!/bin/bash -x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No... I thought I removed it...

@@ -36,6 +36,8 @@
import data
import predictor

import numpy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line necessary? seems numpy is not used in change.

@corochann
Copy link
Member

corochann commented Feb 16, 2018

I read the paper, https://arxiv.org/pdf/1609.02907.pdf
and looked your code. Overall it seems nice.
I just put some comment.

Re 2.,
In conclusion of the paper, it is written as

Our GCN model uses an efficient layer-wise propagation rule that is based on a first-order approximation
of spectral convolutions on graphs.

and in Table 3, they finally adopt Renormalized version of the Eq (8).

So in my opinion this is "Renormalized First order Spectral Graph Convolutional Network".
However this name is of course long, how about RenormalizedSpectralGraphConvolutionalNetwork (RSGCN) ?

@corochann
Copy link
Member

I noticed we can add these implementation as well,
but I'm thinking development speed is important.

So these items can be raised as issue, and merge this PR first.

  • Add test code
  • Add the network and paper citation in README
  • Add network in QM9 example as well.
  • Residual connection option:
    Original paper experiments with/with out residual connection in appendix.
    So option kwargs can be added on network to add residual connection or not.
  • support sparce matmul.
    They claim calculation can be faster with sparce matrix.
    https://github.com/tkipf/pygcn/blob/master/pygcn/layers.py#L9

@delta2323
Copy link
Member

Regarding the naming issue. I would suggest how others call the method. For example:

  • Is there any paper that refers GCN?
  • Does someone mention and discuss the paper at e.g. Stack Overflow?
  • Also, I'm wondering if other frameworks (e.g. DeepChem) implement the paper.

If there isn't. We might need to coin some word.

@corochann
Copy link
Member

As far as I know, there are several graph-conv methods which uses Laplacian matrix formulation.

[1]. 'Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering' https://arxiv.org/abs/1606.09375

[2]. This PR. SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS
https://arxiv.org/pdf/1609.02907.pdf

And these are usually referred as grouped way, e.g.,
A. https://arxiv.org/abs/1704.01212 referred them as "Laplacian based methods"
B. https://arxiv.org/pdf/1801.03226.pdf referred them as "Spectral Graph Convolution"

That is why I suggested
RenormalizedSpectralGraphConvolutionalNetwork (RSGCN)
to use "SpectralGraphConvolution" plus the word "Renormalized" to differenciate [1] and [2]

@anaruse
Copy link
Contributor Author

anaruse commented Feb 16, 2018

Thank you for your feedback! I've fixed the branch based on your suggestion. Please check again!

  • Name issue: RSGCN looks good to me (I think it is better than LBM).
  • Support sparse matmul: This will be needed specifically when graph size is large.

@corochann
Copy link
Member

Thank you for fix. I'd like to merge it after following discussion.

  1. Naming issue, let me know your opinion if it is ok with RenormalizedSpectralGraphConvolutionalNetwork or not. @delta2323

I also briefly checked deepchem, molecule.ai but I could not find any existing naming for this.
http://moleculenet.ai/models

  1. ReadOut module.
    If there is no original paper suggestion, I would like to take readout as init args of the Network as well, so that user can change if they want.
    Default value of readout may be None and in that case we can use your implementation of GCNReadout (but I want to add a docstring that this is not the original paper's implementation).
    What do you think?
    (I am personally thinking that just sum along atom axis is enough and extra output_weight and softmax is not necessary. Any user may try these kind of different readout module by passing as args.)

@anaruse
Copy link
Contributor Author

anaruse commented Feb 19, 2018

Thank you for your feedback, @corochann !

I'm not particular about readout, so I added readout option to the model so that users can specify what function to use for readout. It uses "sum readout" when no function is specified by users.

@delta2323
Copy link
Member

delta2323 commented Feb 21, 2018

Regarding the naming issue, I looked up several papers that refer the original paper and how they call their method.

(1) Papers that only refer the original paper and do not name the algorithm: URL, URL.
(2) Papers that call the algorithm "GCN" URL URL.
(3) A paper that calls both the algorithm and other ones "GCN": URL.

As we expect, it seems that there are no de-facto standard of the name except GCN and that GCN can be also used in more general meaning. Therefore, I agree with putting RenormalizedSpectralGraphConvolutionalNetwork or RSGCN in short. But we may want to note that this is our coined term, the reason why we did so, and that original authors call it GCN in the document .

@corochann
Copy link
Member

corochann commented Feb 21, 2018

@anaruse
Thank you for your update. readout option seems ok.

Finally, could you change the name from GCN to RSGCN?
And if possible could you write a docstring that its name Renormalized Spectral Graph Convolutional Network is named by us and it is not original authors naming.

After above change, I would like to test the model performance to see if training progresses in my environment and merge it.

@anaruse
Copy link
Contributor Author

anaruse commented Feb 22, 2018

I've changed the name from GCN to RSGCN and added some docs. Please review again!

@corochann
Copy link
Member

corochann commented Feb 22, 2018

Thank you for the change, LGTM.

When I test the model performance with ROC AUC Evaluator (#62), I found the following.

  1. It is better to add softmax function in default readout func.
    The model train progress becomes much stable (somehow) with adding softmax function in readout as follows.
def _readout_sum(x):
    h = functions.softmax(x, axis=2)  # softmax along channel axis
    y = functions.sum(h, axis=1)  # sum along node axis
    return y
  1. Support a case when readout is a class of Link.
    Currently readout args in __init__ is not registered inside init_scope.
    But user may want to pass Chain class as readout, it becomes more flexible to check if readout isinstanceof Link or not and register inside init_scope if so.

It is ok for me to send above PR 1 & 2 by myself after merge this PR at first. Is it ok for you?

@anaruse
Copy link
Contributor Author

anaruse commented Feb 22, 2018

Thank you for your approval!

Regarding change 1. and 2., I'm fine with those changes. Thanks.

@corochann corochann mentioned this pull request Feb 23, 2018
@corochann corochann added this to the 0.2.0 milestone Feb 23, 2018
@corochann corochann merged commit 2a40c91 into chainer:master Feb 23, 2018
@corochann corochann mentioned this pull request Feb 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants