Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/1044 sortpooling layer #1210

Merged
merged 19 commits into from Apr 17, 2020
Merged

Conversation

PantelisElinas
Copy link
Contributor

This is an implementation of the SortPooling layer introduced in [1].

Note that sorting is performed using only the last column of the input tensor as stated in [1], "For convenience, we set the last graph convolution to have one channel and only used this single channel for sorting."

A related ticket (implementing DGCNN as proposed in [1]) is #1195

  1. An End-to-End Deep Learning Architecture for Graph Classification, M. Zhang, Z. Cui, M. Neuman, and Y. Chen, AAAI, 2018, https://www.cse.wustl.edu/~muhan/papers/AAAI_2018_DGCNN.pdf

@codeclimate
Copy link

codeclimate bot commented Apr 6, 2020

Code Climate has analyzed commit 3e7168d and detected 0 issues on this pull request.

View more on Code Climate.

Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Moving my review from #1212.)

"""
outputs = tf.map_fn(
lambda x: tf.gather(
x, tf.argsort(x[..., -1], axis=0, direction="DESCENDING")
Copy link
Member

@huonw huonw Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paper talks about using index -2 (and -3, -4, ...) to break ties. This doesn't seem to be doing so, is that important?

(from https://github.com/stellargraph/stellargraph/pull/1212/files#r403892314)

Reply:

This is addressed in #1210.

My reply:

It might be good to have a comment, since it slightly mismatches the formal description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just quote myself from above which also quotes from the paper,

Note that sorting is performed using only the last column of the input tensor as stated in [1], "For convenience, we set the last graph convolution to have one channel and only used this single channel for sorting."

I will update the docstring to mention the above. Whether it makes a difference or not, I do not know since the original paper does not show results for sorting with other than the last column. I suspect that given enough GCN layers, the approximation is good enough and that there are few if any ties to break that, in practice, makes little difference to sort beyond the one column.

"""
outputs = tf.map_fn(
lambda x: tf.gather(
x, tf.argsort(x[..., -1], axis=0, direction="DESCENDING")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think this currently doesn't handle padding perfectly; any padding values may be sorted higher than some elements, and this may lead to peculiar behaviour if as graphs are grouped together differently.

For a small example of what I'm thinking, suppose k=2 and batch_size=2, and we've got three graphs G1, G2 and G3, with 2, 2, 3 nodes respectively. If the network gives G1's nodes sorting score 1 and -1, for a batch G1, G2, the final output will include both of those nodes and be as expected. However, for a batch G1, G3, I think the x tensor here may look like:

[
 [..., 1],
 [..., -1],
 [..., 0] # padding
]

(For ease of identifying the elements I'm assuming that there's no biases/they cancel out for the all-zero paddings.)

And thus the chosen elements will the higher node and the padding, i.e. [[..., 1], [..., 0]].

Should this take a mask argument and obey it (somehow)?

(from https://github.com/stellargraph/stellargraph/pull/1212/files#r403898355)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think in this case we should also force any padded values (padded from the generator) to be zero - otherwise if there's biases the nodes padded by SortPooling will be zero while the nodes padded from the generator won't be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not consider the case of activation function with support outside [0, 1]. I will add the mask as an argument to correctly sort only the relevant part of the tensors.

@kieranricardo I'm not sure I understand your concern. The generator pads with 0s already.

@PantelisElinas PantelisElinas self-assigned this Apr 15, 2020
@PantelisElinas
Copy link
Contributor Author

@kieranricardo @huonw

I have updated to use a mask to deal with padding plus some smaller things like the option to flatten the output tensor as in the paper.

Have another look!

P.

@PantelisElinas PantelisElinas added enhancement New feature or request sg-library labels Apr 16, 2020
Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to add this to docs/api.txt as part of the DGCNN pull request, instead of in this one?

stellargraph/layer/sort_pooling.py Outdated Show resolved Hide resolved


def test_flatten_output():
data = np.array([[3, 4, 0], [1, 2, -1], [5, 0, 1]], dtype=int).reshape((1, 3, 3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth using a batch size > 1 here, to ensure that we don't accidentally refactor the code to do something like

outputs = tf.reshape(outputs, [1, -1])

instead of ... [tf.shape(outputs)[0], -1] ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.


def test_mask():
data = np.array([[3, 4, 0], [1, 2, -1], [5, 0, 1]], dtype=int).reshape((1, 3, 3))
mask = np.array([[True, True, True]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test is testing the mask, should this be something like:

Suggested change
mask = np.array([[True, True, True]])
mask = np.array([[True, True, False]])

otherwise this seems like this first case is the same as test_sorting_truncation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing the case that the mask allows all rows to be sorted and then truncated correctly because k=2. In line 91, the last row [5, 0, 1] is correctly moved to the first row followed by [3, 4, 0] while the last element is truncated from the result. If the mask is changed as suggested, then the result will be [3, 4, 0] followed by [1, 2, -1]. Generally, I want to make sure here that if all mask values are True, sorting still works across all rows.

There is a test for mask=[True, True, False] after this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I guess I am just a bit disconcerted by the complete redundancy with the test_sorting_truncation test above, since that test does exactly the same process just with slightly different input values. My question was whether this was a typo for intending to test the combination of a non-trivial mask and truncation (and the next one is the combination of a non-trivial mask + padding), since all the other tests in this file test various combinations with a trivial mask.

@PantelisElinas
Copy link
Contributor Author

PantelisElinas commented Apr 17, 2020

Are you planning to add this to docs/api.txt as part of the DGCNN pull request, instead of in this one?

I forgot about docs/api.txt. I will add it.

On second thought, I had a look at the structure of docs/api.txt and it might be best to add it together with DGCNN.

Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!


def test_mask():
data = np.array([[3, 4, 0], [1, 2, -1], [5, 0, 1]], dtype=int).reshape((1, 3, 3))
mask = np.array([[True, True, True]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I guess I am just a bit disconcerted by the complete redundancy with the test_sorting_truncation test above, since that test does exactly the same process just with slightly different input values. My question was whether this was a typo for intending to test the combination of a non-trivial mask and truncation (and the next one is the combination of a non-trivial mask + padding), since all the other tests in this file test various combinations with a trivial mask.

@PantelisElinas PantelisElinas merged commit 77ef054 into develop Apr 17, 2020
@PantelisElinas PantelisElinas deleted the feature/1044-sortpooling-layer branch April 17, 2020 07:44
@huonw huonw mentioned this pull request Apr 21, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request sg-library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants