New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/1044 sortpooling layer #1210
Conversation
…g tensor padding for k larger than the input graph size.
…uncation of output tensor.
… output tensor is checked at runtime for padding or truncation.
Code Climate has analyzed commit 3e7168d and detected 0 issues on this pull request. View more on Code Climate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Moving my review from #1212.)
stellargraph/layer/sort_pooling.py
Outdated
""" | ||
outputs = tf.map_fn( | ||
lambda x: tf.gather( | ||
x, tf.argsort(x[..., -1], axis=0, direction="DESCENDING") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The paper talks about using index -2
(and -3
, -4
, ...) to break ties. This doesn't seem to be doing so, is that important?
(from https://github.com/stellargraph/stellargraph/pull/1212/files#r403892314)
Reply:
This is addressed in #1210.
My reply:
It might be good to have a comment, since it slightly mismatches the formal description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just quote myself from above which also quotes from the paper,
Note that sorting is performed using only the last column of the input tensor as stated in [1], "For convenience, we set the last graph convolution to have one channel and only used this single channel for sorting."
I will update the docstring to mention the above. Whether it makes a difference or not, I do not know since the original paper does not show results for sorting with other than the last column. I suspect that given enough GCN layers, the approximation is good enough and that there are few if any ties to break that, in practice, makes little difference to sort beyond the one column.
stellargraph/layer/sort_pooling.py
Outdated
""" | ||
outputs = tf.map_fn( | ||
lambda x: tf.gather( | ||
x, tf.argsort(x[..., -1], axis=0, direction="DESCENDING") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think this currently doesn't handle padding perfectly; any padding values may be sorted higher than some elements, and this may lead to peculiar behaviour if as graphs are grouped together differently.
For a small example of what I'm thinking, suppose k=2
and batch_size=2
, and we've got three graphs G1, G2 and G3, with 2, 2, 3 nodes respectively. If the network gives G1's nodes sorting score 1 and -1, for a batch G1, G2
, the final output will include both of those nodes and be as expected. However, for a batch G1, G3
, I think the x
tensor here may look like:
[
[..., 1],
[..., -1],
[..., 0] # padding
]
(For ease of identifying the elements I'm assuming that there's no biases/they cancel out for the all-zero paddings.)
And thus the chosen elements will the higher node and the padding, i.e. [[..., 1], [..., 0]]
.
Should this take a mask
argument and obey it (somehow)?
(from https://github.com/stellargraph/stellargraph/pull/1212/files#r403898355)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think in this case we should also force any padded values (padded from the generator) to be zero - otherwise if there's biases the nodes padded by SortPooling
will be zero while the nodes padded from the generator won't be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not consider the case of activation function with support outside [0, 1]. I will add the mask as an argument to correctly sort only the relevant part of the tensors.
@kieranricardo I'm not sure I understand your concern. The generator pads with 0s already.
I have updated to use a mask to deal with padding plus some smaller things like the option to flatten the output tensor as in the paper. Have another look! P. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to add this to docs/api.txt
as part of the DGCNN pull request, instead of in this one?
tests/layer/test_sort_pooling.py
Outdated
|
||
|
||
def test_flatten_output(): | ||
data = np.array([[3, 4, 0], [1, 2, -1], [5, 0, 1]], dtype=int).reshape((1, 3, 3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth using a batch size > 1 here, to ensure that we don't accidentally refactor the code to do something like
outputs = tf.reshape(outputs, [1, -1])
instead of ... [tf.shape(outputs)[0], -1] ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly.
|
||
def test_mask(): | ||
data = np.array([[3, 4, 0], [1, 2, -1], [5, 0, 1]], dtype=int).reshape((1, 3, 3)) | ||
mask = np.array([[True, True, True]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this test is testing the mask, should this be something like:
mask = np.array([[True, True, True]]) | |
mask = np.array([[True, True, False]]) |
otherwise this seems like this first case is the same as test_sorting_truncation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is testing the case that the mask allows all rows to be sorted and then truncated correctly because k=2
. In line 91, the last row [5, 0, 1] is correctly moved to the first row followed by [3, 4, 0] while the last element is truncated from the result. If the mask is changed as suggested, then the result will be [3, 4, 0] followed by [1, 2, -1]. Generally, I want to make sure here that if all mask values are True, sorting still works across all rows.
There is a test for mask=[True, True, False]
after this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I guess I am just a bit disconcerted by the complete redundancy with the test_sorting_truncation
test above, since that test does exactly the same process just with slightly different input values. My question was whether this was a typo for intending to test the combination of a non-trivial mask and truncation (and the next one is the combination of a non-trivial mask + padding), since all the other tests in this file test various combinations with a trivial mask.
On second thought, I had a look at the structure of docs/api.txt and it might be best to add it together with DGCNN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
|
||
def test_mask(): | ||
data = np.array([[3, 4, 0], [1, 2, -1], [5, 0, 1]], dtype=int).reshape((1, 3, 3)) | ||
mask = np.array([[True, True, True]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I guess I am just a bit disconcerted by the complete redundancy with the test_sorting_truncation
test above, since that test does exactly the same process just with slightly different input values. My question was whether this was a typo for intending to test the combination of a non-trivial mask and truncation (and the next one is the combination of a non-trivial mask + padding), since all the other tests in this file test various combinations with a trivial mask.
This is an implementation of the
SortPooling
layer introduced in [1].Note that sorting is performed using only the last column of the input tensor as stated in [1], "For convenience, we set the last graph convolution to have one channel and only used this single channel for sorting."
A related ticket (implementing DGCNN as proposed in [1]) is #1195