Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can Scatter Be Reproducible? #226

Closed
yuxiang-guo opened this issue Jul 9, 2021 · 7 comments
Closed

Can Scatter Be Reproducible? #226

yuxiang-guo opened this issue Jul 9, 2021 · 7 comments
Labels

Comments

@yuxiang-guo
Copy link

I used scatter function to implement GCN like this:

re = scatter(embed, index, dim=0, out=None, dim_size=embed.size, reduce='mean')

I found that even though the input (embed and index) are the same when I run the code twice, the output are still different. So I want to know if the scatter method has any random seed, which lead to the result cannot be reproduced? How can I make the result of scatter deterministic? Thank you very much.

@rusty1s
Copy link
Owner

rusty1s commented Jul 10, 2021

Scatter is a non-deterministic operation by design since it makes use of atomic operations in which the order of aggregation is non-deterministic, leading to minor numerical differences. As an alternative, you can make use of the segment_csr operation of torch_scatter.

@yuxiang-guo
Copy link
Author

Scatter is a non-deterministic operation by design since it makes use of atomic operations in which the order of aggregation is non-deterministic, leading to minor numerical differences. As an alternative, you can make use of the segment_csr operation of torch_scatter.

Thanks very much. But how can I get the same result by using the segment_csr operation instead of `scatter' operation?

@rusty1s
Copy link
Owner

rusty1s commented Jul 10, 2021

segment_csr expects indices to be sorted, and the ptr tensor denotes the compressed index representation similar to a sparse matrix CSR representation:

index = torch.tensor([0, 0, 0, 1, 2, 2])
ptr = torch.tensor([0, 3, 4, 6])

scatter(x, index, dim=0) == segment_csr(x, ptr)

@yuxiang-guo
Copy link
Author

segment_csr expects indices to be sorted, and the ptr denotes the compressed index representation similar to a sparse matrix CSR representation:

index = torch.tensor([0, 0, 0, 1, 2, 2])
ptr = torch.tensor([0, 3, 4, 6])

scatter(x, index, dim=0) == segment_csr(x, ptr)

Thank you very much!

@yuxiang-guo
Copy link
Author

segment_csr expects indices to be sorted, and the ptr tensor denotes the compressed index representation similar to a sparse matrix CSR representation:

index = torch.tensor([0, 0, 0, 1, 2, 2])
ptr = torch.tensor([0, 3, 4, 6])

scatter(x, index, dim=0) == segment_csr(x, ptr)

By the way, if I cannot guarantee the indices are in order. And sorting is too time-consuming. How can I do or is there any other method that can achieve the same result as `scatter'?

@rusty1s
Copy link
Owner

rusty1s commented Jul 12, 2021

If you cannot guarantee that indices are in order, there exists no parallelized operation that can group elements while still being deterministic. This is just the natural imprecision of floating point arithmetic.

@github-actions
Copy link

github-actions bot commented Jan 9, 2022

This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants