Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An initial implementation of the LABOR-0 sampling algorithm #242

Closed
wants to merge 14 commits into from

Conversation

mfbalin
Copy link

@mfbalin mfbalin commented Jul 9, 2023

Implements the LABOR-0 sampling algorithm described in https://arxiv.org/abs/2210.13339 and https://docs.dgl.ai/en/1.1.x/generated/dgl.dataloading.LaborSampler.html.
Sequential poisson sampling was used so that each vertex gets a deterministic number of neighbors fully matching the behaviour of neighbor sampler.

Update: To appear at NeurIPS 2023.

@mfbalin
Copy link
Author

mfbalin commented Jul 9, 2023

I am unfamiliar with disjoint, directed, or temporal options, so I deleted them for now. But LABOR-0 can be used as a drop-in replacement to neighbor sampling so it should be possible to extend its support to those cases too. Extension to heterogenous case should also be straightforward.

@codecov
Copy link

codecov bot commented Jul 9, 2023

Codecov Report

Merging #242 (d24cd6c) into master (c231a45) will increase coverage by 1.00%.
The diff coverage is 93.26%.

@@            Coverage Diff             @@
##           master     #242      +/-   ##
==========================================
+ Coverage   83.69%   84.70%   +1.00%     
==========================================
  Files          28       29       +1     
  Lines         883      987     +104     
==========================================
+ Hits          739      836      +97     
- Misses        144      151       +7     
Impacted Files Coverage Δ
pyg_lib/csrc/sampler/cpu/labor_kernel.cpp 92.39% <92.39%> (ø)
pyg_lib/csrc/sampler/neighbor.cpp 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mfbalin
Copy link
Author

mfbalin commented Jul 9, 2023

@rusty1s What are the next steps to seamlessly enable users to experiment with this new sampler in PyG?

@mfbalin mfbalin marked this pull request as draft July 9, 2023 23:53
@mfbalin mfbalin marked this pull request as ready for review July 10, 2023 03:19
@mfbalin
Copy link
Author

mfbalin commented Jul 10, 2023

Right now, only LABOR-0 is implemented. For the importance sampling versions, an edge vector containing reciprocals of importance sampling probabilities will be returned. Should I add that vector to the return value as an optional return value before those cases are implemented? Or is it fine to make modifications to the API later?

@mfbalin
Copy link
Author

mfbalin commented Jul 12, 2023

If weighted sampling is of interest to PyG users, I can contribute that too.

@mfbalin
Copy link
Author

mfbalin commented Jul 13, 2023

Contains the result of the benchmark script. labor2023-07-13 034625.678968.csv

@mfbalin
Copy link
Author

mfbalin commented Jul 16, 2023

Is there an easy way to refactor the neighbor_kernel and this file so that the extra features such as temporal or disjoint come for free for this implementation as well?

@mfbalin
Copy link
Author

mfbalin commented Jan 19, 2024

CPU and GPU implementations that support weighted sampling can be found here: https://docs.dgl.ai/en/latest/generated/dgl.graphbolt.LayerNeighborSampler.html#dgl.graphbolt.LayerNeighborSampler
It is possible to use this library as a dataloader for PyG models so there is not much point to maintain another version here.

@mfbalin mfbalin closed this Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants