An initial implementation of the LABOR-0 sampling algorithm #242

mfbalin · 2023-07-09T20:01:54Z

Implements the LABOR-0 sampling algorithm described in https://arxiv.org/abs/2210.13339 and https://docs.dgl.ai/en/1.1.x/generated/dgl.dataloading.LaborSampler.html.
Sequential poisson sampling was used so that each vertex gets a deterministic number of neighbors fully matching the behaviour of neighbor sampler.

Update: To appear at NeurIPS 2023.

mfbalin · 2023-07-09T20:16:14Z

I am unfamiliar with disjoint, directed, or temporal options, so I deleted them for now. But LABOR-0 can be used as a drop-in replacement to neighbor sampling so it should be possible to extend its support to those cases too. Extension to heterogenous case should also be straightforward.

codecov · 2023-07-09T20:37:04Z

Codecov Report

Merging #242 (d24cd6c) into master (c231a45) will increase coverage by 1.00%.
The diff coverage is 93.26%.

@@            Coverage Diff             @@
##           master     #242      +/-   ##
==========================================
+ Coverage   83.69%   84.70%   +1.00%     
==========================================
  Files          28       29       +1     
  Lines         883      987     +104     
==========================================
+ Hits          739      836      +97     
- Misses        144      151       +7

Impacted Files	Coverage Δ
pyg_lib/csrc/sampler/cpu/labor_kernel.cpp	`92.39% <92.39%> (ø)`
pyg_lib/csrc/sampler/neighbor.cpp	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

mfbalin · 2023-07-09T21:17:44Z

@rusty1s What are the next steps to seamlessly enable users to experiment with this new sampler in PyG?

for more information, see https://pre-commit.ci

mfbalin · 2023-07-10T03:41:42Z

Right now, only LABOR-0 is implemented. For the importance sampling versions, an edge vector containing reciprocals of importance sampling probabilities will be returned. Should I add that vector to the return value as an optional return value before those cases are implemented? Or is it fine to make modifications to the API later?

mfbalin · 2023-07-12T00:27:50Z

If weighted sampling is of interest to PyG users, I can contribute that too.

mfbalin · 2023-07-13T03:48:30Z

Contains the result of the benchmark script. labor2023-07-13 034625.678968.csv

for more information, see https://pre-commit.ci

mfbalin · 2023-07-16T15:27:15Z

Is there an easy way to refactor the neighbor_kernel and this file so that the extra features such as temporal or disjoint come for free for this implementation as well?

mfbalin · 2024-01-19T05:08:51Z

CPU and GPU implementations that support weighted sampling can be found here: https://docs.dgl.ai/en/latest/generated/dgl.graphbolt.LayerNeighborSampler.html#dgl.graphbolt.LayerNeighborSampler
It is possible to use this library as a dataloader for PyG models so there is not much point to maintain another version here.

mfbalin mentioned this pull request Jul 9, 2023

[Roadmap] Advanced Graph Sampling Routines 🚀 pyg-team/pytorch_geometric#7331

Open

17 tasks

implement LABOR-0

52a8c19

mfbalin force-pushed the labor_sampler branch from c0ab82d to 52a8c19 Compare July 9, 2023 20:36

mfbalin added 2 commits July 9, 2023 20:58

edit changelog

ee5136b

add the python binding

b773f62

mfbalin and others added 2 commits July 9, 2023 23:24

adding a benchmark

4c5849f

[pre-commit.ci] auto fixes from pre-commit.com hooks

0d3e300

for more information, see https://pre-commit.ci

mfbalin marked this pull request as draft July 9, 2023 23:53

mfbalin and others added 2 commits July 10, 2023 03:10

remove layer dependency as pyg has layers inside each other

8669347

[pre-commit.ci] auto fixes from pre-commit.com hooks

d34d1fb

for more information, see https://pre-commit.ci

mfbalin force-pushed the labor_sampler branch from 9ab1a63 to d34d1fb Compare July 10, 2023 03:18

mfbalin marked this pull request as ready for review July 10, 2023 03:19

mfbalin and others added 3 commits July 10, 2023 03:24

remove unnecessary change

fd166e6

fix linting

f8500c5

[pre-commit.ci] auto fixes from pre-commit.com hooks

832a4b6

for more information, see https://pre-commit.ci

mfbalin and others added 4 commits July 13, 2023 03:50

adding weighted sampling

5d3a764

[pre-commit.ci] auto fixes from pre-commit.com hooks

209999a

for more information, see https://pre-commit.ci

linting

d2b497a

[pre-commit.ci] auto fixes from pre-commit.com hooks

d24cd6c

for more information, see https://pre-commit.ci

rusty1s assigned mfbalin Sep 4, 2023

rusty1s added 1 - Priority P1 feature sampler labels Sep 4, 2023

mfbalin closed this Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An initial implementation of the LABOR-0 sampling algorithm #242

An initial implementation of the LABOR-0 sampling algorithm #242

mfbalin commented Jul 9, 2023 •

edited

Loading

mfbalin commented Jul 9, 2023 •

edited

Loading

codecov bot commented Jul 9, 2023 •

edited

Loading

mfbalin commented Jul 9, 2023

mfbalin commented Jul 10, 2023 •

edited

Loading

mfbalin commented Jul 12, 2023

mfbalin commented Jul 13, 2023 •

edited

Loading

mfbalin commented Jul 16, 2023

mfbalin commented Jan 19, 2024

An initial implementation of the LABOR-0 sampling algorithm #242

An initial implementation of the LABOR-0 sampling algorithm #242

Conversation

mfbalin commented Jul 9, 2023 • edited Loading

mfbalin commented Jul 9, 2023 • edited Loading

codecov bot commented Jul 9, 2023 • edited Loading

Codecov Report

mfbalin commented Jul 9, 2023

mfbalin commented Jul 10, 2023 • edited Loading

mfbalin commented Jul 12, 2023

mfbalin commented Jul 13, 2023 • edited Loading

mfbalin commented Jul 16, 2023

mfbalin commented Jan 19, 2024

mfbalin commented Jul 9, 2023 •

edited

Loading

mfbalin commented Jul 9, 2023 •

edited

Loading

codecov bot commented Jul 9, 2023 •

edited

Loading

mfbalin commented Jul 10, 2023 •

edited

Loading

mfbalin commented Jul 13, 2023 •

edited

Loading