Support multiple node type sampling in NeighborLoader (V2) #5521

Padarn · 2022-09-24T10:04:23Z

This PR adds functionality to allow for multiple node types to be sampled in NeighbourLoader.

The interface looks as was discussed in the roadmap (#4765):

NeighbourLoader(
   input_nodes=dict(
     paper=torch.LongTensor([0,1,2]), 
     author=torch.LongTensor([0,1,2])
    )
  ...
)

Internally, it converts this to a list of tuples.

[('paper', 0), ('paper', 1),....]

This is not very efficient, but benchmarks #4765 (comment) showed it to be acceptable.

TODO:

Add tests

Addresses #4765

cc @mananshah99

Padarn · 2022-09-24T10:05:22Z

@mananshah99 I was trying to figure out how to cleanly add this, and the easiest seemed to be to do something like what I've done here: Wrapping the 'input nodes' up into a class.

I know the code is probably not in the right place, and I've not actually used it to support multiple nodes yet, but I wanted to get feedback on this approach.

mananshah99 · 2022-10-15T00:21:14Z

Sorry, just saw this in my inbox. Will have a review in shortly

Padarn · 2022-10-23T00:10:44Z

hey just a bump @mananshah99 :-)

rusty1s · 2022-10-24T06:00:22Z

I'll try to get this merged today. Sorry for the slowness on our end :)

for more information, see https://pre-commit.ci

Padarn · 2022-10-25T08:41:43Z

thanks! but actually its no where near ready for merging - just looking for feedback on the approach before I implement it more thoroughly

mananshah99

Left a few comments. Overall, I think this design makes sense, summarizing my understanding below (feel free to correct me if I am mistaken):

We will be supporting dictionaries of input nodes to a loader
Internally, the loader converts this to a SamplingInputNodes representation, which has methods to convert between this dict representation and a flattened list that we need for the PyTorch dataloader to define the list of samples to batch properly (I wonder if we can override that to work on dicts instead of needing to flatten here...)
Within collate_fn, we will re-convert back to a dict representation containing potentially multiple node types that we then pass to the sampling implementation (which needs to support this now).

mananshah99 · 2022-11-08T06:19:17Z

torch_geometric/loader/utils.py

+    def node_types(self) -> Tuple[Optional[str]]:
+        return tuple(self.input_nodes.keys())


nit: return a list instead of a tuple?

mananshah99 · 2022-11-08T06:28:29Z

torch_geometric/loader/utils.py

+    @property
+    def as_list(self) -> Tuple[Tuple[str, int]]:
+        return tuple([(node_type, int(i))
+                      for node_type, tensor in self.input_nodes.items()
+                      for i in tensor])


I guess we need this (and the below function) so that we can pass the right data to the PyTorch DataLoader constructor; perhaps we can leave a note here explaining that?

mananshah99 · 2022-11-08T06:35:28Z

torch_geometric/loader/node_loader.py

        super().__init__(iterator, collate_fn=self.collate_fn, **kwargs)

    def collate_fn(self, index: NodeSamplerInput) -> Any:
        r"""Samples a subgraph from a batch of input nodes."""
-        input_data: NodeSamplerInput = self.input_data[index]
+        input_nodes: NodeSamplerInput = self.input_nodes[index]


Am I correct in understanding that, in the final version, InputData will be responsible for producing a Dict[str, NodeSamplerInput] (e.g. using from_list), and we will then sample on this dictionary by providing support for this in the sampler implementation?

mananshah99 · 2022-11-08T06:36:38Z

torch_geometric/loader/utils.py

+@dataclass
+class SamplingInputNodes:


A few thoughts:

SamplingInputNodes and InputNodes as types have different functionalities, but almost identical names. Thoughts on changing this name (e.g. to something like SamplingInput) to increase the edit distance between them?

Would it be better to have input_nodes: InputNodes? I am not sure I understand the need for a separate Dict[Optional[str], Sequence here. That would also make the relationship between this class and InputNodes more clear.

Am I correct in my understanding that we will pass sampling_input_nodes.as_list() to the DataLoader constructor, and once batches are selected by the torch DataLoader, we will convert them from_list back to a dict representation that we then pass to the sampler?

Padarn · 2022-11-08T08:38:55Z

Your understanding is correct. Thanks for the feedback.

I will finish the implementation taking your comments into account 👍

denadai2 · 2024-03-10T13:23:25Z

Your understanding is correct. Thanks for the feedback.

I will finish the implementation taking your comments into account 👍

Hi @Padarn, thanks for your contributioooon!! Would you still be interested on finishing it? It would be amazing to have this feature

WIP multi-node

8861038

github-actions bot added data loader labels Sep 24, 2022

rusty1s added feature 1 - Priority P1 labels Sep 25, 2022

rusty1s assigned Padarn Sep 25, 2022

rusty1s removed the data label Sep 25, 2022

Padarn requested a review from mananshah99 October 10, 2022 01:34

rusty1s and others added 4 commits October 24, 2022 17:05

Merge branch 'master' into padarn/multi-input-nodes

b429be0

[pre-commit.ci] auto fixes from pre-commit.com hooks

b314feb

for more information, see https://pre-commit.ci

update

2c67175

update

82096c0

mananshah99 reviewed Nov 8, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple node type sampling in NeighborLoader (V2) #5521

Support multiple node type sampling in NeighborLoader (V2) #5521

Padarn commented Sep 24, 2022

Padarn commented Sep 24, 2022

mananshah99 commented Oct 15, 2022

Padarn commented Oct 23, 2022

rusty1s commented Oct 24, 2022

Padarn commented Oct 25, 2022

mananshah99 left a comment

mananshah99 Nov 8, 2022

mananshah99 Nov 8, 2022

mananshah99 Nov 8, 2022

mananshah99 Nov 8, 2022

Padarn commented Nov 8, 2022

denadai2 commented Mar 10, 2024

		def node_types(self) -> Tuple[Optional[str]]:
		return tuple(self.input_nodes.keys())

Support multiple node type sampling in NeighborLoader (V2) #5521

Are you sure you want to change the base?

Support multiple node type sampling in NeighborLoader (V2) #5521

Conversation

Padarn commented Sep 24, 2022

Padarn commented Sep 24, 2022

mananshah99 commented Oct 15, 2022

Padarn commented Oct 23, 2022

rusty1s commented Oct 24, 2022

Padarn commented Oct 25, 2022

mananshah99 left a comment

Choose a reason for hiding this comment

mananshah99 Nov 8, 2022

Choose a reason for hiding this comment

mananshah99 Nov 8, 2022

Choose a reason for hiding this comment

mananshah99 Nov 8, 2022

Choose a reason for hiding this comment

mananshah99 Nov 8, 2022

Choose a reason for hiding this comment

Padarn commented Nov 8, 2022

denadai2 commented Mar 10, 2024