-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Description
🚀 The feature, motivation and pitch
I am working on Graphs. Right now I have a model running that takes a subgraph and does some predictions.
To improve throughput I want to batch multiple subgraphs of different sizes together.
Padding them to the same size does not work in my case as I use an aggregation operation where I don't want to aggregate the padded neighbours but masking some (the padded) neighbours is not possible.
I tried modifiying my model to support nested tensors as input which somewhat worked, but I had to cut out some unsupported operations, specifically layer_norm.
Also currently there are no supported loss functions, so a cross_entropy or nll_loss (and log_softmax) that supports nested tensors would be a big usability upgrade.
Also some error messages related to nested tensors point to https://github.com/pytorch/nestedtensor which I suspect is not correct anymore since nested tensors were moved to the core.
Alternatives
I tried implementing layer_norm myself using the currently supported nested ops, but was not sucessfull.
The issue is the "a/sqrt(b)" calculation, which I did not get to work without a .pow() or element wise division of two nested tensors.
For the loss function I can work around it by unbinding and stacking the output nested tensors, but this is very ugly.
Additional context
No response