New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DensePaddingDataLoader
for flexible (padded) dense batching of BaseData
objects
#8518
base: master
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8518 +/- ##
==========================================
- Coverage 90.02% 89.08% -0.94%
==========================================
Files 470 471 +1
Lines 30164 30333 +169
==========================================
- Hits 27154 27022 -132
- Misses 3010 3311 +301 ☔ View full report in Codecov by Sentry. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
return batch | ||
|
||
|
||
class DensePaddingCollater: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. It looks like we now have multiple places that dictate how to collate or pad attributes from data objects. I wonder if we can re-use existing code from there, e.g., we have the PadTransform
, that already does most of what you have written here. Do you think there is a chance to re-use some of this logic for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @rusty1s. It's possible we may be able to reuse existing assets here. I'm not too familiar with the PadTransform
utility. Does it already support padding both float and non-float (e.g., int, bool) tensors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should be supported.
…thout auxiliary libraries
for more information, see https://pre-commit.ci
🚀 The new functionality, motivation and pitch
DensePaddingDataLoader
class that enables flexible dense batching ofBaseData
objects.Data
orHeteroData
objects into CV/NLP-driven (dense-batching) model training pipelines.Alternatives
No alternatives exist for this functionality, including PyG's
DenseDataLoader
which assumes each (sub)graph's node count must be identical to all other (sub)graphs.Additional context
This new data loader enables one to easily experiment with message passing algorithms that operate on dense tensor representations of batched graph objects without having to redesign one's PyG-driven
Dataset
class. This idea was initiated per #8516 and raised as an issue in #8517.