[AUTO_MERGE] Remove if/else from dist_data_loader for collate_func#727
[AUTO_MERGE] Remove if/else from dist_data_loader for collate_func#727
dist_data_loader for collate_func#727Conversation
Co-authored-by: drewoldag <47493171+drewoldag@users.noreply.github.com>
dist_data_loader for collate_func
…ftest.py Co-authored-by: drewoldag <47493171+drewoldag@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #727 +/- ##
=======================================
Coverage 64.14% 64.15%
=======================================
Files 61 61
Lines 5996 5989 -7
=======================================
- Hits 3846 3842 -4
+ Misses 2150 2147 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@copilot check over this PR again, and review the code once more, now that we've incorporated the latest changes from main to make sure this is the only place that needs to be updated. |
Co-authored-by: drewoldag <47493171+drewoldag@users.noreply.github.com>
Reviewed — the only place needing an update was |
Click here to view all benchmarks. |
drewoldag
left a comment
There was a problem hiding this comment.
This looks pretty reasonable to me. It removes some code that was added in haste to support collate functions for the applecider incubator. And the approach that was used there has been changed to the more uniform approach of defining the collation function in the dataset class and registering it using the DataProvider instance.
There was a problem hiding this comment.
Pull request overview
This PR simplifies dist_data_loader’s collate function selection logic in src/hyrax/pytorch_ignite.py by removing an if/else branch and deleting the helper that dynamically loaded a collate function from config.
Changes:
- Always assigns
data_loader_kwargs["collate_fn"] = dataset.collateindist_data_loader. - Removes the
load_collate_functionhelper. - Removes the now-unused
get_or_load_classimport.
| else: | ||
| collation_func = load_collate_function(data_loader_kwargs) | ||
| data_loader_kwargs["collate_fn"] = collation_func | ||
| data_loader_kwargs["collate_fn"] = dataset.collate |
There was a problem hiding this comment.
dist_data_loader now unconditionally sets collate_fn to dataset.collate, but non-DataProvider datasets (notably iterable-style datasets) don't appear to define a collate method. This breaks the documented config option config['data_loader']['collate_fn'] (default false in hyrax_default_config.toml, and iterable_dataset_collate is referenced in data_set_registry.py) and will raise AttributeError when an iterable dataset reaches this function (the function already has an if dataset.is_iterable() branch). Consider restoring the previous fallback behavior: use dataset.collate when available (e.g., DataProvider), otherwise load/use the config-specified collate function (or None when false) so iterable datasets can still work.
| data_loader_kwargs["collate_fn"] = dataset.collate | |
| # Prefer a dataset-provided collate function when available (e.g., DataProvider), | |
| # otherwise fall back to the config-specified collate_fn. Treat a TOML `false` | |
| # value as "not set" and pass None through to the DataLoader. | |
| collate_from_config = data_loader_kwargs.get("collate_fn", None) | |
| if hasattr(dataset, "collate"): | |
| data_loader_kwargs["collate_fn"] = dataset.collate | |
| else: | |
| if collate_from_config is False: | |
| data_loader_kwargs["collate_fn"] = None |
There was a problem hiding this comment.
Is this a reasonable comment for "iterable datasets"
I am not familiar enough with Hyrax to verify myself?
But otherwise the implementation looks sane (and going through the commit history and comments was interesting to see how the copilot workflow is operating here)
There was a problem hiding this comment.
We just merged a PR that should completely remove all traces of iterable-style datasets. So I believe that the portion of this comment that is relevant are perhaps updating some documentation around config[data_loader][collate_fn].
I'll also double check to see if it's possible to simply remove that from the config entirely.
mtauraso
left a comment
There was a problem hiding this comment.
We need to make 100% sure that collate_fn instructions make it into our docs and demo notebooks, because this reduces the discoverability of the feature.
Created #746 |
dist_data_loader for collate_funcdist_data_loader for collate_func
dist_data_loaderhad a conditional block to assign the collate function:DataProviderinstances useddataset.collate, everything else loaded from config. Since every map-style dataset is wrapped inDataProvider, this branch was dead code.Change Description
pytorch_ignite.py: Replace the if/else block withdata_loader_kwargs["collate_fn"] = dataset.collatepytorch_ignite.py: Remove the now-unusedload_collate_functionhelper function and theget_or_load_classimport that was only used by itBefore:
After:
Solution Description
All map-style datasets reaching
dist_data_loaderare wrapped inDataProvider, which already has acollatemethod. The if/else was dead code and has been removed. With the if/else gone,load_collate_functionbecame unreachable and was also removed along with its now-unusedget_or_load_classimport.Code Quality
Original prompt
pytorch_ignite:dist_data_loaderfor collate_func #726💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.