Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataModule: common base class to reduce code duplication #1260

Merged
merged 3 commits into from
Apr 26, 2023

Conversation

adamjstewart
Copy link
Collaborator

A lot of the code in our base classes was shared, so I aggregated it.

Originally I was just planning on adding better error messages, but figured I would do this first.

@adamjstewart adamjstewart added this to the 0.5.0 milestone Apr 18, 2023
@github-actions github-actions bot added datamodules PyTorch Lightning datamodules documentation Improvements or additions to documentation labels Apr 18, 2023
Copy link
Member

@calebrob6 calebrob6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels kind of like premature optimization (is premature abstracting a thing?) as it only saves a net few lines (just repeated twice), and "DataModule" doesn't seem like the correct name. It is good because I can't think of a case why we would want to go back and make different plot, on_after_batch_transfer or prepare_data methods for the Geo and NonGeo data modules.

@@ -23,15 +23,76 @@
from .utils import MisconfigurationException


class GeoDataModule(LightningDataModule): # type: ignore[misc]
"""Base class for data modules containing geospatial information.
class DataModule(LightningDataModule): # type: ignore[misc]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this a TorchgeoDataModule or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could, I don't think it matters too much. The user should never use this class.

@adamjstewart
Copy link
Collaborator Author

For context, the reason I made this change was because I wanted to improve the datamodule error messages. The current error message just says:

Foo.setup does not define a 'foo_dataset'

but the actual error could be any of:

  • Foo.setup does not define a foo_dataset or dataset
  • Foo.setup does not define a foo_sampler or sampler or foo_batch_sampler or batch_sampler
  • The dataset length is 0
  • The sampler length is 0

Since this is a bit of code and requires access to self, I figured it would make sense to add this check to a base class. When I started working on a base class, that's when I realized how much duplicated code there was. So I figured it would be easiest to add the base class in one PR and improve error messages in a second PR.

@calebrob6
Copy link
Member

Totally fine! I just think the name "DataModule" might be confusing, maybe "BaseDataModule"? Else, I'm cool with it.

@calebrob6 calebrob6 merged commit 7f1ec52 into main Apr 26, 2023
@calebrob6 calebrob6 deleted the datamodules/base branch April 26, 2023 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datamodules PyTorch Lightning datamodules documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants