Refactor trainers #1541

adamjstewart · 2023-09-01T20:25:24Z

In #1195 we introduced a new structure for our trainers. This PR updates all existing trainers to match.

Add new base class (reduces code duplication)
No *args or **kwargs (prevents argument typos, adds default values and type hints, required for LightningCLI)
No typing.cast (declare type when initialized)
__init__ first (should be first thing in docs)
Added configure_(losses|metrics|models) methods (easier to override in subclass)

This is a great time to bring #996 back up. At this point, our "trainers" are referred to as:

tasks: they all have a Task suffix
trainers: they live in torchgeo.trainers
models: this is what Lightning refers to them as
modules: our YAML config file list them as module: (this will go away in my next PR)

I would love it if we could decide on a single naming scheme and be consistent...

Closes #1393

adamjstewart · 2023-09-02T01:23:37Z

tests/trainers/test_byol.py

@@ -76,7 +74,6 @@ def test_trainer(

        # Instantiate model
        model = instantiate(conf.module)
-        model.backbone = SegmentationTestModel(**conf.module)


BYOLTask has never had a backbone attribute. This line of code didn't do anything, and the tests are still slower and require more memory than necessary.

adamjstewart · 2023-09-02T03:59:46Z

torchgeo/trainers/detection.py


    .. versionadded:: 0.4
    """

-    def config_task(self) -> None:
-        """Configures the task based on kwargs parameters passed to the constructor."""
-        backbone_pretrained = self.hyperparams.get("pretrained", True)


Previously this would use a pretrained backbone and download weights from the internet by default, which doesn't match the behavior of any other trainer.

adamjstewart · 2023-09-02T14:26:19Z

conf/deepglobelandcover.yaml

@@ -4,9 +4,8 @@ module:
  model: "unet"
  backbone: "resnet18"
  weights: null
-  learning_rate: 1e-3
-  learning_rate_schedule_patience: 6
-  verbose: false


Verbose isn't an option

adamjstewart · 2023-09-02T14:32:59Z

torchgeo/trainers/byol.py

+                or None for random weights, or the path to a saved model state dict.
+            in_channels: Number of input channels to model.
+            lr: Learning rate for optimizer.
+            weight_decay: Weight decay (L2 penalty).


Many of these parameters were not previously documented.

adamjstewart · 2023-09-02T14:50:27Z

torchgeo/trainers/detection.py

@@ -274,6 +290,7 @@ def validation_step(self, *args: Any, **kwargs: Any) -> None:

    def on_validation_epoch_end(self) -> None:
        """Logs epoch level validation metrics."""
+        # TODO: why is this method necessary?


None of our other trainers require special handling for metric logging. Is it only because of Lightning-AI/torchmetrics#1832 (comment)?

nilsleh · 2023-09-05T14:28:25Z

Should we have a public configure_models, configure_losses, configure_metrics for every trainer to make it easier for people to override these? This would be similar to configure_optimizers.

Regarding this point, for another domain library project I am also writing "trainers" that need a model, loss and metrics. When Isaac told me about hydra configs and instantiation I started using that at least for the model, loss and optimizer part. So you would have something like:

class BaseModel(LightningModule):
    def __init__(
        self,
        model: nn.Module,
        optimizer: type[torch.optim.Optimizer],
        loss_fn: nn.Module,
    ) -> None:
        self.model = model
        self.optimizer = optimizer
        self.lr_scheduler = lr_scheduler
        self.loss_fn = loss_fn

    def configure_optimizers(self) -> dict[str, Any]:
        """Initialize the optimizer."""
        optimizer = self.optimizer(params=self.parameters())
        return optimizer

You could also provide defaults. But what I like is that you have some nice control via config files, so I could have:

method:
  _target_: BaseModel
  model:
    _target_: some.MLP # timm.create_model for timm models
    # all arguments to initiate the mlp for example
    n_outputs: 1
    n_hidden: [50]
  optimizer:
    _target_: torch.optim.Adam # can change optimizers here easily
    _partial_: true
    lr: 0.003
  loss_fn:
    _target_: torch.nn.MSELoss

So just via config you have control about optimizer and model architecture. Not sure if this is good practice but it has been convenient for running experiments and keeping track of configurations.

adamjstewart · 2023-09-05T15:11:10Z

You're in luck. Once this PR is merged I'll open a follow-up PR that switches everything to LightningCLI which supports exactly what you're describing without even changing the code:

https://lightning.ai/docs/pytorch/stable/cli/lightning_cli_intermediate_2.html#multiple-optimizers

This all relies on jsonargparse which supports command-line, YAML, and JSON configuration.

Refactor trainers

f4c516e

adamjstewart added this to the 0.5.0 milestone Sep 1, 2023

github-actions bot added the trainers PyTorch Lightning trainers label Sep 1, 2023

Update conf files

7c8e23b

github-actions bot added the testing Continuous integration testing label Sep 1, 2023

adamjstewart added 2 commits September 1, 2023 16:27

Fix pydocstyle

43fd0b5

Add scheduler monitor

0c9042c

adamjstewart added the backwards-incompatible Changes that are not backwards compatible label Sep 1, 2023

Update conf files

7cb1556

adamjstewart mentioned this pull request Sep 2, 2023

fix: Support tensors and arrays for class_weight #1413

Merged

adamjstewart added 2 commits September 1, 2023 20:16

Fix BYOL backbone

f658e1c

Remove broken configure_optimizers out type

c0d830c

adamjstewart marked this pull request as draft September 2, 2023 01:03

adamjstewart added 2 commits September 1, 2023 21:07

Fix type hints

03f8483

No casts

7d3a6f7

adamjstewart commented Sep 2, 2023

View reviewed changes

Increase test coverage

1355bc9

adamjstewart commented Sep 2, 2023

View reviewed changes

adamjstewart added 5 commits September 2, 2023 11:18

Better documentation of supported models

e55a8dc

Remove unimportant configuration

3f32e09

Remove unimportant configuration

c64b7e3

Drop model_kwargs

04fc8cb

Docstring improvements

79944a2

adamjstewart marked this pull request as ready for review September 2, 2023 17:17

Add base class for all torchgeo trainers

49b684e

adamjstewart added 3 commits September 5, 2023 17:20

Add configure_* methods for losses/metrics/models

40e8960

init must come first

189727e

More type hints

0825d73

adamjstewart merged commit 578aded into microsoft:main Sep 11, 2023
21 checks passed

adamjstewart deleted the trainers/refactor branch September 11, 2023 22:05

adamjstewart mentioned this pull request May 15, 2024

class_weights cannot be passed via config file as a tensor is expected #2060

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor trainers #1541

Refactor trainers #1541

adamjstewart commented Sep 1, 2023 •

edited

Loading

adamjstewart Sep 2, 2023

adamjstewart Sep 2, 2023

adamjstewart Sep 2, 2023

adamjstewart Sep 2, 2023

adamjstewart Sep 2, 2023

nilsleh commented Sep 5, 2023

adamjstewart commented Sep 5, 2023

Refactor trainers #1541

Refactor trainers #1541

Conversation

adamjstewart commented Sep 1, 2023 • edited Loading

adamjstewart Sep 2, 2023

Choose a reason for hiding this comment

adamjstewart Sep 2, 2023

Choose a reason for hiding this comment

adamjstewart Sep 2, 2023

Choose a reason for hiding this comment

adamjstewart Sep 2, 2023

Choose a reason for hiding this comment

adamjstewart Sep 2, 2023

Choose a reason for hiding this comment

nilsleh commented Sep 5, 2023

adamjstewart commented Sep 5, 2023

adamjstewart commented Sep 1, 2023 •

edited

Loading