Speeding up and reducing the memory footprint of the trainer tests #344

calebrob6 · 2021-12-31T23:18:00Z

Our "tests/trainers/" unit tests currently use the following resources (based off of /usr/bin/time --verbose pytest tests/trainers/ on my machine):

User time (seconds): 682.60
System time (seconds): 269.87
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:49.86
Maximum resident set size (kbytes): 6274388

This PR monkeypatches / replaces the models (ResNet18s and UNets with ResNet18 backbones) in the generic tests with smaller models. This reduces the overall resource usage to:

User time (seconds): 503.18
System time (seconds): 212.22
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:41.41
Maximum resident set size (kbytes): 5501672

For "test_classification.py" specifically the before/after is:

User time (seconds): 53.64
System time (seconds): 35.74
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.28
Maximum resident set size (kbytes): 1941160

to

User time (seconds): 24.94
System time (seconds): 16.05
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.39
Maximum resident set size (kbytes): 1254892

Similarly, for "test_segmentation.py":

User time (seconds): 251.97
System time (seconds): 51.93
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:19.06
Maximum resident set size (kbytes): 2774676

to

User time (seconds): 228.96
System time (seconds): 38.08
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.87
Maximum resident set size (kbytes): 1403244

adamjstewart · 2021-12-31T23:28:22Z

This isn't as much of a speed/memory difference as I was expecting. Any thoughts on why things are still quite slow and use a lot of memory?

isaaccorley · 2022-01-01T00:17:41Z

For classification, we could probably make the stride something large so we do less multiplies during convolution. Wonder if that would make a difference.

calebrob6 · 2022-01-01T00:20:40Z

For "test_classification.py", if I monkeypatch timm in every method we use ClassificationTask then it drops resource usage a little bit more:

User time (seconds): 17.32
System time (seconds): 10.92
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.26
Maximum resident set size (kbytes): 1079080

However, even if I replace the models with noops (this is kind of hacky so I haven't committed it), the resource usage doesn't drop lower. This means that the datasets/datamodules are responsible for the additional time/memory.

This is a decent improvement though, it drops test time from (sampling from recent Ubuntu3.6 runs) from ~2m20s to 1m35s (33% decrease).

calebrob6 · 2022-01-01T00:21:02Z

if I monkeypatch timm in every method we use ClassificationTask

how can I do this once for the whole file?

calebrob6 · 2022-01-01T02:24:41Z

With the shrinking of the OSCD test data "test_segmentation.py" is now:

User time (seconds): 74.53
System time (seconds): 25.18
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.36
Maximum resident set size (kbytes): 1139396

(a huge improvement)

isaaccorley

lgtm. Not sure if we need docstrings for the classes in test_utils

calebrob6 · 2022-01-01T03:15:24Z

tests/trainers/test_regression.py

@@ -30,6 +32,8 @@ def test_trainer(self, name: str, classname: Type[LightningDataModule]) -> None:
        model_kwargs = conf_dict["module"]
        model = RegressionTask(**model_kwargs)

+        model.model = RegressionTestModel()


I could monkeypatch torchvision.models.resnet18 here to save having the ResNet18 object actually being created in the constructor, but it seems like that won't save much time/RAM and will change when we upgrade RegressionTask

adamjstewart · 2022-01-01T17:17:54Z

Tested again with the latest commit and latest main:

main

$ /usr/bin/time -v pytest
User time (seconds): 735.98
System time (seconds): 289.61
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:19.87
Maximum resident set size (kbytes): 8144368

tests/fix_memory

$ /usr/bin/time -v pytest
User time (seconds): 620.70
System time (seconds): 210.79
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:09.44
Maximum resident set size (kbytes): 7420892

So that's a 15% (10 s) reduction in wall clock time and a 10% (700 MB) reduction in total RAM. This is definitely an improvement, but it's unclear if this is enough to avoid all OOM errors on Windows. I've still seen at least one OOM Windows error on main even after the reduction in OSCD pad size.

calebrob6 · 2022-01-01T18:11:31Z

Collection of comments:

From here -- windows and linux machines have 7GB of RAM
@adamjstewart do you know how I can monkeypatch timm for the entirety of "test_classification.py"? (or at a test class level?)
The RAM usage should improve once we get rid of the other trainer tests
I'll poke around some more today to see if there are: any huge test files laying around, the RAM usage of each part of the tests

adamjstewart · 2022-01-01T19:25:13Z

@adamjstewart do you know how I can monkeypatch timm for the entirety of "test_classification.py"? (or at a test class level?)

You would probably want to create a fixture in conftest.py. This file can be placed at any level in the repo (tests, tests/trainers, etc.) and defines fixtures available in all tests in the same directory and subdirectories.

calebrob6 · 2022-01-01T19:39:35Z

I don't think that I want to make a fixture. I want to make it such that when the code in ClassificationTask calls timm.create_model, my code gets called instead. Currently, I do that within a single test (

torchgeo/tests/trainers/test_classification.py

Line 59 in fc54edd

monkeypatch.setattr( # type: ignore[attr-defined]

), but I would need to repeat this line of code for all tests.

adamjstewart · 2022-01-01T19:42:10Z

Fixtures don't need to have a return value, they can monkeypatch a section of code and then yield, allowing test functions to use that monkeypatched library.

calebrob6 · 2022-01-01T19:48:21Z

Oh cool, so it would be something like:

def monkeypatch_timm() --> None:
  # do the patching
  yield

Then I can add that as a parameter to any test I want to use monkeypatched timm in?

adamjstewart · 2022-01-01T19:50:09Z

Yep, plus @pytest.fixture of course. Might need to play around with the scope.

adamjstewart · 2022-01-01T20:14:57Z

Can you rebase now that dataset-specific trainers (and tests) are gone? That should hopefully show further reduction in memory usage.

calebrob6 · 2022-01-01T20:21:28Z

Now

main

Command being timed: "pytest tests/trainers/"
User time (seconds): 360.17
System time (seconds): 184.81
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.12
Maximum resident set size (kbytes): 6773776

tests/fix_memory

Command being timed: "pytest tests/trainers/"
User time (seconds): 263.35
System time (seconds): 130.11
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:23.80
Maximum resident set size (kbytes): 4314044

adamjstewart

Was going to wait until after 0.2.0 but given that those tests are consistently failing I think we need this now. Changed the base to releases/v0.2 so that these can be merged into that PR branch.

* 0.2.0 release * Fix notebooks * Fix minimal dependency tests * Fix integration tests * Fix integration tests * Try to avoid running GitHub Actions twice on release PRs * Revert "Try to avoid running GitHub Actions twice on release PRs" This reverts commit a1ac7ab. * GeoDatasets use intersection, not addition * Adding stack_samples to benchmarks * Fix zero division error in SEN12MS tests * Replaces test models with dummy models (#344) Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> * lc values must be < num_classes * updated indices tutorial with latest indices Co-authored-by: Caleb Robinson <calebrob6@gmail.com> Co-authored-by: isaaccorley <22203655+isaaccorley@users.noreply.github.com>

* 0.2.0 release * Fix notebooks * Fix minimal dependency tests * Fix integration tests * Fix integration tests * Try to avoid running GitHub Actions twice on release PRs * Revert "Try to avoid running GitHub Actions twice on release PRs" This reverts commit a1ac7ab. * GeoDatasets use intersection, not addition * Adding stack_samples to benchmarks * Fix zero division error in SEN12MS tests * Replaces test models with dummy models (microsoft#344) Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com> * lc values must be < num_classes * updated indices tutorial with latest indices Co-authored-by: Caleb Robinson <calebrob6@gmail.com> Co-authored-by: isaaccorley <22203655+isaaccorley@users.noreply.github.com>

github-actions bot added the testing Continuous integration testing label Dec 31, 2021

calebrob6 force-pushed the tests/fix_memory branch from c3e640c to fc54edd Compare January 1, 2022 02:07

isaaccorley previously approved these changes Jan 1, 2022

View reviewed changes

calebrob6 commented Jan 1, 2022

View reviewed changes

Replaces test models with dummy models

9155955

calebrob6 dismissed isaaccorley’s stale review via 9155955 January 1, 2022 20:19

calebrob6 force-pushed the tests/fix_memory branch from fc54edd to 9155955 Compare January 1, 2022 20:19

adamjstewart added this to the 0.2.1 milestone Jan 1, 2022

adamjstewart changed the base branch from main to releases/v0.2 January 1, 2022 22:57

adamjstewart approved these changes Jan 1, 2022

View reviewed changes

Merge branch 'releases/v0.2' into tests/fix_memory

ca6eb74

adamjstewart approved these changes Jan 1, 2022

View reviewed changes

adamjstewart merged commit 4d9a729 into releases/v0.2 Jan 1, 2022

adamjstewart deleted the tests/fix_memory branch January 1, 2022 22:59

adamjstewart modified the milestones: 0.2.1, 0.2.0 Jan 1, 2022

adamjstewart added utilities Utilities for working with geospatial data and removed utilities Utilities for working with geospatial data labels Jan 2, 2022

adamjstewart mentioned this pull request Feb 8, 2023

Tests requiring internet access #1088

Open

calebrob6 mentioned this pull request Feb 26, 2023

Speed up detection tests #1148

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up and reducing the memory footprint of the trainer tests #344

Speeding up and reducing the memory footprint of the trainer tests #344

calebrob6 commented Dec 31, 2021

adamjstewart commented Dec 31, 2021

isaaccorley commented Jan 1, 2022

calebrob6 commented Jan 1, 2022

calebrob6 commented Jan 1, 2022

calebrob6 commented Jan 1, 2022

isaaccorley left a comment

calebrob6 Jan 1, 2022

adamjstewart commented Jan 1, 2022

calebrob6 commented Jan 1, 2022

adamjstewart commented Jan 1, 2022

calebrob6 commented Jan 1, 2022

adamjstewart commented Jan 1, 2022

calebrob6 commented Jan 1, 2022

adamjstewart commented Jan 1, 2022

adamjstewart commented Jan 1, 2022

calebrob6 commented Jan 1, 2022 •

edited

adamjstewart left a comment

Speeding up and reducing the memory footprint of the trainer tests #344

Speeding up and reducing the memory footprint of the trainer tests #344

Conversation

calebrob6 commented Dec 31, 2021

adamjstewart commented Dec 31, 2021

isaaccorley commented Jan 1, 2022

calebrob6 commented Jan 1, 2022

calebrob6 commented Jan 1, 2022

calebrob6 commented Jan 1, 2022

isaaccorley left a comment

Choose a reason for hiding this comment

calebrob6 Jan 1, 2022

Choose a reason for hiding this comment

adamjstewart commented Jan 1, 2022

main

tests/fix_memory

calebrob6 commented Jan 1, 2022

adamjstewart commented Jan 1, 2022

calebrob6 commented Jan 1, 2022

adamjstewart commented Jan 1, 2022

calebrob6 commented Jan 1, 2022

adamjstewart commented Jan 1, 2022

adamjstewart commented Jan 1, 2022

calebrob6 commented Jan 1, 2022 • edited

main

tests/fix_memory

adamjstewart left a comment

Choose a reason for hiding this comment

calebrob6 commented Jan 1, 2022 •

edited