Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test fewer models in trainers to avoid exceeding RAM #1377

Merged
merged 5 commits into from
May 29, 2023

Conversation

calebrob6
Copy link
Member

We don't need to test the BYOL trainer with every set of pretrained weights (we don't actually need to involve pretrained weights at all).

@github-actions github-actions bot added the testing Continuous integration testing label May 28, 2023
@adamjstewart
Copy link
Collaborator

From #1376:

We might be able to only load 1 ResNet, but we'll still have to load all of them on the release branch, so we can't just avoid the problem.

We want to make sure our weights actually load in a model correctly. All of them. Every time someone adds a new one.

@calebrob6
Copy link
Member Author

calebrob6 commented May 28, 2023

We certainly don't need to check the cross product of pretrained weights with every trainer as there will be a lot of duplicate work (e.g. we have several different sets of ResNet50 weights). We need to check to make sure the pretrained weights are valid, then separately need to check to see if our trainers work with resnets, vits, etc.

We want to make sure our weights actually load in a model correctly.

Right, but the BYOL trainer is not the right place for this. We can check that the weights load in the model correctly with:

from torchgeo.models import get_model, get_model_weights, list_models
for model_name in list_models():
    for weights in get_model_weights(model_name):
        model = get_model(model_name, weights=weights)

@calebrob6
Copy link
Member Author

However, I would also say we should change strict=True in the model factory type methods -- https://github.com/microsoft/torchgeo/blob/main/torchgeo/models/resnet.py#L223

@adamjstewart
Copy link
Collaborator

Can you make the same change to the other trainers?

@adamjstewart adamjstewart added this to the 0.4.2 milestone May 29, 2023
@adamjstewart adamjstewart changed the title Alternative fix for test RAM blowing up Test fewer models in trainers to avoid exceeding RAM May 29, 2023
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only does this decrease memory usage, it also shaves off 1/3 of the time our tests take to run!

It's unclear if this is a solution or if we're just kicking the can. We could find that if we double the number of models, the model tests start to fail instead of the trainer tests. But I guess we'll find out when we add all of our Landsat weights.

@adamjstewart adamjstewart merged commit 108c94b into main May 29, 2023
18 checks passed
@adamjstewart adamjstewart deleted the reduce_byol_tests branch May 29, 2023 16:28
@isaaccorley
Copy link
Collaborator

If we are using dummy models, does it even make sense to test that all the model weights work in a trainer other than just testing that the weights load properly into the specified backbone?

@adamjstewart
Copy link
Collaborator

At a bare minimum, we need these tests for test coverage. But we also want to make sure that enums, strings, and paths all work correctly

@adamjstewart adamjstewart modified the milestones: 0.4.2, 0.5.0 Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants