Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCMerced: fix image shape bug #1238

Merged
merged 30 commits into from
Apr 16, 2023

Conversation

isaaccorley
Copy link
Collaborator

Previously we performed some preprocessing transforms for each individual sample. In #992 we changed this to perform all transforms on a batch.

However, some of the UCMerced images vary in size, e.g. 247, 254, 248. This causes an issue when loading batches of the imagery because they aren't the same shape and aren't being resized. This PR adds a default resize transform so that if a user doesn't specify the transform as a kwarg, the dataset will still work with a dataloader.

@isaaccorley isaaccorley self-assigned this Apr 12, 2023
@github-actions github-actions bot added datamodules PyTorch Lightning datamodules testing Continuous integration testing labels Apr 12, 2023
@calebrob6
Copy link
Member

literally just ran into this

However, some of the UCMerced images vary in size, e.g. 247, 254, 248.
253

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the fake data for UC Merced to reflect the fact that the image size isn't consistent? This will let us test things properly.

tests/datamodules/test_ucmerced.py Outdated Show resolved Hide resolved
@adamjstewart adamjstewart added this to the 0.4.2 milestone Apr 12, 2023
@github-actions github-actions bot removed the testing Continuous integration testing label Apr 13, 2023
@isaaccorley
Copy link
Collaborator Author

Something seems to be wrong with the tests because the test data does vary in size already

agricultural02.tif (247, 247, 3)
agricultural00.tif (256, 256, 3)
agricultural01.tif (249, 256, 3)

@isaaccorley
Copy link
Collaborator Author

Ok I figured it out. It's because in the datamodule inside the test config we are using a batch size = 1, so it doesn't have to combine 2 images of different shapes into a batch unless batch_size > 2

@github-actions github-actions bot added the testing Continuous integration testing label Apr 13, 2023
@adamjstewart
Copy link
Collaborator

Is it worth resizing in the dataset instead of the datamodule?

@isaaccorley
Copy link
Collaborator Author

isaaccorley commented Apr 13, 2023

Is it worth resizing in the dataset instead of the datamodule?

Good point. I think keeping this in the datamodule is better because it doesn't make any assumptions about how the user wants to modify the dataset.

@adamjstewart
Copy link
Collaborator

I guess the benefit of doing it in the dataset is that users who use the dataset without a datamodule won't also have to do this. Are all the images approximately the same size? Looking at our existing code, we have 2 datasets (cyclone, seco) and 0 datamodules that do resizing, unless I miscounted.

@isaaccorley
Copy link
Collaborator Author

Yes they are all within a few pixels from 256x256. I'm good either way so I'll leave it up to you, just let me know so I can implement it.

@calebrob6
Copy link
Member

I like it in the dataset so users don't have to fool around with it. Without it someone (me) is going to do:

ds = UCMerced(...)
dl = DataLoader(ds, ...)

for batch in dl:
    ...

then get the error about size mismatch, groan, go lookup how to do resize and pass that as a transform, and continue with whatever they were doing.

@github-actions github-actions bot added the datasets Geospatial or benchmark datasets label Apr 14, 2023
@isaaccorley isaaccorley reopened this Apr 15, 2023
@isaaccorley
Copy link
Collaborator Author

Something wrong with the pyvista docs

@isaaccorley isaaccorley reopened this Apr 15, 2023
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to rebase now that we are using Python 3.9+ type hints

torchgeo/datasets/ucmerced.py Outdated Show resolved Hide resolved
@adamjstewart adamjstewart enabled auto-merge (squash) April 16, 2023 23:45
@adamjstewart adamjstewart merged commit 83a978a into microsoft:main Apr 16, 2023
17 checks passed
@isaaccorley isaaccorley deleted the datasets/ucmerced-size-bug branch April 22, 2023 21:46
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* fix resize bug in ucmerced

* remove unused test

* update tests

* move resize to dataset

* fix resize bug in ucmerced

* remove unused test

* update tests

* remove changes to datamodule

* Update torchgeo/datasets/ucmerced.py

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

* fix docstring

* fix resize bug in ucmerced

* remove unused test

* update tests

* move resize to dataset

* fix resize bug in ucmerced

* remove unused test

* update tests

* remove changes to datamodule

* Update torchgeo/datasets/ucmerced.py

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

* fix docstring

* update docstring

* update docstring x3

* remove Dict

* remove Tuple

* Fix base class docs

* Grammar fix

---------

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
@adamjstewart adamjstewart modified the milestones: 0.4.2, 0.5.0 Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants