Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vaihingen datamodule #851

Closed
nilsleh opened this issue Oct 15, 2022 · 6 comments · Fixed by #853
Closed

Vaihingen datamodule #851

nilsleh opened this issue Oct 15, 2022 · 6 comments · Fixed by #853
Labels
datamodules PyTorch Lightning datamodules
Milestone

Comments

@nilsleh
Copy link
Collaborator

nilsleh commented Oct 15, 2022

Description

I would expect that with a existing Vaihingen datamodule, I only need to define a segmentation task and a pl.Trainer to train a model on this dataset (but maybe this expectation is wrong). However, the Vaihingen dataset has variable sized image dimensions, and therefore one cannot specify a batch_size>1 because otherwise tensors cannot be stacked. So either there should be a collate function for the dataloaders in the datamodule or some comment in the documentation because the default batch_size of the datamodule is 64.

Steps to reproduce

from torchgeo.datamodules import Vaihingen2DDataModule
from torchgeo.trainers import SemanticSegmentationTask
import pytorch_lightning as pl

datamodule = Vaihingen2DDataModule(root="./data/Vaihingen")

task = SemanticSegmentationTask(
    segmentation_model="unet",
    encoder_name="resnet18",
    encoder_weights="imagenet",
    in_channels=3,
    num_classes=6,
    loss="jaccard",
    ignore_index=None,
    learning_rate=0.001,
    learning_rate_schedule_patience=5
)

trainer = pl.Trainer(
    fast_dev_run=True,
    enable_progress_bar=False
)

trainer.fit(
    model=task,
    datamodule=datamodule
)

Version

0.4.0.dev0

@adamjstewart
Copy link
Collaborator

My vote is for data augmentation that pads or crops to a consistent size. How much do image sizes vary? A lot of other datasets have image sizes that vary by ± 1 px, so those are much easier to take care of.

@adamjstewart adamjstewart added the datamodules PyTorch Lightning datamodules label Oct 15, 2022
@calebrob6
Copy link
Member

There are 16 samples in the training dataset and they are more like "tiles" or "scenes". I think the datamodule should randomly sample fixed size crops from them.

The sizes:

torch.Size([3, 2569, 1919])
torch.Size([3, 2566, 1893])
torch.Size([3, 2558, 2818])
torch.Size([3, 2565, 1919])
torch.Size([3, 1281, 2336])
torch.Size([3, 2546, 1903])
torch.Size([3, 2546, 1903])
torch.Size([3, 1783, 2995])
torch.Size([3, 2567, 1917])
torch.Size([3, 3007, 2006])
torch.Size([3, 2563, 1934])
torch.Size([3, 2555, 1980])
torch.Size([3, 2555, 1388])
torch.Size([3, 1995, 1996])
torch.Size([3, 2557, 1887])
torch.Size([3, 2557, 1887])

@adamjstewart
Copy link
Collaborator

In that case, we should convert Vaihingen2D from a NonGeoDataset to a GeoDataset and use a GeoSampler like we do in NAIPChesapeakeDataModule.

@calebrob6
Copy link
Member

They aren't georeferenced

@adamjstewart
Copy link
Collaborator

Guess we can do something like this then: https://kornia-tutorials.readthedocs.io/en/latest/geometry_generate_patch.html

@isaaccorley
Copy link
Collaborator

OSCDDataModule is a good reference. It also has variable sized images and we take random crops during training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datamodules PyTorch Lightning datamodules
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants