Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error training model - band regex? #466

Closed
graceebc9 opened this issue Mar 16, 2022 · 5 comments
Closed

Error training model - band regex? #466

graceebc9 opened this issue Mar 16, 2022 · 5 comments
Labels
datasets Geospatial or benchmark datasets

Comments

@graceebc9
Copy link

graceebc9 commented Mar 16, 2022

Hello!

I've got a custom datamodule for Landcover / Modis / Sentinel data. The data module works fine when called directly (I can plot the 1 mask 4 channels) by sampling with a dataloader.

Issue comes when trying to run this datamodule with a semantic segmentation binary task - there's an issue with the bands within the geo.py raster dataset - ' no such group' . I've looked at the source code in datasets/geo.py but I'm not clear how to solve.

It seems to be some kind of issue with the band and the regex - we seem to match the date ok, but potentially fail with the band. However I copied the form of the band regex from the torchgeo sentinel2class.

from torchgeo.datasets import Sentinel2

class Sentinel2(Sentinel2):
    filename_glob = '*B03.tif'
    filename_regex = '^(?P<date>\d{6})\S{4}(?P<band>B[018][\dA]).tif$'
    date_format = '%Y%m'
    all_bands = ['B03', 'B08', 'B11']
def main():

    datamodule = MODISJDLandcoverSimpleDataModule(
      modis_root_dir="MODIS/",
      landcover_root_dir="landcover/Classified/",
      sentinel_root_dir ='sentinel/',
      patch_size=250,
      batch_size=10,
      length=10,
      num_workers=0,
      one_hot_encode=False,
      balance_samples=False,
      burn_prop = 0, 
      grid_sampler = False,
      units = Units.PIXELS
)

    # ignore_zeros=True corresponds to ignoring the background class
    # in metrics evaluation
    model = BinarySemanticSegmentationTask(
        segmentation_model="unet",
        encoder_name="resnet18",
        encoder_weights=None, #"imagenet",
        in_channels=4,
        num_filters=64,
        num_classes=2,
        loss="jaccard",
        # tversky_alpha=0.7,
        # tversky_beta=0.3,
        # tversky_gamma=1.0,
        learning_rate=0.1,
        ignore_zeros=False,
        learning_rate_schedule_patience=5,
    )

    trainer = Trainer(gpus=1, fast_dev_run=True)


    # this is used when automatically finding the learning rate
    trainer.tune(
        model, datamodule
    )  
    trainer.fit(model, datamodule)


if __name__ == "__main__":

    # set random seed for reproducibility
    pl.seed_everything(0)

    # TRAIN
    main()
Global seed set to 0
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Running in fast_dev_run mode: will run a full train, val, test and prediction loop using 1 batch(es).
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type             | Params
---------------------------------------------------
0 | model         | Unet             | 14.3 M
1 | loss          | JaccardLoss      | 0     
2 | train_metrics | MetricCollection | 0     
3 | val_metrics   | MetricCollection | 0     
4 | test_metrics  | MetricCollection | 0     
---------------------------------------------------
14.3 M    Trainable params
0         Non-trainable params
14.3 M    Total params
57.325    Total estimated model params size (MB)
/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/data_loading.py:433: UserWarning: The number of training samples (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  f"The number of training samples ({self.num_training_batches}) is smaller than the logging interval"
Epoch 0: 0%
0/2 [00:00<?, ?it/s]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[<ipython-input-33-74529b3a24c6>](https://localhost:8080/#) in <module>()
     72 
     73     # TRAIN
---> 74     main()

26 frames
[/usr/local/lib/python3.7/dist-packages/torchgeo/datasets/geo.py](https://localhost:8080/#) in __getitem__(self, query)
    415                     if match:
    416                         if "date" in match.groupdict():
--> 417                             start = match.start("band")
    418                             end = match.end("band")
    419                             filename = filename[:start] + band + filename[end:]

IndexError: no such group
@graceebc9
Copy link
Author

format of name of files is '201808_21_B03.tif'

@adamjstewart
Copy link
Collaborator

This one took me a long time to figure out. You need to use a raw string (filename_regex = r'...') or replace all \ with \\.

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Mar 16, 2022
@graceebc9
Copy link
Author

graceebc9 commented Mar 16, 2022

Thank you very much!!!!!! That solved that but now I'm getting the error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-62-d2a0032e8c00>](https://localhost:8080/#) in <module>()
     70 
     71     # TRAIN
---> 72     main()

39 frames
[/usr/local/lib/python3.7/dist-packages/segmentation_models_pytorch/unet/decoder.py](https://localhost:8080/#) in forward(self, x, skip)
     36         x = F.interpolate(x, scale_factor=2, mode="nearest")
     37         if skip is not None:
---> 38             x = torch.cat([x, skip], dim=1)
     39             x = self.attention1(x)
     40         x = self.conv1(x)

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list.

@adamjstewart
Copy link
Collaborator

This seems related to the following bug reports. Basically, the UNet that comes with SMP requires images with patch_size divisible by 32. Can you try switching from 250 to 256 and see if that solves your issue?

@graceebc9
Copy link
Author

yes that worked, thanks so much!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

No branches or pull requests

2 participants