Overhaul BoundingBox and ZipDataset classes #144

calebrob6 · 2021-09-18T05:31:15Z

This PR completely overhauls the BoundingBox and ZipDataset classes. Notable enhancements include:

Closes #77
Closes #86
Closes #135
Closes #149
Closes #260

adamjstewart · 2021-11-30T17:40:35Z

This will break benchmark.py right?

benchmark.py has now been updated to use & instead of + and to use stack_samples instead of the default collation function.

adamjstewart · 2021-12-01T03:36:22Z

I think it is worth re-running the paper experiment and comparing times.

Done. As usual, the results vary quite a bit and are so random as to be completely useless.

Before (main)

$ ./benchmark.py --landsat-root /datadrive/adam/landsat/original --cdl-root /datadrive/adam/cdl/original -n 8 -v
Global seed set to 0

RandomGeoSampler:
  duration: 23.199 sec
  count: 128 patches
  rate: 5.518 patches/sec

GridGeoSampler:
  duration: 17.035 sec
  count: 128 patches
  rate: 7.514 patches/sec

RandomBatchGeoSampler:
  duration: 47.418 sec
  count: 128 patches
  rate: 2.699 patches/sec

ResNet-34:
  duration: 4.040 sec
  count: 128 patches
  rate: 31.684 patches/sec
$ ./benchmark.py --landsat-root /datadrive/adam/landsat/warped --cdl-root /datadrive/adam/cdl/warped -c -n 8 -v
Global seed set to 0

RandomGeoSampler:
  duration: 112.901 sec
  count: 128 patches
  rate: 1.134 patches/sec
CacheInfo(hits=253, misses=834, maxsize=128, currsize=128)

GridGeoSampler:
  duration: 2.377 sec
  count: 128 patches
  rate: 53.855 patches/sec
CacheInfo(hits=1016, misses=8, maxsize=128, currsize=8)

RandomBatchGeoSampler:
  duration: 28.301 sec
  count: 128 patches
  rate: 4.523 patches/sec
CacheInfo(hits=967, misses=57, maxsize=128, currsize=57)

ResNet-34:
  duration: 1.999 sec
  count: 128 patches
  rate: 64.037 patches/sec

After (feature/zipdatasets)

$ ./benchmark.py --landsat-root /datadrive/adam/landsat/original --cdl-root /datadrive/adam/cdl/original -n 8 -v
Global seed set to 0

RandomGeoSampler:
  duration: 19.806 sec
  count: 128 patches
  rate: 6.463 patches/sec

GridGeoSampler:
  duration: 15.489 sec
  count: 128 patches
  rate: 8.264 patches/sec

RandomBatchGeoSampler:
  duration: 18.744 sec
  count: 128 patches
  rate: 6.829 patches/sec

ResNet-34:
  duration: 1.996 sec
  count: 128 patches
  rate: 64.137 patches/sec
$ ./benchmark.py --landsat-root /datadrive/adam/landsat/warped --cdl-root /datadrive/adam/cdl/warped -c -n 8 -v
Global seed set to 0

RandomGeoSampler:
  duration: 62.110 sec
  count: 128 patches
  rate: 2.061 patches/sec
CacheInfo(hits=253, misses=834, maxsize=128, currsize=128)

GridGeoSampler:
  duration: 1.161 sec
  count: 128 patches
  rate: 110.212 patches/sec
CacheInfo(hits=1016, misses=8, maxsize=128, currsize=8)

RandomBatchGeoSampler:
  duration: 5.916 sec
  count: 128 patches
  rate: 21.636 patches/sec
CacheInfo(hits=967, misses=57, maxsize=128, currsize=57)

ResNet-34:
  duration: 2.228 sec
  count: 128 patches
  rate: 57.454 patches/sec

adamjstewart · 2021-12-01T03:51:47Z

I think it is probably worth having a section that explains how the geo dataset stuff is different than vanilla pytorch.

I assume you mean vanilla torchvision? Both torchvision and torchgeo are all pure pytorch, we aren't doing anything hacky here.

adamjstewart · 2021-12-01T04:42:56Z

@calebrob6 I fleshed out the README a bit more. I think I've addressed most of your comments. Ready for another round of review.

README.md

calebrob6 · 2021-12-02T07:39:40Z

@calebrob6 I fleshed out the README a bit more. I think I've addressed most of your comments. Ready for another round of review.

I love it! Really well done -- I think it really effectively communicates why torchgeo is really cool. The only thing I'd add is pictures but we should do that later.

calebrob6 · 2021-12-02T07:43:20Z

Benchmarking...

"the results vary quite a bit and are so random as to be completely useless." this is troubling and we'll definitely need to revisit. We should be able to get less variance between runs by running longer -- there is not anything too random going on.

calebrob6 · 2021-12-02T08:02:49Z

Just went through it again carefully and it looks good to me! I can't approve though because I originally opened this PR.

Plotting Landsat8 + CDL based on intersection dataset

calebrob6 · 2021-12-02T08:08:53Z

torchgeo/datasets/geo.py

-        except ValueError:
-            raise ValueError("Datasets have no overlap")
+        # Force dataset2 to have the same CRS/res as dataset1
+        dataset2.crs = dataset1.crs


Why is this necessary? (this is present in both Union and Intersection)

If the two datasets have a different CRS/res, we need to ensure that they have a matching CRS/res. Otherwise, the combined R-tree index would be meaningless. I added a getter/setter to GeoDataset that updates the entire index when you try to set the CRS to a different CRS.

Makes sense. Should we warn users that if they use a dataset in an Intersection/Union then it might have its properties changed? (I'm imagining throwing a warning if dataset1.crs != dataset2.crs and the same with res)

Done. I ended up using a print statement instead of a warning since warnings aren't displayed by default.

adamjstewart

Approving on behalf of @calebrob6

* Adding a UnionDataset * Adding contains method to BoundingBox * Finishing UnionDataset * Add __contains__ method * Overhaul BoundingBox, add set arithmetic * mypy fixes * pydocstyle fixes * Ignore erroneous pydocstyle warnings * rtree only supports tuples, not BoundingBoxes * mypy fixes * Use custom collate function to handle BoundingBoxes * Add back support for Python 3.6 * Add tests for all new BoundingBox features * Rename ZipDataset to IntersectionDataset * Merge indices of IntersectionDataset, auto-convert CRS/res * Get tests to pass * Fix more tests * Test more of RasterDataset/VectorDataset directly * Increase UnionDataset test coverage * IntersectionDataset stacks tensors, UnionDataset merges tensors * Support collating dicts with differing keys, add tests * Style fixes * Samplers: compute intersection between index and ROI * Update README with example usage * GeoDataset addition is deprecated * Add note about CRS/res * More documentation for Intersection/UnionDatasets * Use collate function in tutorial * Don't use multiple workers * Fix typo * Drop support for adding GeoDatasets * Remove unused import * Add comment explaining coverage config settings * Collation function needed for benchmark script * Add more explanation to README * Correct Landsat 8 bands * Print warning when changing CRS/res Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>

adamjstewart added the datasets Geospatial or benchmark datasets label Sep 19, 2021

adamjstewart force-pushed the feature/zipdatasets branch from 04b1257 to 38ef6f4 Compare November 13, 2021 18:55

adamjstewart changed the title ~~Adding a UnionDataset~~ Overhaul BoundingBox and ZipDataset classes Nov 13, 2021

adamjstewart mentioned this pull request Nov 15, 2021

Adding eval script #244

Merged

adamjstewart force-pushed the feature/zipdatasets branch from 5ec58f3 to 2e85cb5 Compare November 20, 2021 21:22

adamjstewart added this to the 0.2.0 milestone Nov 20, 2021

adamjstewart force-pushed the feature/zipdatasets branch from e595f80 to 453621e Compare November 23, 2021 17:02

adamjstewart mentioned this pull request Nov 23, 2021

Samplers not completely respecting RoI #260

Closed

calebrob6 and others added 22 commits November 25, 2021 11:06

Adding a UnionDataset

f4f7241

Adding contains method to BoundingBox

50afb3c

Finishing UnionDataset

997c887

Add __contains__ method

bdb81a7

Overhaul BoundingBox, add set arithmetic

25217dc

mypy fixes

8ab616f

pydocstyle fixes

42fbdc5

Ignore erroneous pydocstyle warnings

bb5b065

rtree only supports tuples, not BoundingBoxes

df85408

mypy fixes

f0ce461

Use custom collate function to handle BoundingBoxes

798454e

Add back support for Python 3.6

e6dba2b

Add tests for all new BoundingBox features

6b89586

Rename ZipDataset to IntersectionDataset

b64532d

Merge indices of IntersectionDataset, auto-convert CRS/res

b82edbd

Get tests to pass

08f27e1

Fix more tests

e6e40da

Test more of RasterDataset/VectorDataset directly

e76a40f

Increase UnionDataset test coverage

6f4ff40

IntersectionDataset stacks tensors, UnionDataset merges tensors

b0591e9

Support collating dicts with differing keys, add tests

89718fe

Style fixes

fc78b4c

Remove unused import

8c76d19

Add comment explaining coverage config settings

5cccb55

Collation function needed for benchmark script

ac2aabe

Add more explanation to README

8b5d4eb

adamjstewart reviewed Dec 1, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

Correct Landsat 8 bands

9e6c131

calebrob6 commented Dec 2, 2021

View reviewed changes

adamjstewart previously approved these changes Dec 2, 2021

View reviewed changes

Print warning when changing CRS/res

dd3f9a7

adamjstewart dismissed their stale review via dd3f9a7 December 3, 2021 17:20

adamjstewart approved these changes Dec 3, 2021

View reviewed changes

adamjstewart merged commit 5d407b7 into main Dec 3, 2021

adamjstewart deleted the feature/zipdatasets branch December 3, 2021 22:40

adamjstewart added a commit that referenced this pull request Dec 24, 2021

Mark features added in #144 as new

a8437e9

adamjstewart mentioned this pull request Dec 24, 2021

Mark features added in #144 as new #328

Merged

adamjstewart added a commit that referenced this pull request Dec 24, 2021

Mark features added in #144 as new (#328)

20459d7

adamjstewart added utilities Utilities for working with geospatial data samplers Samplers for indexing datasets documentation Improvements or additions to documentation testing Continuous integration testing and removed utilities Utilities for working with geospatial data labels Jan 2, 2022

yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023

Mark features added in microsoft#144 as new (microsoft#328)

29f6fd1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul BoundingBox and ZipDataset classes #144

Overhaul BoundingBox and ZipDataset classes #144

calebrob6 commented Sep 18, 2021 •

edited by adamjstewart

adamjstewart commented Nov 30, 2021 •

edited

adamjstewart commented Dec 1, 2021

adamjstewart commented Dec 1, 2021

adamjstewart commented Dec 1, 2021

calebrob6 commented Dec 2, 2021

calebrob6 commented Dec 2, 2021

calebrob6 commented Dec 2, 2021

calebrob6 Dec 2, 2021

adamjstewart Dec 2, 2021

calebrob6 Dec 2, 2021 •

edited

adamjstewart Dec 3, 2021

adamjstewart left a comment

Overhaul BoundingBox and ZipDataset classes #144

Overhaul BoundingBox and ZipDataset classes #144

Conversation

calebrob6 commented Sep 18, 2021 • edited by adamjstewart

adamjstewart commented Nov 30, 2021 • edited

adamjstewart commented Dec 1, 2021

Before (main)

After (feature/zipdatasets)

adamjstewart commented Dec 1, 2021

adamjstewart commented Dec 1, 2021

calebrob6 commented Dec 2, 2021

calebrob6 commented Dec 2, 2021

calebrob6 commented Dec 2, 2021

calebrob6 Dec 2, 2021

Choose a reason for hiding this comment

adamjstewart Dec 2, 2021

Choose a reason for hiding this comment

calebrob6 Dec 2, 2021 • edited

Choose a reason for hiding this comment

adamjstewart Dec 3, 2021

Choose a reason for hiding this comment

adamjstewart left a comment

Choose a reason for hiding this comment

calebrob6 commented Sep 18, 2021 •

edited by adamjstewart

adamjstewart commented Nov 30, 2021 •

edited

calebrob6 Dec 2, 2021 •

edited