-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul BoundingBox and ZipDataset classes #144
Conversation
04b1257
to
38ef6f4
Compare
5ec58f3
to
2e85cb5
Compare
e595f80
to
453621e
Compare
|
Done. As usual, the results vary quite a bit and are so random as to be completely useless. Before (main)$ ./benchmark.py --landsat-root /datadrive/adam/landsat/original --cdl-root /datadrive/adam/cdl/original -n 8 -v
Global seed set to 0
RandomGeoSampler:
duration: 23.199 sec
count: 128 patches
rate: 5.518 patches/sec
GridGeoSampler:
duration: 17.035 sec
count: 128 patches
rate: 7.514 patches/sec
RandomBatchGeoSampler:
duration: 47.418 sec
count: 128 patches
rate: 2.699 patches/sec
ResNet-34:
duration: 4.040 sec
count: 128 patches
rate: 31.684 patches/sec
$ ./benchmark.py --landsat-root /datadrive/adam/landsat/warped --cdl-root /datadrive/adam/cdl/warped -c -n 8 -v
Global seed set to 0
RandomGeoSampler:
duration: 112.901 sec
count: 128 patches
rate: 1.134 patches/sec
CacheInfo(hits=253, misses=834, maxsize=128, currsize=128)
GridGeoSampler:
duration: 2.377 sec
count: 128 patches
rate: 53.855 patches/sec
CacheInfo(hits=1016, misses=8, maxsize=128, currsize=8)
RandomBatchGeoSampler:
duration: 28.301 sec
count: 128 patches
rate: 4.523 patches/sec
CacheInfo(hits=967, misses=57, maxsize=128, currsize=57)
ResNet-34:
duration: 1.999 sec
count: 128 patches
rate: 64.037 patches/sec After (feature/zipdatasets)$ ./benchmark.py --landsat-root /datadrive/adam/landsat/original --cdl-root /datadrive/adam/cdl/original -n 8 -v
Global seed set to 0
RandomGeoSampler:
duration: 19.806 sec
count: 128 patches
rate: 6.463 patches/sec
GridGeoSampler:
duration: 15.489 sec
count: 128 patches
rate: 8.264 patches/sec
RandomBatchGeoSampler:
duration: 18.744 sec
count: 128 patches
rate: 6.829 patches/sec
ResNet-34:
duration: 1.996 sec
count: 128 patches
rate: 64.137 patches/sec
$ ./benchmark.py --landsat-root /datadrive/adam/landsat/warped --cdl-root /datadrive/adam/cdl/warped -c -n 8 -v
Global seed set to 0
RandomGeoSampler:
duration: 62.110 sec
count: 128 patches
rate: 2.061 patches/sec
CacheInfo(hits=253, misses=834, maxsize=128, currsize=128)
GridGeoSampler:
duration: 1.161 sec
count: 128 patches
rate: 110.212 patches/sec
CacheInfo(hits=1016, misses=8, maxsize=128, currsize=8)
RandomBatchGeoSampler:
duration: 5.916 sec
count: 128 patches
rate: 21.636 patches/sec
CacheInfo(hits=967, misses=57, maxsize=128, currsize=57)
ResNet-34:
duration: 2.228 sec
count: 128 patches
rate: 57.454 patches/sec |
I assume you mean vanilla torchvision? Both torchvision and torchgeo are all pure pytorch, we aren't doing anything hacky here. |
@calebrob6 I fleshed out the README a bit more. I think I've addressed most of your comments. Ready for another round of review. |
I love it! Really well done -- I think it really effectively communicates why torchgeo is really cool. The only thing I'd add is pictures but we should do that later. |
"the results vary quite a bit and are so random as to be completely useless." this is troubling and we'll definitely need to revisit. We should be able to get less variance between runs by running longer -- there is not anything too random going on. |
torchgeo/datasets/geo.py
Outdated
except ValueError: | ||
raise ValueError("Datasets have no overlap") | ||
# Force dataset2 to have the same CRS/res as dataset1 | ||
dataset2.crs = dataset1.crs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? (this is present in both Union and Intersection)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the two datasets have a different CRS/res, we need to ensure that they have a matching CRS/res. Otherwise, the combined R-tree index would be meaningless. I added a getter/setter to GeoDataset that updates the entire index when you try to set the CRS to a different CRS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Should we warn users that if they use a dataset in an Intersection/Union then it might have its properties changed? (I'm imagining throwing a warning if dataset1.crs != dataset2.crs
and the same with res)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I ended up using a print statement instead of a warning since warnings aren't displayed by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving on behalf of @calebrob6
* Adding a UnionDataset * Adding contains method to BoundingBox * Finishing UnionDataset * Add __contains__ method * Overhaul BoundingBox, add set arithmetic * mypy fixes * pydocstyle fixes * Ignore erroneous pydocstyle warnings * rtree only supports tuples, not BoundingBoxes * mypy fixes * Use custom collate function to handle BoundingBoxes * Add back support for Python 3.6 * Add tests for all new BoundingBox features * Rename ZipDataset to IntersectionDataset * Merge indices of IntersectionDataset, auto-convert CRS/res * Get tests to pass * Fix more tests * Test more of RasterDataset/VectorDataset directly * Increase UnionDataset test coverage * IntersectionDataset stacks tensors, UnionDataset merges tensors * Support collating dicts with differing keys, add tests * Style fixes * Samplers: compute intersection between index and ROI * Update README with example usage * GeoDataset addition is deprecated * Add note about CRS/res * More documentation for Intersection/UnionDatasets * Use collate function in tutorial * Don't use multiple workers * Fix typo * Drop support for adding GeoDatasets * Remove unused import * Add comment explaining coverage config settings * Collation function needed for benchmark script * Add more explanation to README * Correct Landsat 8 bands * Print warning when changing CRS/res Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
This PR completely overhauls the
BoundingBox
andZipDataset
classes. Notable enhancements include:BoundingBox
supports set arithmetic (intersection, union, contains)GeoDataset
supports set arithmetic (intersection, union)GeoDataset
no longer supports addition, as it is ambiguousZipDataset
has been replaced byIntersectionDataset
andUnionDataset
IntersectionDataset
andUnionDataset
merge the indices of their datasetsIntersectionDataset
andUnionDataset
work even if CRS/res don't matchIntersectionDataset
stacks tensors,UnionDataset
merges tensorsGeoSampler
ROI is respectedCloses #77
Closes #86
Closes #135
Closes #149
Closes #260