Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the EnviroAtlas dataset #364

Merged
merged 8 commits into from
Jan 28, 2022
Merged

Add the EnviroAtlas dataset #364

merged 8 commits into from
Jan 28, 2022

Conversation

calebrob6
Copy link
Member

Adds one of the datasets used in https://openreview.net/forum?id=AEa_UepnMDX

@calebrob6 calebrob6 marked this pull request as ready for review January 24, 2022 21:49
@github-actions github-actions bot added the datasets Geospatial or benchmark datasets label Jan 24, 2022
@adamjstewart adamjstewart marked this pull request as draft January 24, 2022 22:24
@github-actions github-actions bot added documentation Improvements or additions to documentation testing Continuous integration testing labels Jan 25, 2022
@calebrob6 calebrob6 marked this pull request as ready for review January 25, 2022 21:39
@calebrob6
Copy link
Member Author

Note my data.py is a very rough stab at something that can digest patches/tiles from the original dataset and replace them with fake data. I think a better way to do this would be to save a JSON file or something that had pairs of (filename in the dataset, the current metadata I have pasted in the script).

@calebrob6
Copy link
Member Author

image

@calebrob6
Copy link
Member Author

image

@calebrob6 calebrob6 added this to the 0.3.0 milestone Jan 25, 2022
@calebrob6 calebrob6 requested review from isaaccorley and adamjstewart and removed request for isaaccorley January 26, 2022 23:47
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dataset has the same problems as ChesapeakeCVPR. If we find a way to improve sampling speed for RasterDataset, these datasets won't benefit. Also, I'm not sure if these datasets can be combined with any other GeoDataset, which violates a lot of the fundamental assumptions we make about GeoDataset instances and nullifies the whole point of using GeoDataset instead of VisionDataset. I'm fine with merging this as is because we already have a similarly broken dataset in TorchGeo, but we need to seriously consider how to handle these kinds of datasets in the future.

tests/data/enviroatlas/data.py Outdated Show resolved Hide resolved
torchgeo/datasets/enviroatlas.py Outdated Show resolved Hide resolved
torchgeo/datasets/enviroatlas.py Outdated Show resolved Hide resolved
torchgeo/datasets/enviroatlas.py Show resolved Hide resolved
@calebrob6
Copy link
Member Author

calebrob6 commented Jan 27, 2022

If we find a way to improve sampling speed for RasterDataset, these datasets won't benefit.

This dataset extends GeoDataset, not RasterDataset, so this much is obvious. Whether it should extend RasterDataset is a different question all together.

Also, I'm not sure if these datasets can be combined with any other GeoDataset

Why not? Does it not correctly extend GeoDataset? If there are some assumptions that GeoDataset is making then we need to enforce them.

similarly broken dataset

Do you have anything else in mind besides the above points?

@adamjstewart
Copy link
Collaborator

Why not? Does it not correctly extend GeoDataset? If there are some assumptions that GeoDataset is making then we need to enforce them.

If I'm reading things correctly, this dataset doesn't doesn't warp things into a common CRS, correct? This is incompatible with other GeoDatasets. GeoDataset makes the assumption that the index and the data returned by __getitem__ are in a single CRS. If we need to change that in GeoDataset, we can, but it requires a lot of thought (what to do when bbox in different CRS overlap).

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistencies between GeoDataset and ChesapeakeCVPR/EnviroAtlas and refactoring so that RasterDataset code can be shared with these datasets can be done in a follow-up PR, I vote we merge this if everything else looks good.

@calebrob6
Copy link
Member Author

The index is in a single CRS (EPSG:3857), but the problem is that __getitem__ will return imagery from the warped extent of a query bounding box in EPSG:3857. E.g. you query with a square bounding box, this bounding box will be warped into some off-axis rectangle (in the CRS of some tile in the dataset -- not EPSG:3857), then the rectangular bounding box of that will be used to get data.

The assumption that this (and ChesapeakeCVPR) breaks is that __getitem__ should return data warped to the CRS of the dataset. It would be relatively simple to change these datasets to follow that (although it would slow down sampling).

@calebrob6
Copy link
Member Author

Any thoughts on how to clean up data.py / do it differently (or do we not care too much about those)?

@calebrob6 calebrob6 merged commit d4c8a4b into main Jan 28, 2022
@calebrob6 calebrob6 deleted the dataset/enviroatlas branch January 28, 2022 03:16
@adamjstewart
Copy link
Collaborator

I'm not worried about data.py at the moment. Once we have more examples we could try to standardize things, but since it isn't installed with the library I don't want to be very strict about it.

@adamjstewart adamjstewart mentioned this pull request Jul 11, 2022
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* Add dataset

* Add dataset to docs

* Tests for enviroatlas

* Test coverage

* Added numpy type

* Added plotting

* Code review changes

* Propagating code review comments to Chesapeake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants