Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plot method and data.py to NAIP #407

Merged
merged 11 commits into from
Apr 9, 2022
Merged

Conversation

nilsleh
Copy link
Collaborator

@nilsleh nilsleh commented Feb 16, 2022

Since RasterDatasets should have their own plot method per #253, this PR adds a plot method as well as a data.py file to the NAIP dataset.

@github-actions github-actions bot added datasets Geospatial or benchmark datasets testing Continuous integration testing labels Feb 16, 2022
@nilsleh nilsleh closed this Feb 17, 2022
@nilsleh nilsleh deleted the naipPlot branch February 17, 2022 14:25
@nilsleh nilsleh restored the naipPlot branch February 17, 2022 14:31
@nilsleh nilsleh reopened this Feb 17, 2022
@adamjstewart adamjstewart added this to the 0.3.0 milestone Feb 18, 2022
calebrob6
calebrob6 previously approved these changes Feb 20, 2022
@calebrob6 calebrob6 closed this Feb 20, 2022
@calebrob6 calebrob6 reopened this Feb 20, 2022
@calebrob6
Copy link
Member

It looks like this is missing the data.py file, and the tests are not passing with the new test NAIP data

@calebrob6
Copy link
Member

Maybe we can cut this down to 1PB with some optimizations?

image

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be something from with the transform that's causing the file size to blow up to PBs?

tests/data/naip/data.py Outdated Show resolved Hide resolved
torchgeo/datasets/naip.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to still be allocating a ~1 PB numpy array somewhere, causing the tests to fail.

tests/datasets/test_naip.py Show resolved Hide resolved
@adamjstewart adamjstewart mentioned this pull request Mar 16, 2022
19 tasks
@adamjstewart
Copy link
Collaborator

Has anyone figured out what is going on with this giant numpy array?

@nilsleh
Copy link
Collaborator Author

nilsleh commented Apr 3, 2022

Has anyone figured out what is going on with this giant numpy array?

Could it have to do with the size of the bounds that is passed to index the dataset?

@adamjstewart
Copy link
Collaborator

That has to be it but I don't see anything fishy about the data. Maybe try printing the data length and dataset.bounds?

@adamjstewart
Copy link
Collaborator

P.S. To see the output of a print statement in a test file you'll need to use the -s flag:

$ pytest -s tests/datasets/test_geo.py::TestRasterDataset::test_getitem_single_file

@nilsleh
Copy link
Collaborator Author

nilsleh commented Apr 4, 2022

That has to be it but I don't see anything fishy about the data. Maybe try printing the data length and dataset.bounds?

when running pytest on my ubuntu machine, I get:
BoundingBox(minx=0.0, maxx=64.0, miny=-63.0, maxy=1.0, mint=1541285999.999999, maxt=1559771999.999999) torch.Size([4, 64, 64])

@adamjstewart
Copy link
Collaborator

It's not the test_naip.py tests that are failing, it's the test_geo.py tests.

@nilsleh
Copy link
Collaborator Author

nilsleh commented Apr 4, 2022

it's the test_geo.py tests.

If I uncomment, the hardcorded crs of crs = CRS.from_epsg(3005) in the test_geo.py file, then the array is normal sized and the tests pass.

@adamjstewart
Copy link
Collaborator

The new test failure is because our NAIP data and Chesapeake data have no overlap.

@nilsleh
Copy link
Collaborator Author

nilsleh commented Apr 5, 2022

The new test failure is because our NAIP data and Chesapeake data have no overlap.

TJust for my understanding: The error I am seeing is that for the IntersectionDataset of naip & chesapeake, the Bounding box is invalid: 'minx=1.7976931348623157e+308' > 'maxx=-1.7976931348623157e+308'. Is that what happens when two datasets have no overlap?

I am not sure how to best proceed. If one now tunes the NAIP data.py to work with the Chesapeake bounds specifically, there might come a new test along that also needs to overlap with NAIP but then can't. What would you suggest to do?

@adamjstewart
Copy link
Collaborator

Correct, there's a bug in rtree where an empty index actually has invalid bounds. That's the error you're seeing. If you print the length of the dataset, you'll find that it's zero.

I think it's fine to make sure that the bounding box for NAIP and Chesapeake match. The actual location of these datasets on earth doesn't matter, they just need to overlap. I don't envision a test being added that asserts that these particular datasets don't overlap.

tests/data/naip/data.py Outdated Show resolved Hide resolved
tests/datasets/test_geo.py Outdated Show resolved Hide resolved
tests/datasets/test_naip.py Outdated Show resolved Hide resolved
tests/datasets/test_geo.py Outdated Show resolved Hide resolved
@adamjstewart adamjstewart reopened this Apr 8, 2022
@adamjstewart adamjstewart merged commit 1a35d42 into microsoft:main Apr 9, 2022
remtav pushed a commit to remtav/torchgeo that referenced this pull request May 26, 2022
* add plot method and data.py

* add version

* file typo

* forgot data.py

* add version change and larger image size

* requested changes

* test with print

* test geo

* change data to match chesapeake

* fix crs test
@adamjstewart adamjstewart mentioned this pull request Jul 11, 2022
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* add plot method and data.py

* add version

* file typo

* forgot data.py

* add version change and larger image size

* requested changes

* test with print

* test geo

* change data to match chesapeake

* fix crs test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants