-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling from custom object detection dataset #454
Comments
Could be related to #399 |
I saw that issue and read through it. In my case, shouldn't there be shapes that survive the filtering based on my other observations above? It certainly seems that no shapes are being returned, but I don't understand why. Any ideas on how I can test this other than the methods I tried above? |
This is fixed by #467. However, I now have a question about IntersectionDataset. #467 allows me to sample from the intersection dataset without getting the error above, essentially by returning an "empty" tensor... however, if I intersect a vector and a raster dataset and sample from it, my initial expectation would be that it would only sample from patches where there was an intersection of the two datasets, which to me means every sample would contain an object from the vector dataset. Based on the behavior I am observing, and the logic in #467, it seems like maybe this isn't the case. (Maybe the intersection is not at an object level as I would expect, maybe the intersection is at an AOI level. I have looked at the code in Is there an existing way to constrain sampling to only sample from areas where I have an object in my vector dataset as well as the raster for that patch? Would this require me to subclass one of the samplers and/or a base dataset class to achieve this? Thanks again for your work on #467 @adamjstewart 🙏 |
The internal representation of a GeoDataset is an r-tree index. Each entry in the r-tree is the bounding box of a file, whether it is a raster or vector file. When you combine two GeoDatasets into an IntersectionDataset, we compute the intersection of these bounding boxes so that we only sample from regions within bounding boxes from both datasets. Your problem probably stems from the fact that not all areas within the bounding box of the file may have shapes. For example, consider the following vector file with 3 features:
If you sample from any of the areas where there is a feature (A, B, C), you'll get what you expect. However, the bounding box of the vector file also contains region d where there are no features. This is what is happening in your dataset and why things were previously crashing. One solution to this would be to store each shape as a separate entry in the r-tree index instead of storing things on a file-by-file basis. I haven't thought about this too much because we don't have a ton of VectorDatasets in TorchGeo yet, but that may work better than our current approach. This obviously wouldn't work for raster files. I would have to see how well things work when the r-tree index gets large since some datasets like CBF have millions of shapes. In the meantime, you can also specify a |
This makes sense. Thanks! The objects are sparsely distributed across the raster extent so I will probably look at implementing a class for this use case. |
Hi all. Thanks a lot for this library... I am really looking forward to integrating it into my workflow.
I am trying to sample from my own multiclass object detection dataset. I have a set of large raster images (geotiffs) and polygons in a geojson. All the source data is in
EPSG:4326
. I am using torchgeov0.2.0
.Here are my dataset classes:
Then I instantiate the dataset like this:
I am trying to sample from the image dataset, but only in spots where there is an object in my vector dataset. So, I combine the two datasets as above, since I believe that an
IntersectionDataset
is what I want.Then I set up my sampler and dataloader:
However, I keep getting this error:
I have experimented with different values for
size
but I did a sanity check by looking at a sample from just the raster dataset at this size and everything looks good. (Note that I am onv0.2.0
and I don't think theunit
option is available for specifying pixel units.)To troubleshoot, I have limited my image dataset to a single tiff and my vector dataset to a geojson with only 100 objects, some (but not all) of which fall on the image in my image dataset. I have also focused on one specific ROI on that image, a bounding box of size 1000 m^2 where I know there are objects located. I still receive the errors above.
I can index into the raster dataset using that ROI bounding box and get back the correct imagery, and I can index into the vector dataset using that ROI bounding box and get back the correct mask for my objects. Also, I can sample from the
MyImageDataset
raster dataset, but I get the same error as above when attempting to sampling from theMyObjectDataset
vector dataset.What am I doing wrong? Feel free to point me to any examples of setting up this type of dataset that I may have missed.
The text was updated successfully, but these errors were encountered: