You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A couple of issues that may affect the usage of GridGeoSampler for benchmarking.
Overlapping patches
Back in #630, we modified GridGeoSampler to ensure that every part of the image is sampled from, even if the height/width of the image is not a multiple of the stride. At the time, I decided that we should adjust the stride of the last row/col in order to avoid sampling outside of the bounds of the image. In hindsight, I think this was a mistake.
The problem is that we end up with the last row/col overlapping with the second-to-last row/col, resulting in some areas being double counted when computing performance metrics. It also makes stitching together prediction patches unnecessarily complicated.
I think we should modify GridGeoSampler to avoid adjusting stride and instead sample outside the bounds of the image. I believe rasterio will simply return nodata pixels for areas outside of the image. @remtav are you okay with this solution? I believe this was actually the first idea you implemented, apologies for pushing that PR in the wrong direction.
Technically this issue also occurs when multiple images in the dataset intersect, but this is harder to mitigate without storing all predictions in one giant tensor and computing performance only on the final predicted mask. I think we would run out of memory very quickly.
ignore_index weighting
This one may also affect training for other GeoSamplers as well, although I'm most concerned about evaluation.
When sampling from large tiles, many patches will contain partial or complete nodata pixels. TorchMetrics allows us to ignore these areas using ignore_index. However, it's unclear to me if all patches are weighted equally when computing the final performance metrics with Lightning. Ideally, the overall reported accuracy would match regardless of whether we chip up the image into small patches or if we compute accuracy on the entire image/mask in one go.
We could peruse the internals of TorchMetrics and Lightning, but I think it's actually easier to construct a toy example to determine whether or not this issue occurs. Consider an image with width 200 and height 100. Let the first 99 columns of the ground truth mask be 0, the 100th column be 1, and the last 100 columns be 2. Let the predicted mask be a tensor of all 1s. If we use a GridGeoSampler with size 100 and stride 100, and let ignore_index=0, the correct performance should be ~1%. If the actual reported performance is 50%, we'll know we have an issue.
The text was updated successfully, but these errors were encountered:
A couple of issues that may affect the usage of GridGeoSampler for benchmarking.
Overlapping patches
Back in #630, we modified GridGeoSampler to ensure that every part of the image is sampled from, even if the height/width of the image is not a multiple of the stride. At the time, I decided that we should adjust the stride of the last row/col in order to avoid sampling outside of the bounds of the image. In hindsight, I think this was a mistake.
The problem is that we end up with the last row/col overlapping with the second-to-last row/col, resulting in some areas being double counted when computing performance metrics. It also makes stitching together prediction patches unnecessarily complicated.
I think we should modify GridGeoSampler to avoid adjusting stride and instead sample outside the bounds of the image. I believe rasterio will simply return nodata pixels for areas outside of the image. @remtav are you okay with this solution? I believe this was actually the first idea you implemented, apologies for pushing that PR in the wrong direction.
Technically this issue also occurs when multiple images in the dataset intersect, but this is harder to mitigate without storing all predictions in one giant tensor and computing performance only on the final predicted mask. I think we would run out of memory very quickly.
ignore_index
weightingThis one may also affect training for other GeoSamplers as well, although I'm most concerned about evaluation.
When sampling from large tiles, many patches will contain partial or complete nodata pixels. TorchMetrics allows us to ignore these areas using
ignore_index
. However, it's unclear to me if all patches are weighted equally when computing the final performance metrics with Lightning. Ideally, the overall reported accuracy would match regardless of whether we chip up the image into small patches or if we compute accuracy on the entire image/mask in one go.We could peruse the internals of TorchMetrics and Lightning, but I think it's actually easier to construct a toy example to determine whether or not this issue occurs. Consider an image with width 200 and height 100. Let the first 99 columns of the ground truth mask be 0, the 100th column be 1, and the last 100 columns be 2. Let the predicted mask be a tensor of all 1s. If we use a GridGeoSampler with size 100 and stride 100, and let
ignore_index=0
, the correct performance should be ~1%. If the actual reported performance is 50%, we'll know we have an issue.The text was updated successfully, but these errors were encountered: