-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new ChessboardGeoSampler #2067
base: main
Are you sure you want to change the base?
Conversation
@microsoft-github-policy-service agree [company="{Collecte Localisation Satellites (CLS)}"] |
@microsoft-github-policy-service agree company="Collecte Localisation Satellites (CLS)" |
How is this different from |
It samples patches in an alternating pattern (like a chessboard), reducing overlap. The |
This is definitely useful to have and used in geoML papers, e.g. https://www.nature.com/articles/s41467-021-24638-z |
@calebrob6 that example is for train-test splitting, not for sampling. We already have random_grid_cell_assignment for that use case. |
) * Datasets: improve lazy import error msg for missing deps * Add type annotation * Use lazy imports throughout datasets * Fix support for older scipy * Fix support for older scipy * CI: test optional datasets on every commit * Update minversion and fix tests * Double quotes preferred over single quotes * Undo for now * Fast-fail during dataset initialization * Remove extraneous space * MissingDependencyError -> DependencyNotFoundError
@killian31 could you plot some example patches and overlay over the original full image? The math seems to math but just want to make sure we aren't missing an increment somewhere. |
It's still not clear what the purpose or advantage of this sampler is. We already have:
From what you've described so far, it sounds like you're trying to split your dataset. Does Of course, we could use the sampler as an alternative to the splitter that gets a similar result. Initially we had no splitters, only samplers. But I think we're trying to move in a direction where splitters replace most of the features of samplers, as splitters are more familiar to torchvision users than samplers are. I'm trying to balance making things intuitive and having "more than one way to do things" with maintaining a limited amount of features without code duplication. |
Does this need to be its own class? Can this be achieved with the GridGeoSampler but by setting the stride to be equal to the patch size * 2? |
You would need 2 GridGeoSamplers offset by 1 row/col to achieve the same thing. But again, I don't know why you would want to sample in this way unless you were trying to achieve a checkerboard dataset split, which we already support via other mechanisms. |
Indeed I do use it at the splitting stage: I split my full ROI (the area inside the whole dataset's boundaries) into 2 ROIs, one for training and one for both validation and test. Then I define a @Bencpr do you have any clearer explanation of our needs? |
Thanks for the explanation! Yes, I think
Of course, if this sounds too difficult, we could save that feature for version 2. |
Add
ChessboardGeoSampler
to TorchGeoDescription
This Pull Request introduces the
ChessboardGeoSampler
. TheChessboardGeoSampler
samples patches in a chessboard-like pattern, minimizing overlap and redundant computation during evaluation.Features
Tests
tests/samplers/test_single.py
:Example Usage
Example of train-val-test split using
ChessboardGeoSampler
(blue tiles are training samples fromRandomGeoSampler
, orange and purple are validation and test samples fromChessboardGeoSampler
):