.. module:: torchgeo.samplers
Samplers are used to index a dataset, retrieving a single query at a time. For :class:`~torchgeo.datasets.NonGeoDataset`, dataset objects can be indexed with integers, and PyTorch's builtin samplers are sufficient. For :class:`~torchgeo.datasets.GeoDataset`, dataset objects require a bounding box for indexing. For this reason, we define our own :class:`GeoSampler` implementations below. These can be used like so:
from torch.utils.data import DataLoader
from torchgeo.datasets import Landsat
from torchgeo.samplers import RandomGeoSampler
dataset = Landsat(...)
sampler = RandomGeoSampler(dataset, size=256, length=10000)
dataloader = DataLoader(dataset, sampler=sampler)
This data loader will return 256x256 px images, and has an epoch length of 10,000.
.. autoclass:: RandomGeoSampler
.. autoclass:: GridGeoSampler
.. autoclass:: PreChippedGeoSampler
When working with large tile-based datasets, randomly sampling patches from each tile can be extremely time consuming. It's much more efficient to choose a tile, load it, warp it to the appropriate :term:`coordinate reference system (CRS)` and resolution, and then sample random patches from that tile to construct a mini-batch of data. For this reason, we define our own :class:`BatchGeoSampler` implementations below. These can be used like so:
from torch.utils.data import DataLoader
from torchgeo.datasets import Landsat
from torchgeo.samplers import RandomBatchGeoSampler
dataset = Landsat(...)
sampler = RandomBatchGeoSampler(dataset, size=256, batch_size=128, length=10000)
dataloader = DataLoader(dataset, batch_sampler=sampler)
This data loader will return 256x256 px images, and has a batch size of 128 and an epoch length of 10,000.
.. autoclass:: RandomBatchGeoSampler
If you want to write your own custom sampler, you can extend one of these abstract base classes.
.. autoclass:: GeoSampler
.. autoclass:: BatchGeoSampler
.. autofunction:: get_random_bounding_box
.. autofunction:: tile_to_chips
By default, the size
parameter specifies the size of the image in pixel units. If you would instead like to specify the size in CRS units, you can change the units
parameter like so:
from torch.utils.data import DataLoader
from torchgeo.datasets import Landsat
from torchgeo.samplers import RandomGeoSampler, Units
dataset = Landsat(...)
sampler = RandomGeoSampler(dataset, size=256 * 30, length=10000, units=Units.CRS)
dataloader = DataLoader(dataset, sampler=sampler)
Assuming that each pixel in the CRS is 30 m, this data loader will return 256x256 px images, and has an epoch length of 10,000.
.. autoclass:: Units