Skip to content

Latest commit

 

History

History
104 lines (62 loc) · 3.09 KB

samplers.rst

File metadata and controls

104 lines (62 loc) · 3.09 KB

torchgeo.samplers

.. module:: torchgeo.samplers

Samplers

Samplers are used to index a dataset, retrieving a single query at a time. For :class:`~torchgeo.datasets.NonGeoDataset`, dataset objects can be indexed with integers, and PyTorch's builtin samplers are sufficient. For :class:`~torchgeo.datasets.GeoDataset`, dataset objects require a bounding box for indexing. For this reason, we define our own :class:`GeoSampler` implementations below. These can be used like so:

from torch.utils.data import DataLoader

from torchgeo.datasets import Landsat
from torchgeo.samplers import RandomGeoSampler

dataset = Landsat(...)
sampler = RandomGeoSampler(dataset, size=256, length=10000)
dataloader = DataLoader(dataset, sampler=sampler)

This data loader will return 256x256 px images, and has an epoch length of 10,000.

Random Geo Sampler

.. autoclass:: RandomGeoSampler

Grid Geo Sampler

.. autoclass:: GridGeoSampler

Pre-chipped Geo Sampler

.. autoclass:: PreChippedGeoSampler

Batch Samplers

When working with large tile-based datasets, randomly sampling patches from each tile can be extremely time consuming. It's much more efficient to choose a tile, load it, warp it to the appropriate :term:`coordinate reference system (CRS)` and resolution, and then sample random patches from that tile to construct a mini-batch of data. For this reason, we define our own :class:`BatchGeoSampler` implementations below. These can be used like so:

from torch.utils.data import DataLoader

from torchgeo.datasets import Landsat
from torchgeo.samplers import RandomBatchGeoSampler

dataset = Landsat(...)
sampler = RandomBatchGeoSampler(dataset, size=256, batch_size=128, length=10000)
dataloader = DataLoader(dataset, batch_sampler=sampler)

This data loader will return 256x256 px images, and has a batch size of 128 and an epoch length of 10,000.

Random Batch Geo Sampler

.. autoclass:: RandomBatchGeoSampler

Base Classes

If you want to write your own custom sampler, you can extend one of these abstract base classes.

Geo Sampler

.. autoclass:: GeoSampler

Batch Geo Sampler

.. autoclass:: BatchGeoSampler

Utilities

.. autofunction:: get_random_bounding_box
.. autofunction:: tile_to_chips

Units

By default, the size parameter specifies the size of the image in pixel units. If you would instead like to specify the size in CRS units, you can change the units parameter like so:

from torch.utils.data import DataLoader

from torchgeo.datasets import Landsat
from torchgeo.samplers import RandomGeoSampler, Units

dataset = Landsat(...)
sampler = RandomGeoSampler(dataset, size=256 * 30, length=10000, units=Units.CRS)
dataloader = DataLoader(dataset, sampler=sampler)

Assuming that each pixel in the CRS is 30 m, this data loader will return 256x256 px images, and has an epoch length of 10,000.

.. autoclass:: Units