Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify RasterDataset documentation for is_image and dtype #1811

Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion docs/tutorials/custom_raster_dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,11 @@
"\n",
"### `is_image`\n",
"\n",
"If your data only contains image files, as is the case with Sentinel-2, use `is_image = True`. If your data only contains segmentation masks, use `is_image = False` instead.\n",
"If your dataset only contains source data, such as image files, like Sentinel-2, or a digital surface, like a Digital Elevation Model, Digital Surface Model, Digital Terrain Model, or a raster of temperature values, use `is_image = True`. If your dataset only contains target data, such as a segmentation mask, like land use or land cover classification, use `is_image = False` instead.\n",
adamjstewart marked this conversation as resolved.
Show resolved Hide resolved
"\n",
"### `dtype`\n",
"\n",
"Defaults to float32 for `is_image == True` and long for `is_image == False`. This is what is usually wanted for 99% of datasets but can be overridden for pixel-wise regression masks (where the target should be float32). Uint16 and uint32 are automatically cast to int32 and int64, respectively, because numpy supports the former but torch does not.\n",
adamjstewart marked this conversation as resolved.
Show resolved Hide resolved
"\n",
"### `separate_files`\n",
"\n",
Expand Down
16 changes: 15 additions & 1 deletion torchgeo/datasets/geo.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,10 @@ class GeoDataset(Dataset[dict[str, Any]], abc.ABC):
(e.g. Landsat and CDL)
* Combine datasets for multiple image sources for multimodal learning or data fusion
(e.g. Landsat and Sentinel)
* Combine image and digital surface (e.g., elevation, temperature,
pressure) and sample from both simultaneously (e.g. Sentinel-2 and an Aster
Global DEM tile)


These combinations require that all queries are present in *both* datasets,
and can be combined using an :class:`IntersectionDataset`:
Expand Down Expand Up @@ -342,7 +346,11 @@ class RasterDataset(GeoDataset):
#: ``start`` and ``stop`` groups.
date_format = "%Y%m%d"

#: True if dataset contains imagery, False if dataset contains mask
#: True if the dataset contains source data, such as imagery. False if the dataset
#: contains target data, such as a mask. This is the same as Kornia. When multiple
adamjstewart marked this conversation as resolved.
Show resolved Hide resolved
#: datasets with different keys are combined and the same key is used for multiple
#: datasets, for example 2 "image" and 1 "mask", the channels will be stacked so
#: that there's still a single value for that key.
adamjstewart marked this conversation as resolved.
Show resolved Hide resolved
is_image = True

#: True if data is stored in a separate file for each band, else False.
Expand All @@ -361,6 +369,12 @@ class RasterDataset(GeoDataset):
def dtype(self) -> torch.dtype:
"""The dtype of the dataset (overrides the dtype of the data file via a cast).

Defaults to float32 for is_image = True and long for is_image = False. This is
what we usually want for 99% of datasets but can be overridden for pixel-wise
regression masks (where it should be float32). Uint16 and uint32 are
automatically cast to int32 and int64, respectively, because numpy supports
the former but torch does not.
adamjstewart marked this conversation as resolved.
Show resolved Hide resolved

Returns:
the dtype of the dataset

Expand Down
Loading