Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify RasterDataset documentation for is_image and dtype #1811

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/api/geo_datasets.csv
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@ Dataset,Type,Source,License,Size (px),Resolution (m)
`Aboveground Woody Biomass`_,Masks,"Landsat, LiDAR","CC-BY-4.0","40,000x40,000",30
`AgriFieldNet`_,"Imagery, Masks",Sentinel-2,"CC-BY-4.0","256x256",10
`Airphen`_,Imagery,Airphen,-,"1,280x960",0.047--0.09
`Aster Global DEM`_,Masks,Aster,"public domain","3,601x3,601",30
`Aster Global DEM`_,DEM,Aster,"public domain","3,601x3,601",30
`Canadian Building Footprints`_,Geometries,Bing Imagery,"ODbL-1.0",-,-
`Chesapeake Land Cover`_,"Imagery, Masks",NAIP,"CC-BY-4.0",-,1
`Global Mangrove Distribution`_,Masks,"Remote Sensing, In Situ Measurements","public domain",-,3
`Cropland Data Layer`_,Masks,Landsat,"public domain",-,30
`EDDMapS`_,Points,Citizen Scientists,-,-,-
`EnviroAtlas`_,"Imagery, Masks","NAIP, NLCD, OpenStreetMap","CC-BY-4.0",-,1
`Esri2020`_,Masks,Sentinel-2,"CC-BY-4.0",-,10
`EU-DEM`_,Masks,"Aster, SRTM, Russian Topomaps","CSCDA-ESA",-,25
`EU-DEM`_,DEM,"Aster, SRTM, Russian Topomaps","CSCDA-ESA",-,25
`EuroCrops`_,Geometries,EU Countries,"CC-BY-SA-4.0",-,-
`GBIF`_,Points,Citizen Scientists,"CC0-1.0 OR CC-BY-4.0 OR CC-BY-NC-4.0",-,-
`GlobBiomass`_,Masks,Landsat,"CC-BY-4.0","45,000x45,000",100
Expand Down
6 changes: 5 additions & 1 deletion docs/tutorials/custom_raster_dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,11 @@
"\n",
"### `is_image`\n",
"\n",
"If your data only contains image files, as is the case with Sentinel-2, use `is_image = True`. If your data only contains segmentation masks, use `is_image = False` instead.\n",
"If your data only contains model inputs (such as images), use `is_image = True`. If your data only contains ground truth model outputs (such as segmentation masks), use `is_image = False` instead.\n",
"\n",
"### `dtype`\n",
"\n",
"Defaults to float32 for `is_image == True` and long for `is_image == False`. This is what you want for 99% of datasets, but can be overridden for tasks like pixel-wise regression (where the target mask should be float32).\n",
"\n",
"### `separate_files`\n",
"\n",
Expand Down
53 changes: 39 additions & 14 deletions torchgeo/datasets/geo.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,11 @@ class GeoDataset(Dataset[dict[str, Any]], abc.ABC):
based on latitude/longitude. This allows users to do things like:

* Combine image and target labels and sample from both simultaneously
(e.g. Landsat and CDL)
(e.g., Landsat and CDL)
* Combine datasets for multiple image sources for multimodal learning or data fusion
(e.g. Landsat and Sentinel)
(e.g., Landsat and Sentinel)
* Combine image and other raster data (e.g., elevation, temperature, pressure)
and sample from both simultaneously (e.g., Landsat and Aster Global DEM)

These combinations require that all queries are present in *both* datasets,
and can be combined using an :class:`IntersectionDataset`:
Expand All @@ -69,9 +71,9 @@ class GeoDataset(Dataset[dict[str, Any]], abc.ABC):
Users may also want to:

* Combine datasets for multiple image sources and treat them as equivalent
(e.g. Landsat 7 and Landsat 8)
(e.g., Landsat 7 and Landsat 8)
* Combine datasets for disparate geospatial locations
(e.g. Chesapeake NY and PA)
(e.g., Chesapeake NY and PA)

These combinations require that all queries are present in *at least one* dataset,
and can be combined using a :class:`UnionDataset`:
Expand Down Expand Up @@ -108,7 +110,7 @@ class GeoDataset(Dataset[dict[str, Any]], abc.ABC):
def __init__(
self, transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new GeoDataset instance.

Args:
transforms: a function/transform that takes an input sample
Expand Down Expand Up @@ -344,7 +346,14 @@ class RasterDataset(GeoDataset):
#: ``start`` and ``stop`` groups.
date_format = "%Y%m%d"

#: True if dataset contains imagery, False if dataset contains mask
#: True if the dataset only contains model inputs (such as images). False if the
#: dataset only contains ground truth model outputs (such as segmentation masks).
#:
#: The sample returned by the dataset/data loader will use the "image" key if
#: *is_image* is True, otherwise it will use the "mask" key.
#:
#: For datasets with both model inputs and outputs, a custom
#: :func:`~RasterDataset.__getitem__` method must be implemented.
is_image = True

#: True if data is stored in a separate file for each band, else False.
Expand All @@ -363,6 +372,10 @@ class RasterDataset(GeoDataset):
def dtype(self) -> torch.dtype:
"""The dtype of the dataset (overrides the dtype of the data file via a cast).

Defaults to float32 if :attr:`~RasterDataset.is_image` is True, else long.
Can be overridden for tasks like pixel-wise regression where the mask should be
float32 instead of long.

Returns:
the dtype of the dataset

Expand All @@ -382,7 +395,7 @@ def __init__(
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
cache: bool = True,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new RasterDataset instance.

Args:
paths: one or more root directories to search or files to load
Expand Down Expand Up @@ -605,7 +618,7 @@ def __init__(
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
label_name: Optional[str] = None,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new VectorDataset instance.

Args:
paths: one or more root directories to search or files to load
Expand Down Expand Up @@ -873,9 +886,11 @@ class IntersectionDataset(GeoDataset):
This allows users to do things like:

* Combine image and target labels and sample from both simultaneously
(e.g. Landsat and CDL)
(e.g., Landsat and CDL)
* Combine datasets for multiple image sources for multimodal learning or data fusion
(e.g. Landsat and Sentinel)
(e.g., Landsat and Sentinel)
* Combine image and other raster data (e.g., elevation, temperature, pressure)
and sample from both simultaneously (e.g., Landsat and Aster Global DEM)

These combinations require that all queries are present in *both* datasets,
and can be combined using an :class:`IntersectionDataset`:
Expand All @@ -896,7 +911,12 @@ def __init__(
] = concat_samples,
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new IntersectionDataset instance.

When computing the intersection between two datasets that both contain model
inputs (such as images) or model outputs (such as masks), the default behavior
is to stack the data along the channel dimension. The *collate_fn* parameter
can be used to change this behavior.

Args:
dataset1: the first dataset
Expand Down Expand Up @@ -1026,9 +1046,9 @@ class UnionDataset(GeoDataset):
This allows users to do things like:

* Combine datasets for multiple image sources and treat them as equivalent
(e.g. Landsat 7 and Landsat 8)
(e.g., Landsat 7 and Landsat 8)
* Combine datasets for disparate geospatial locations
(e.g. Chesapeake NY and PA)
(e.g., Chesapeake NY and PA)

These combinations require that all queries are present in *at least one* dataset,
and can be combined using a :class:`UnionDataset`:
Expand All @@ -1049,7 +1069,12 @@ def __init__(
] = merge_samples,
transforms: Optional[Callable[[dict[str, Any]], dict[str, Any]]] = None,
) -> None:
"""Initialize a new Dataset instance.
"""Initialize a new UnionDataset instance.

When computing the union between two datasets that both contain model inputs
(such as images) or model outputs (such as masks), the default behavior is to
merge the data to create a single image/mask. The *collate_fn* parameter can be
used to change this behavior.

Args:
dataset1: the first dataset
Expand Down
Loading