Source: https://torchgeo.readthedocs.io/en/stable/tutorials/getting_started.html

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Getting Started

In this tutorial, we demonstrate some of the basic features of TorchGeo and show how easy it is to use if you're already familiar with other PyTorch domain libraries like torchvision.

It's recommended to run this notebook on Google Colab if you don't have your own GPU. Click the "Open in Colab" button above to get started.

## Setup

You need a Python environment to run this notebook. You can either do this in Google Colab, or by setting up a local python environment. I like to set my up using conda:

In [None]:
# create a conda environment with python 3.11 for pytorch
%conda create --name pyt python=3.11 
%conda activate pyt

# run this next line specific to your OS and GPU configuration. Use CPU if you don't have CUDA.
# https://pytorch.org/get-started/locally/
# this example is for a linux system with CUDA 12.1
%conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Now we can install TorchGeo. It's best to use `pip` for this.

In [None]:
%pip install torchgeo

## Imports

Next, we import TorchGeo and any other libraries we need.

In [2]:
import os
import tempfile

from torch.utils.data import DataLoader

from torchgeo.datasets import NAIP, ChesapeakeDE, stack_samples
from torchgeo.datasets.utils import download_url
from torchgeo.samplers import RandomGeoSampler

## Datasets

For this tutorial, we'll be using imagery from the [National Agriculture Imagery Program (NAIP)](https://catalog.data.gov/dataset/national-agriculture-imagery-program-naip) and labels from the [Chesapeake Bay High-Resolution Land Cover Project](https://www.chesapeakeconservancy.org/conservation-innovation-center/high-resolution-data/land-cover-data-project/). First, we manually download a few NAIP tiles and create a PyTorch Dataset.

In [3]:
naip_root = os.path.join(tempfile.gettempdir(), 'naip')
naip_url = (
    'https://naipeuwest.blob.core.windows.net/naip/v002/de/2018/de_060cm_2018/38075/'
)
tiles = [
    'm_3807511_ne_18_060_20181104.tif',
    'm_3807511_se_18_060_20181104.tif',
    'm_3807512_nw_18_060_20180815.tif',
    'm_3807512_sw_18_060_20180815.tif',
]
for tile in tiles:
    download_url(naip_url + tile, naip_root)

naip = NAIP(naip_root)

Downloading https://naipeuwest.blob.core.windows.net/naip/v002/de/2018/de_060cm_2018/38075/m_3807511_ne_18_060_20181104.tif to /tmp/naip/m_3807511_ne_18_060_20181104.tif


100%|██████████| 513332284/513332284 [01:06<00:00, 7700744.96it/s] 


Downloading https://naipeuwest.blob.core.windows.net/naip/v002/de/2018/de_060cm_2018/38075/m_3807511_se_18_060_20181104.tif to /tmp/naip/m_3807511_se_18_060_20181104.tif


100%|██████████| 521985441/521985441 [01:36<00:00, 5404524.82it/s]


Downloading https://naipeuwest.blob.core.windows.net/naip/v002/de/2018/de_060cm_2018/38075/m_3807512_nw_18_060_20180815.tif to /tmp/naip/m_3807512_nw_18_060_20180815.tif


100%|██████████| 489865657/489865657 [01:10<00:00, 6981898.29it/s]


Downloading https://naipeuwest.blob.core.windows.net/naip/v002/de/2018/de_060cm_2018/38075/m_3807512_sw_18_060_20180815.tif to /tmp/naip/m_3807512_sw_18_060_20180815.tif


100%|██████████| 484476647/484476647 [01:16<00:00, 6323757.24it/s] 


Next, we tell TorchGeo to automatically download the corresponding Chesapeake labels.

In [4]:
chesapeake_root = os.path.join(tempfile.gettempdir(), 'chesapeake')
os.makedirs(chesapeake_root, exist_ok=True)
chesapeake = ChesapeakeDE(chesapeake_root, crs=naip.crs, res=naip.res, download=True)

https://hf.co/datasets/torchgeo/chesapeake/resolve/1e0370eda6a24d93af4153745e54fd383d015bf5/de_lulc_2013_2022-Edition.zip
Downloading https://cdn-lfs-us-1.hf.co/repos/b1/d9/b1d907e4abc2017ba5639b5383cb747c467032b813359bb2ceb94fe88674bac3/ced3e274bfd8531915cb21d1a3faad31c9de859648feb8b8daec245f343c0b5c?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27de_lulc_2013_2022-Edition.zip%3B+filename%3D%22de_lulc_2013_2022-Edition.zip%22%3B&response-content-type=application%2Fzip&Expires=1727737210&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNzczNzIxMH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zL2IxL2Q5L2IxZDkwN2U0YWJjMjAxN2JhNTYzOWI1MzgzY2I3NDdjNDY3MDMyYjgxMzM1OWJiMmNlYjk0ZmU4ODY3NGJhYzMvY2VkM2UyNzRiZmQ4NTMxOTE1Y2IyMWQxYTNmYWFkMzFjOWRlODU5NjQ4ZmViOGI4ZGFlYzI0NWYzNDNjMGI1Yz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=qdsppKbpJ462RPRBujj-UISFwwuXD6M1GiNMxFvwOOQKm0v%7EQpA%7ESYt

100%|██████████| 342966050/342966050 [00:03<00:00, 112920466.71it/s]


https://hf.co/datasets/torchgeo/chesapeake/resolve/1e0370eda6a24d93af4153745e54fd383d015bf5/de_lulc_2018_2022-Edition.zip
Downloading https://cdn-lfs-us-1.hf.co/repos/b1/d9/b1d907e4abc2017ba5639b5383cb747c467032b813359bb2ceb94fe88674bac3/4b996051cbd532dc4e43642d4ecbdf2dd55456a927edc35983b427f137145273?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27de_lulc_2018_2022-Edition.zip%3B+filename%3D%22de_lulc_2018_2022-Edition.zip%22%3B&response-content-type=application%2Fzip&Expires=1727737214&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyNzczNzIxNH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zL2IxL2Q5L2IxZDkwN2U0YWJjMjAxN2JhNTYzOWI1MzgzY2I3NDdjNDY3MDMyYjgxMzM1OWJiMmNlYjk0ZmU4ODY3NGJhYzMvNGI5OTYwNTFjYmQ1MzJkYzRlNDM2NDJkNGVjYmRmMmRkNTU0NTZhOTI3ZWRjMzU5ODNiNDI3ZjEzNzE0NTI3Mz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=X7MquGN-BXpoPLZf16wfnpLNYLFi9WZb-MUjGIOyNGU%7Eb36bLYCUMAPY5

100%|██████████| 348540992/348540992 [00:02<00:00, 116923861.23it/s]


Finally, we create an IntersectionDataset so that we can automatically sample from both GeoDatasets simultaneously.

In [5]:
dataset = naip & chesapeake

## Sampler

Unlike typical PyTorch Datasets, TorchGeo GeoDatasets are indexed using lat/long/time bounding boxes. This requires us to use a custom GeoSampler instead of the default sampler/batch_sampler that comes with PyTorch.

In [6]:
sampler = RandomGeoSampler(dataset, size=1000, length=10)

## DataLoader

Now that we have a Dataset and Sampler, we can combine these into a single DataLoader.

In [7]:
dataloader = DataLoader(dataset, sampler=sampler, collate_fn=stack_samples)

## Training

Other than that, the rest of the training pipeline is the same as it is for torchvision.

In [8]:
for sample in dataloader:
    image = sample['image']
    target = sample['mask']

ValueError: WarpedVRT does not permit boundless reads