MapInWild dataset #1096

burakekim · 2023-02-08T15:26:05Z

Summary

Hi!

MapInWild is a large-scale multi-modal dataset curated for the novel task of wilderness mapping from space, introduced in our recent paper. We would be glad to see our dataset in torchgeo!

The dataset is originally hosted on harvard dataverse. Following @calebrob6's suggestion, I have mirrored it on huggingface to make things easier for everyone.
The original GitHub repository of the dataset is here.
Wilderness mapping is a supervised pixel-wise classification task. Although the annotations contain four classes/pixel values {0:Background, 1:Strict nature reserve, 2:Wilderness Area, 3:National Park]}, we perform annotation[annotation!= 0] = 1 and frame this task as a binary classification problem where 0 is Background and 1 is Wilderness Area.
The annotations (in the form of polygons) are derived from World Database of Protected Areas.
The dataset is around 350 GB in size. That is why each modality is zipped separately, allowing the users to pick among the modalities they want to work on.

There are 1018 areas in the dataset and each area contains the following modalities in the shape of 1920 x 1920 pixels.

MapInWild
├── Dual Pol Sentinel-1 (2 Bands)
├──  Sentinel-2 (10 Bands)  
│   ├── Spring
│   ├── Autumn
│   ├── Winter
│   ├── Summer
│   └── Single Temporal Subset 
├──  VIIRS Night Time Light (1 Band)
└── ESA WorldCover (1 Band)

As explained in the paper, the single temporal subset includes the most informative Sentinel-2 season for each area, suggested to the users who are not interested in the multi-seasonality aspect of the dataset.

Following the torchgeo logic, here are the bands and their modality-level combinations.

    BAND_SETS: Dict[str, Tuple[str, ...]] = {
        "all": (
            "VV",
            "VH",
            "B2",
            "B3",
            "B4",
            "B5",
            "B6",
            "B7",
            "B8",
            "B8A",
            "B11",
            "B12",
            "2020_Map",
            "avg_rad"), 
        "s1": ("VV", "VH"),
        "s2-rgb":(
            "B4",
            "B3",
            "B2"),
        "s2-all": (
            "B2",
            "B3",
            "B4",
            "B5",
            "B6",
            "B7",
            "B8",
            "B8A",
            "B11",
            "B12"),
        "esa_wc": {"2020_Map"},
        "viirs":{"avg_rad"}
    }

Rationale

No response

Implementation

After asked by the user MapInWild(root="data/", modalities=[...], download=True), any modality can be loaded as below:

mask= load_dataset("burakekim/mapinwild", data_dir="mask")
s1 = load_dataset("burakekim/mapinwild", data_dir="s1")
viirs = load_dataset("burakekim/mapinwild", data_dir="viirs")
esa_wc = load_dataset("burakekim/mapinwild", data_dir="esa_wc")
s2 = load_dataset("burakekim/mapinwild", data_dir="s2_temporal_subset")
s2_autumn= load_dataset("burakekim/mapinwild", data_dir="s2_autumn")
s2_spring= load_dataset("burakekim/mapinwild", data_dir="s2_spring")
s2_winter= load_dataset("burakekim/mapinwild", data_dir="s2_winter")
s2_summer = load_dataset("burakekim/mapinwild", data_dir="s2_summer")

The s1 and s2 are bigger than 50 GB and they are split into two zip files. For these modalities, the num_proc=2 argument in the load_dataset can be used.

Alternatives

No response

Additional information

No response

The text was updated successfully, but these errors were encountered:

adamjstewart · 2023-02-08T16:32:00Z

We would love to have your dataset in TorchGeo! Would you like to try opening a PR to add it? I would suggest looking at recent PRs that added new datasets to get a full list of files you would need to add/modify. I'm happy to help review a draft when it's ready.

burakekim · 2023-02-08T16:39:06Z

Sure, I can give it a go!

adamjstewart added the datasets Geospatial or benchmark datasets label Feb 8, 2023

burakekim mentioned this issue Feb 21, 2023

Add MapInWild dataset #1131

Merged

adamjstewart closed this as completed in #1131 Sep 29, 2023

adamjstewart added this to the 0.5.0 milestone Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MapInWild dataset #1096

MapInWild dataset #1096

burakekim commented Feb 8, 2023 •

edited

adamjstewart commented Feb 8, 2023

burakekim commented Feb 8, 2023

MapInWild dataset #1096

MapInWild dataset #1096

Comments

burakekim commented Feb 8, 2023 • edited

Summary

Rationale

Implementation

Alternatives

Additional information

adamjstewart commented Feb 8, 2023

burakekim commented Feb 8, 2023

burakekim commented Feb 8, 2023 •

edited