You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MapInWild is a large-scale multi-modal dataset curated for the novel task of wilderness mapping from space, introduced in our recent paper. We would be glad to see our dataset in torchgeo!
The dataset is originally hosted on harvard dataverse. Following @calebrob6's suggestion, I have mirrored it on huggingface to make things easier for everyone.
The original GitHub repository of the dataset is here.
Wilderness mapping is a supervised pixel-wise classification task. Although the annotations contain four classes/pixel values {0:Background, 1:Strict nature reserve, 2:Wilderness Area, 3:National Park]}, we perform annotation[annotation!= 0] = 1 and frame this task as a binary classification problem where 0 is Background and 1 is Wilderness Area.
The annotations (in the form of polygons) are derived from World Database of Protected Areas.
The dataset is around 350 GB in size. That is why each modality is zipped separately, allowing the users to pick among the modalities they want to work on.
There are 1018 areas in the dataset and each area contains the following modalities in the shape of 1920 x 1920 pixels.
MapInWild
├── Dual Pol Sentinel-1 (2 Bands)
├── Sentinel-2 (10 Bands)
│ ├── Spring
│ ├── Autumn
│ ├── Winter
│ ├── Summer
│ └── Single Temporal Subset
├── VIIRS Night Time Light (1 Band)
└── ESA WorldCover (1 Band)
As explained in the paper, the single temporal subset includes the most informative Sentinel-2 season for each area, suggested to the users who are not interested in the multi-seasonality aspect of the dataset.
Following the torchgeo logic, here are the bands and their modality-level combinations.
The s1 and s2 are bigger than 50 GB and they are split into two zip files. For these modalities, the num_proc=2argument in the load_dataset can be used.
Alternatives
No response
Additional information
No response
The text was updated successfully, but these errors were encountered:
We would love to have your dataset in TorchGeo! Would you like to try opening a PR to add it? I would suggest looking at recent PRs that added new datasets to get a full list of files you would need to add/modify. I'm happy to help review a draft when it's ready.
Summary
Hi!
MapInWild is a large-scale multi-modal dataset curated for the novel task of wilderness mapping from space, introduced in our recent paper. We would be glad to see our dataset in
torchgeo
!The dataset is originally hosted on harvard dataverse. Following @calebrob6's suggestion, I have mirrored it on huggingface to make things easier for everyone.
The original GitHub repository of the dataset is here.
Wilderness mapping is a supervised pixel-wise classification task. Although the annotations contain four classes/pixel values
{0:Background, 1:Strict nature reserve, 2:Wilderness Area, 3:National Park]}
, we performannotation[annotation!= 0] = 1
and frame this task as a binary classification problem where0
isBackground
and1
isWilderness Area
.The annotations (in the form of polygons) are derived from World Database of Protected Areas.
The dataset is around 350 GB in size. That is why each modality is zipped separately, allowing the users to pick among the modalities they want to work on.
There are 1018 areas in the dataset and each area contains the following modalities in the shape of 1920 x 1920 pixels.
single temporal subset
includes the most informative Sentinel-2 season for each area, suggested to the users who are not interested in the multi-seasonality aspect of the dataset.Following the
torchgeo
logic, here are the bands and their modality-level combinations.Rationale
No response
Implementation
After asked by the user
MapInWild(root="data/", modalities=[...], download=True)
, any modality can be loaded as below:The s1 and s2 are bigger than 50 GB and they are split into two zip files. For these modalities, the
num_proc=2
argument in theload_dataset
can be used.Alternatives
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: