Skip to content

gsaltintas/supremap-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Supremap Data

Table of Contents

This repository has the functionality to

  • Query and download data from Sentinel provided by Copernicus API and Swisstopo provided by Swiss Federal Office of Topology.
  • Query Open Street Map (OSM) to fetch map data.
  • Plot and create segmentation and instance maps of geotagged tifs using OSM API.
  • Split the tif files, which are 10,000x10,000 pixels for Swisstopo, into desired patches.
  • Preprocess Swisstopo and Sentinel data together to match and align, convert CRS, etc.
  • Unified data creation pipeline.
  • Downsample or resize images.

Data Sources

We provide a single entry point download_data.py for data download. For all possible options like maximum rows to query or ordering please call python download_data.py --help.

Swisstopo

Swisstopo data download is straightforward. For example the following query will return all images captured at 10cm ground sampling distance (other available option is 2m) from Switzerland from January 1st, 2022 to December 5th, 2022. Note that date_range argument must form a valid json.

python download_data.py --swisstopo --bbox "[5,46,10,48]" --date_range "[\"2018-01-01\", \"2018-12-31\"]" \
    --resolution 0.1  --save_dir "../out"

One can also query for a single item using the following command. This is useful when the user wants a specific tile or downloads a csv containing list of files from the Swisstopo website.

python download_data.py --swisstopo --id ID --save_dir "../out"

Copernicus API and Sentinel

10m, 20m, 30m bands from the Sentinel-2 can be downloaded via the following command. For an extensive list of possible commands call the program with --help.

python download_data.py --sentinel --bbox "[5,46,10,48]" --date_range "[\"2022-01-01\", \"2022-01-05\"]" --save_dir "../out"

We use the OpenSearchAPI to query Sentinel data, please note that other options like Sentinelsat library and AWS are considered but didn't provide the flexibility we require. See here on the dhusget.sh script and here for API options.

Full Dataset Creation

We use the same dataset to train both pix2pixHD and Real-ESRGAN models and implemented a pipeline that

  • queries and downloads Swisstopo and Sentinel data along a certain bounding box or point
  • fixes alignment issues between two formats
  • creates segmentation and instance maps for each tile
  • patchifies the tiles (Swisstopo, Sentinel, segmentation map and instance map) into smaller (can be configured, by default 256) png's and saves the corresponding bounding box as a geojson.

CSV files with WGS-84 coordinates in the columns x_center, y_center can be provided as input to create_imaginaire_dataset.py in order to use Swisstopo tiles showing these specific locations.

We have prepared the following CSV files for demonstration:

  • csvs/supremap_swisstopo_zurich_lausanne_interlaken_mini.csv: contains one point located in a Swisstopo tile showing ETH Zürich's Hauptgebäude
  • csvs/supremap_swisstopo_zurich_lausanne_interlaken_small.csv: contains a selection of 5 points from Zurich, Lausanne and Interlaken
  • csvs/supremap_swisstopo_zurich_lausanne_interlaken_large.csv: contains 309 points across Zurich, Lausanne and Interlaken

Use the following commands to create corresponding datasets:

  • Mini dataset: python src/create_imaginaire_dataset.py --csv=csvs/supremap_swisstopo_zurich_lausanne_interlaken_mini.csv --output-dir=datasets/supremap_swisstopo_zurich_lausanne_interlaken_mini
  • Small dataset: python src/create_imaginaire_dataset.py --csv=csvs/supremap_swisstopo_zurich_lausanne_interlaken_small.csv --output-dir=datasets/supremap_swisstopo_zurich_lausanne_interlaken_small
  • Large dataset (creation will take very long): python src/create_imaginaire_dataset.py --csv=csvs/supremap_swisstopo_zurich_lausanne_interlaken_large.csv --output-dir=datasets/supremap_swisstopo_zurich_lausanne_interlaken_large

Please see create_imaginaire_dataset.py for more details.

Dataset

We perform the train-test split while creating the data. The total number of pixelwise instances of each class in our final dataset can be seen below:

Dataset Pixelwise Category Count

  • Note that we have 19177 beach pixels in the training set and 2459 in the validation set.

See some samples from out dataset in figures folder.

Requirements

Please download required python packages via

pip install -r requirements.txt

Copernicus

Sentinel API requires authentication, follow the instructions here to sign up and add the following lines to your .bashrc.

export DHUS_USER="YOUR_USERNAME"
export DHUS_PASSWORD="YOUR_PASSWORD"
export DHUS_URL="https://apihub.copernicus.eu/apihub"

Install GDAL

If you have difficulty installing GDAL, please try to install through conda-forge.

conda install -c conda-forge gdal

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published