Skip to content

Sharing New Data

Martin Gauch edited this page Jun 20, 2022 · 4 revisions

Sharing New Data

Caravan is a community project. It depends on the contributions from people around the world who share data. If you have data for one of the white spots on our map, this document is for you. It describes how to share data in a way that it is usable with the rest of Caravan.

map of Caravan basins

Overview

To make your data a part of Caravan, the following two things are required:

  1. The data itself. This includes:
    1. Catchment attributes
    2. Timeseries data
    3. A Shapefile of catchments
  2. License information. Your data must be shared with a permissive license. We recommend you use CC-BY-4.0, but other licenses may be compatible, too.

This guide assumes that you have followed the tutorial on how to extend Caravan and successfully run the two notebooks that download and locally post-process data from Google Earth Engine. The following sections will explain what to do when you have collected all data and are ready to share it with the community.

Preparing the Data

After you followed the Caravan extension guide with your data, you should have ended up with a folder on your disk that has the following structure:

The root folder of your dataset must contain the following folders:

  • attributes
  • timeseries with subfolders csv/ and netcdf/
  • shapefiles
  • licenses

Each of these folders must contain a subfolder {BASIN_PREFIX} (the short string that must be unique within the Caravan data space and describes your dataset) and nothing else. This way, your new sub-dataset can easily be merged with the existing Caravan data by copying the {BASIN_PREFIX} folders into the official Caravan folders.

Catchment Attributes

The attributes/{BASIN_PREFIX}/ folder should contain two comma-separated .csv files with exactly the following names:

  • attributes_caravan_{BASIN_PREFIX}.csv
  • attributes_hydroatlas_{BASIN_PREFIX}.csv

Timeseries Data

The timeseries folder must contain the time-series data (meteorological forcings, streamflow) as both csv files (in the csv/{BASIN_PREFIX}/ subfolder) and as netCDF files (in the netcdf/{BASIN_PREFIX}/ subfolder).

Shapefiles

The shapefiles/{BASIN_PREFIX}/ folder should contain a shapefile of all catchments that you are contributing to Caravan.

Licenses

The licenses/{BASIN_PREFIX}/ folder should contain a single markdown file called license_{BASIN_PREFIX}.md which contains information on the license, sources, and references for your data. Take a look at the license files from existing Caravan sub-datasets to get an idea of how this file should look like.

Note that your data must be shareable under a permissive license that is compatible with CC-BY-4.0.

Publishing the Data

When all data is in the correct format and folder structure, you are ready to upload it to the Zenodo data archive. Zenodo is a free service where you can upload your data and get a DOI for it.

Telling People About the Data

Once your data is published on Zenodo, head over to the issues section of the Caravan GitHub page and create a new entry, using the "Data Contribution" template that's provided there. There, you will need to fill in some information on the dataset. Once all information is complete, a Caravan maintainer will post the information about your contribution in the New Data discussion thread.

That's it! Congratulations and a big thank you for sharing your data!