# MDE-GEO-01-B Introduction to Earth System Data

Earth System Data? What are those?

Data related ot Earth Science (including Geology, Geophysics, Marine Science, Weather/Climate Science, etc.)  are very heterogeneous, very large and with a variable degree of complexity.

This course won't provide an exhaustive view of the entire set of available Earth Science data, nor get into deep data analysis matters, but it will give a general overview of the range of data types, standards and content used in Earth Science. It will also provide an overvirw of available archives and tools available: They evolve fast! file formats not so much, but tools and **services** (more on that later) are increasing in number and capability by the month.

E.g. try any search engine, and you will find even journals dealing with those. Skimming through the recent articles can give an idea of what Earth System Data are about... e.g.

https://www.earth-syst-sci-data.net

Also, some of the papers above link to repositories with data linked to  peer-reviewed publications. In some cases (see further below) data not linked to any peer-reviewed publication can also be found

# Data:

* **Who collects data?**
    * Yourself (metadata? ;) )
    * Someone else (see above)
    * Some organised experiment / institution / system (see above, but more - or less - relaxed..)
* **Data vs. metadata**
    * Metadata are data about data. In practice, metadata are bits of information needed to use, process or interpret data. Example: map projection information from your GPS coordinates in x, y in meters, but referred to... what?
* **Data vs. quicklooks**
    * Quicklooks (or previews, or browse data) are typically low resolution ~thumbnail visualisations of complete data products. They can allow quick inspection (but no real analysis) of data without the need to open/visualise real data.
* **processing levels, raw vs. processed data**
    * data can be available with different levels of processing and science-readiness, i.e. images collected by a sensor (e.g. a CCD) with units in voltage, or digital number, or some physical quantity. All three data representation might exist, but the one more directly usable is the **higher processing level** one, e..g the physical quantity.

## Dataset vs. data product
* For your own data, the distinction might be more blurred, e.g. a data product could be your notebook notes in a field station, a dataset the entire day of work, or the entire campaign. E.g. [this](https://doi.org/10.5281/zenodo.1084885) is a dataset
* A data product (or data granule) is typically the smallest granularity possible to search/discover and download, but that, depending on the way to access data, can be flexible. Example: if a camera collects several images, each is a data product, but each image has possibly millions of pixels and each of them can be accessed, as sub-granule. Tools or services might provide access to various levels of granularity. The higher the flexibility, the easier is to get exactly one needs/want (without downloading huge files).

## Data vs. Data services

Data access can be provided via services that are either used via web interface or more or less programmatically. Most modern institutional data providers use for Earth Science data some sort of web mapping services, e.g. see [USGS 3DEP view](https://viewer.nationalmap.gov/basic/?basemap=b1&category=ned,nedsrc&title=3DEP%20View).

Lately, data access services providing also a variable amount of data subset, visualisation, and analysis capabilities, e.g.:

* NASA Giovanni - https://giovanni.gsfc.nasa.gov/giovanni/
* EO Data Service - https://eodataservice.org
* [NASA ocean motion viewer](http://oceanmotion.org/html/resources/oscar.htm)
* [UNAVCO GPS velocity viewer](https://www.unavco.org/software/visualization/GPS-Velocity-Viewer/GPS-Velocity-Viewer.html)
* [NOAA Climate Indices: Monthly Atmospheric and Ocean Time Series](https://www.esrl.noaa.gov/psd/data/climateindices/)
* [ESA global current portal](https://globcurrent.oceandatalab.com/?date=735652800000&timespan=3m%3B6m&products=3857_GlobCurrent_L4_geostrophic_streamline%2C3857_GlobCurrent_L4_geostrophic_nrt_vectorfield%2C3857_GlobCurrent_drogue_15m%2C3857_ODYSSEA_SAF_SST%2C3857_ODYSSEA_MED_SST%2C3857_ODYSSEA_NWE_SST%2C3857_ODYSSEA_SST&extent=-2951577.706399%2C6082935.2252079%2C179282.971726%2C7658149.5038895&opacity=100%2C100%2C100%2C100%2C100%2C100%2C100&stackLevel=120%2C120.01%2C140%2C50%2C50.01%2C50.02%2C10)
* [Copernicus Global Flood Awareness System](http://globalfloods.jrc.ec.europa.eu/)
* [USGS Spectral Library viewer](https://crustal.usgs.gov/speclab/QueryAll07a.php)
* [CU Grace data portal](http://geoid.colorado.edu/grace/dataportal.html)

## Data by discipline

Disciplines have often their own formats and standards for data. This slightly improved in the last decades due to an increase of convergence and standardisation on data formats. Still, one has to face many...

## Data by type/organisation


* individual scientists or groups/institutions can use data repositories. 
* The standards used are less rigorous than those of institutional repositories, i.e. one can upload more or less any kind of data. 
* Unless linked with a peer-reviewed data publication, there is no a priori control on the data. 
 
Example non-institutional data repositories, totally or partially used for Earth Science, include:
* https://www.pangaea.de
* https://zenodo.org
* https://figshare.com

Data sharing and workflow/reproducibility aspects are covered e.g. by:
* https://osf.io



# Tasks: data portals inspection
    
## T2.1

* Please inspect the web services above. Document with screenshot and short description (1 sentence) of how the interface looks like. Please document your impression on the ease of use.

## T2.2

* Please list the advantages and disadvantages of each web interface you encountered. A bullet list is appropriate.

# Workflow(s)

Workflows can be slightly different depending on the data nature, its acquisition, the type and structure of data themselves as well as the objectives and background of the investigators (source: Enthought).

<img src="./images/enthoughts-data.png" width=600>


# Archives

A multitude of archives exist and they are growing in number, complexity and data availability.

See e.g. https://ropensci.org/blog/2013/09/11/taskview/

# Tools

A wide variety of tools exist. Some old, some unmaintained and few actively developed. The latter might be  more attractive, but keeping up to date is challenging nowadays. There are not many curated lists. We will use our partial, non-exhaustive list and use few as examples.

GUI tools can be found on fatcat/DE/Earth/Exercises/Extra/resources

Types of tools used might be: 

* ~Low-level Programming tools (self-made or made by someone else)
    * e.g. https://confluence.ecmwf.int/display/ECC
* CLI or GUI programs with pre-configured capabitlies
    * e.g. http://www.gdal.org (command line)
    * e.g. [NASA Panoply netCDF, HDF, Grib viewer](https://www.giss.nasa.gov/tools/panoply/)
* tools based on high-level programming languages (e.g. IDL-,  Matlab-, or python-based)
    * e.g. https://github.com/topics/earth-science (a good way to discover recent tools, BTW)
* GUI tools with capabilities expandable via plugins
    * e.g. http://qgis.org
* Tools running entirely on the web, e.g. via web browsers
    * e.g. http://eodataservice.org

## Tools by type

### Data discovery and search

Data discovery tools allow for finding data by theme or application and then refine. Examples are mentioned e.g. in the Earth Observation-based [CEOS OpenSearch](https://earthdata.nasa.gov/user-resources/standards-and-references/ceos-opensearch) documentation. Useful information is also included in the [Dataset Interoperability Recommendations for Earth Science](https://cdn.earthdata.nasa.gov/conduit/upload/5098/ESDS-RFC-028v1.1.pdf). More on the notebook on [Data archives](later).

Data search tools, often merged nowadays with data discovery, allow for parameter-based (e.g. by latitude, longitude range, date) or graphical, interactive map-based search

e.g. 

UK Marine data discovery service - https://www.geodata.soton.ac.uk/geodata/web/project202

### Data Access

Data access can be provided via clients providing also discovery and search capabilities or via direct download, e.g. via FTP/HTTP of individual granules or entire datasets, typically within a directory structure:

### Data inspection

e..g 
NOAA trackline geophysical data viewer - https://maps.ngdc.noaa.gov/viewers/geophysics/

### Data analysis 

e.g. 
* NASA Giovanni - https://giovanni.gsfc.nasa.gov/giovanni/
* EO Data Service - http://eodataservice.org

### Data assimilation

e.g. see 
* ECMWF Data assimilation - https://www.ecmwf.int/en/research/data-assimilation
    * https://software.ecmwf.int/wiki/display/LDAS/LDAS+Home




# Data formats (used or useful) by discipline

Some tools, part of which we will use during the classes, that are worth noting (NOTE: list is not exhaustive!), divided roughly by subject.

Data formats can be heterogeneous, especially for non-institutional data repositories, but several standards exist and are partially shared across disciplines.

See also [USGS data management & formats](https://www2.usgs.gov/datamanagement/plan/dataformats.php)

### Climate science

#### Common data formats in Climate Science

Please see also https://climatedataguide.ucar.edu/climate-tools#climate-data-formats (source of the list below):

* [GRIB1](http://www.nco.ncep.noaa.gov/pmb/docs):    GRIdded Binary (Edition 1), World Meteorological Organization
* [GRIB2](http://www.nco.ncep.noaa.gov/pmb/docs/grib2/):    GRIdded Binary (Edition 2), World Meteorological Organization
* [netCDF3](http://www.unidata.ucar.edu/software/netcdf/#netcdf_faq):  Network Common Data Form, (Version 3.x), Unidata (UCAR/NCAR)
* [netCDF4](http://www.unidata.ucar.edu/software/netcdf/#netcdf_faq):  Network Common Data Format, (Version 4.x), Unidata (UCAR/NCAR
* [HDF4](http://www.hdfgroup.org/products/hdf4/):     Hierarchical Data Format, (Version 4.x),  NCSA/NASA
* [HDF4-EOS2](http://www.hdfgroup.org/hdfeos.html): HDF4-Earth Obseving System, (Version 2; georeferenced data)
* [HDF5](http://www.hdfgroup.org/HDF5/h5site-levels.html):     Hierarchical Data Format, (Version 5.x),  NCSA/NASA
* [HDF5-EOS5](http://www.hdfgroup.org/HDF5/h5site-levels.html): HDF5-Earth Obseving System, (Version 5; georeferenced data)
* [GeoTIFF](http://trac.osgeo.org/geotiff/): Georeferenced raster imagery


Plus: 

* [OGC](http://www.opengeospatial.org/docs/is) (WMS, WCS, et al.)

### Geology

Several formats, raster geodata often in geospatial formats

* ASCII

* Geotiff, other [GDAL formats](http://www.gdal.org/formats_list.html)

vector geodata, e.g. measurements represented as points, polylines or polygons, 

* Shapefile, GeoJson, and other [OGR formats](http://www.gdal.org/ogr_formats.html)

### Geophysics

Lots of specialised, proprietary data formats are used. Some standards and format exist, e.g.

* [Seg-Y](https://wiki.seg.org/wiki/SEG-Y)
    * more on [Agile Geoscience post on SEG-Y, 2014](https://agilescientific.com/blog/2014/3/26/what-is-seg-y.html)
* [LAS](http://www.cwls.org/las/)
* netCDF
* HDF
* other ASCII
* ...

### Marine science

* HDF
* netCDF
* ASCII
* ...

### Remote Sensing

Several, partially shared by other communities, such as Climate/Weather (see above)

* HDF
* GeoTiff
* CEOS
* proprietary format, mostly supported by [GDAL](http://www.gdal.org/)


### GIS 

* Raster[GDAL formats](http://www.gdal.org/formats_list.html)
* Vector [OGR formats](http://www.gdal.org/ogr_formats.html)
* [OGC](http://www.opengeospatial.org/docs/is) (WMS, WCS, et al.)


# Tasks

## Task 2.3

Please download and install any *two* of the GUI tools (e.g. for opening HDF or netCDF) listed above and in the resources (you might need to use your own computer / laptop for this). Please open any data product, document with screenshots and describe eventual difficulties.
<div class="alert alert-block alert-danger">  GUI tools information can be found on fatcat Data/resources.md or the ipython notebook with "resources" in the filename </div>

# References and useful links

* Davis. J. C. (2012) [Statistics and Data Analysis in Geology](https://jacobsuniversity.on.worldcat.org/oclc/856822059), 3rd edition, 638 p., ISBN: 978-0471172758, Wiley, New York.

* Trauth, M. (2007) [MATLAB recipes for Earth Sciences](https://jacobsuniversity.on.worldcat.org/oclc/654397287), 2nd edition, 336p., ISBN: 978-3-540-72748-4, Springer, Berlin Heidelberg New York,

* [https://github.com/ltauxe](https://github.com/ltauxe) (2017) [Python-for-Earth-Science-Students
](https://github.com/ltauxe/Python-for-Earth-Science-Students)

* [https://github.com/koldunovn](https://github.com/koldunovn) (2017) [Python for Geosciences](https://github.com/koldunovn/python_for_geosciences)

* ECWMF (2017) [Workshop on developing Python frameworks for earth system sciences](https://www.ecmwf.int/en/learning/workshops/workshop-developing-python-frameworks-earth-system-sciences), ECMWF | Reading | 28-29 November 2017

* http://www.scipy-lectures.org/