# Raster formats and libraries

This notebook based on content from previous geohackweek raster tutorials https://github.com/geohackweek/raster

<blockquote class="objectives">
    
## Overview

<div style="float: left;">

**teaching:** 15 minutes
    
**exercises:** 0
    </div> 
    
|questions| objectives | 
|:----------| :---  |
| What sorts of formats are available for representing raster datasets?  | Understand the high-level data interchange formats for raster datasets. |  

</blockquote>

### Table of contents

1. [**Raster Data Format**](#Raster-Data-Format)
    1. [About Raster Data](#About-Raster-Data)
1. [**Geospatial Data Abstraction Library (GDAL)**](#Geospatial-Data-Abstraction-Library-(GDAL))
    1. [Converting formats](#Converting-formats)
    1. [Reprojection](#Reprojection)
    1. [Visualization and computation](#Visualization-and-computation)
1. [**Raster Tiles**](#Raster-Tiles)  

### Raster Data Format 

Raster data can come in many different formats. One of the most pervasive comments currently is the `GeoTIFF` format which has the extension `.tif`. A `.tif` file stores metadata or attributes about the file as embedded `tif tags`. For instance, your camera might store EXIF tags that describes the make and model of the camera or the date the photo was taken when it saves a `.tif`. A GeoTIFF is a standard `.tif` image format with additional spatial (georeferencing) information embedded in the file
as tags. These tags should include the following raster metadata:

1. Geotransform (defines extent, resolution)
2. Coordinate Reference System (CRS)
3. Values that represent missing data (`NoDataValue`) 

Spatially-aware applications such as [QGIS](https://qgis.org) are careful to interpret this metadata appropriately.  If we aren't careful (or are using a raster-editing application that ignores spatial information), we can accidentally strip this spatial metadata.  Photoshop, for example, can edit GeoTiffs, but we'll lose the embedded CRS and geotransform if we save to the same file!

<blockquote class="callout">

## More Resources on the  `.tif` format

* [GeoTIFF on Wikipedia](https://en.wikipedia.org/wiki/GeoTIFF)
* [Cloud-optimized Geotiffs (COGs)](https://www.cogeo.org)

</blockquote>

### Geospatial Data Abstraction Library (GDAL)
[GDAL](http://gdal.org) is the de facto standard library for
interaction and manipulation of geospatial raster data.  The primary purpose of GDAL is to read, write and transform geospatial datasets in a way that makes sense in the context of its spatial metadata.  GDAL also includes a set of [command-line utilities](http://www.gdal.org/gdal_utilities.html) (e.g., `gdalinfo`, `gdal_translate`, `gdalwarp`) for convenient inspection and manipulation of raster data.

GDAL's support for different file formats depends on the format drivers that have been implemented, and the libraries that are available at compile time. To find the available formats for your current install of GDAL:

In [15]:
# ! runs a terminal command
!gdalinfo --version

GDAL 2.4.1, released 2019/03/15


In [2]:
# Output cropped to 5 lines
!gdalinfo --formats | head -n 5

Supported Formats:
  VRT -raster- (rw+v): Virtual Raster
  DERIVED -raster- (ro): Derived datasets using VRT pixel functions
  GTiff -raster- (rw+vs): GeoTIFF
  NITF -raster- (rw+vs): National Imagery Transmission Format


In [3]:
# Get information about a given datatype
#!gdalinfo --format GTiff

In [13]:
# GDAL can operate on local files or even read files stored on a server:
HOST = 'http://landsat-pds.s3.amazonaws.com/c1/L8/042/034/LC08_L1TP_042034_20170616_20170629_01_T1'
IMAGE = 'LC08_L1TP_042034_20170616_20170629_01_T1_B4.TIF'

In [16]:
# NOTE we are expanding python variables defined above with {}
!gdalinfo /vsicurl/{HOST}/{IMAGE}

Driver: GTiff/GeoTIFF
Files: /vsicurl/http://landsat-pds.s3.amazonaws.com/c1/L8/042/034/LC08_L1TP_042034_20170616_20170629_01_T1/LC08_L1TP_042034_20170616_20170629_01_T1_B4.TIF
       /vsicurl/http://landsat-pds.s3.amazonaws.com/c1/L8/042/034/LC08_L1TP_042034_20170616_20170629_01_T1/LC08_L1TP_042034_20170616_20170629_01_T1_B4.TIF.ovr
       /vsicurl/http://landsat-pds.s3.amazonaws.com/c1/L8/042/034/LC08_L1TP_042034_20170616_20170629_01_T1/LC08_L1TP_042034_20170616_20170629_01_T1_MTL.txt
Size is 7821, 7951
Coordinate System is:
PROJCS["WGS 84 / UTM zone 11N",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_or

### Converting formats

Often you want files in a specific format. GDAL is great for format conversions. One of the most powerful and useful formats is the `virtual raster` [VRT format](https://www.gdal.org/gdal_vrttut.html). It is essentially an XML file that fully describes a raster, but does not duplicate the binary data. For example, you can save a reference to a remote file to your local disk without downloading the entire file!

In [17]:
%%bash
# Alternatively you can run a short bash script with %%bash
HOST='http://landsat-pds.s3.amazonaws.com/c1/L8/042/034/LC08_L1TP_042034_20170616_20170629_01_T1'
IMAGE='LC08_L1TP_042034_20170616_20170629_01_T1_B4.TIF'
gdal_translate -of VRT /vsicurl/$HOST/$IMAGE LC08_L1TP_042034_20170616_20170629_01_T1_B4.vrt

Input file size is 7821, 7951


In [23]:
# Now you can forget about the strange '/vsicurl/' syntax and just work directly with the local file. 
# The command below should give you the same print-out as earlier.
!gdalinfo LC08_L1TP_042034_20170616_20170629_01_T1_B4.vrt | grep PROJCS

PROJCS["WGS 84 / UTM zone 11N",


### Reprojection

Another common task is warping an image to a different coordinate system. Note from above that the file we are working with has a CRS of UTM 11N. [Universal Transverse Mercator](https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system) is a very common raster format for small bounding boxes because the x and y dimensions are in units of meters. 

Whenever you want to convert from one CRS to another, it is extremely useful to use [EPSG Codes](https://spatialreference.org/ref/epsg/). These codes are essentially a  shorthand and internationally agreed upon database of all the CRS's out there. Two of the most common ones that are worth memorizing are `4326` which is unprojected WGS84 Lat/Lon, and `3857` which is Google Mercator, used extensively for maps on the web. 

The example command below warps the image from UTM Coordinates to WGS84 lat/lon coordinates:

In [19]:
!gdalwarp -t_srs EPSG:4326 -of VRT /vsicurl/{HOST}/{IMAGE} LC08_L1TP_042034_20170616_20170629_01_T1_B4-wgs84.vrt

Creating output file that is 8812P x 7132L.
Processing /vsicurl/http://landsat-pds.s3.amazonaws.com/c1/L8/042/034/LC08_L1TP_042034_20170616_20170629_01_T1/LC08_L1TP_042034_20170616_20170629_01_T1_B4.TIF [1/1] : 0...10...20...30...40...50...60...70...80...90...100 - done.


In [26]:
# Confirm reprojection by looking at the new coordinates
# Note the pixel size and corner coordinates are now in units of degrees
!gdalinfo LC08_L1TP_042034_20170616_20170629_01_T1_B4-wgs84.vrt

Driver: VRT/Virtual Raster
Files: LC08_L1TP_042034_20170616_20170629_01_T1_B4-wgs84.vrt
       /vsicurl/http://landsat-pds.s3.amazonaws.com/c1/L8/042/034/LC08_L1TP_042034_20170616_20170629_01_T1/LC08_L1TP_042034_20170616_20170629_01_T1_B4.TIF
Size is 8812, 7132
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4326"]]
Origin = (-120.391439278664379,38.559377085787425)
Pixel Size = (0.000307568822054,-0.000307568822054)
Metadata:
  AREA_OR_POINT=Point
Corner Coordinates:
Upper Left  (-120.3914393,  38.5593771) (120d23'29.18"W, 38d33'33.76"N)
Lower Left  (-120.3914393,  36.3657962) (120d23'29.18"W, 36d21'56.87"N)
Upper Right (-117.6811428,  38.5593771) (117d40'52.11"W, 38d33'33.76"N)
Lower Right (-117.6

## Raster Tiles

When rasters cover very large areas, or are very high resolution, it is convenient to chop them up into discrete tiles that can be reassembled into the full image. This is especially true for interactive web maps where we want to visualize a raster at a resolution appropriate for a given zoom level. If you are looking at the entire globe, there is no reason to render an image at 30m resolution because our eyes can't discern the detail. Conversly if we are zoomed into a city, we want the full resolution data displayed. This schematic illustrates tiling:

<img src="ArcGIS-raster-tiles.png" width="300"/>
*Source: http://desktop.arcgis.com.*

It is important to note that the bounding boxes of tiles, as well as the pixel averaging scheme affects how the raster appears at a given zoom level. There are common standard tiling schemes for web maps - check out the [mercantile library](https://github.com/mapbox/mercantile). GDAL also offers utilities to easily generate tiles from any raster. For example, see [`gdal2tiles.py`](https://gdal.org/programs/gdal2tiles.html)  or the [`MBTiles` format](https://gdal.org/drivers/raster/mbtiles.html).


## Visualization and computation

This just scratches the surface of what GDAL is capable of. Next you might want to look into sampling raster values with [`gdallocationinfo`](https://gdal.org/programs/gdal_calc.html) or performing computations with [`gdal_calc`](https://gdal.org/programs/gdal_calc.html). GDAL itself does not handle graphics for visualizing graphics, but it underpins many graphical GIS programs, such as [QGIS](https://qgis.org). We'll look at visualization with Python libraries next.

<blockquote class="keypoints">

## key points 

- The Geospatial Data Abstraction Library (GDAL) is very useful for reading, writing and transforming rasters
    
</blockquote class="keypoints">

In [28]:
# This cell enables software carpentry lesson styling
from IPython.core.display import HTML
with open( './carpentries-lesson.css', 'r' ) as f: style = f.read()
HTML( style )