# Working with individual files

In `0-preliminaries.ipynb` we searched NASA's archive for Sentinel-1 backscatter imagery and saved a list of URLS. Let's now start working with a single file! 

Contents:

* [GDAL](#GDAL-command-line-tools)
    * [Subsetting](#subsetting)
    * [reprojection](#reprojection)
* [Visualization](#visualization)
    * [rasterio](#rasterio)
    * [xarray+holoviz](#xarray-and-holoviz)
    * [save subset](#save-subset)

### GDAL command line tools

The [Geospatial Data Abstraction Library (GDAL)](https://gdal.org/) is foundational geospatial software that can be used to transform between formats, projections, and perform many common analysis tasks. GDAL has the ability to interact with 'Network based file systems' using an interface that transforms local file system operations to network requests, simply by prefixing the path to a file with `/vsicurl`. Read more in the [documentation](https://gdal.org/user/virtual_file_systems.html#vsicurl-http-https-ftp-files-random-access), but this is best illustrated with a simple example:

In [None]:
import os

In [None]:
with open('gamma0.txt', 'r') as f:
    gammas = [line.rstrip() for line in f]

In [None]:
gammas[:3]

In [None]:
# It turns out GDAL Needs some environment variables set for authentication and efficiency
# which get set here in front of the `gdalinfo` command
env_vars = 'GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR GDAL_HTTP_COOKIEFILE=.urs_cookies GDAL_HTTP_COOKIEJAR=.urs_cookies'
cog = gammas[1]
cmd = f'{env_vars} gdalinfo /vsicurl/{cog} -approx_stats'
print(cmd)

In [None]:
%%time

!{cmd}

### Subsetting

Neat! While that particular file is big (29520x53220 pixels, over 1 Gigabyte on disk) but getting information about the projection and approximate statistics took less than 1 second. We have not downloaded anything, we are just streaming the metadata into memory. What if we'd like to download only a portion of this file rather than the whole thing? We can do this with `gdal_translate`:

In [None]:
bounding_box = '-54.85,69.31,-52.18,70.26' #West, South, East, North longitude and latitude bounds
ulx,uly,lrx,lry = [-54.85, 70.26, -52.18, 69.31]
src_dataset =  f'/vsicurl/{cog}'
filename = os.path.basename(cog)
dst_dataset = filename.replace('.tif', '_subset.tif')
cmd = f'{env_vars} gdal_translate -projwin_srs EPSG:4326 -projwin {ulx} {uly} {lrx} {lry} {src_dataset} {dst_dataset}'
print(cmd)

In [None]:
%%time 

!{cmd}

In [None]:
# Great! That only took a few seconds :) And now we can work with this local file that is of managble size
# NOTE that we no longer need the special environment variables for reading remote NASA data
cmd = f'gdalinfo {dst_dataset} -stats'
print(cmd)

In [None]:
%%time 

!{cmd}

### Reprojection

So far we've keep the file in its original coordinate reference system [EPSG:3413](https://epsg.io/3413), or "Polar Stereographic North". Perhaps you want to work with unprojected latitude longitude coordinates [EPSG:4326](https://epsg.io/4326). You can use `gdalwarp` to reproject this subset on the fly and save it locally:

In [None]:
# note target extent -te has different coordinate ordering compared to the earlier -projwin option
dstfile = dst_dataset.replace('.tif','.wgs84.tif')
cmd = f'{env_vars} gdalwarp -overwrite -t_srs EPSG:4326 -te {ulx} {lry} {lrx} {uly} {src_dataset} {dstfile}'
print(cmd)

In [None]:
%%time 
!{cmd}

In [None]:
!gdalinfo {dstfile}

## Visualization

So far we have not visualized any of these images! GDAL is great for command line operations and batch processing, but not visualizing results. 

In [None]:
%config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 14})
plt.rcParams.update({'figure.figsize': (8.5,11)})

### Rasterio

There are many options in Python. We'll start with [`rasterio`](https://rasterio.readthedocs.io/en/latest/) a fantastic library that provides an intuitive Pythonic interface to GDAL and includes some convenience functions for plotting

In [None]:
# Open the JPG overview with Rasterio and plot
import rasterio
import rasterio.plot

# Rasterio uses an environment context manager for GDAL environment variables
Env = rasterio.Env(GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR',
                   GDAL_HTTP_COOKIEFILE='.urs_cookies',
                   GDAL_HTTP_COOKIEJAR='.urs_cookies')

In [None]:
%%time 

jpg = 'https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.13/GL_S1bks_mosaic_13Jan15_24Jan15_gamma0_500m_v03.0.jpg'
title = os.path.basename(jpg)
with Env:
    with rasterio.open(jpg) as src:
        print(src.profile)
        rasterio.plot.show(src, cmap='gray', title=title)

In [None]:
# Accessing the Full-resolution data is also straightforward, no need for /vsicurl/
url = gammas[1]
print(url)
title = os.path.basename(url)
with Env:
    with rasterio.open(url) as src:
        print(src.profile)  
        overview_factors = [src.overviews(i) for i in src.indexes][0]
        overview_levels = list(range(len(overview_factors)))
        print('Overview levels: ', overview_levels)
        print('Overview factors: ',  overview_factors) 

In [None]:
# NOTE from above that overview_level=0 downsamples full-resolution by a factor of 2:
with Env:
    with rasterio.open(url, OVERVIEW_LEVEL=0) as src:
        print(src.profile)

In [None]:
%%time 

# Plot lowest resolution overview
with Env:
    with rasterio.open(url, OVERVIEW_LEVEL=5) as src:
        print(src.profile)
        rasterio.plot.show(src, cmap='gray', title=title)

### Xarray and Holoviz

Again, there are many ways to accomplish this in Python, but we'll demonstrate a particularly powerful combination for geospatial analysis. [RioXarray](https://github.com/corteva/rioxarray) combines Xarray + Rasterio for analysis of multidimensional geospatial data. [Holoviz](https://holoviz.org/) combines various Python plotting libraries for interactive visualization in a webbrowser

In [None]:
import rioxarray as rx
import hvplot.xarray

In [None]:
# An alternative to using rasterio.Env() is to set global environment variables:
os.environ['GDAL_DISABLE_READDIR_ON_OPEN']='EMPTY_DIR'
os.environ['GDAL_HTTP_COOKIEFILE']='.urs_cookies' 
os.environ['GDAL_HTTP_COOKIEJAR']='.urs_cookies'

In [None]:
%%time

da = rx.open_rasterio(url, overview_level=4, masked=True).squeeze('band') #Read an overview and account for NoDATA value
da

In [None]:
# Note this plot is interactive! You'll see coordinate and pixel values as you move the cursor, resolution updates as you zoom in
da.hvplot.image(rasterize=True, dynamic=True, aspect='equal', frame_width=200, cmap='gray',
                title=title)

In [None]:
# Read a subset window at full resolution
da = rx.open_rasterio(url, masked=True).squeeze('band') 

# Use pixel coordinates
subset = da.isel(x=slice(int(1e4),int(1.5e4)), y=slice(int(1e3),int(2e3)))
subset.hvplot.image(rasterize=True, dynamic=True, frame_width=400, cmap='gray')

In [None]:
# USE EPSG:4313 coordinates (left->right, top->bottom)
subset = da.sel(x=slice(-1.0e5, 1.0e5), y=slice(-7.5e5, -7.9e5))
subset.hvplot.image(rasterize=True, dynamic=True, frame_width=400, cmap='gray')

In [None]:
# Reproject a small piece (preferred, more control over warp resampling, etc, dataset easier to reference)
subset4326 = subset.rio.reproject('EPSG:4326') 
subset4326.hvplot.image(rasterize=True, dynamic=True, frame_width=400, cmap='gray')

In [None]:
# Use Basemap tiles (these are usually in a given projection to begin with)
subset4326.hvplot.image(geo=True, tiles=True, 
                         rasterize=True, dynamic=True, frame_width=400, frame_height=400, cmap='gray')

### Save subset

Finally, we may want to save this subset we've been working with for future use 

In [None]:
%%time

subset.rio.to_raster('mysubset.tif', dtype='float32', driver='GTiff', COMPRESS='LZW', NUM_THREADS=4)

In [None]:
# Round trip test
subset = rx.open_rasterio('mysubset.tif', masked=True)
subset