# Datasets: Environmental Covariates

Datasets listed in [Supplementary_Data_File_1._environmental_covariates - Google Sheet](https://docs.google.com/spreadsheets/d/1hPw9G1A34SnlbDJ8sk3LYfwgoLN0Gbail2xdNb7viGc/edit#gid=106509025).

In [None]:
#default_exp env_data

In [None]:
#export
import os
from urlpath import URL
import easydict
import requests
import math
import numpy as np
import xarray as xr
import rioxarray

In [None]:
#export
URLs = easydict.EasyDict()

## Earthenv Cloud
Global 1-km Cloud Cover: https://www.earthenv.org/cloud

Reading GeoTIFF:
- http://xarray.pydata.org/en/stable/io.html#rasterio
- http://xarray.pydata.org/en/stable/generated/xarray.open_rasterio.html#xarray-open-rasterio

### EarthEnvCloudCover_MODCF_interannualSD

In [None]:
#export
n = 'EarthEnvCloudCover_MODCF_interannualSD'
URLs[n] = URL('https://data.earthenv.org/cloud/'
           'MODCF_interannualSD.tif')

In [None]:
url = URLs.EarthEnvCloudCover_MODCF_interannualSD
! wget {url} -O ../data/{url.name}

--2021-02-02 15:26:41--  https://data.earthenv.org/cloud/MODCF_interannualSD.tif
Resolving data.earthenv.org (data.earthenv.org)... 104.21.34.38, 172.67.196.246
Connecting to data.earthenv.org (data.earthenv.org)|104.21.34.38|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 669742763 (639M) [image/tiff]
Saving to: ‘../data/MODCF_interannualSD.tif’


2021-02-02 15:27:31 (13.2 MB/s) - ‘../data/MODCF_interannualSD.tif’ saved [669742763/669742763]



In [None]:
! du -hs ../data/{url.name}

640M	../data/MODCF_interannualSD.tif


### EarthEnvCloudCover_MODCF_intraannualSD
> Within-year seasonality represented as the standard deviation of mean 2000-2014 monthly cloud frequencies

In [None]:
#export
n = 'EarthEnvCloudCover_MODCF_intraannualSD'
URLs[n] = URL('https://data.earthenv.org/cloud/'
           'MODCF_intraannualSD.tif')

In [None]:
url = URLs.EarthEnvCloudCover_MODCF_intraannualSD
# ! wget {url} -O ../data/{url.name}

In [None]:
! du -hs ../data/{url.name}

671M	../data/MODCF_intraannualSD.tif


In [None]:
%%time

da = xr.open_rasterio(f'../data/{url.name}')

CPU times: user 5.55 ms, sys: 9.19 ms, total: 14.7 ms
Wall time: 26.4 ms


## Earthenv Topography

Global 1,5,10,100-km Topography: https://www.earthenv.org/topography.

### EarthEnvTopoMed_Elevation
> Elevation (in meters)

Couldn't obtain an URL.  Had to manually select from dropdown menu and press 'Download' button to download 'elevation_1KMmd_GMTEDmd.tif'.

In [None]:
! du -hs ../data/elevation_1KMmd_GMTEDmd.tif

380M	../data/elevation_1KMmd_GMTEDmd.tif


In [None]:
%%time

da = xr.open_rasterio('../data/elevation_1KMmd_GMTEDmd.tif')

CPU times: user 4.51 ms, sys: 4.49 ms, total: 9 ms
Wall time: 21.5 ms


In [None]:
da

## FanEtAl_Depth_to_Water_Table_AnnualMean 

> Mean annual depth of the water table on the terrestrial land surface (in m below land surface).


http://thredds-gfnl.usc.es/thredds/catalog/GLOBALWTDFTP/catalog.html

### North America

In [None]:
n = 'NAMERICA_WTD_annualmean'
URLs[n] = URL('http://thredds-gfnl.usc.es/thredds/'
              'fileServer/GLOBALWTDFTP/annualmeans/'
              'NAMERICA_WTD_annualmean.nc')

In [None]:
url = URLs.NAMERICA_WTD_annualmean
# ! wget {url} -O ../data/{url.name}

In [None]:
! du -hs ../data/{url.name}

 80M	../data/NAMERICA_WTD_annualmean.nc


In [None]:
%%time

ds = rioxarray.open_rasterio(f'../data/{url.name}')
ds.rio.write_crs('epsg:4326', inplace=True)
ds = ds.squeeze()

CPU times: user 131 ms, sys: 43.4 ms, total: 174 ms
Wall time: 280 ms


In [None]:
ds.rio.crs

CRS.from_epsg(4326)

In [None]:
%%time

# ds.WTD.rio.to_raster(f'../data/{url.stem}.tiff')
ds.WTD.rio.to_raster(f'../data/NAMERICA_WTD_annualmean.tiff')

In [None]:
xr.open_rasterio('../data/FanEtAl_Depth_to_Water_Table_AnnualMean.tiff')

## MODIS_LAI
> Leaf Area Index (LAI) = MCD15A3H.006 Terra+Aqua Leaf Area Index

https://explorer.earthengine.google.com/#detail/MODIS%2F006%2FMCD15A3H

Load in GEE:
```javascript
var collection = ee.ImageCollection('MODIS/006/MCD15A3H');
```
Reduce time dimension and save as tif in gdrive:

```javascript
var leaf_area_index = collection.reduce(ee.Reducer.median());


print('leaf_area_index: ', leaf_area_index);
// Map.setCenter(-122.3578, 37.7726, 12);
Map.addLayer(leaf_area_index, imageVisParam, 'median');

Export.image.toDrive({
  image: leaf_area_index,
  description: 'timemedian_leaf_area_index',
  scale: 3000,
});
```

In [None]:
%%time

da = xr.open_rasterio('../data/timemedian_leaf_area_index.tif')

CPU times: user 1.55 ms, sys: 572 µs, total: 2.12 ms
Wall time: 3.43 ms


In [None]:
da

In [None]:
da.y.min(), da.y.max(), da.x.min(), da.x.max()

(<xarray.DataArray 'y' ()>
 array(26.39699462),
 <xarray.DataArray 'y' ()>
 array(42.40497299),
 <xarray.DataArray 'x' ()>
 array(-145.8369948),
 <xarray.DataArray 'x' ()>
 array(-44.58787913))

## ISRIC Data
ISRIC World Soil Information.  
Data Hub: https://data.isric.org/geonetwork/srv/eng/catalog.search#/home

### SG_Absolute_depth_to_bedrock
> BDTICM_M_250m_ll_b1 = Abslolute Depth to Bedrock

Search for "Absolute Depth to Bedrock" on Data Hub, then download tif.

In [None]:
n = 'SG_Absolute_depth_to_bedrock'
URLs[n] = URL('https://files.isric.org/soilgrids'
              '/former/2017-03-10/data/BDTICM_M_250m_ll.tif')

File size about 8 GB, download speed ~ 200 KB/sec.  Estimated download time ~ 12 hrs. 

Have been doing, whenever the laptop is open,
```bash
wget -c ...
```
in a gnu screen session.

In [None]:
# with requests.get(url, stream=True) as r:
#     r.raise_for_status()
#     with open(f'../data/{os.path.basename(url)}', 'wb') as f:
#         for chunk in r.iter_content(chunk_size=2**13):
#             f.write(chunk)

## WCS_Human_Footprint_2009
> Human Footprint 2009

http://wcshumanfootprint.org/

### Full dataset

How to unpack full dataset download:
```
$ unzip doi_10.5061_dryad.052q5__v2.zip
$ brew install p7zip
$ 7za x HumanFootprintv2.7z
```

In [None]:
path = '../data/Dryadv3/Maps'
ns = [f'{path}/{n}' for n in os.listdir(path) if n.endswith('.tif')]
sum([os.path.getsize(n) for n in ns]) / 1e9

3.296027813

In [None]:
%%time

da_fullset = xr.open_rasterio(
    '../data/Dryadv3/Maps/HFP2009.tif', chunks={'x':10, 'y':10})

CPU times: user 15.5 s, sys: 37.4 s, total: 53 s
Wall time: 1min 8s


In [None]:
%%time

(da_fullset - da).sum()

CPU times: user 34.3 s, sys: 47.6 s, total: 1min 21s
Wall time: 1min 36s


###  Summary 2009

In [None]:
n = 'HFP2009'
URLs[n] = URL('http://wcshumanfootprint.org/data/HFP2009.zip')

In [None]:
url = URLs.HFP2009
! curl {url} -o ../data/{url.name}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  254M  100  254M    0     0  12.4M      0  0:00:20  0:00:20 --:--:-- 14.6M0  0:00:20  0:00:20 --:--:-- 15.0M


Unpack downloaded 'HFP2009.zip'.

In [None]:
da = xr.open_rasterio(
    '../data/HFP2009/HFP2009.tif', chunks={'x':10, 'y':10})

In [None]:
da

Unnamed: 0,Array,Chunk
Bytes,2.36 GB,400 B
Shape,"(1, 16382, 36081)","(1, 10, 10)"
Count,5915152 Tasks,5915151 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.36 GB 400 B Shape (1, 16382, 36081) (1, 10, 10) Count 5915152 Tasks 5915151 Chunks Type float32 numpy.ndarray",36081  16382  1,

Unnamed: 0,Array,Chunk
Bytes,2.36 GB,400 B
Shape,"(1, 16382, 36081)","(1, 10, 10)"
Count,5915152 Tasks,5915151 Chunks
Type,float32,numpy.ndarray


## 