# Datasets: Environmental Covariates

Datasets listed in [Supplementary_Data_File_1._environmental_covariates - Google Sheet](https://docs.google.com/spreadsheets/d/1hPw9G1A34SnlbDJ8sk3LYfwgoLN0Gbail2xdNb7viGc/edit#gid=106509025).

In [2]:
#default_exp env_data

In [34]:
#export
import os
from pprint import pprint
from urlpath import URL
import requests
import math
import numpy as np
import pandas as pd
import xarray as xr
import rioxarray

In [4]:
#export
DATA_SRC = EasyDict()

In [6]:
def check_id(data_id):
    if data_id in DATA_SRC:
        print(f'Data_id "{data_id}" already exists. '
              'Check that it is not re-used.')
        raise    

def add_download(data_id, download_url):
    DATA_SRC[data_id] = {'download': URL(download_url)}

def add_filename(data_id, filename):
    DATA_SRC[data_id] = {'filename': filename}    

In [None]:
DATA_SRC.get()

## Earthenv Cloud
Global 1-km Cloud Cover: https://www.earthenv.org/cloud

In [11]:
add_download(
    'EarthEnvCloudCover_MODCF_interannualSD', 
    'https://data.earthenv.org/cloud/MODCF_interannualSD.tif')

add_download(
    'EarthEnvCloudCover_MODCF_intraannualSD',
    'https://data.earthenv.org/cloud/MODCF_intraannualSD.tif')

add_download(
    'EarthEnvCloudCover_MODCF_meanannual',
    'https://data.earthenv.org/cloud/MODCF_meanannual.tif')

add_download(
    'EarthEnvCloudCover_MODCF_seasonality_concentration',
    'https://data.earthenv.org/cloud/MODCF_seasonality_concentration.tif')

add_download(
    'EarthEnvCloudCover_MODCF_seasonality_theta',
    'https://data.earthenv.org/cloud/MODCF_seasonality_theta.tif')


In [34]:
DATA_SRC.EarthEnvCloudCover_MODCF_interannualSD

{'download': URL('https://data.earthenv.org/cloud/MODCF_interannualSD.tif')}

In [24]:
url = DATA_SRC.EarthEnvCloudCover_MODCF_interannualSD.download
! wget {url} -O ../data/{url.name}

--2021-02-05 20:15:10--  https://data.earthenv.org/cloud/MODCF_interannualSD.tif
Resolving data.earthenv.org (data.earthenv.org)... 172.67.196.246, 104.21.34.38
Connecting to data.earthenv.org (data.earthenv.org)|172.67.196.246|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 669742763 (639M) [image/tiff]
Saving to: ‘../data/MODCF_interannualSD.tif’


2021-02-05 20:16:28 (8.35 MB/s) - ‘../data/MODCF_interannualSD.tif’ saved [669742763/669742763]



In [28]:
! du -hs ../data/{url.name}

640M	../data/MODCF_interannualSD.tif


## Earthenv Topography

Global 1,5,10,100-km Topography: https://www.earthenv.org/topography.

Couldn't obtain the direct download URLs for these using the browser's developer tools.

In [12]:
add_filename(
    'EarthEnvTopoMed_1stOrderPartialDerivEW',
    'dx_1KMmd_GMTEDmd.tif')

In [13]:
add_filename(
    'EarthEnvTopoMed_1stOrderPartialDerivNS', 
    'dy_1KMmd_GMTEDmd.tif')

In [14]:
add_filename(
    'EarthEnvTopoMed_2ndOrderPartialDerivEW',
    'dxx_1KMmd_GMTEDmd.tif')

In [15]:
add_filename(
    'EarthEnvTopoMed_2ndOrderPartialDerivNS',
    'dyy_1KMmd_GMTEDmd.tif')

In [16]:
add_filename(
    'EarthEnvTopoMed_AspectCosine',
    'aspectcosine_1KMmd_GMTEDmd.tif')

In [17]:
add_filename(
    'EarthEnvTopoMed_AspectSine',
    'aspectsine_1KMmd_GMTEDmd.tif')

In [18]:
add_filename(
    'EarthEnvTopoMed_Eastness',
    'eastness_1KMmd_GMTEDmd.tif')

In [19]:
add_filename(
    'EarthEnvTopoMed_Elevation', 
    'elevation_1KMmd_GMTEDmd.tif')

In [20]:
add_filename(
    'EarthEnvTopoMed_Northness',
    'northness_1KMmd_GMTEDmd.tif')

In [21]:
add_filename(
    'EarthEnvTopoMed_ProfileCurvature',
    'pcurv_1KMmd_GMTEDmd.tif')

In [22]:
add_filename(
    'EarthEnvTopoMed_Roughness',
    'roughness_1KMmd_GMTEDmd.tif')

In [23]:
add_filename(
    'EarthEnvTopoMed_Slope',
    'slope_1KMmd_GMTEDmd.tif')

In [24]:
add_filename(
    'EarthEnvTopoMed_TangentialCurvature',
    'tcurv_1KMmd_GMTEDmd.tif')

In [25]:
add_filename(
    'EarthEnvTopoMed_TerrainRuggednessIndex',
    'tri_1KMmd_GMTEDmd.tif')

In [26]:
add_filename(
    'EarthEnvTopoMed_TopoPositionIndex',
    'tpi_1KMmd_GMTEDmd.tif')

In [27]:
add_filename(
    'EarthEnvTopoMed_VectorRuggednessMeasure', 
    'vrm_1KMmd_GMTEDmd.tif')

In [32]:
for k, d in DATA_SRC.items():
    if 'EarthEnvTopoMed' in k:
        d['user_gee'] = 'bingosaucer'

## FanEtAl_Depth_to_Water_Table_AnnualMean 

http://thredds-gfnl.usc.es/thredds/catalog/GLOBALWTDFTP/catalog.html

In [36]:
n = 'NAMERICA_WTD_annualmean'
URLs[n] = URL('http://thredds-gfnl.usc.es/thredds/'
              'fileServer/GLOBALWTDFTP/annualmeans/'
              'NAMERICA_WTD_annualmean.nc')

In [37]:
url = URLs.NAMERICA_WTD_annualmean
# ! wget {url} -O ../data/{url.name}

In [38]:
! du -hs ../data/{url.name}

 80M	../data/NAMERICA_WTD_annualmean.nc


In [39]:
%%time

ds = rioxarray.open_rasterio(f'../data/{url.name}')
ds.rio.write_crs('epsg:4326', inplace=True)
ds = ds.squeeze()

CPU times: user 131 ms, sys: 43.4 ms, total: 174 ms
Wall time: 280 ms


In [40]:
ds.rio.crs

CRS.from_epsg(4326)

In [41]:
%%time

# ds.WTD.rio.to_raster(f'../data/{url.stem}.tiff')
ds.WTD.rio.to_raster(f'../data/NAMERICA_WTD_annualmean.tiff')

CPU times: user 21min 11s, sys: 3min 35s, total: 24min 47s
Wall time: 25min 59s


In [146]:
xr.open_rasterio('../data/FanEtAl_Depth_to_Water_Table_AnnualMean.tiff')

## MODIS_LAI
> Leaf Area Index (LAI) = MCD15A3H.006 Terra+Aqua Leaf Area Index

https://explorer.earthengine.google.com/#detail/MODIS%2F006%2FMCD15A3H

Load in GEE:
```javascript
var collection = ee.ImageCollection('MODIS/006/MCD15A3H');
```
Reduce time dimension and save as tif in gdrive:

```javascript
var leaf_area_index = collection.reduce(ee.Reducer.median());


print('leaf_area_index: ', leaf_area_index);
// Map.setCenter(-122.3578, 37.7726, 12);
Map.addLayer(leaf_area_index, imageVisParam, 'median');

Export.image.toDrive({
  image: leaf_area_index,
  description: 'timemedian_leaf_area_index',
  scale: 3000,
});
```

In [10]:
%%time

da = xr.open_rasterio('../data/timemedian_leaf_area_index.tif')

CPU times: user 1.55 ms, sys: 572 µs, total: 2.12 ms
Wall time: 3.43 ms


In [11]:
da

In [7]:
da.y.min(), da.y.max(), da.x.min(), da.x.max()

(<xarray.DataArray 'y' ()>
 array(26.39699462),
 <xarray.DataArray 'y' ()>
 array(42.40497299),
 <xarray.DataArray 'x' ()>
 array(-145.8369948),
 <xarray.DataArray 'x' ()>
 array(-44.58787913))

## ISRIC Data
ISRIC World Soil Information.  
Data Hub: https://data.isric.org/geonetwork/srv/eng/catalog.search#/home

### SG_Absolute_depth_to_bedrock
> BDTICM_M_250m_ll_b1 = Abslolute Depth to Bedrock

Search for "Absolute Depth to Bedrock" on Data Hub, then download tif.

In [19]:
n = 'SG_Absolute_depth_to_bedrock'
URLs[n] = URL('https://files.isric.org/soilgrids'
              '/former/2017-03-10/data/BDTICM_M_250m_ll.tif')

File size about 8 GB, download speed ~ 200 KB/sec.  Estimated download time ~ 12 hrs. 

Have been doing, whenever the laptop is open,
```bash
wget -c ...
```
in a gnu screen session.

In [None]:
# with requests.get(url, stream=True) as r:
#     r.raise_for_status()
#     with open(f'../data/{os.path.basename(url)}', 'wb') as f:
#         for chunk in r.iter_content(chunk_size=2**13):
#             f.write(chunk)

## WCS_Human_Footprint_2009
> Human Footprint 2009

http://wcshumanfootprint.org/

### Full dataset

How to unpack full dataset download:
```
$ unzip doi_10.5061_dryad.052q5__v2.zip
$ brew install p7zip
$ 7za x HumanFootprintv2.7z
```

In [13]:
path = '../data/Dryadv3/Maps'
ns = [f'{path}/{n}' for n in os.listdir(path) if n.endswith('.tif')]
sum([os.path.getsize(n) for n in ns]) / 1e9

3.296027813

In [35]:
%%time

da_fullset = xr.open_rasterio(
    '../data/Dryadv3/Maps/HFP2009.tif', chunks={'x':10, 'y':10})

CPU times: user 15.5 s, sys: 37.4 s, total: 53 s
Wall time: 1min 8s


In [None]:
%%time

(da_fullset - da).sum()

CPU times: user 34.3 s, sys: 47.6 s, total: 1min 21s
Wall time: 1min 36s


###  Summary 2009

In [4]:
n = 'HFP2009'
URLs[n] = URL('http://wcshumanfootprint.org/data/HFP2009.zip')

In [5]:
url = URLs.HFP2009
! curl {url} -o ../data/{url.name}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  254M  100  254M    0     0  12.4M      0  0:00:20  0:00:20 --:--:-- 14.6M0  0:00:20  0:00:20 --:--:-- 15.0M


Unpack downloaded 'HFP2009.zip'.

In [33]:
da = xr.open_rasterio(
    '../data/HFP2009/HFP2009.tif', chunks={'x':10, 'y':10})

In [34]:
da

Unnamed: 0,Array,Chunk
Bytes,2.36 GB,400 B
Shape,"(1, 16382, 36081)","(1, 10, 10)"
Count,5915152 Tasks,5915151 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 2.36 GB 400 B Shape (1, 16382, 36081) (1, 10, 10) Count 5915152 Tasks 5915151 Chunks Type float32 numpy.ndarray",36081  16382  1,

Unnamed: 0,Array,Chunk
Bytes,2.36 GB,400 B
Shape,"(1, 16382, 36081)","(1, 10, 10)"
Count,5915152 Tasks,5915151 Chunks
Type,float32,numpy.ndarray


##  WorldClim 

https://www.worldclim.org/data/index.html 

### Precipitation

In [44]:
n = 'WorldClimV2_precipitation'
URLs[n] = URL('http://biogeo.ucdavis.edu/data/'
           'worldclim/v2.1/base/wc2.1_30s_prec.zip')

In [46]:
url = URLs.WorldClimV2_precipitation
! wget {url} -O ../data/{url.name}

--2021-02-03 15:47:31--  http://biogeo.ucdavis.edu/data/worldclim/v2.1/base/wc2.1_30s_prec.zip
Resolving biogeo.ucdavis.edu (biogeo.ucdavis.edu)... 128.120.228.172
Connecting to biogeo.ucdavis.edu (biogeo.ucdavis.edu)|128.120.228.172|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://biogeo.ucdavis.edu/data/worldclim/v2.1/base/wc2.1_30s_prec.zip [following]
--2021-02-03 15:47:32--  https://biogeo.ucdavis.edu/data/worldclim/v2.1/base/wc2.1_30s_prec.zip
Connecting to biogeo.ucdavis.edu (biogeo.ucdavis.edu)|128.120.228.172|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://data.biogeo.ucdavis.edu/data/worldclim/v2.1/base/wc2.1_30s_prec.zip [following]
--2021-02-03 15:47:33--  https://data.biogeo.ucdavis.edu/data/worldclim/v2.1/base/wc2.1_30s_prec.zip
Resolving data.biogeo.ucdavis.edu (data.biogeo.ucdavis.edu)... 128.120.228.172
Connecting to data.biogeo.ucdavis.edu (data.biogeo.ucdavis.edu)|128.120.228.172|:443... conn

In [47]:
%cd ../data

! mkdir {n}
! mv {url.name} {n}/.
%cd {n}

! unzip {url.name}

%cd ../../nbs/

/Users/jack/git_repos/earthshotsoil/data


In [63]:
da = rioxarray.open_rasterio(
    '../data/WorldClimV2_precipitation/wc2.1_30s_prec_01.tif')

In [64]:
da

# Google Earth Engine Access

In [47]:
df = pd.DataFrame.from_dict(DATA_SRC).T.reset_index()
df.rename({'index':'asset_id'}, axis=1, inplace=True)

df = df[df.asset_id.notna() & df.user_gee.notna()]
df[['user_gee', 'asset_id']].reset_index(drop=True)

Unnamed: 0,user_gee,asset_id
0,bingosaucer,EarthEnvTopoMed_1stOrderPartialDerivEW
1,bingosaucer,EarthEnvTopoMed_1stOrderPartialDerivNS
2,bingosaucer,EarthEnvTopoMed_2ndOrderPartialDerivEW
3,bingosaucer,EarthEnvTopoMed_2ndOrderPartialDerivNS
4,bingosaucer,EarthEnvTopoMed_AspectCosine
5,bingosaucer,EarthEnvTopoMed_AspectSine
6,bingosaucer,EarthEnvTopoMed_Eastness
7,bingosaucer,EarthEnvTopoMed_Elevation
8,bingosaucer,EarthEnvTopoMed_Northness
9,bingosaucer,EarthEnvTopoMed_ProfileCurvature


# Reference

Reading GeoTIFF:
- http://xarray.pydata.org/en/stable/io.html#rasterio
- http://xarray.pydata.org/en/stable/generated/xarray.open_rasterio.html#xarray-open-rasterio