# Comparing spatial pattern of velocity response to forcing

In this notebook, we'll use the `iceutils` package (Bryan Riel) to invert continuous time-varying surface velocity fields on Helheim Glacier.  We'll then process several observational datasets (gathered by Denis Felikson) using the `nifl` module (Lizz Ultee) and compare time series of these variables against surface velocity at several points.  Finally, we'll visualize spatial differences in the relationship between surface velocity and each hypothesised forcing variable.

### Import the necessary packages

In [None]:
from netCDF4 import Dataset
from scipy import interpolate
import pyproj as pyproj
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import iceutils as ice
import nifl_helper as nifl

### Define where the necessary data lives

In [None]:
flowline_fpath = '/Users/lizz/Documents/GitHub/Data_unsynced/Felikson-flowlines/netcdfs/glaciera199.nc'
velocity_fpath='/Users/lizz/Documents/GitHub/Data_unsynced/Gld-Stack/'
gl_bed_fpath ='/Users/lizz/Documents/GitHub/Data_unsynced/BedMachine-Greenland/BedMachineGreenland-2017-09-20.nc'
gl_smb_fpath = '/Users/lizz/Documents/GitHub/Data_unsynced/HIRHAM5-SMB/DMI-HIRHAM5_GL2_ERAI_1980_2016_SMB_MM.nc'
catchment_smb_fpath = '/Users/lizz/Documents/GitHub/Data_unsynced/Helheim-processed/HIRHAM_integrated_SMB.csv'
runoff_fpath = '/Users/lizz/Documents/GitHub/Data_unsynced/Helheim-processed/RACMO2_3p2_Helheimgletscher_runoff_1958-2017.csv'
termini_fpath = '/Users/lizz/Documents/GitHub/Data_unsynced/Helheim-processed/HLM_terminus_widthAVE.csv'

### Define the domain of analysis

We will analyse along flowlines defined by Denis Felikson in his previous work, saved and shared as NetCDF files.  The flowlines are numbered 01-10 across the terminus; flowline 05 is close to the middle.  Note that Helheim Glacier has two large branches.  For now we'll study the main trunk, `glaciera199.nc`.  The more southerly trunk is `glacierb199.nc`.

In [None]:
ncfile = Dataset(flowline_fpath, 'r')
xh = ncfile['flowline05'].variables['x'][:]
yh = ncfile['flowline05'].variables['y'][:]
# s = ncfile['flowline05']['geometry']['surface']['GIMP']['nominal'].variables['h'][:] # GIMP DEM
# b = ncfile['flowline05']['geometry']['bed']['BedMachine']['nominal'].variables['h'][:] # BedMachine v3
# dh = ncfile['flowline05']['dh']['GIMP-Arctic']['nominal']['dh'][:]
ncfile.close()

In [None]:
## Define points at which to extract
upstream_max = 500 # index of last xh,yh within given distance of terminus--pts roughly 50m apart
xys = [(xh[i], yh[i]) for i in range(0, upstream_max, 25)]

## Import and invert velocity observations

In [None]:
## Set up combined hdf5 stack
hel_stack = ice.MagStack(files=[velocity_fpath+'vx.h5', velocity_fpath+'vy.h5'])
data_key = 'igram' # B. Riel convention for access to datasets in hdf5 stack

In [None]:
# Create an evenly spaced time array for time series predictions
t_grid = np.linspace(hel_stack.tdec[0], hel_stack.tdec[-1], 1000)

# First convert the time vectors to a list of datetime
dates = ice.tdec2datestr(hel_stack.tdec, returndate=True)
dates_grid = ice.tdec2datestr(t_grid, returndate=True)

# Build the collection
collection = nifl.build_collection(dates)

# Construct a priori covariance
Cm = nifl.computeCm(collection)
iCm = np.linalg.inv(Cm)

# Instantiate a model for inversion
model = ice.tseries.Model(dates, collection=collection)

# Instantiate a model for prediction
model_pred = ice.tseries.Model(dates_grid, collection=collection)

## Access the design matrix for plotting
G = model.G

# Create lasso regression solver that does the following:
# i) Uses an a priori covariance matrix for damping out the B-splines
# ii) Uses sparsity-enforcing regularization (lasso) on the integrated B-splines
solver = ice.tseries.select_solver('lasso', reg_indices=model.itransient, penalty=0.05,
                                   rw_iter=1, regMat=iCm)

Now that we are set up with our data and machinery, we'll ask the inversion to make us a continuous time series of velocity at each point we wish to study.

In [None]:
preds = []
for xy in xys:
    pred, st, lt = nifl.VSeriesAtPoint(xy, vel_stack=hel_stack, collection=collection, 
                                  model=model, model_pred=model_pred, solver=solver, 
                                  t_grid=t_grid, sigma=1.5, data_key='igram')
    preds.append(pred)

## Comparison data sets

### Bed topography

Mostly we will use this for plotting and for defining a standard coordinate system.  However, future analyses could combine bed topography with calving position or other variables to analyse effect on surface velocity.

In [None]:
## Read in and interpolate BedMachine topography
fh = Dataset(gl_bed_fpath, mode='r')
xx = fh.variables['x'][:].copy() #x-coord (polar stereo (70, 45))
yy = fh.variables['y'][:].copy() #y-coord
s_raw = fh.variables['surface'][:].copy() #surface elevation
h_raw=fh.variables['thickness'][:].copy() # Gridded thickness
b_raw = fh.variables['bed'][:].copy() # bed topo
thick_mask = fh.variables['mask'][:].copy()
ss = np.ma.masked_where(thick_mask !=2, s_raw)#mask values: 0=ocean, 1=ice-free land, 2=grounded ice, 3=floating ice, 4=non-Greenland land
hh = np.ma.masked_where(thick_mask !=2, h_raw) 
bb = b_raw #don't mask, to allow bed sampling from modern bathymetry (was subglacial in ~2006)
fh.close()

In [None]:
## Interpolate in area of Helheim
xl, xr = 6100, 6600
yt, yb = 12700, 13100
x_hel = xx[xl:xr]
y_hel = yy[yt:yb]
s_hel = ss[yt:yb, xl:xr]
b_hel = bb[yt:yb, xl:xr]
S_helheim = interpolate.RectBivariateSpline(x_hel, y_hel[::-1], s_hel.T[::,::-1]) #interpolating surface elevation provided
B_helheim = interpolate.RectBivariateSpline(x_hel, y_hel[::-1], b_hel.T[::,::-1]) #interpolating surface elevation provided

### Surface mass balance

We load in surface mass balance from HIRHAM, cut out an area around Helheim, re-project it to Polar Stereographic North coordinates (to match our other data), and interpolate 2D fields for each year so that we can select surface mass balance at our points.

In [None]:
##Load in HIRHAM
fh2 = Dataset(gl_smb_fpath, mode='r')
x_lon = fh2.variables['lon'][:].copy() #x-coord (latlon)
y_lat = fh2.variables['lat'][:].copy() #y-coord (latlon)
ts = fh2.variables['time'][:].copy()
smb_raw = fh2.variables['smb'][:].copy()
fh2.close()

In [None]:
## Select Helheim
xl1, xr1 = 190, 260
yt1, yb1 = 345, 405
x_lon_h = x_lon[yt1:yb1, xl1:xr1]
y_lat_h = y_lat[yt1:yb1, xl1:xr1]

wgs84 = pyproj.Proj("+init=EPSG:4326") # LatLon with WGS84 datum used by HIRHAM
psn_gl = pyproj.Proj("+init=epsg:3413") # Polar Stereographic North used by BedMachine, Felikson lines, ...
xs, ys = pyproj.transform(wgs84, psn_gl, x_lon_h, y_lat_h)
Xmat, Ymat = np.meshgrid(x_hel, y_hel) # BedMachine coords from helheim-profiles

In [None]:
## Timeslice-specific SMB functions 2006-2014
SMB_dict = {} #set up a dictionary of surface mass balance fields indexed by year
time_indices = range(311, 444) # go from Jan 2006 to Dec 2016 in monthly series
smb_dates = pd.date_range(start='2006-01-01', end='2016-12-01', periods=len(time_indices))
smb_d = [d.utctimetuple() for d in smb_dates]
dates_interp = [ice.timeutils.datestr2tdec(d[0], d[1], d[2]) for d in smb_d]
for t,d in zip(time_indices, smb_dates):
    smb_t = smb_raw[t][0][::-1, ::][yt1:yb1, xl1:xr1]
    regridded_smb_t = interpolate.griddata((xs.ravel(), ys.ravel()), smb_t.ravel(), (Xmat, Ymat), method='nearest')
    SMB_dict[d] = interpolate.interp2d(x_hel, y_hel, regridded_smb_t, kind='linear')   

Now, we compute the normalized cross-correlation between single-point SMB and surface velocity at the same point.  We will draw on the inverted velocity series saved in `preds` above.  We save the value of the maximum normalized cross-correlation, and the value in days of the lag where it occurs, to compare with other variables later.

In [None]:
smb_corr_max = []
smb_lag_max = []
for xy, pred in zip(xys, preds):
    corr, lags, ci = nifl.SmbXcorr(xy, smb_dictionary=SMB_dict, smb_dates=smb_dates, 
                              velocity_pred=pred, t_grid=t_grid, diff=1, normalize=True)
    smb_corr_max.append(max(corr))
    smb_lag_max.append(lags[np.argmax(corr)])

### Catchment-integrated SMB

We load in a 1D timeseries of surface mass balance integrated over the whole Helheim catchment, with catchment defined following Mankoff.

In [None]:
smb = pd.read_csv(catchment_smb_fpath, parse_dates=[0])
smb_d = [d.utctimetuple() for d in smb['Date']]
smb_d_interp = [ice.timeutils.datestr2tdec(d[0], d[1], d[2]) for d in smb_d]
smb_func = interpolate.interp1d(smb_d_interp, smb['SMB_int'])

We compute the normalized cross-correlation between catchment-integrated SMB and surface velocity at each same point.  Again we save the value of the maximum normalized cross-correlation, and the value in days of the lag where it occurs, to compare with other variables.

In [None]:
smb_corr_max = []
smb_lag_max = []
for xy, pred in zip(xys, preds):
    corr, lags, ci = nifl.RunoffXcorr(xy, runoff_func=smb_func, runoff_dates=smb_d_interp, 
                              velocity_pred=pred, t_grid=t_grid, diff=1, normalize=True)
    smb_corr_max.append(max(corr))
    smb_lag_max.append(lags[np.argmax(corr)])

### Runoff

We import monthly runoff from the RACMO model, integrated over the Helheim catchment and shared as a CSV by Denis Felikson.  Because this data is catchment-integrated, we interpolate a single 1D time series that will be used at all points.

In [None]:
runoff = np.loadtxt(runoff_fpath, delimiter=',') 
rnf = runoff[runoff[:,0]>=2006] # trim values from before the start of the velocity series
rf = rnf[rnf[:,0]<=2016] #trim values after the end of the velocity series

runoff_dates = pd.date_range(start='2006-01-01', end='2016-12-01', periods=len(rf))
runoff_d = [d.utctimetuple() for d in runoff_dates]
d_interp = [ice.timeutils.datestr2tdec(d[0], d[1], d[2]) for d in runoff_d]
runoff_func = interpolate.interp1d(d_interp, rf[:,2])

We compute the normalized cross-correlation between catchment-integrated runoff and surface velocity at each same point.  Again we save the value of the maximum normalized cross-correlation, and the value in days of the lag where it occurs, to compare with other variables.

In [None]:
runoff_corr_max = []
runoff_lag_max = []
for xy, pred in zip(xys, preds):
    corr, lags, ci = nifl.RunoffXcorr(xy, runoff_func=runoff_func, runoff_dates=d_interp, 
                              velocity_pred=pred, t_grid=t_grid, diff=1, normalize=True)
    runoff_corr_max.append(max(corr))
    runoff_lag_max.append(lags[np.argmax(corr)])

### Terminus position change

We import width-averaged terminus position change processed by Leigh Stearns.  These data give terminus position in km from a baseline, so they do not need to be processed into a coordinate system.

In [None]:
termini = pd.read_csv(termini_fpath, parse_dates=True, usecols=[0,1])
termini['date'] = pd.to_datetime(termini['date'])
trmn = termini.loc[termini['date'].dt.year >= 2006]
tm = trmn.loc[trmn['date'].dt.year <=2016]

termini_d = [d.utctimetuple() for d in tm['date']]
tm_d_interp = [ice.timeutils.datestr2tdec(d[0], d[1], d[2]) for d in termini_d]
termini_func = interpolate.interp1d(tm_d_interp, tm['term_km'])

In [None]:
terminus_corr_max = []
terminus_lag_max = []
for xy, pred in zip(xys, preds):
    corr, lags, ci = nifl.RunoffXcorr(xy, runoff_func=termini_func, runoff_dates=tm_d_interp, 
                              velocity_pred=pred, t_grid=t_grid, diff=1, normalize=True)
    terminus_corr_max.append(max(corr))
    terminus_lag_max.append(lags[np.argmax(corr)])

## Plotting

First, we plot the max correlation at each point for a single variable.

In [None]:
fig, ax = plt.subplots(1)
ax.contourf(x_hel, y_hel, b_hel, cmap='gist_earth', alpha=0.5)
sc = ax.scatter(np.asarray(xys)[:,0], np.asarray(xys)[:,1], c=smb_corr_max)
cb = fig.colorbar(sc, ax=ax)
cb.ax.set_title('Max. xcorr')
ax.set(xlim=(270000, 320000), xticks=(280000, 300000, 320000), 
      ylim=(-2590000, -2550000), yticks=(-2590000, -2570000, -2550000), 
       xticklabels=('280', '300', '320'), yticklabels=('-2590', '-2570', '-2550'),
      xlabel='Easting [km]', ylabel='Northing [km]')
plt.show()

Now, let's compare the patterns of correlation and lag for each variable.

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
ax1.contourf(x_hel, y_hel, b_hel, cmap='gist_earth', alpha=0.5)
sc1 = ax1.scatter(np.asarray(xys)[:,0], np.asarray(xys)[:,1], c=smb_corr_max)
cb1 = fig.colorbar(sc1, ax=ax1)
cb1.ax.set_title('Max. xcorr')
ax1.set(xlim=(270000, 320000), xticks=(280000, 300000, 320000), 
      ylim=(-2590000, -2550000), yticks=(-2590000, -2570000, -2550000), 
       xticklabels=('280', '300', '320'), yticklabels=('-2590', '-2570', '-2550'),
      xlabel='Easting [km]', ylabel='Northing [km]', title='Surface mass balance')
ax2.contourf(x_hel, y_hel, b_hel, cmap='gist_earth', alpha=0.5)
sc2 = ax2.scatter(np.asarray(xys)[:,0], np.asarray(xys)[:,1], c=runoff_corr_max)
cb2 = fig.colorbar(sc2, ax=ax2)
cb2.ax.set_title('Max. xcorr')
ax2.set(xlim=(270000, 320000), xticks=(280000, 300000, 320000), 
      ylim=(-2590000, -2550000), yticks=(-2590000, -2570000, -2550000), 
       xticklabels=('280', '300', '320'), yticklabels=('-2590', '-2570', '-2550'),
      xlabel='Easting [km]', ylabel='Northing [km]', title='Catchment-integrated runoff')
ax3.contourf(x_hel, y_hel, b_hel, cmap='gist_earth', alpha=0.5)
sc3 = ax3.scatter(np.asarray(xys)[:,0], np.asarray(xys)[:,1], c=terminus_corr_max)
cb3 = fig.colorbar(sc3, ax=ax3)
cb3.ax.set_title('Max. xcorr')
ax3.set(xlim=(270000, 320000), xticks=(280000, 300000, 320000), 
      ylim=(-2590000, -2550000), yticks=(-2590000, -2570000, -2550000), 
       xticklabels=('280', '300', '320'), yticklabels=('-2590', '-2570', '-2550'),
      xlabel='Easting [km]', ylabel='Northing [km]', title='Terminus position')
plt.subplots_adjust(right=2.0)
plt.show()

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
ax1.contourf(x_hel, y_hel, b_hel, cmap='gist_earth', alpha=0.5)
sc1 = ax1.scatter(np.asarray(xys)[:,0], np.asarray(xys)[:,1], c=smb_lag_max)
cb1 = fig.colorbar(sc1, ax=ax1)
cb1.ax.set_title('Lag [days] at peak xcorr')
ax1.set(xlim=(270000, 320000), xticks=(280000, 300000, 320000), 
      ylim=(-2590000, -2550000), yticks=(-2590000, -2570000, -2550000), 
       xticklabels=('280', '300', '320'), yticklabels=('-2590', '-2570', '-2550'),
      xlabel='Easting [km]', ylabel='Northing [km]', title='Surface mass balance')
ax2.contourf(x_hel, y_hel, b_hel, cmap='gist_earth', alpha=0.5)
sc2 = ax2.scatter(np.asarray(xys)[:,0], np.asarray(xys)[:,1], c=runoff_lag_max)
cb2 = fig.colorbar(sc2, ax=ax2)
cb2.ax.set_title('Lag [days] at peak xcorr')
ax2.set(xlim=(270000, 320000), xticks=(280000, 300000, 320000), 
      ylim=(-2590000, -2550000), yticks=(-2590000, -2570000, -2550000), 
       xticklabels=('280', '300', '320'), yticklabels=('-2590', '-2570', '-2550'),
      xlabel='Easting [km]', ylabel='Northing [km]', title='Catchment-integrated runoff')
ax3.contourf(x_hel, y_hel, b_hel, cmap='gist_earth', alpha=0.5)
sc3 = ax3.scatter(np.asarray(xys)[:,0], np.asarray(xys)[:,1], c=terminus_lag_max)
cb3 = fig.colorbar(sc3, ax=ax3)
cb3.ax.set_title('Lag [days] at peak xcorr')
ax3.set(xlim=(270000, 320000), xticks=(280000, 300000, 320000), 
      ylim=(-2590000, -2550000), yticks=(-2590000, -2570000, -2550000), 
       xticklabels=('280', '300', '320'), yticklabels=('-2590', '-2570', '-2550'),
      xlabel='Easting [km]', ylabel='Northing [km]', title='Terminus position')
plt.subplots_adjust(right=2.0)
plt.show()

## Annual chunks to compare changing seasonal cycle

We break signals into annual subsets and compute the cross-correlation signal for each single year of data.

In [None]:
rf_annual_corrs = []
rf_annual_lags = []
rf_annual_ci = []

point_to_plot =5
date_chks = range(2009, 2017)
for i in range(len(date_chks)-1):
    snippet = rf[rf[:,0]>=date_chks[i]]
    snpt = snippet[snippet[:,0]<date_chks[i+1]]
    d_chk = [d for d in d_interp if (d>=date_chks[i] and d<=date_chks[i+1])]
#     t_chk = np.asarray([t for t in t_grid if (t>=date_chks[i] and t<date_chks[i+1])])
#     t_chk = t_grid[t_grid>=date_chks[i]]
#     snpt_func = interpolate.interp1d(d_chk, snpt[:,2])
    corr, lags, ci = nifl.Xcorr1D(xys[point_to_plot], series_func=runoff_func, series_dates=d_interp, 
                              velocity_pred=preds[point_to_plot], t_grid=t_grid, t_limits=(date_chks[i], date_chks[i+1]),
                                  diff=1, normalize=True)
    rf_annual_corrs.append(corr)
    rf_annual_lags.append(lags)
    rf_annual_ci.append(ci)

In [None]:
# fig, axs = plt.subplots(len(rf_annual_corrs))
# for j in range(len(rf_annual_corrs)):
#     axs[j].plot(rf_annual_lags[j], rf_annual_corrs[j])
#     axs[j].plot(rf_annual_lags[j], rf_annual_ci[j], ls=':', color='k')
#     axs[j].plot(rf_annual_lags[j], -1*np.array(rf_annual_ci[j]), ls=':', color='k')

for j in range(len(rf_annual_corrs)):
    fig, ax = plt.subplots(1)
    ax.axvline(x=0, ls='-', color='k', alpha=0.5)
    ax.axhline(y=0, ls='-', color='k', alpha=0.5)
    ax.plot(rf_annual_lags[j], rf_annual_corrs[j])
    ax.plot(rf_annual_lags[j], rf_annual_ci[j], ls=':', color='k')
    ax.plot(rf_annual_lags[j], -1*np.array(rf_annual_ci[j]), ls=':', color='k')
    ax.set(ylim=(-1,1), title='Xcorr runoff-vel, {}'.format(date_chks[j]), 
           xlabel='Lag [days]', ylabel='xcorr')

Let's compare with the overall signal from the full period.

In [None]:
corr, lags, ci = nifl.Xcorr1D(xys[5], series_func=runoff_func, series_dates=d_interp, 
                          velocity_pred=preds[5], t_grid=t_grid, t_limits=(2009,2017), diff=1, normalize=True)
fig, ax = plt.subplots(1)
ax.axvline(x=0, ls='-', color='k', alpha=0.5)
ax.axhline(y=0, ls='-', color='k', alpha=0.5)
ax.plot(lags, corr)
ax.plot(lags, ci, ls=':', color='k')
ax.plot(lags, -1*np.array(ci), ls=':', color='k')
ax.set(ylim=(-1,1), title='Xcorr runoff-vel, 2009-2017', xlabel='Lag [days]', ylabel='xcorr')