## Andrew's Musings on the Jacobian/Sensitivity/H matrix.

In [None]:
######################################################################
#-- Read settings for location of data and set up, NO NEED TO CHANGE
######################################################################
orig_dir = getwd()
require(yaml,warn.conflicts = FALSE)
dat = yaml.load_file("../site_settings.yml")
Rcode_dir <- getwd()
data_dir = dat$global_paths$input_folder
output_dir = dat$global_paths$output_folder

#--  Load utility code file w/ setup()
source(file.path(Rcode_dir,"util_code_032024.R"))
setup()


#### Pieces of an inversion

Recall that there are several parts that go into an inversion:

- The state space $\mathbf{x}$, it's prior mean $\mathbf{x}_0$, and prior covariance $\mathbf{S}_0$
- Observations $\mathbf{z}$, with error covariance $\mathbf{S}_z$
- Jacobian mapping state to obs: $\mathbf{H}$

We load and explore the Jacobian $\mathbf{H}$ below, the "sensitivity" matrix

#### Baseline Sensitivity Matrices (H and H^t)

These precalculated sensitivity matrices (jacob object) detail the sensitivity of 1,156,383 different observations to the basis functions, which consist of 22 regions, 11 land and 11 ocean, over 24 different months. The jacob_bgd object consists of the sensitivity of the observations to emission sources which will not be optimized here, particularly fire emissions (e.g. forest/grassland fires) and fossil fuel emissions.  At end we assign these objects to 'H' to match the notation through rest of exercises/slides.

In [None]:
###############################################
#--  Load sensitivity matrices 
###############################################

d = load.ncdf(file.path(data_dir,"jacobians","trunc_full_jacob_030624_with_dimnames_sib4_4x5_mask.nc4"))

#-- Difference in forward runs from GEOS-CHem resulted in CO2 vs C diff in mass is why 12/44 is here (note)
#-- Assign the jacob objects to H to match notation

H <- d$jacobian * 12/44
dimnames(H)[[1]] = d$observation.id
dimnames(H)[[2]] = d$pulse.id
rm(d)

###################################################################
#-- END END END ***Parent Directory and code for ALL inversions***
###################################################################

In [None]:
#- Load "catalog" of observations, subset to every 5th obs to match "short" Jacobians
load(file.path(data_dir,"obs/obs_catalog_042424_unit_pulse_hour_timestamp_witherrors_withdates.rda")) 

#how many obs of each type, and some sample observations locations/times
table(obs_catalog$TYPE)
print("Ten random rows of obs_catalog:")
obs_catalog[sample(1:dim(obs_catalog)[1],10),]


## Plot the "pulses" we are tracking 
There are 528 net CO2 flux (NEE) "pulses" that we look to track and adjust/optimize.  There are 22 different geographic regions (11 land and 11 ocean) and 24 months (Sept 2014 - Aug 2016, Month 1=Sept 2014, Month 2=Oct 2014, etc), i.e. 22*24 = 528. This function allows you to plot the pulses for various month and region combinations. The general structure of the function call is:

plot_base_pulse_flux(month=c(11),transcom_region=c(2))  # plot Aug 2015 and Transcom Region 2 (Temperature N.A.)
plot_base_pulse_flux(month=c(11,12),transcom_region=c(1,2))  # plot Aug 2015, Sept 2015 and Transcom Region 1 (Boreal N.A.) and 2 (Temperature N.A.)

#### Notes

1) the "flux" here is identical over the entire month, just an average (not totally realistic).
2) the flux is not identical over the entire geographic region, but is a "pattern" that is fixed

In [None]:
options(repr.plot.width = 12, repr.plot.height = 8, repr.plot.res = 150)

plot_base_pulse_flux(month=c(1:24),transcom_region=c(1))

### Plotting a COLUMN of the Jacobian/Sensitivity Matrix 
The following function finds all rows (observations) corresponding to a sampling "location" and then plots the sensitivity of that site to a particular column of Jacobian (transcom region & month) over time. In essence, this is plotting a "piece" of a single column of the Jacobian, hence the function being called plot_Jacobian_cols_*. 

Note these are all in situ we're plotting but we could do similar analysis by subsetting the OCO-2 observation to a small lat-lon box.  Didn't get to that : )

In [None]:
#- Keep in mind Month 1 is Sept 2014, Month 12 is Aug 2015, Month 24 is Aug 2016
#- Different sites plotted below...
#- "co2_mlo_surface-insitu_1_allvalid" : Mauna Loa, Hawaii
#- "co2_brw_surface-insitu_1_allvalid" : Barrow, Alaska
#- "co2_smo_surface-insitu_1_allvalid" : American Samoa, South Pacific Ocean
#- "co2_spo_surface-insitu_1_allvalid" : South Pole
#- "co2_lef_tower-insitu_1_allvalid-396magl" : Park Falls, Wisconsin

#- Single pulse example: transcom_region=1,month=12
#- Mult pulse example: transcom_region=c(1,2),month=c(11,12)
#- transcom_region: vector of integers between 1 and 22
#- month: vector of integers between 1 and 24
#- plot_sum: this sums all the pulse contributions into a single variable to be plotted (called "VALUE"), 
#-  otherwise individual pulses plotted, plot_sum has to be FALSE if you just select one pulse region (1 time and 1 region)

options(repr.plot.width = 20, repr.plot.height = 4, repr.plot.res = 150)

plot_Jacobian_cols_observations(transcom_region=c(1),month=c(11),
                                plot_sum=FALSE,
                                site_strings=c("co2_mlo_surface-insitu_1_allvalid",
                                   "co2_brw_surface-insitu_1_allvalid",
                                   "co2_smo_surface-insitu_1_allvalid",
                                   "co2_spo_surface-insitu_1_allvalid",
                                   "co2_lef_tower-insitu_1_allvalid-396magl"))



### Plotting ROW of Jacobian/Sensitivity Matrix 
The following function ingests a row number corresponding to a single observation in time.  It then plots the sensitivity of the observation to each Transcom Region over the 24 months.  Essentially, this function plots a "row" of the Jacobian.

In [None]:
#-- some representative samples
#-- I'm plotting the rows, sorting output by time for clarity
#-- 772122: MLO flask (Hawaii, US 12/31/2014)
#-- 768131: wbi in-situ (Iowa, US 12/24/2014)
#-- 924951: wbi in-situ (Iowa, US 8/10/2015)
#-- 1156374: spo flask ( South Pole 8/31/2016 )
#-- ....need a few more....

options(repr.plot.width = 20, repr.plot.height = 4, repr.plot.res = 150)

plot_Jacobian_rows_fluxes(772122)

#### Volunteers to explain the previous samples

1) Iowa, US 12/24/2014
2) Iowa, US 8/10/2015
3) South Pole 8/31/2016
4) Any other interesting one somebody sees?
5) (BONUS) Can you *roughly* gauge how long it takes for air to move from Canada to South Pole by these plots? or 'bracket' it at all?

###  BONUS: Fixed components to Jacobian/Sensitivity Matrix, things we "add" but don't plan to optimize
In actual inversion problems, you are usually optimizing a piece of the emissions, relative to prior guess fluxes, but often also adding other fixed contributions which you don't plan to optimize.  In the global CO2 flux problem, this usually manifests itself as adding in fossil fuel CO2 and biomass fire CO2 as fixed/unoptimized components and then trying to optimize the "natural" part due to photosynthesis/decomposition (NEE) and net ocean exchange.  Here we show both fossil and fires as single columns of the Jacobian (sensitivity of obs to a single "pulse") and plot the sensitivity of different sampling sites to the emission over time.

In [None]:
#-- These represent the fossil and biomass burning contributions to the observations (from fixed emission runs)
load(file.path(data_dir,"jacobians/","jacob_bgd_060524.rda"))
H_bgd <- jacob_bgd 
fire_fixed <- H_bgd[,2]
fossil_fixed <- H_bgd[,3]

#-- Note the row dimension is length of observations, the 5 rows are five possible extra fixed contributions,
#-- including fire and fossil as 2 of the columns
#paste("Dimension H_bgd:",dim(H_bgd)[1]," rows by ",dim(H_bgd)[2]," cols",sep="")




In [None]:
#-- This secret argument, "plot_fossil_instead" and "plot_fossil_instead" bypass the region and month argument
#-- and allow you to plot the "fixed" fossil and fires additions to the concentrations
#-- Note "izo" is Izana, Canary Islands, off the NW coast of Africa

options(repr.plot.width = 20, repr.plot.height = 4, repr.plot.res = 150)

plot_Jacobian_cols_observations(transcom_region=1,month=12,site_strings=c("izo_surface","brw_surface-insitu","mlo_surface-flask","smo_surface-insitu",
                                                          "co2_spo_surface-flask",
                                                          "lef","wkt","wbi","nwr","hun","cgo","cpt"),plot_fires_instead=TRUE,
                                                           plot_fossil_instead=FALSE)

### Discussion

Have a few volunteers contrast what possible Jacobians *in their research* problem might look like compared to these.  Furthermore, what are the optimizable components and what might be fixed additions?