### Baseline Batch Inversion Example

- Run the script as is making sure you can step through an example and create plots

- See if you can set errors and/or choose number of constraining observations to make estimate:

    - As good as possible (tight boxplot/confidence bounds)
    - As poor as possible (loose boxplot/confidence bounds)

- Describe the above estimates:

    - Monthly and annual flux estimates for oceans vs land, how are they different?

    - Which land regions appear more difficult to constrain with limited global observations?

- Using knowledge from above
    - about how much of these 1,100,000 observations would you think you need to reasonably constrain most of the land flux regions?

    - How much data do you think you’d need to simply constrain the global annual CO2 flux (all regions summed together) ?


#### Setting up Environment for Computing
This cell reads from site_settings.yml file (top directory of ssim-ghg repos) and sets up environment, including directory references and libraries. Many more libraries and code scripts are read via the setup() function.

In [None]:
######################################################################
#-- Read settings for location of data and set up, NO NEED TO CHANGE
######################################################################
orig_dir = getwd()
require(yaml,warn.conflicts = FALSE)
dat = yaml.load_file("../site_settings.yml")
Rcode_dir <- getwd()
data_dir = paste(dat$global_paths$input_folder,"/",sep="")
output_dir = paste(dat$global_paths$output_folder,"/",sep="")

print(paste("Using",data_dir,"for data directory"))
print(paste("Using",output_dir,"for output directory"))
print(paste("Using",Rcode_dir,"for RCode dir"))

#--  Load utility code file w/ setup()
source(file.path(Rcode_dir,"util_code_032024.R"))
setup()

#### Solving the equation
Recall here we are simply solving the below equation, we therefore need inputs for each variable

$$
\newcommand{\transpose}[1]{{#1^{\scriptscriptstyle T}}} 
J(x) = \transpose{(x_0 - x)} {S_x
}^{-1}(x_0 - x) + \transpose{(z - Hx)} {S_z}^{-1}(z - Hx)\\
$$

$$
\newcommand{\transpose}[1]{{#1^{\scriptscriptstyle T}}} 
\hat{x} = (\transpose{H}{S_z}^{-1}H + {S_x}^{-1})^{-1}(\transpose{H}{S_z}^{-1}(z-Hx)+{S_x}^{-1}x_0)
$$

$$
\newcommand{\transpose}[1]{{#1^{\scriptscriptstyle T}}} 
S_{\hat{x}} = {({S_x}^{-1} + \transpose{H}{S_z}^{-1}H )}^{-1}
$$




#### Baseline Sensitivity Matrices (H and H^t)

These precalculated sensitivity matrices (jacob object) detail the sensitivity of 1,156,383 different observations to the basis functions, which consist of 22 regions, 11 land and 11 ocean, as well as 24 months. The jacob_bgd object consists of the sensitivity of the observations to emission sources which will not be optimized here, particularly fire emissions (e.g. forest/grassland fires) and fossil fuel emissions.  At end we assign these objects to 'H' to match the notation through rest of exercises/slides.

In [None]:
###############################################
#--  Load sensitivity matrices 
###############################################

load(file.path(data_dir,"jacobians/","trunc_full_jacob_030624_with_dimnames_sib4_4x5_mask.rda"))
load(file.path(data_dir,"jacobians/","jacob_bgd_060524.rda"))

#-- Difference in forward runs from GEOS-CHem resulted in CO2 vs C diff in mass is why 12/44 is here (note)
#-- Assign the jacob objects to H to match notation
H <- jacob * 12/44
H_bgd <- jacob_bgd 
rm(jacob);rm(jacob_bgd)

#-- These represent the fossil and biomass burning contributions to the observations (from fixed emission runs)
fire_fixed <- H_bgd[,2]
fossil_fixed <- H_bgd[,3]
###################################################################
#-- END END END ***Parent Directory and code for ALL inversions***
###################################################################

#### Set the "truth"

This block of code sets up the (simulated) "truth" for the 528 element long state vector we described above. We've provided real life examples of what these can look like in the truth_array.  You can also simply set the state_vector_true to any vector of length 528.  Recall this state 'x' represents the adjustment to a baseline prior guess of fluxes such that the simulated true flux = 'prior best guess flux' * (1 + x).  This state will then be used to simulate our observations 'z'

In [None]:

##################################################################
#- Inversion #1   *************************
##################################################################

#################################
#- Target truth in state space
#################################

##################################################################
#-- This array holds ratios of OCO2v10MIP fluxes and SiB4 fluxes
#-- as examples of "scalings" to be recovered. It also holds corresponding
#-- differences if the inversion attempts to directly solve for flux
#-- truth_array(24 months, 23 transcom, 98 inversions, (ratio, difference) )
#-- To try another "truth" from these, just increment the third element below:
#-- e.g. set * in xx = truth_array[,-1,*,1] to be between 1 and 98
##################################################################

#-- Don't Change
#load("/projects/sandbox/inversion_workshop_scripts/truth_array.rda")
load(file.path(data_dir,"misc/truth_array.rda"))
#-- pulling out NA transcom region and subset to scalar vs flux adj
truth_array = truth_array[,-1,,1]
#-- Don't Change


#--  Choose our state from inversion list, option #1, and "truncate" to -1 and 1
inversion_number =1   #  choose this between 1 and 98
state_vector_true= tm(as.vector(- truth_array[,,inversion_number]),-1,1)

#-- Alternatively choose a "different" true state like the below ones
#-- The first just means the truth IS the prior, the second has a simple structure
#-- Land regions fluxes are (1+0.5) * prior guess and ocean fluxes are (1- 0.5) * prior guess.
#state_vector_true = c(rep(0,24*11),rep(0,24*11))
#state_vector_true = c(rep(0.5,24*11),rep(-0.5,24*11))


#### Define the a priori flux covariance matrix
Here we define what we are calling S_x, the a priori flux covariance matrix. In essence, this defines the bounds within which we expect to find our "simulated" truth, relative to the baseline best guess for prior flux.

In [None]:
#########################################################
# Generate a prior flux covariance matrix Sx
# These first two lines form "diagonal" of Sx, e.g. marginal variances
# Long term, a catalog of predefined choices is best here I think
#########################################################
land_prior_sd = 0.5   #-- free to set this, implies you think "truth" for land is within +/- 3*this
ocean_prior_sd = 1    #-- free to set this, implies you think "truth" for ocean is within +/- 3*this

##############################################################################
#-- This is the structure of the 24 month subblock for each land/ocean region
#-- induce temporal correlations
##############################################################################

#-- This will set up a prior temporal correlation, 
#-- free to set month_to_month_correlation between 0 (independent) and 1
month_to_month_correlation = 0.5
sigma = bdiag(rep(list(ar_covariance(24, month_to_month_correlation)), 22))  #-- free to set 


#################################################
#-- scale by variance for land/ocean (set diagonal of matrix)
#-- This simply puts together pieces above
#################################################
var_scaling_diagonal = diag(c(rep(land_prior_sd,24*11),rep(ocean_prior_sd,24*11)))

Sx = as.matrix(var_scaling_diagonal %*% sigma %*% t(var_scaling_diagonal))

#-- This is an alternative state_vector_true based *exactly* upon the prior covariance matrix
#-- as opposed to being able to pick your "truth" separately from your assumed dist where "truth" lives
#-- Probably don't want to change this unless you know what you are doing
#state_vector_true = t(rmvnorm(n=1,mean=rep(0,528),sigma=sigma))


#### Choose which observations you want to assimilate
Or in other words, which observations will be used to optimize/estimate the unknown fluxes.  This problem is somewhat over determined with over a million observations to constrain a 528 element state.  With that in mind, small observation errors and LOTS of observations used should "nail the unknown" solution quite well. The goal here is to create a vector of TRUE/FALSE of length equal to the total number of observations described in the sensitivity matrix we loaded above ( 1156383 ). The obs_catalog is a data.frame (think matrix of 'items'), with information about each observation and can be used to build a subset.

In [None]:
####################################################################################
#-- WHICH obs do you want to use in the inversion? 
#-- examples of selecting on stations, type of data, lat/lon box,etc
#-- essentially the "subset_indicator_obs" object is a vector of logicals (T/F) the 
#-- same length as the number of possible observations, i.e. 1156383
####################################################################################

load(file.path(data_dir,"obs/obs_catalog_042424_unit_pulse_hour_timestamp_witherrors_withdates.rda")) 

############################
#-- Example 0: USE ALL OBS
############################
subset_indicator_obs=rep(TRUE,dim(H)[1])

############################
#-- SAMPLE BY TYPE EXAMPLE
############################
#subset_indicator_obs = obs_catalog$TYPE == "TCCON"
#subset_indicator_obs = obs_catalog$TYPE == "OCO2"
#subset_indicator_obs = obs_catalog$TYPE == "IS"

###################################################################
#-- Example 1: SAMPLE BY NOAA STATION EXAMPLE: Just use Mauna Loa 2 week flasks
####################################################################
# subset_indicator_obs = (
#   grepl("mlo_surface-flask", obs_catalog$ID)  #, add another via : | grepl("lef", obs_catalog$ID)
# )

#####################################################################
#-- Example 2: SAMPLE BY TIME EXAMPLE, THIS IS ONLY FLASK DATE FOR Aug 2015
#####################################################################
# subset_indicator_obs=(
#   obs_catalog$YEAR == 2015
#   & obs_catalog$MONTH == 8
#    & obs_catalog$TYPE=="IS"
#    &  grepl("flask", obs_catalog$ID)
# )

##############################################################################
#-- Example 3:  SAMPLE BY LON & LAT EXAMPLE 
#-- Data within 3deg of Equator and S. Hemisphere data cases
##############################################################################

#-- Southern Hemi data
# subset_indicator_obs=(
#   obs_catalog$LAT > -90
#   & obs_catalog$LAT < -30   
# )

#-- Equator "band" of OCO2 XCO2 data
# subset_indicator_obs=(
#   obs_catalog$LAT > -3
#   & obs_catalog$LAT < 3
#   & obs_catalog$TYPE=="OCO2"
# )

############################
#-- USE SIMPLE SUBSET
############################
#subset_size = 40000
#subset_indicator_obs=rep(FALSE,dim(H)[1])
#subset_indicator_obs[seq(1,1156383,length=subset_size)] = TRUE


#######################################################################
#-- Downsample if necessary to 578191 obs, watch running out of RAM
#######################################################################

#if(sum(subset_indicator_obs) > 0.5*length(subset_indicator_obs)) {
#  new_ind = rep(FALSE,length(subset_indicator_obs))
#  new_ind[sample(x=grep(TRUE,subset_indicator_obs),size=floor(0.5*length(subset_indicator_obs)))] = TRUE
#  print(paste("downsampling from",sum(subset_indicator_obs),"to",
#              floor(0.5*length(subset_indicator_obs)),"observations"))
#  subset_indicator_obs = new_ind
#    }

#-- LEAVE THIS AS IT SUMMARIZES THE NUMBER OF OBS USED
print(paste("using",sum(subset_indicator_obs),"of",length(subset_indicator_obs),"observations"))

#### Set the observation errors
Recall this component, matrix Sz, consists of the sum of (assumed) independent errors describing instrument noise and various transport errors due to representation and aggregation. You can simply set this error to be the same across all observations or use realistic errors as given in the obs_catalog object (from the OCO2MIP project). Note we don't allow off-diagonal non-zero entries here so we're carrying this matrix forward as vector.

In [None]:
##########################################################
#-- sd for Gaussian i.i.d. errors, jacob is sens matrix
##########################################################

#-- Simple errors 
Sz_diagonal_in = rep(1,(dim(H)[1]))  # dim(H)[1] is length of obs possible

#-- More realistic errors, "real" errors we use for these observation sites in "real" data inversion
#Sz_diagonal_in = obs_catalog$SD

#### Simulate the true observations from the sensitivity matrix and the assumed observation errors
Here we literally take the sensitivity matrix, our "true" state and the prior guess (the 1 in the calc below) and add our expected errors (Sz) to it.

In [None]:
#############################################################
#-- Generate obs, 'y',  set.seed() ????
#-- currently leaving out bgd and all fixed
#-- non-optimizable contributions including fire and fossil
#############################################################

z_in = H %*% (1+state_vector_true) + rnorm(length(Sz_diagonal_in),sd=Sz_diagonal_in)


### The "calculations"
Now we have every component defined and we simply do the calculations....

$$
\newcommand{\transpose}[1]{{#1^{\scriptscriptstyle T}}} 
J(x) = \transpose{(x_0 - x)} {\Sigma_x
}^{-1}(x_0 - x) + \transpose{(z - Hx)} {\Sigma_z}^{-1}(z - Hx)\\
$$

$$
\newcommand{\transpose}[1]{{#1^{\scriptscriptstyle T}}} 
\hat{x} = (\transpose{H}{\Sigma_z}^{-1}H + {\Sigma_x}^{-1})^{-1}(\transpose{H}{\Sigma_z}^{-1}(z-Hx)+{\Sigma_x}^{-1}x_0)
$$

$$
\newcommand{\transpose}[1]{{#1^{\scriptscriptstyle T}}} 
\Sigma_{\hat{x}} = {({\Sigma_x}^{-1} + \transpose{H}{\Sigma_z}^{-1}H )}^{-1}
$$

Actual baseline "inversion" code is now below...





In [None]:
############################
#-- Run the actual inversion
############################
#-- Be aware DOF calc (DOF arg) and Kalman Gain  calc (output_Kalman_Gain) are a bit costly computationally
#-- Try to leave DOF T or F, but output_Kalman_Gain=FALSE except in the kalman gain notebook example

ret2  = invert_clean_notation(H=H,Sz_diagonal=Sz_diagonal_in,Sx=Sx,z=z_in,H_bgd=H_bgd,
                    subset_indicator_obs=subset_indicator_obs,DOF=FALSE,output_Kalman_Gain=FALSE,
                     state_vector_true=state_vector_true,force=TRUE)

#### "Sanity check"
The first sanity check here is to simply compare the predicted state with actual "true" state we defined above. If all is perfect, the points will line up on the 1:1 line.

In [None]:
#hist(ret2$posterior$x_hat[,1])
options(repr.plot.width=8, repr.plot.height=8)
plot(state_vector_true,ret2$posterior$x_hat,pch=16,cex=1.5,col=c(rep("orange",264),rep("blue",264)),
     xlab="True State Scaling",ylab="Estimated State Scaling",main="Estimated state vector vs true state vector (all time and regions)")
lines(c(-100,100),c(-100,100),lty=1,lwd=3,col="grey")
legend(min(state_vector_true),max(ret2$posterior$x_hat),c("Land","Ocean"),pch=c(16,16),col=c("orange","blue"))

#### Maps of Flux, Prior, Post and Truth

In [None]:
plot_flux_maps_annual_prior_post_truth=function (inv_object = ret2, true_state = state_vector_true, 
                                                 prior_mean_ncdf = file.path(data_dir, "priors/prior_SiB4.nc"), 
                                                 center_prior_on_zero = TRUE) 
{
  print("creating gridded fluxes....")
  con = nc_open(prior_mean_ncdf)
  longitude_prior = con$dim$longitude$vals
  latitude_prior = con$dim$latitude$vals
  NEE = ncvar_get(con, "NEE")
  nc_close(con)
  NEE_1x1 = aaply(NEE, 3, .fun = function(x) {
    expand_5x4_2_1x1(x)
  }) %>% aperm(c(2, 3, 1))
  NEE_transcom = aaply(NEE_1x1, 3, .fun = function(x) {
    grid2transcom(x)
  })
  tr_dir = file.path(data_dir, "/transcom/", sep = "")
  x_prior_matrix = matrix(inv_object$prior$x_hat, nrow = 24, 
                          byrow = FALSE)
  x_hat_matrix = matrix(inv_object$posterior$x_hat, nrow = 24, 
                        byrow = FALSE)
  true_state_matrix = matrix(true_state, nrow = 24, byrow = FALSE)
  prior_flux_unc = diag(as.vector(NEE_transcom[, 1:22])) %*% 
    inv_object$prior$Sx %*% diag(as.vector(NEE_transcom[, 
                                                        1:22]))
  post_flux_unc = diag(as.vector(NEE_transcom[, 1:22])) %*% 
    inv_object$posterior$Sx %*% diag(as.vector(NEE_transcom[, 
                                                            1:22]))
  A = cbind(diag(1, nrow = 22), diag(1, nrow = 22), diag(1,nrow = 22), diag(1, nrow = 22), diag(1, nrow = 22), diag(1,nrow = 22), 
            diag(1, nrow = 22), diag(1, nrow = 22), diag(1,nrow = 22), diag(1, nrow = 22), diag(1, nrow = 22), diag(1,nrow = 22), 
            diag(1, nrow = 22), diag(1, nrow = 22), diag(1,nrow = 22), diag(1, nrow = 22), diag(1, nrow = 22), diag(1,nrow = 22), 
            diag(1, nrow = 22), diag(1, nrow = 22), diag(1,nrow = 22), diag(1, nrow = 22), diag(1, nrow = 22), diag(1,nrow = 22))
  
  annual_avg_prior_flux_cov = 0.5 * A %*% prior_flux_unc %*% 
    t(0.5 * A)
  annual_avg_post_flux_cov = 0.5 * A %*% post_flux_unc %*% 
    t(0.5 * A)
  gridded_1x1_prior_sd_flux_annual = transcom2grid(sqrt(diag(annual_avg_prior_flux_cov))) * 
    1e-15
  gridded_1x1_post_sd_flux_annual = transcom2grid(sqrt(diag(annual_avg_post_flux_cov))) * 
    1e-15
  gridded_1x1_post_sd_reduction_annual = 1 - (gridded_1x1_post_sd_flux_annual/gridded_1x1_prior_sd_flux_annual)
  if (!center_prior_on_zero) {
    x_prior_matrix = x_prior_matrix + 1
    x_hat_matrix = x_hat_matrix + 1
    true_state_matrix = true_state_matrix + 1
  }
  gridded_1x1_prior_state = aaply(x_prior_matrix, 1, .fun = function(x) {
    transcom2grid(x, model.grid.x = 1, model.grid.y = 1, 
                  file_location = data_dir)
  }) %>% aperm(c(2, 3, 1))
  gridded_1x1_posterior_state = aaply(x_hat_matrix, 1, .fun = function(x) {
    transcom2grid(x, model.grid.x = 1, model.grid.y = 1, 
                  file_location = data_dir)
  }) %>% aperm(c(2, 3, 1))
  gridded_1x1_true_state = aaply(true_state_matrix, 1, .fun = function(x) {
    transcom2grid(x, model.grid.x = 1, model.grid.y = 1, 
                  file_location = data_dir)
  }) %>% aperm(c(2, 3, 1))
  gridded_1x1_prior_mean_flux = NEE_1x1 * gridded_1x1_prior_state
  gridded_1x1_posterior_mean_flux = NEE_1x1 * gridded_1x1_posterior_state
  gridded_1x1_truth = NEE_1x1 * gridded_1x1_true_state
  gridded_1x1_prior_mean_flux_annual = apply(gridded_1x1_prior_mean_flux, 
                                             c(1, 2), sum)/2
  gridded_1x1_posterior_mean_flux_annual = apply(gridded_1x1_posterior_mean_flux, 
                                                 c(1, 2), sum)/2
  gridded_1x1_true_mean_flux_annual = apply(gridded_1x1_truth, 
                                            c(1, 2), sum)/2
  library(maps)
  w = map("world", plot = FALSE)
  grd = expand.grid(longitude = seq(-179.5, 179.5, by = 1), 
                    latitude = seq(-89.5, 89.5, by = 1))
  units_scaling = 1000 * 12/44 * 3600 * 24 * 30.5
  grd$prior.mean = as.vector(gridded_1x1_prior_mean_flux_annual) * 
    units_scaling
  grd$posterior.mean = as.vector(gridded_1x1_posterior_mean_flux_annual) * 
    units_scaling
  grd$truth = as.vector(gridded_1x1_true_mean_flux_annual) * 
    units_scaling
  grd$difference = grd$posterior.mean - grd$truth
  grd$prior_sd = as.vector(gridded_1x1_prior_sd_flux_annual) * 
    units_scaling
  grd$post_sd = as.vector(gridded_1x1_post_sd_flux_annual) * 
    units_scaling
  grd$reduction_sd = as.vector(gridded_1x1_post_sd_reduction_annual)
  rng_mn = range(c(grd$prior.mean, grd$posterior.mean, grd$truth, 
                   grd$difference))
  rng_sd = range(c(grd$prior_sd, grd$post_sd))

  library(ggplot2)
  w2 =    as.data.frame(cbind(w$x,w$y))
  names(w2) = c("x","y")
  
  
  
  
  # Set color scale limits
  lims_mn <- seq(-max(abs(c(rng_mn))), max(abs(c(rng_mn))), length.out = 50)
  lims_sd <- seq(0, max(abs(c(rng_sd))), length.out = 50)
  
  plt1=ggplot(grd, aes(x = longitude, y = latitude, fill = prior.mean)) +
    geom_raster(interpolate = TRUE) +  # Or use geom_tile() if you want crisp edges
    scale_fill_gradientn(
      colours = my.col(50),
      limits = range(lims_mn),
      name = "gC/m2/yr"
    ) +
    geom_path(data = w2, aes(x = x, y = y), linewidth = 0.25, inherit.aes = FALSE, color = "black") +
    coord_fixed() +
    labs(title = "Annual Prior Mean Flux (gC/m2/yr)", x = "", y = "") +
    theme_minimal()
  
  plt2=ggplot(grd, aes(x = longitude, y = latitude, fill = posterior.mean)) +
    geom_raster(interpolate = TRUE) +  # Or use geom_tile() if you want crisp edges
    scale_fill_gradientn(
      colours = my.col(50),
      limits = range(lims_mn),
      name = "gC/m2/yr"
    ) +
    geom_path(data = w2, aes(x = x, y = y), linewidth = 0.25, inherit.aes = FALSE, color = "black") +
    coord_fixed() +
    labs(title = "Annual Posterior Mean Flux (gC/m2/yr)", x = "", y = "") +
    theme_minimal()
  
  plt3=ggplot(grd, aes(x = longitude, y = latitude, fill = truth)) +
    geom_raster(interpolate = TRUE) +  # Or use geom_tile() if you want crisp edges
    scale_fill_gradientn(
      colours = my.col(50),
      limits = range(lims_mn),
      name = "gC/m²/yr"
    ) +
    geom_path(data = w2, aes(x = x, y = y), linewidth = 0.25, inherit.aes = FALSE, color = "black") +
    coord_fixed() +
    labs(title = "True Flux (gC/m2/yr)", x = "", y = "") +
    theme_minimal()
  
  plt4=ggplot(grd, aes(x = longitude, y = latitude, fill = difference)) +
    geom_raster(interpolate = TRUE) +  # Or use geom_tile() if you want crisp edges
    scale_fill_gradientn(
      colours = my.col(50),
      limits = range(lims_mn),
      name = "gC/m²/yr"
    ) +
    geom_path(data = w2, aes(x = x, y = y), linewidth = 0.25, inherit.aes = FALSE, color = "black") +
    coord_fixed() +
    labs(title = "Posterior Mean - Truth (gC/m2/yr)", x = "", y = "") +
    theme_minimal()

    
  plt5=ggplot(grd, aes(x = longitude, y = latitude, fill = prior_sd)) +
    geom_raster(interpolate = TRUE) +  # Or use geom_tile() if you want crisp edges
    scale_fill_gradientn(
      colours = my.col(50)[26:50],
      limits = range(lims_sd),
      name = "gC/m²/yr"
    ) +
    geom_path(data = w2, aes(x = x, y = y), linewidth = 0.25, inherit.aes = FALSE, color = "black") +
    coord_fixed() +
    labs(title = "Prior Standard Deviation (PgC/region/yr)", x = "", y = "") +
    theme_minimal()
  
  plt6=ggplot(grd, aes(x = longitude, y = latitude, fill = post_sd)) +
    geom_raster(interpolate = TRUE) +  # Or use geom_tile() if you want crisp edges
    scale_fill_gradientn(
      colours = my.col(50)[26:50],
      limits = range(lims_sd),
      name = "gC/m²/yr"
    ) +
    geom_path(data = w2, aes(x = x, y = y), linewidth = 0.25, inherit.aes = FALSE, color = "black") +
    coord_fixed() +
    labs(title = "Posterior Standard Deviation (PgC/region/yr)", x = "", y = "") +
    theme_minimal()
  
  plt7=ggplot(grd, aes(x = longitude, y = latitude, fill = reduction_sd)) +
    geom_raster(interpolate = TRUE) +  # Or use geom_tile() if you want crisp edges
    scale_fill_gradientn(
      colours = rev(my.col(50)),
      limits = range(lims_sd),
      name = ""
    ) +
    geom_path(data = w2, aes(x = x, y = y), linewidth = 0.25, inherit.aes = FALSE, color = "black") +
    coord_fixed() +
    labs(title = "Uncertainty Reduction: 1-Posterior_SD/Prior_SD", x = "", y = "") +
    theme_minimal()

    
  options(jupyter.plot_scale = 1)
  options(repr.plot.width=10, repr.plot.height=5)
  #plt = arrangeGrob(grobs=list(plt1,plt2))
  #plot(plt1)
  #plot(plt2,newpage=FALSE)
  #plot(plt3,newpage=FALSE)
  #plot(plt4,newpage=FALSE)  
  #plt = arrangeGrob(grobs=list(plt3,plt4))
  #plot(plt)
  #plt = arrangeGrob(grobs=list(plt5,plt6))
  #plot(plt)
  #plt = arrangeGrob(grobs=list(plt7))
  #plot(plt)
  #saved <- options()
   #options(repr.plot.width=8, repr.plot.height=8)
  #plt = marrangeGrob(grobs=list(plt1,plt2,plt3,plt4),layout_matrix=matrix(1:4,ncol=1))
  #plot(plt1)
  #plot(plt2)
#plot(plt3)
  #options(saved)
    return(list(plt1,plt2,plt3,plt4,plt5,plt6,plt7))
}

In [None]:
plot_flux_maps_annual_prior_post_truth(inv_object = ret2, true_state = state_vector_true, 
    prior_mean_ncdf = file.path(data_dir, "priors/prior_SiB4.nc"), 
    center_prior_on_zero = TRUE)

#### Produce a Monte Carlo estimate from analytical inversion output in order to quickly plot results
We could perfectly produce the following plots from analytical solutions (linear combinations of the different flux pieces in time and space) coming from inversion but choose to take a large sample of observations to facilitate quick and efficient plotting, in particular boxplots.  The first line returns a sample of fluxes that can be plotted (simply the perturbations to the prior that we are optimizing) while the second option adds the original prior mean fluxes for both the ocean and land.  Takes a little while to generate but run both lines if unsure of what to run.

In [None]:
org_data = generate_transcom_flux_ensemble_from_inversion(inv_object=ret2,samples=1000)

org_data_add_ocn_land_prior_means = generate_transcom_flux_ensemble_from_inversion(inv_object=ret2,samples=1000,include_ocn_land_prior=TRUE)

### Here we will plot annual flux average for 9/2014 - 8/2016 for each Transcom Region.
***Very important, all these boxplots/confidence bounds plots will be on the deviation from the prior flux, i.e. H\*state_vector_true and not H\*(1+state_vector_true). These also represent the inversion estimate of the deviation from the prior with no fires or fossil fuel emissions added back in.  We *could* add the prior mean ocean and land fluxes into the boxplots below but on the annual scale (not monthly/seasonal) they are close to zero so it would be hard to even discern that they were added.

In [None]:
plot_timeseries_flux_bytranscom(ret=org_data)

#### Plotting "confidence bounds" for monthly flux estimates by Transcom Region
In this next section of code we will plot the posterior "credible intervals" for each transcom region as a function of month from 9/2014 to 8/2016.

In [None]:
#Plot just the adjustment the inversion is making
#plot_transcom_flux_by_month(ret=org_data)

#Plot the adjustment PLUS the prior means for land and bio
plot_transcom_flux_by_month(ret=org_data_add_ocn_land_prior_means)

#### Plotting prior/posterior correlations across fluxes
Here we are plotting prior/post correlation across 2 year flux average (then month by month for different regions in next code block). Note that correlations are estimated from samples in "orig_data" hence prior shows "some" correlation when none exists due to noise.


In [None]:
plot_inversion_correlations(org_data = org_data)

In [None]:
plot_inversion_correlations_by_transcom(org_data=org_data)

#### Plot concentration time series at different sites
When add_prior_nee=TRUE,add_fossil=FALSE,and add_fire=FALSE, only the inversion produced adjustments to the site level concentrations are plotted (note prior=0 then). add_prior_nee=TRUE adds the underlying best guess initial prior which is being scaled by the inversion to prior/posterior/obs, you can note the seasonal cycle in the prior *appearing*.  add_fossil=TRUE and add_fire=TRUE add fixed contributions from fossil fuels and biomass burning to all the concentrations resulting in more *realistic* concentration time series.

In [None]:
#key default arg here which you can change: site_strings=c("brw","mlo","smo","co2_spo_surface-flask","lef","wkt","wbi","nwr","hun")
plot_concentrations(inversion=ret2,add_prior_nee=TRUE,add_fossil=FALSE,add_fire=FALSE,
           site_strings=c("brw_surface-flask_1_representative","mlo_surface-flask_1_representative",
                          "smo_surface-flask_1_representative","spo_surface-flask_1_representative",
                          "lef","wkt","wbi","nwr_surface-flask_1_representative","hun")        )

#### Writing out the inversion "object" to a netcdf file so that anybody can read it
You don't have to save out this object to netcdf file every time.  In fact, I wouldn't, it will take a few minutes for this to run to save to netcdf files. However, you *will* need it to make the standard deviation maps in next cell.

In [None]:
#-- Write out "posterior" object
system.time(write_inversion_2_netcdfs(inv_object=ret2,subobject="posterior",
                                      prior_mean_ncdf=file.path(data_dir,"priors/prior_SiB4.nc"),
                                      sample_number=100,output_dir=output_dir))

#-- Write out "posterior" object
system.time(write_inversion_2_netcdfs(inv_object=ret2,subobject="prior",
                                      prior_mean_ncdf=file.path(data_dir,"priors/prior_SiB4.nc"),
                                      sample_number=100,output_dir=output_dir))

#### Prior vs Posterior Marginal Standard Deviation and Uncertainty Reduction
We need to have written out the above inversion objects to netcdf files before running the below plots. Sorry, this code takes just a bit of time to run.

In [None]:
plot_flux_maps_annual_prior_post(prior_file_nc="/Users/aschuh/temp/ssim-ghg-output/gridded_fluxes_prior.nc4",
                                            posterior_file_nc="/Users/aschuh/temp/ssim-ghg-output/gridded_fluxes_posterior.nc4")

In [None]:
sessionInfo()
#show info