## Research Question: 
### What are the February and July fossil CO<sub>2</sub> estimates for Baltimore in 2019?  Is our understanding of Baltimore's CO<sub>2</sub> emissions consistent with atmospheric observations.   

## Method: 
### Use an atmospheric inversion to (1) estimate emissions and uncertainties, and (2) compare to prior emissions at the monthly scale.  
- We are going to run different exercises **(OSSEs)**.  You should always run an OSSE to test your code before running a real data case.
- This toy code has a series of blocks that opens libraries, and files along with all the plotting and vector/df/matrix rearranging.
- Assume that biospheric emissions are perfectly well known. 
- Assumes that you already computed your Jacobian **H** in the right format.
- Assumes that you have observations and that they have already been subsetted for afternoon hours.    
- The toy code creates all the pieces of the inversion and provides output like statistics, plots, etc.
- For the different OSSEs, we are going to use a "true emission" and "prior emissions".  We will also vary how we parameterize **S<sub>z** and **S<sub>o**.  We have two choices for emissions: ACES_FFDAS (Gately et al., ) and GRA2PES (website).
- **Note**, please wait until you see the word **"done!"** before moving to the next block of code.  Some code blocks take a minute or two to run.

### Remember that we use OSSEs to (1) test your code, (2) explore sensitivities, and (3) find the best inversion setup for your situation.

------------------------------------------------------------------------
The code block below loads necessary python libraries and some files.  This is administrative stuff.

In [None]:
#Toy Code for Lagrangian Local-Scale Problems Using In-Situ Data from 3 Fixed Sites Around Baltimore
#Written by Kim Mueller (Kimlm@umich.edu)
#For GHG Center Summer School
#Date July 2025
#NOTE - for teaching purposes only.  Do not use outside of classroom, not responsible for any errors.

import sys
sys.path.append('C:/Users/klm3/AppData/Local/Programs/Python/Python311/Lib/site-packages')
import os
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import warnings
from scipy.sparse import diags
from scipy.linalg import inv as dense_inv
from scipy.linalg import cholesky 
import scipy.sparse as sp
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from shapely.geometry import Polygon, Point, box
from matplotlib.colors import Normalize, LogNorm 

warnings.filterwarnings("ignore")
mask_df = pd.read_csv('G:/SummerSchool/shapefiles/mask.csv')
mask_array = mask_df['Mask'].values
num_fluxes =sum(mask_array==1)
 
#add lons and lats
subdir = 'priors_processed'
priorstring = 'ACES_FFDAS'  
lats = np.load('G:/SummerSchool/priors/'+subdir+'/'+priorstring+'_lat.npy')
lons = np.load('G:/SummerSchool/priors/'+subdir+'/'+priorstring+'_long.npy')
lons = lons[mask_array.ravel().astype(bool)] #removes rows where mask_array = 0
lats = lats[mask_array.ravel().astype(bool)]    
lat_grid = np.unique(lats)
lon_grid = np.unique(lons)
mask = True
print('done!')

## Break into groups to run one of the six exercises (specified below) using the toy code
------------------------------------------------------------------------
### Example 1:  
### Ensure your inversion code is working correctly (base case)

#### To do this 
(1) Create enhancement (**H${x}$<sub>o**) using **H** (created with WRF2-STILT) and **${x}$<sub>o** (ACES+FFDAS) which we assume is **z<sub>truth**.   (2) Use **z<sub>truth** with **H** (created with with WRF2-STILT), and **${x}$<sub>o** (ACES+FFDAS) to estimate $\hat{x}$.    
(3) Use a diagonal of ones multiplied by 0.001 for **S<sub>z**.    
(4) Use a diagonal of ones for **S<sub>o**. 

Please answer the following:  
- Are you able to retrieve the truth?
- Why did we construct OSSE like this?
- Do you have any other insights?

------------------------------------------------------------------------

### Example 2:  
### Adding varying white noise on **z<sub>truth** to see the impact on the inversion resuts but fix the amount of assumed error in **S<sub>o<sub>** to simulate real-world violations of that assumption.    

### Example 3:
### Changing prior and covariance parameters on **S<sub>o** to explore how prior information and uncertainty assumptions impact the inversion results.  The point of this example is to help illustrate how bias, structure, and scale in the prior and **S<sub>o** influences the model's ability to recover the truth.


In [None]:
#%Choosing your group's 
Exercise1 = False
Exercise2 = False
Exercise3 = True

bias = 0
#variables for the inversion DO NOT CHANGE
priorslist = ['ACES_FFDAS', 'GRAAPESCO2']
tower_names = ['NEB','NWB','HAL']
monthlist = ['02','07']
truemet = 'WRF2' 
met = truemet 
unc = False
z_directory = 'G:/SummerSchool/output/enhancements/'
y_save_directory = 'G:/SummerSchool/output/y/'

if Exercise1 or Exercise2:  #R = Sz, y = z, Q = So
    truth = 'ACES_FFDAS'  
    prior = 'ACES_FFDAS'
    y_bias = 0
    R_ones = True
    qones = True #Covariance matrix So is a diagonal of ones 
    q_param = 1 #This is a scaling factor to multiply So. Keep (1) for Exercise 1& 2, Change for Excercise 3 (b) and (c)-----> CHANGE THIS
    q_param = q_param**2
    R_whitenoise_mean_feb = 0
    R_whitenoise_std_feb = 0
    R_whitenoise_mean_jul = 0
    R_whitenoise_std_jul = 0
elif Exercise3:
    y_bias = 0
    R_ones = True
    R_whitenoise_mean_feb = 0
    R_whitenoise_std_feb = 0
    R_whitenoise_mean_jul = 0
    R_whitenoise_std_jul = 0
    qones = False # (1) = TRUE, (2) & (3) = FALSE ----->  CHANGE THIS

if Exercise1: #R = Sz, y = z
    R_param_feb = 0.05 #This is a scaling factor that you use to multiply on your Sz matrix
    R_param_feb =  R_param_feb**2 #to ppm2
    R_param_jul = 0.05 #This is a scaling factor that you use to multiply on your Sz matrix
    R_param_jul =  R_param_jul**2 #to ppm2
    y_whitenoise_mean_feb = 0 #Do not change this as we want mean 0
    y_whitenoise_std_feb = 0 #No noise
    y_whitenoise_mean_jul = 0 #Do not change this as we want mean 0
    y_whitenoise_std_jul = 0 #No noise

if Exercise2: #R = So, y = z
    R_param_feb = 0.1 #This is a scaling factor that you use to multiply on your Sz matrix
    R_param_feb =  R_param_feb**2 #to ppm2
    R_param_jul = 0.1 #This is a scaling factor that you use to multiply on your Sz matrix
    R_param_jul =  R_param_jul**2 #to ppm2
    y_whitenoise_mean_feb = 0 #Do not change this as we want mean 0
    y_whitenoise_std_feb = 0.1 #Picarro = 0.1, K30 = 30, very low cost sensor = 50 ----->  CHANGE THIS
    y_whitenoise_mean_jul = 0 #Do not change this as we want mean 0
    y_whitenoise_std_jul = 0.1 #Picarro = 0.1, K30 = 30, very low cost sensor = 50 ----->  CHANGE THIS

if Exercise3: 
    #IMPACT OF So & Sz
    qones = False #(1) True (2) & (3) are False
    truth = 'ACES_FFDAS'  
    prior = 'ACES_FFDAS' #(1) 'GRAAPESCO2', (2) 'ACES_FFDAS', & (3) 'ACES_FFDAS'  ----->  CHANGE THIS
    if qones:  #-------> CHANGE THIS! Make sure that it is (1) TRUE and (2) & (3) FALSE
        q_param_feb = 1 #(1) (Tight) 
        q_param_jul = 1 #(1) (Tight)  
        q_param_feb = q_param_feb**2
        q_param_jul = q_param_jul**2 
        R_param_feb = 0.1 #(1) = 0.1  
        R_param_feb =  R_param_feb**2 #to ppm2
        R_param_jul = 0.1 #(1) & (2) = 0.05 Very Small
        R_param_jul =  R_param_jul**2 #to ppm2 
        prior_multiplier = 1 #Do not change
    else: #Diag of R is (1) prior^2 or (2) (0.5*truth)^2
        q_param_feb = 10  #(2) = 1 (tight), & (3) = 10 (Really Loose) ----->  CHANGE THIS
        q_param_jul = 10 #(2) = 1 (tight), & (3) = 10 (Really Loose) ----->  CHANGE THIS
        q_floor = 1
        prior_multiplier = 0.75 #(1) & (2) = 1, (3) = 0.75 ----->  CHANGE THIS
        # Keep Picarro like noise on z but account for it in Sz
        R_param_feb = 0.05 #(2) = 0.05 (tighgly to obs) (3) 0.05 (tighgly to obs) 
        R_param_feb =  R_param_feb**2 #to ppm2
        R_param_jul = 0.05 #(2) = 0.05 (tighgly to obs) (3) 0.05 (tighgly to obs) 
        R_param_jul =  R_param_jul**2 #to ppm2 
    y_whitenoise_mean_feb = 0 #Do not change this as we want mean 0; but you can change this later to explore constant bias
    y_whitenoise_std_feb = 0.1 #Picarro = 0.1, K30 = 30, very low cost sensor = 50 
    y_whitenoise_mean_jul = 0 #Do not change this as we want mean 0; but you can change this later to explore constant bias
    y_whitenoise_std_jul = 0.1 #Picarro = 0.1, K30 = 30, very low cost sensor = 50

print(' ')
print('Met/Dispersion model = ' + truemet + '-STILT')
print('True emissions = ' + truth)
print('Prior fluxes = '+ prior) 
print('done!')

#### In the next block, we are loading data, adding noise (if specified), and computing the mean enhancement of **ztruth**
#### Write down answers on your sheet.

In [None]:
print('Loading Data ...')
def load_numpy_array(directory,fname):
    filename = f'{directory}{fname}'
    data = np.load(filename)
    return data

#load
values_feb_truth = []
values_feb_prior = []
values_jul_truth = []
values_jul_prior = []
unit_WRF2_feb =[]
unit_WRF2_jul = []

for site in tower_names:
    #Feb
    filename_feb_prior = site + '_y_' + prior+'_02_2019.npy'
    feb_values_prior = load_numpy_array(y_save_directory,filename_feb_prior)
    values_feb_prior.append(feb_values_prior)
    
    filename_feb_truth = site + '_y_' + truth+'_02_2019.npy'
    feb_values_truth = load_numpy_array(y_save_directory,filename_feb_truth)
    values_feb_truth.append(feb_values_truth)
    #print('y (feb)'+site + ' ' + str(len(feb_values_truth)))

    u_WRF2_feb = np.load(z_directory + 'unit_'+ site+'_2019_02_WRF2.npy') #change later
    u_WRF2_feb = [x for x in u_WRF2_feb if not pd.isna(x) and x!='nan']
    u_WRF2_feb  = np.array(u_WRF2_feb)
    unit_WRF2_feb.append(u_WRF2_feb) 
    
    #Jul    
    filename_jul_prior = site + '_y_' + prior+'_07_2019.npy'
    jul_values_prior = load_numpy_array(y_save_directory,filename_jul_prior)
    values_jul_prior.append(jul_values_prior)
    
    filename_jul_truth = site + '_y_' + truth+'_07_2019.npy'
    jul_values_truth = load_numpy_array(y_save_directory,filename_jul_truth)
    values_jul_truth.append(jul_values_truth)
    
    u_WRF2_jul = np.load(z_directory + 'unit_'+ site+'_2019_07_WRF2.npy') #change later
    u_WRF2_jul = [x for x in u_WRF2_jul if not pd.isna(x) and x!='nan']
    u_WRF2_jul  = np.array(u_WRF2_jul)
    unit_WRF2_jul.append(u_WRF2_jul)

#Feb
y_feb_array_truth = np.concatenate(values_feb_truth)
y_feb_array_prior = np.concatenate(values_feb_prior)
r_unit_WRF2_feb = np.concatenate(unit_WRF2_feb)
feb_mean = np.mean(y_feb_array_truth)
feb_mean_pr = np.mean(y_feb_array_prior)
feb_mean_u = np.mean(r_unit_WRF2_feb)

#Jul
y_jul_array_truth = np.concatenate(values_jul_truth)
y_jul_array_prior = np.concatenate(values_jul_prior)
r_unit_WRF2_jul = np.concatenate(unit_WRF2_jul)
jul_mean = np.mean(y_jul_array_truth)
jul_mean_pr = np.mean(y_jul_array_prior)
jul_mean_u = np.mean(r_unit_WRF2_jul)

noise_feb = np.random.normal(y_whitenoise_mean_feb, y_whitenoise_std_feb, size = y_feb_array_prior.shape)
noise_jul = np.random.normal(y_whitenoise_mean_jul, y_whitenoise_std_jul, size = y_jul_array_prior.shape)

y_feb_array_truth_hold = y_feb_array_truth.copy()
y_jul_array_truth_hold = y_jul_array_truth.copy()

y_feb_array_truth = y_feb_array_truth+noise_feb+bias
y_jul_array_truth = y_jul_array_truth+noise_jul+bias    

print(' ')
print('Mean enhancement (ztruth) for Feb is '+ str(round(feb_mean,2)) +' ppm')
print('Mean enhancement (ztruth) for Jul is '+ str(round(jul_mean,2)) + ' ppm')
print(' ')

print('Mean prior enhancement for Feb is '+ str(round(feb_mean_pr,2)) +' ppm')
print('Mean prior enhancement for Jul is '+ str(round(jul_mean_pr,2)) + ' ppm')
print('For exercise 3d - skip printing these enhancements')
print(' ')
    
print('Mean unit enhancement for Feb is '+ str(round(feb_mean_u,2)) +' ppm')
print('Mean unit enhancement for Jul is '+ str(round(jul_mean_u,2)) + ' ppm')

print(' ')
print('done!')
print('Configuring Sz')
print(' ')

#Create Sz
if R_ones:
    R_Feb = np.ones(y_feb_array_prior.shape)
    R_Jul = np.ones(y_jul_array_prior.shape)
    noise_feb = np.random.normal(R_whitenoise_mean_feb,R_whitenoise_std_feb, size = y_feb_array_prior.shape)
    noise_jul = np.random.normal(R_whitenoise_mean_jul,R_whitenoise_std_jul, size = y_jul_array_prior.shape)
    R_Feb = (R_Feb+noise_feb)*R_param_feb
    R_Jul = (R_Jul+noise_jul)*R_param_jul
    print('Sz = diag of one for Feb and July')
else:
    R_Feb  = u_WRF2_feb*R_param_feb
    R_Jul  = u_WRF2_jul*R_param_jul
    print('Sz = diag of unit enhancements for Feb and July')

print(' ')
print('Scaling factor on Sz is ' + str(np.sqrt(R_param_feb)) + ' ppm in Feb')
print('Scaling factor on Sz is ' + str(np.sqrt(R_param_jul)) + ' ppm in Jul')


#### What what do these average hourly enhancements mean in terms of the values of the errors you specify in **S<sub>z**?  

##### In the next block, we are configuring **S<sub>z<sub>** data 

In [None]:
#%% Creating So & Loading Hmatrix
from scipy.spatial.distance import cdist
from scipy.sparse import diags  # still used for diagonal case

use_spatial_correlation = True

def load_sparse_matrix(filename):
    return sp.load_npz(filename)

Hmatrix_feb = load_sparse_matrix('G:/SummerSchool/output/Hmatrices/H_'+met+'_2019_02.npz') 
Hmatrix_jul = load_sparse_matrix('G:/SummerSchool/output/Hmatrices/H_'+met+'_2019_07.npz') 

prior_array_feb = np.load('G:/SummerSchool/output/prior/'+prior+'_2019_02.npy')*prior_multiplier
prior_array_jul = np.load('G:/SummerSchool/output/prior/'+prior+'_2019_07.npy')*prior_multiplier

truth_array_feb = np.load('G:/SummerSchool/output/prior/'+truth+'_2019_02.npy')
truth_array_jul = np.load('G:/SummerSchool/output/prior/'+truth+'_2019_07.npy')
q_size_jul = len(prior_array_jul)
q_size_feb = len(prior_array_feb)

#Check dimensions
print('')
print(f"Hmatrix shape(Feb): {Hmatrix_feb.shape}")
print('Prior length: ' + str(len(prior_array_feb)))
print('Size of So (Feb) = ' + str(q_size_feb) + ' x ' +str(q_size_feb))
print(' ')
print('Mean prior emission value for Feb is '+ str(round(np.mean(prior_array_feb),2)) + ' umol/m2s')
percentile_75_feb = np.percentile(prior_array_feb, 75)
print('75th percentile of prior emission value for Feb is '+ str(round(percentile_75_feb,2)) + ' umol/m2s')
print(' ')
print('Mean truth emission value for Feb is '+ str(round(np.mean(truth_array_feb),2)) + ' umol/m2s')
print(' ')
print(f"Hmatrix shape(July): {Hmatrix_jul.shape}")    
print('Size of So (Jul) = ' + str(q_size_jul) + ' x ' +str(q_size_jul))
print('Prior length: ' + str(len(prior_array_jul)))
print(' ')
print('Mean prior emission value for Jul is '+ str(round(np.mean(prior_array_jul),2)) + ' umol/m2s')
percentile_75_jul = np.percentile(prior_array_jul, 75)
print('75th percentile of prior emission value for Jul is '+ str(round(percentile_75_jul,2)) + ' umol/m2s')
print(' ')
print('Mean truth emission value for Jul is '+ str(round(np.mean(truth_array_jul),2)) + ' umol/m2s')

Hsp_jul = Hmatrix_jul@prior_array_jul #should be the same as original y_jul_array_prior
Hsp_feb = Hmatrix_feb@prior_array_feb #should be the same as original y_feb_array_prior

print(' ')
print('Created Hxsp')
print(' ')

fig, ax = plt.subplots(1,2,figsize=(18,6))
ax[0].plot(y_feb_array_truth,label='ztrue enh. w noise and bias',color='black',linewidth =.75)
ax[0].plot(Hsp_feb,label='Hxprior:modelled enhan.',color='red',linewidth =1)
ax[0].legend(fontsize=14)  # Adjust font size for legend entries
ax[0].set_ylabel('ppm', fontsize=14)  # Adjust font size for y-axis label
ax[0].set_xlabel('Time index (hourly) w gaps', fontsize=14)  # Adjust font size for x-axis label
ax[0].set_title("ztruth vs Hxprior Feb 2019", fontsize=14) 
ax[0].grid(True)

ax[1].plot(y_jul_array_truth,label='ztrue enh. w noise and bias',color='black',linewidth =.75)
ax[1].plot(Hsp_jul,label='Hxprior:modelled enh.',color='red',linewidth =1)
ax[1].legend(fontsize=14)  # Adjust font size for legend entries
ax[1].set_ylabel('ppm', fontsize=14)  # Adjust font size for y-axis label
ax[1].set_xlabel('Time index (hourly) w gaps', fontsize=14)  # Adjust font size for x-axis label
ax[1].set_title("ztruth vs Hxprior Jul 2019", fontsize=14)
ax[1].grid(True)

print('Mean of difference between Hxprior and ztrue (Feb) = ' + str(round(np.mean(Hsp_feb - y_feb_array_truth),4)))
print('Mean of difference between Hxprior and ztrue (Jul) = ' + str(round(np.mean(Hsp_jul - y_jul_array_truth),4)))
print('')
if qones:
    print('So is diag of ones')
    Q_diag_feb = (np.ones(prior_array_feb.shape[0])*q_param_feb)
    q_feb = Q_diag_feb
    Q_diag_feb = diags(Q_diag_feb,0)
    Q_diag_jul = (np.ones(prior_array_jul.shape[0])*q_param_jul)
    q_jul = Q_diag_jul
    Q_diag_jul = diags(Q_diag_jul,0)
else:
    print('So is varying (per prior values)') 
    q_prior_array_feb = prior_array_feb.copy()
    q_prior_array_feb[q_prior_array_feb < 1] = 1 #give So a floor of one
    q_diag_feb = (q_param_feb) * np.square(q_prior_array_feb)
    q_feb = q_diag_feb
    Q_diag_feb = diags(q_diag_feb,0)
    q_prior_array_jul =prior_array_jul.copy()
    q_prior_array_jul[q_prior_array_jul < 1] = q_floor
    q_diag_jul = (q_param_jul) * np.square(q_prior_array_jul)
    Q_diag_jul = diags(q_diag_jul,0)
    q_jul = q_diag_jul

print('') 
print('Scaling factor on Sz is ' + str(round(np.sqrt(q_param_feb),3)) + ' ppm in Feb')
print('Scaling factor on Sz is ' + str(round(np.sqrt(q_param_feb),3)) + ' ppm in Jul')
print(' ')
print('Size of So diag (Feb) = ' +  str(q_feb.shape[0]))
print('Size of So diag (Jul) = ' +  str(q_jul.shape[0]))

print('')
print('Created So July and So Feb')
print(' ')
print('done!')

#### What is the relationship between the ztrue and the red line?

##### In this block, we create **S<sub>o</sub>** and load our **H** matrices for February and July 2019. You always want to check your dimensions on everything.  1

In [None]:
#%% Creating So & Loading Hmatrix
from scipy.spatial.distance import cdist
from scipy.sparse import diags  # still used for diagonal case

use_spatial_correlation = True

def load_sparse_matrix(filename):
    return sp.load_npz(filename)

Hmatrix_feb = load_sparse_matrix('G:/SummerSchool/output/Hmatrices/H_'+met+'_2019_02.npz') 
Hmatrix_jul = load_sparse_matrix('G:/SummerSchool/output/Hmatrices/H_'+met+'_2019_07.npz') 

prior_array_feb = np.load('G:/SummerSchool/output/prior/'+prior+'_2019_02.npy')*prior_multiplier
prior_array_jul = np.load('G:/SummerSchool/output/prior/'+prior+'_2019_07.npy')*prior_multiplier

truth_array_feb = np.load('G:/SummerSchool/output/prior/'+truth+'_2019_02.npy')
truth_array_jul = np.load('G:/SummerSchool/output/prior/'+truth+'_2019_07.npy')
q_size_jul = len(prior_array_jul)
q_size_feb = len(prior_array_feb)

#Check dimensions
print('')
print(f"Hmatrix shape(Feb): {Hmatrix_feb.shape}")
print('Prior length: ' + str(len(prior_array_feb)))
print('Size of So (Feb) = ' + str(q_size_feb) + ' x ' +str(q_size_feb))
print(' ')
print('Mean prior emission value for Feb is '+ str(round(np.mean(prior_array_feb),2)) + ' umol/m2s')
percentile_75_feb = np.percentile(prior_array_feb, 75)
print('75th percentile of prior emission value for Feb is '+ str(round(percentile_75_feb,2)) + ' umol/m2s')
print(' ')
print('Mean truth emission value for Feb is '+ str(round(np.mean(truth_array_feb),2)) + ' umol/m2s')
print(' ')
print(f"Hmatrix shape(July): {Hmatrix_jul.shape}")    
print('Size of So (Jul) = ' + str(q_size_jul) + ' x ' +str(q_size_jul))
print('Prior length: ' + str(len(prior_array_jul)))
print(' ')
print('Mean prior emission value for Jul is '+ str(round(np.mean(prior_array_jul),2)) + ' umol/m2s')
percentile_75_jul = np.percentile(prior_array_jul, 75)
print('75th percentile of prior emission value for Jul is '+ str(round(percentile_75_jul,2)) + ' umol/m2s')
print(' ')
print('Mean truth emission value for Jul is '+ str(round(np.mean(truth_array_jul),2)) + ' umol/m2s')

Hsp_jul = Hmatrix_jul@prior_array_jul #should be the same as original y_jul_array_prior
Hsp_feb = Hmatrix_feb@prior_array_feb #should be the same as original y_feb_array_prior

print(' ')
print('Created Hxsp')
print(' ')

fig, ax = plt.subplots(1,2,figsize=(18,6))
ax[0].plot(y_feb_array_truth,label='ztrue enh. w noise and bias',color='black',linewidth =.75)
ax[0].plot(Hsp_feb,label='Hxprior:modelled enhan.',color='red',linewidth =1)
ax[0].legend(fontsize=14)  # Adjust font size for legend entries
ax[0].set_ylabel('ppm', fontsize=14)  # Adjust font size for y-axis label
ax[0].set_xlabel('Time index (hourly) w gaps', fontsize=14)  # Adjust font size for x-axis label
ax[0].set_title("ztruth vs Hxprior Feb 2019", fontsize=14) 
ax[0].grid(True)

ax[1].plot(y_jul_array_truth,label='ztrue enh. w noise and bias',color='black',linewidth =.75)
ax[1].plot(Hsp_jul,label='Hxprior:modelled enh.',color='red',linewidth =1)
ax[1].legend(fontsize=14)  # Adjust font size for legend entries
ax[1].set_ylabel('ppm', fontsize=14)  # Adjust font size for y-axis label
ax[1].set_xlabel('Time index (hourly) w gaps', fontsize=14)  # Adjust font size for x-axis label
ax[1].set_title("ztruth vs Hxprior Jul 2019", fontsize=14)
ax[1].grid(True)

print('Mean of difference between Hxprior and ztrue (Feb) = ' + str(round(np.mean(Hsp_feb - y_feb_array_truth),4)))
print('Mean of difference between Hxprior and ztrue (Jul) = ' + str(round(np.mean(Hsp_jul - y_jul_array_truth),4)))
print('')
if qones:
    print('So is diag of ones')
    Q_diag_feb = (np.ones(prior_array_feb.shape[0])*q_param_feb)
    q_feb = Q_diag_feb
    Q_diag_feb = diags(Q_diag_feb,0)
    Q_diag_jul = (np.ones(prior_array_jul.shape[0])*q_param_jul)
    q_jul = Q_diag_jul
    Q_diag_jul = diags(Q_diag_jul,0)
else:
    print('So is varying (per prior values)') 
    q_prior_array_feb = prior_array_feb.copy()
    q_prior_array_feb[q_prior_array_feb < 1] = 1 #give So a floor of one
    q_diag_feb = (q_param_feb) * np.square(q_prior_array_feb)
    q_feb = q_diag_feb
    Q_diag_feb = diags(q_diag_feb,0)
    q_prior_array_jul =prior_array_jul.copy()
    q_prior_array_jul[q_prior_array_jul < 1] = q_floor
    q_diag_jul = (q_param_jul) * np.square(q_prior_array_jul)
    Q_diag_jul = diags(q_diag_jul,0)
    q_jul = q_diag_jul

print('') 
print('Scaling factor on Sz is ' + str(round(np.sqrt(q_param_feb),3)) + ' ppm in Feb')
print('Scaling factor on Sz is ' + str(round(np.sqrt(q_param_feb),3)) + ' ppm in Jul')
print(' ')
print('Size of So diag (Feb) = ' +  str(q_feb.shape[0]))
print('Size of So diag (Jul) = ' +  str(q_jul.shape[0]))

print('')
print('Created So July and So Feb')
print(' ')
print('done!')

#### What can you say about how your modelled enhancements, i.e. **Hx<sub>prior<sub>**, are different than your **ztruth**?   

#### How does that compare with your true enhancement?  

 
##### In the next block, we will create all the pieces for the inversion for Feb and July 2019 --- and most importantly, estimate our emissions!

In [None]:
#%% HSoHt &SoHt Feb and July
HQ_feb = Hmatrix_feb@Q_diag_feb
HQ_jul = Hmatrix_jul@Q_diag_jul

def create_diagonal_matrix(vector):
  vector_size = len(vector)
  diagonal_matrix = np.zeros((vector_size, vector_size))
  diagonal_matrix[np.diag_indices(vector_size)] = vector
  return diagonal_matrix

R_diagonal_Feb = create_diagonal_matrix(R_Feb)
R_diagonal_Jul = create_diagonal_matrix(R_Jul)

Htrans_feb = Hmatrix_feb.T
HQHt_feb = HQ_feb@Htrans_feb
QHt_feb = Q_diag_feb@Htrans_feb
Psi_feb = HQHt_feb.toarray()+R_Feb
Psi_inv_feb = dense_inv(Psi_feb)

Htrans_jul = Hmatrix_jul.T
HQHt_jul = HQ_jul@Htrans_jul
QHt_jul = Q_diag_jul@Htrans_jul
HQ_jul = Hmatrix_jul@Q_diag_jul
Psi_jul = HQHt_jul.toarray()+R_Jul
Psi_inv_jul = dense_inv(Psi_jul)

psi_z_feb= Psi_inv_feb@(y_feb_array_truth-Hsp_feb)
shat_feb = prior_array_feb+QHt_feb@psi_z_feb 

psi_z_jul= Psi_inv_jul@(y_jul_array_truth-Hsp_jul)
shat_jul = prior_array_jul+QHt_jul@psi_z_jul 

print('')
print('Feb: HSo (' +  str(HQ_feb.shape[0]) + ',' + str(HQ_feb.shape[1]) +')')
print('July: HSo (' +  str(HQ_jul.shape[0]) + ',' + str(HQ_jul.shape[1]) +')')
print(' ')
print('Feb: Sz diagonal (' +  str(R_diagonal_Feb.shape[0]) + ',' +  str(R_diagonal_Feb.shape[1]) +')')
print('July: Sz diagonal (' + str(R_diagonal_Jul.shape[0]) + ',' +  str(R_diagonal_Jul.shape[1]) +')')
print(' ')
print('Created HSoHt +Sz (Psi) and inv(Psi) for Feb and July')
print('')
print('Feb and Jul emissions shat estimates are completed')
print(' ')
print('done!')

##### In the next block, (if you want to) we will "approximate" uncertainty associated with the **x<sub>hat<sub>** emissions in Feb and July 2019.  This takes too long for our excercise time so we are going to skip it.  

We cannot take the inverse of So so we have to do some slicing for an approximation.  Do not use this code outside of this toy exercise for your own research.  Use the equations provided in the batch inversion day to estimate uncertainties - but use a computer with a lot of memory or use methods specified in Yadav et al. ().

In [None]:
#Estimating Uncertainties
import numpy as np
from tqdm import tqdm
from scipy.sparse import csr_matrix
if unc:
    def compute_posterior_uncertainty_sparse(H, Q_diag, R_diag):
        """
        Efficiently compute the diagonal of the posterior covariance matrix.

        Parameters:
            H:       csr_matrix, shape (m, n), sensitivity matrix
            Q_diag:  1D array or diagonal matrix, shape (n,)
            R_diag:  1D array or diagonal matrix, shape (m,)

        Returns:
            posterior_std: 1D array of posterior standard deviations
            avg_std: float, mean of posterior std devs
        """
        if not isinstance(H, csr_matrix):
            H = H.tocsr()

        m, n = H.shape
        q_diag = Q_diag.diagonal() if hasattr(Q_diag, "diagonal") else Q_diag
        r_diag = R_diag.diagonal() if hasattr(R_diag, "diagonal") else R_diag
        R_inv_diag = 1.0 / r_diag

        vshat_diag = np.copy(q_diag)

        print("Computing diagonal of posterior covariance...")

        for i in tqdm(range(n), desc="State variable"):
            h_col = H[:, i].toarray().flatten()  # shape (m,)
            if np.any(h_col):  # skip if all zero
                vshat_diag[i] -= q_diag[i]**2 * np.sum((h_col**2) * R_inv_diag)

        posterior_std = np.sqrt(np.maximum(vshat_diag, 0))
        return posterior_std, np.mean(posterior_std)

    posterior_std_feb, avg_std_feb = compute_posterior_uncertainty_sparse(Hmatrix_feb, Q_diag_feb, R_diagonal_Feb)
    posterior_std_jul, avg_std_jul = compute_posterior_uncertainty_sparse(Hmatrix_jul, Q_diag_jul, R_diagonal_Jul)

    print(f"Average posterior std (Feb): {avg_std_feb:.4f} µmol/m²/s")
    print(f"Average posterior std (Jul): {avg_std_jul:.4f} µmol/m²/s")
    vshat_diag_feb = posterior_std_feb
    vshat_diag_jul = posterior_std_jul

else: 
    print('')
    print('Not runnuning uncertainties so posterior uncertaites equal prior posterior uncertainties.')
    vshat_diag_feb = q_feb 
    vshat_diag_jul = q_jul

##### Next code block is **UGLY** code.  But the code is rearraging things to look at the monthly mean.  (Not part of this exercise) You can change code to see how this looks at other time intervals.

In [None]:
#%%
#Rearranging shat, priors, and truths
utctime_feb = np.load('G:/SummerSchool/output/prior/H_ACES_FFDAS_2019_02_utctimes.npy',allow_pickle=True)
utctime_jul = np.load('G:/SummerSchool/output/prior/H_ACES_FFDAS_2019_07_utctimes.npy',allow_pickle=True)
nhrs_back = 180
#February
#Shat
reshape_shat_feb = shat_feb.reshape(len(utctime_feb),num_fluxes)
trimmed_shat_feb = reshape_shat_feb[nhrs_back:-(nhrs_back-1),:]
flatten_shat_feb = trimmed_shat_feb.ravel(order='F')
mean_shat_feb = np.mean(trimmed_shat_feb,axis = 0)
mean_shat_feb_week = np.mean(trimmed_shat_feb, axis=1)
#Posterior Unc Vshat
reshape_vshat_diag_feb = vshat_diag_feb.reshape(len(utctime_feb),num_fluxes)
trimmed_vshat_diag_feb = reshape_vshat_diag_feb[nhrs_back:-(nhrs_back-1),:]
flatten_vshat_diag_feb = trimmed_vshat_diag_feb.ravel(order='F')
mean_vshat_diag_feb = np.mean(trimmed_vshat_diag_feb,axis = 0)
print('')
#if unc:
#    print('average posterior uncertainty Feb = ' + str(round(np.sqrt(np.mean(mean_vshat_diag_feb)),4)))
    
#Prior Unc Q
reshape_q_diag_feb = q_feb.reshape(len(utctime_feb),num_fluxes)
trimmed_q_diag_feb = reshape_q_diag_feb[nhrs_back:-(nhrs_back-1),:]
flatten_q_diag_feb = trimmed_q_diag_feb.ravel(order='F')
mean_q_diag_feb = np.mean(trimmed_q_diag_feb,axis = 0)
#print('average prior uncertainty Feb = ' + str(round(np.sqrt(np.mean(mean_q_diag_feb)),4)))

#July
#Shat
reshape_shat_jul = shat_jul.reshape(len(utctime_jul),num_fluxes)
trimmed_shat_jul = reshape_shat_jul[nhrs_back:-(nhrs_back-1),:]
flatten_shat_jul = trimmed_shat_jul.ravel(order='F')
mean_shat_jul = np.mean(trimmed_shat_jul,axis = 0)
mean_shat_jul_week = np.mean(trimmed_shat_jul, axis=1)
#Posterior Unc Vshat
reshape_vshat_diag_jul = vshat_diag_jul.reshape(len(utctime_jul),num_fluxes)
trimmed_vshat_diag_jul = reshape_vshat_diag_jul[nhrs_back:-(nhrs_back-1),:]
flatten_vshat_diag_jul = trimmed_vshat_diag_jul.ravel(order='F')
mean_vshat_diag_jul = np.mean(trimmed_vshat_diag_jul,axis = 0)
#if unc:
#    print('average posterior uncertainty Jul = ' + str(round(np.sqrt(np.mean(mean_vshat_diag_jul)),4)))
#Prior Unc Q
reshape_q_diag_jul = q_jul.reshape(len(utctime_jul),num_fluxes)
trimmed_q_diag_jul = reshape_q_diag_jul[nhrs_back:-(nhrs_back-1),:]
flatten_q_diag_jul = trimmed_q_diag_jul.ravel(order='F')
mean_q_diag_jul = np.mean(trimmed_q_diag_jul,axis = 0)
#print('average prior uncertainty Jul = ' + str(round(np.sqrt(np.mean(mean_q_diag_jul)),4)))

mean_shat_feb = mean_shat_feb.reshape(len(lon_grid),len(lat_grid)) #lon and lat may need to be flipped but don't know
mean_shat_jul = mean_shat_jul.reshape(len(lon_grid),len(lat_grid)) #lon and lat may need to be flipped but don't know
mean_vshat_diag_feb = mean_vshat_diag_feb.reshape(len(lon_grid),len(lat_grid)) #
mean_vshat_diag_jul = mean_vshat_diag_jul.reshape(len(lon_grid),len(lat_grid)) #lon and lat may need to be flipped but don't know

prior_array_feb = prior_array_feb.reshape(len(utctime_feb),num_fluxes)
prior_array_feb = prior_array_feb[nhrs_back:-(nhrs_back-1),:]
flatten_prior_array_feb = prior_array_feb.ravel(order='F')
mean_prior_array_feb = np.mean(prior_array_feb,axis = 0)
mean_prior_array_feb = mean_prior_array_feb.reshape(len(lon_grid),len(lat_grid)) #lon and lat may need to be flipped but don't know

prior_array_jul = prior_array_jul.reshape(len(utctime_jul),num_fluxes)
prior_array_jul = prior_array_jul[nhrs_back:-(nhrs_back-1),:]
flatten_prior_array_jul = prior_array_jul.ravel(order='F')
mean_prior_array_jul = np.mean(prior_array_jul,axis = 0)
mean_prior_array_jul = mean_prior_array_jul.reshape(len(lon_grid),len(lat_grid)) #lon and lat may need to be flipped but don't know

q_diag_array_feb = q_feb.reshape(len(utctime_feb),num_fluxes)
q_diag_array_feb = q_diag_array_feb[nhrs_back:-(nhrs_back-1),:]
flatten_q_diag_array_feb = q_diag_array_feb.ravel(order='F')
mean_q_diag_array_feb = np.mean(q_diag_array_feb,axis = 0)
mean_q_diag_array_feb = mean_q_diag_array_feb.reshape(len(lon_grid),len(lat_grid)) #lon and lat may need to be flipped but don't know

q_diag_array_jul = q_jul.reshape(len(utctime_jul),num_fluxes)
q_diag_array_jul = q_diag_array_jul[nhrs_back:-(nhrs_back-1),:]
flatten_q_diag_array_jul = q_diag_array_jul.ravel(order='F')
mean_q_diag_array_jul = np.mean(q_diag_array_jul,axis = 0)
mean_q_diag_array_jul = mean_q_diag_array_jul.reshape(len(lon_grid),len(lat_grid)) #lon and lat may need to be flipped but don't know

mean_prior_array_feb = mean_prior_array_feb.reshape(len(lon_grid),len(lat_grid)) #lon and lat may need to be flipped but don't know
mean_prior_array_jul = mean_prior_array_jul.reshape(len(lon_grid),len(lat_grid)) #lon and lat may need to be flipped but don't know
mean_q_diag_array_feb = mean_q_diag_array_feb.reshape(len(lon_grid),len(lat_grid)) #lon a
mean_q_diag_array_jul = mean_q_diag_array_jul.reshape(len(lon_grid),len(lat_grid))

#Truth Feb and July
truth_array_feb = truth_array_feb.reshape(len(utctime_feb),num_fluxes)
truth_array_feb = truth_array_feb[nhrs_back:-(nhrs_back-1),:]
flatten_truth_array_feb = truth_array_feb.ravel(order='F')
mean_truth_array_feb = np.mean(truth_array_feb,axis = 0)
mean_truth_array_feb = mean_truth_array_feb.reshape(len(lon_grid),len(lat_grid)) #lon and lat may need to be flipped but don't know

truth_array_jul = truth_array_jul.reshape(len(utctime_jul),num_fluxes)
truth_array_jul = truth_array_jul[nhrs_back:-(nhrs_back-1),:]
flatten_truth_array_jul = truth_array_jul.ravel(order='F')
mean_truth_array_jul = np.mean(truth_array_jul,axis = 0)
mean_truth_array_jul = mean_truth_array_jul.reshape(len(lon_grid),len(lat_grid)) #lon and lat may need to be flipped but don't know

print('Configured xhat, xprior emissions, emissions truth, So!')
if unc:
    print('and Vxhat diagonal!')
print(' ')
print('done!')

### For all exercises: 
#### What do the statistics for the full domain tell us?

Note that the next code block **DOES NOT** calculate the chi-squared statistic.  This is something for you to do later.  It is an important metric to look at.

In [None]:
#%%Calculating Statistics

diff_prior_feb = mean_shat_feb-mean_prior_array_feb
diff_prior_jul = mean_shat_jul-mean_prior_array_jul

diff_truth_feb = mean_shat_feb-mean_truth_array_feb
diff_truth_jul = mean_shat_jul-mean_truth_array_jul

mean_diff_feb = round(np.mean(diff_truth_feb),4)
mean_diff_jul = round(np.mean(diff_truth_jul),4)

mean_prior_feb = round(np.mean(mean_prior_array_feb),4)
mean_prior_jul = round(np.mean(mean_prior_array_jul),4)

mean_truth_feb =round(np.mean(mean_truth_array_feb),4)
mean_truth_jul =round(np.mean(mean_truth_array_jul),4)

mean_shat_feb_val =round(np.mean(mean_shat_feb),4)
mean_shat_jul_val =round(np.mean(mean_shat_jul),4)

def calculate_rmse(truth, estimated):
    assert truth.shape  == estimated.shape, "Arrays must be the same shape!"
    rmse = np.sqrt(np.nanmean((truth-estimated)**2))
    return rmse

rmse_feb = calculate_rmse(flatten_shat_feb,flatten_truth_array_feb)
rmse_jul = calculate_rmse(flatten_shat_jul,flatten_truth_array_jul)

corr_matrix_feb = np.corrcoef(flatten_truth_array_feb,flatten_shat_feb)
corr_coef_feb = corr_matrix_feb[0,1]
corr_matrix_jul = np.corrcoef(flatten_truth_array_jul,flatten_shat_jul)
corr_coef_jul = corr_matrix_jul[0,1]

std_err_feb = np.std(flatten_shat_feb - flatten_truth_array_feb)/np.sqrt(len(flatten_truth_array_feb))
std_err_jul = np.std(flatten_shat_jul - flatten_truth_array_jul)/np.sqrt(len(flatten_truth_array_jul))

print('Statistics in flux space')
print(' ')
print('Mean truth flux (Feb):' + str(round(mean_truth_feb,3)) + ' µmol/m2s')
print('Mean truth flux (Jul):' + str(round(mean_truth_jul,3)) + ' µmol/m2s')
print(' ')
print('Mean estimated flux (Feb):' + str(round(mean_shat_feb_val,3)) + ' µmol/m2s')
print('Mean estimated flux (Jul):' + str(round(mean_shat_jul_val,3)) + ' µmol/m2s')
print('')
print('Mean prior flux (Feb):' + str(round(mean_prior_feb,3)) + ' µmol/m2s')
print('Mean prior flux (Jul):' + str(round(mean_prior_jul,3)) + ' µmol/m2s')
print('')
print(f"Mean difference (xhat - xtruth) Feb: {mean_diff_feb:.4f} µmol/m2s")
print(f"Mean difference (xhat - xtruth) Jul: {mean_diff_jul:.4f} µmol/m2s")
print('')
print(f"Std Error Feb: {std_err_feb:.4f} µmol/m2s")
print(f"Std Error Jul: {std_err_jul:.4f} µmol/m2s")
print(' ')
print(f"Correlation Coefficient (xhat,xtruth) Feb: {corr_coef_feb:.4f}")
print(f"Correlation Coefficient (xhat,xtruth) Jul: {corr_coef_jul:.4f}")
print(' ')
print(f"RMSE Feb: {rmse_feb:.4f} µmol/m2s")
print(f"RMSE Jul: {rmse_jul:.4f} µmol/m2s")


#### Let's check out how things look in space across our entire domain and look at differences.  
- There will be 6 plots that show the estimates, priors, and truths.
- IF we aren't running the perfect case, we will also have four more plots looking at differences.
- We don't show the base case because there will be small spurious noise in the differences.

##### **Note:** a lot of **UGLY** plotting code in this block

In [None]:
#%%Plotting
print('Plotting spatial maps')
print('This can take awhile ... please be patient')
ua_pathname = 'G:/NEC/Regional/shapefiles/Census/UrbanAreas/tl_2023_us_uac20.zip'
gdf_ua = gpd.read_file(ua_pathname)
gdf_ua.crs = "EPSG:4326"

# Select Baltimore areas
gdf_baltimore = gdf_ua[gdf_ua['NAME20'] == "Baltimore, MD"]
ua_pathname = 'G:/NEC/Regional/shapefiles/Census/UrbanAreas/tl_2023_us_uac20.zip'
gdf_ua = gpd.read_file(ua_pathname)
gdf_ua.crs = "EPSG:4326"
geometry = [-76.583, 39.315417] 
point = Point(geometry) 
twr_gdf_NEB = gpd.GeoDataFrame(crs="EPSG:4326", geometry=[point])
geometry = [-76.685071, 39.344541]  
point = Point(geometry) 
twr_gdf_NWB = gpd.GeoDataFrame(crs="EPSG:4326", geometry=[point])
geometry = [-76.675278, 39.255194]  
point = Point(geometry) 
twr_gdf_HAL = gpd.GeoDataFrame(crs="EPSG:4326", geometry=[point])

# Get bounds
minx, miny, maxx, maxy = gdf_baltimore.total_bounds
bounds = gdf_baltimore.total_bounds
mask_polygon = box(*bounds)

vmin_jul = np.percentile(flatten_shat_jul, 10)
vmax_jul = np.percentile(flatten_shat_jul, 90)
norm_jul = Normalize(vmin=vmin_jul,vmax=vmax_jul)

Plot_text = 'Estimates'
fig, ax = plt.subplots(2,3,figsize=(16,6), subplot_kw = {'projection':ccrs.PlateCarree()})
ax[0,0].add_feature(cfeature.COASTLINE)
ax[0,0].add_feature(cfeature.BORDERS)
ax[0,0].add_feature(cfeature.STATES)
mesh = ax[0,0].pcolormesh(lon_grid,lat_grid, mean_shat_jul,cmap='viridis', norm=norm_jul,shading='auto',transform=ccrs.PlateCarree())
gdf_baltimore.plot(ax=ax[0,0],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
twr_gdf_NEB.plot(ax=ax[0,0],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_HAL.plot(ax=ax[0,0],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_NWB.plot(ax=ax[0,0],marker='o', markersize=20, linewidth=0.5, color='red')
ax[0,0].set_title('Estimates July umols/m2s' + '(truth = ' + truth + ')', size = 10)
fig.colorbar(mesh, ax=ax[0,0])

Plot_text = 'Prior'
#mesh_grid = mean_truth_grid
ax[0,1].add_feature(cfeature.COASTLINE)
ax[0,1].add_feature(cfeature.BORDERS)
ax[0,1].add_feature(cfeature.STATES)
mesh = ax[0,1].pcolormesh(lon_grid,lat_grid, mean_prior_array_jul,cmap='viridis', norm=norm_jul,shading='auto',transform=ccrs.PlateCarree())
gdf_baltimore.plot(ax=ax[0,1],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
twr_gdf_NEB.plot(ax=ax[0,1],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_HAL.plot(ax=ax[0,1],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_NWB.plot(ax=ax[0,1],marker='o', markersize=20, linewidth=0.5, color='red')
ax[0,1].set_title(Plot_text+' July umols/m2s' + '(prior = ' + prior + ')', size = 10)
fig.colorbar(mesh, ax=ax[0,1])

Plot_text = 'Truth'
#mesh_grid = mean_truth_grid
ax[0,2].add_feature(cfeature.COASTLINE)
ax[0,2].add_feature(cfeature.BORDERS)
ax[0,2].add_feature(cfeature.STATES)
mesh = ax[0,2].pcolormesh(lon_grid,lat_grid, mean_truth_array_jul,cmap='viridis', norm=norm_jul,shading='auto',transform=ccrs.PlateCarree())
gdf_baltimore.plot(ax=ax[0,2],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
twr_gdf_NEB.plot(ax=ax[0,2],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_HAL.plot(ax=ax[0,2],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_NWB.plot(ax=ax[0,2],marker='o', markersize=20, linewidth=0.5, color='red')
ax[0,2].set_title(Plot_text+' July umols/m2s' + '(prior = ' + prior + ')', size = 10)
fig.colorbar(mesh, ax=ax[0,2])

Plot_text = 'Estimates'
vmin_feb = np.percentile(flatten_shat_feb, 10)
vmax_feb = np.percentile(flatten_shat_feb, 90)
norm_feb = Normalize(vmin=vmin_jul,vmax=vmax_jul)

ax[1,0].add_feature(cfeature.COASTLINE)
ax[1,0].add_feature(cfeature.BORDERS)
ax[1,0].add_feature(cfeature.STATES)
mesh = ax[1,0].pcolormesh(lon_grid,lat_grid, mean_shat_feb,cmap='viridis', norm=norm_feb,shading='auto',transform=ccrs.PlateCarree())
gdf_baltimore.plot(ax=ax[1,0],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
twr_gdf_NEB.plot(ax=ax[1,0],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_HAL.plot(ax=ax[1,0],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_NWB.plot(ax=ax[1,0],marker='o', markersize=20, linewidth=0.5, color='red')
ax[1,0].set_title('Estimates Feb umols/m2s' + '(truth = ' + truth + ')', size = 10)
fig.colorbar(mesh, ax=ax[1,0])

Plot_text = 'Prior'
ax[1,1].add_feature(cfeature.COASTLINE)
ax[1,1].add_feature(cfeature.BORDERS)
ax[1,1].add_feature(cfeature.STATES)
mesh = ax[1,1].pcolormesh(lon_grid,lat_grid, mean_prior_array_feb,cmap='viridis', norm=norm_feb,shading='auto',transform=ccrs.PlateCarree())
gdf_baltimore.plot(ax=ax[1,1],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
twr_gdf_NEB.plot(ax=ax[1,1],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_HAL.plot(ax=ax[1,1],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_NWB.plot(ax=ax[1,1],marker='o', markersize=20, linewidth=0.5, color='red')
ax[1,1].set_title(Plot_text+' Feb umols/m2s' + '(prior = ' + prior + ')', size = 10)
fig.colorbar(mesh, ax=ax[1,1])

Plot_text = 'Truth'
ax[1,2].add_feature(cfeature.COASTLINE)
ax[1,2].add_feature(cfeature.BORDERS)
ax[1,2].add_feature(cfeature.STATES)
mesh = ax[1,2].pcolormesh(lon_grid,lat_grid, mean_truth_array_feb,cmap='viridis', norm=norm_feb,shading='auto',transform=ccrs.PlateCarree())
gdf_baltimore.plot(ax=ax[1,2],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
twr_gdf_NEB.plot(ax=ax[1,2],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_HAL.plot(ax=ax[1,2],marker='o', markersize=20, linewidth=0.5, color='red')
twr_gdf_NWB.plot(ax=ax[1,2],marker='o', markersize=20, linewidth=0.5, color='red')
ax[1,2].set_title(Plot_text+' Feb umols/m2s' + '(prior = ' + prior + ')', size = 10)
fig.colorbar(mesh, ax=ax[1,2])

plt.subplots_adjust(wspace=0.01)
plt.show() 

vmin_jul_prior = np.percentile(diff_prior_jul, 10)
vmax_jul_prior = np.percentile(diff_prior_jul, 90)
norm_jul_prior = Normalize(vmin=vmin_jul_prior,vmax=vmax_jul_prior)

if not Exercise1:
    Plot_text = '[Estimates - Prior]'
    fig, ax = plt.subplots(2,2,figsize=(8,10), subplot_kw = {'projection':ccrs.PlateCarree()})
    ax[0,0].add_feature(cfeature.COASTLINE)
    ax[0,0].add_feature(cfeature.BORDERS)
    ax[0,0].add_feature(cfeature.STATES)
    mesh = ax[0,0].pcolormesh(lon_grid,lat_grid, diff_prior_jul,cmap='viridis', norm=norm_jul_prior,shading='auto',transform=ccrs.PlateCarree())
    gdf_baltimore.plot(ax=ax[0,0],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
    twr_gdf_NEB.plot(ax=ax[0,0],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_HAL.plot(ax=ax[0,0],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_NWB.plot(ax=ax[0,0],marker='o', markersize=20, linewidth=0.5, color='red')
    ax[0,0].set_title(Plot_text+' July umol/m2s', size = 10)
    fig.colorbar(mesh, ax=ax[0,0])

    vmin_jul_truth = np.percentile(diff_truth_jul, 10)
    vmax_jul_truth = np.percentile(diff_truth_jul, 90)
    norm_jul_truth = Normalize(vmin=vmin_jul_truth,vmax=vmax_jul_truth)
    Plot_text = '[Estimates - Truth]'
    ax[0,1].add_feature(cfeature.COASTLINE)
    ax[0,1].add_feature(cfeature.BORDERS)
    ax[0,1].add_feature(cfeature.STATES)
    mesh = ax[0,1].pcolormesh(lon_grid,lat_grid, diff_truth_jul,cmap='viridis', norm=norm_jul_truth,shading='auto',transform=ccrs.PlateCarree())
    gdf_baltimore.plot(ax=ax[0,1],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
    twr_gdf_NEB.plot(ax=ax[0,1],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_HAL.plot(ax=ax[0,1],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_NWB.plot(ax=ax[0,1],marker='o', markersize=20, linewidth=0.5, color='red')
    ax[0,1].set_title(Plot_text+' July umol/m2s', size = 10)
    fig.colorbar(mesh, ax=ax[0,1])

    vmin_feb_prior = np.percentile(diff_prior_feb, 10)
    vmax_feb_prior = np.percentile(diff_prior_feb, 90)
    norm_feb_prior = Normalize(vmin=vmin_feb_prior,vmax=vmax_feb_prior)
    Plot_text = '[Estimates - Prior]'
    ax[1,0].add_feature(cfeature.COASTLINE)
    ax[1,0].add_feature(cfeature.BORDERS)
    ax[1,0].add_feature(cfeature.STATES)
    mesh = ax[1,0].pcolormesh(lon_grid,lat_grid, diff_prior_feb,cmap='viridis', norm=norm_feb_prior,shading='auto',transform=ccrs.PlateCarree())
    gdf_baltimore.plot(ax=ax[1,0],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
    twr_gdf_NEB.plot(ax=ax[1,0],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_HAL.plot(ax=ax[1,0],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_NWB.plot(ax=ax[1,0],marker='o', markersize=20, linewidth=0.5, color='red')
    ax[1,0].set_title(Plot_text+' Feb umol/m2s', size = 10)
    fig.colorbar(mesh, ax=ax[1,0])

    vmin_feb_truth = np.percentile(diff_truth_feb, 10)
    vmax_feb_truth = np.percentile(diff_truth_feb, 90)
    norm_feb_truth = Normalize(vmin=vmin_feb_truth,vmax=vmax_feb_truth)
    Plot_text = '[Estimates - Truth]'
    ax[1,1].add_feature(cfeature.COASTLINE)
    ax[1,1].add_feature(cfeature.BORDERS)
    ax[1,1].add_feature(cfeature.STATES)
    mesh = ax[1,1].pcolormesh(lon_grid,lat_grid, diff_truth_feb,cmap='viridis', norm=norm_feb_truth,shading='auto',transform=ccrs.PlateCarree())
    gdf_baltimore.plot(ax=ax[1,1],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
    twr_gdf_NEB.plot(ax=ax[1,1],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_HAL.plot(ax=ax[1,1],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_NWB.plot(ax=ax[1,1],marker='o', markersize=20, linewidth=0.5, color='red')
    ax[1,1].set_title(Plot_text+ ' Feb umol/m2s', size = 10)
    fig.colorbar(mesh, ax=ax[1,1])
    plt.show()   

print (' ')
print ('done!')

#### If you estimated poseterior uncertanties, let's check out the reduction in uncertainty across our entire domain.    

In [None]:
if unc:
    #%% Uncertainty reduction for both Feb and July
    mesh_grid = np.sqrt(mean_q_diag_array_jul) - np.sqrt(mean_vshat_diag_jul)
    vshat_ave = np.sqrt(np.mean(mean_vshat_diag_jul))
    q_ave = np.sqrt(np.mean(mean_q_diag_array_jul))
    print('Average So (Jul) = ' + str(round(q_ave,4)))
    print('Average posterior uncertainty (Jul) = ' + str(round(vshat_ave,4)))
    vmin = np.percentile(mesh_grid.flatten(), 1)
    vmax = np.percentile(mesh_grid.flatten(), 99)
    norm = Normalize(vmin=vmin,vmax=vmax)

    fig, ax = plt.subplots(1,2,figsize=(8,10), subplot_kw = {'projection':ccrs.PlateCarree()})
    ax[0].add_feature(cfeature.COASTLINE)
    ax[0].add_feature(cfeature.BORDERS)
    ax[0].add_feature(cfeature.STATES)
    mesh = ax[0].pcolormesh(lon_grid,lat_grid, np.sqrt(mesh_grid),cmap='viridis', norm=norm,shading='auto',transform=ccrs.PlateCarree())
    gdf_baltimore.plot(ax=ax[0],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
    twr_gdf_NEB.plot(ax=ax[0],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_HAL.plot(ax=ax[0],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_NWB.plot(ax=ax[0],marker='o', markersize=20, linewidth=0.5, color='red')
    ax[0].set_title('Uncertainty Reduction Jul', size = 10)
    fig.colorbar(mesh, ax=ax[0],shrink=0.5)

    mesh_grid = np.sqrt(mean_q_diag_array_feb) - np.sqrt(mean_vshat_diag_feb)
    vshat_ave = np.sqrt(np.mean(mean_vshat_diag_jul))
    q_ave = np.sqrt(np.mean(mean_q_diag_array_jul))
    print('Average So (Feb) = ' + str(round(q_ave,4)))
    print('Average posterior uncertainty (Feb) = ' + str(round(vshat_ave,4)))
    vmin = np.percentile(mesh_grid.flatten(), 1)
    vmax = np.percentile(mesh_grid.flatten(), 99)
    norm = Normalize(vmin=vmin,vmax=vmax)

    ax[1].add_feature(cfeature.COASTLINE)
    ax[1].add_feature(cfeature.BORDERS)
    ax[1].add_feature(cfeature.STATES)
    mesh = ax[1].pcolormesh(lon_grid,lat_grid, np.sqrt(mesh_grid),cmap='viridis', norm=norm,shading='auto',transform=ccrs.PlateCarree())
    gdf_baltimore.plot(ax=ax[1],color='black',edgecolor = 'black', alpha=0.3,transform=ccrs.PlateCarree())
    twr_gdf_NEB.plot(ax=ax[1],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_HAL.plot(ax=ax[1],marker='o', markersize=20, linewidth=0.5, color='red')
    twr_gdf_NWB.plot(ax=ax[1],marker='o', markersize=20, linewidth=0.5, color='red')
    ax[1].set_title('Uncertainty Reduction Feb', size = 10)
    fig.colorbar(mesh, ax=ax[1],shrink=0.5)
    plt.show()  
    print(' ')
    print('done!')
else:
    print('Did not estimate uncertainties.  No map.')

## Back to Research Question: 
### What are the February and July fossil CO<sub>2</sub> estimates for Baltimore in 2019?  Is our understanding of Baltimore's CO<sub>2</sub> emissions consistent with atmospheric observations?   

- And we want to show in units that people can understand (not just the scientists!).  
- And we want to compare statistics in this domain compared to the statistics for our entire estimation domain.  

##### **Note:** a lot of **UGLY** plotting code in this block


In [None]:
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt

print('Starting bar plot')
print('Takes a second ...')

lon_2d, lat_2d = np.meshgrid(lon_grid, lat_grid)
points = [Point(xy) for xy in zip(lon_2d.ravel(), lat_2d.ravel())]
grid_gdf = gpd.GeoDataFrame(geometry=points, crs=gdf_baltimore.crs)
mask_Balt = grid_gdf.within(gdf_baltimore.unary_union).values.reshape(lon_2d.shape)

days_feb, days_jul = 28, 31
sec_per_day = 86400

def flux_to_GgC(mean_flux, area_grid, days):
    mol_per_m2_s = mean_flux * 1e-6
    g_per_m2_s = mol_per_m2_s * 12.01
    g_per_cell_s = g_per_m2_s * area_grid
    g_per_cell = g_per_cell_s * days * sec_per_day
    return g_per_cell.sum() / 1e9

def std_flux_to_GgC(std_flux, area_grid, days):
    std_flux = std_flux**2
    mol_per_m2_s = std_flux * 1e-6
    g_per_m2_s = mol_per_m2_s * 12.01
    g_per_cell_s = g_per_m2_s * area_grid
    g_per_cell = g_per_cell_s * days * sec_per_day
    return np.sqrt(g_per_cell).sum() / 1e9

#Prior mean arrays
prior_feb = np.where(mask_Balt, mean_prior_array_feb, 0)
prior_jul = np.where(mask_Balt, mean_prior_array_jul, 0)

#Shat arrays
post_feb = np.where(mask_Balt, mean_shat_feb, 0)
post_jul = np.where(mask_Balt, mean_shat_jul, 0)

truth_feb = np.where(mask_Balt, mean_truth_array_feb, 0)
truth_jul = np.where(mask_Balt, mean_truth_array_jul, 0)

#Unc arrays
post_std_feb = np.where(mask_Balt, np.sqrt(mean_vshat_diag_feb), 0)
post_std_jul = np.where(mask_Balt, np.sqrt(mean_vshat_diag_jul), 0)

# Prior std arrays
prior_std_feb = np.where(mask_Balt, np.sqrt(mean_q_diag_array_feb), 0)
prior_std_jul = np.where(mask_Balt, np.sqrt(mean_q_diag_array_jul), 0)

R_earth = 6.371e6
dlat = np.radians(lat_grid[1] - lat_grid[0])  # radians
dlon = np.radians(lon_grid[1] - lon_grid[0])  # radians
lon_2d, lat_2d = np.meshgrid(lon_grid, lat_grid)
area_grid = (R_earth**2) * dlat * dlon * np.cos(np.radians(lat_2d))  # in m^2

prior_feb_GgC = flux_to_GgC(prior_feb, area_grid, days_feb)
prior_jul_GgC = flux_to_GgC(prior_jul, area_grid, days_jul)

post_feb_GgC = flux_to_GgC(post_feb, area_grid, days_feb)
post_jul_GgC = flux_to_GgC(post_jul, area_grid, days_jul)

truth_feb_GgC = flux_to_GgC(truth_feb, area_grid, days_feb)
truth_jul_GgC = flux_to_GgC(truth_jul, area_grid, days_jul)

post_unc_feb_GgC = std_flux_to_GgC(post_std_feb, area_grid, days_feb)
post_unc_jul_GgC = std_flux_to_GgC(post_std_jul, area_grid, days_jul)

prior_unc_feb_GgC = std_flux_to_GgC(prior_std_feb, area_grid, days_feb)
prior_unc_jul_GgC = std_flux_to_GgC(prior_std_jul, area_grid, days_jul)

labels = ['February', 'July']
prior_vals_Balt = [prior_feb_GgC, prior_jul_GgC]
post_vals_Balt = [post_feb_GgC, post_jul_GgC]
truth_vals_Balt = [truth_feb_GgC, truth_jul_GgC]
post_uncs_Balt = [post_unc_feb_GgC, post_unc_jul_GgC]

x = np.arange(len(labels))
width = 0.5
fig, ax = plt.subplots(figsize=(5,6))

# Prior
ax.bar(x, prior_vals_Balt, width/3,capsize = 3, label='Prior', color='lightgray')
# Posterior with error bars
if unc: 
    ax.bar(x-width/3, post_vals_Balt, width/3, yerr=(x*2 for x in post_uncs_Balt), capsize=3, label='Posterior', color='cornflowerblue')
else:
    ax.bar(x-width/3, post_vals_Balt, width/3, capsize=3, label='Posterior', color='cornflowerblue')
# Truth
ax.bar(x+width/3, truth_vals_Balt, width/3, label='Truth', color='forestgreen')

ax.set_ylabel('Emissions (Gg C per month)')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
plt.title('Baltimore Emissions (Prior, Posterior, Truth)')
ax.grid(True,zorder=0)
ax.set_axisbelow(True)
plt.tight_layout()
plt.show()

# More stats for that area

# Flatten all arrays
flat_shat_feb = mean_shat_feb.flatten()
flat_truth_feb = mean_truth_array_feb.flatten()
flat_prior_feb = mean_prior_array_feb.flatten()
flat_shat_jul = mean_shat_jul.flatten()
flat_truth_jul = mean_truth_array_jul.flatten()
flat_prior_jul = mean_prior_array_jul.flatten()

mask_Balt_flat = mask_Balt.flatten()

# Apply mask
shat_feb_Balt = flat_shat_feb[mask_Balt_flat]
truth_feb_Balt = flat_truth_feb[mask_Balt_flat]
prior_feb_Balt = flat_prior_feb[mask_Balt_flat]
shat_jul_Balt = flat_shat_jul[mask_Balt_flat]
truth_jul_Balt = flat_truth_jul[mask_Balt_flat]
prior_jul_Balt = flat_prior_jul[mask_Balt_flat]

def calculate_rmse(truth, estimated):
    return np.sqrt(np.nanmean((truth-estimated)**2))

#February
mean_truth_feb_Balt = np.mean(truth_feb_Balt)
mean_diff_feb_Balt = np.mean(shat_feb_Balt - truth_feb_Balt)
rmse_feb_Balt = calculate_rmse(truth_feb_Balt, shat_feb_Balt)
corr_coef_feb_Balt = np.corrcoef(truth_feb_Balt, shat_feb_Balt)[0,1]
std_err_feb_Balt = np.std(shat_feb_Balt - truth_feb_Balt)/np.sqrt(len(truth_feb_Balt))

#July
mean_truth_jul_Balt = np.mean(truth_jul_Balt)
mean_diff_jul_Balt = np.mean(shat_jul_Balt - truth_jul_Balt)
rmse_jul_Balt = calculate_rmse(truth_jul_Balt, shat_jul_Balt)
corr_coef_jul_Balt = np.corrcoef(truth_jul_Balt, shat_jul_Balt)[0,1]
std_err_jul_Balt = np.std(shat_jul_Balt - truth_jul_Balt)/np.sqrt(len(truth_jul_Balt))

shat_mean_feb_Balt = np.mean(shat_feb_Balt)
shat_mean_jul_Balt = np.mean(shat_jul_Balt)

shat_mean_feb_Balt = np.mean(shat_feb_Balt)
shat_mean_jul_Balt = np.mean(shat_jul_Balt)

prior_mean_feb_Balt = np.mean(prior_feb_Balt)
prior_mean_jul_Balt = np.mean(prior_jul_Balt)

print(' ')
print('Statistics')
print(' ')
print('Mean truth flux (Feb):' + str(round(mean_truth_feb_Balt,3)) + ' µmol/m2s')
print('Mean truth flux (Jul):' + str(round(mean_truth_jul_Balt,3)) + ' µmol/m2s')
print('')
print('Mean prior flux (Feb):' + str(round(prior_mean_feb_Balt,3)) + ' µmol/m2s')
print('Mean prior flux (Jul):' + str(round(prior_mean_jul_Balt,3)) + ' µmol/m2s')
print(' ')
print('Mean estimated flux (Feb):' + str(round(shat_mean_feb_Balt,3)) + ' µmol/m2s')
print('Mean estimated flux (Jul):' + str(round(shat_mean_jul_Balt,3)) + ' µmol/m2s')
print('')
print(f"Mean difference (xhat - xtruth) Feb (Baltimore): {mean_diff_feb_Balt:.4f} µmol/m2s")
print(f"Mean difference (xhat - xtruth) Jul (Baltimore): {mean_diff_jul_Balt:.4f} µmol/m2s")
print('')
print(f"Std Error Feb (Baltimore): {std_err_feb_Balt:.4f} µmol/m2s")
print(f"Std Error Jul (Baltimore): {std_err_jul_Balt:.4f} µmol/m2s")
print(' ')
print(f"Correlation Coefficient (xhat,xtruth) Feb (Baltimore): {corr_coef_feb_Balt:.4f}")
print(f"Correlation Coefficient (xhat,xtruth) Jul (Baltimore): {corr_coef_jul_Balt:.4f}")
print(' ')
print(f"RMSE Feb (Baltimore): {rmse_feb_Balt:.4f} µmol/m2s")
print(f"RMSE Jul (Baltimore): {rmse_jul_Balt:.4f} µmol/m2s")

print(' ')
print('done')
