Wetland CH4 Flux Partition

Author: Will P. Richardson (wprichar@uark.edu)

This program partitions eddy-covariance methane fluxes from wetlands and lakes into ebullitive (i.e., bubbles) and diffusive (i.e., diffusion from water column and plant-mediated transport) subcomponents based on local scalar similarity between methane and other atmospheric scalars in the wavelet domain. The program and code are modified from those developed by Iwata et al. 2018, Boundary-Layer Meteorology.

Key differences between this program and that of Iwata et al. are:

Use of a dynamically changing ebullition threshold, which accounts for changes in wavelet coefficient magnitude due to global variance in the methane time series
Inclusion of canopy height information when calculating normalized frequencies for spectral analysis
Use of carbon dioxide as a reference scalar for methane flux partitioning in addition to the options of water vapor and sonic temperature included in the originally published program
Advanced post-processing steps including spectral and other stats derived from program outputs, an extra quality control step to remove periods in which reference scalar flux magnitudes are too low for reliable partitioning, creation of a unified master file of all half-hourly outputs, and visualization of spectral quantities to inform appropriate selection of the lower frequency bound parameter

Please refer to Iwata et al. (https://doi.org/10.1007/s10546-018-0383-1) and the manuscript accompanying this program code for detailed explanation of this method's theoretical background: Richardson et al. (2022), Boundary-Layer Meteorology (https://doi.org/10.1007/s10546-022-00703-y).

1. Computational Requirements

The program is designed to be run on a linux computer or in any linux-like environment. The user should make sure that the following programming tools and packages are installed for full functionality:

tcsh
gfortran
R packages:
- r-base
- r-cran-foreign
- r-cran-mass
- r-cran-metrics
- r-cran-segmented
Python packages:
- pandas
- numpy
- matplotlib
- statsmodels

2. Program usage

i) Download the program and place it in a directory of your choice.

ii) Example data for the program are stored in the "sample" subdirectory. To confirm that your environment is configured properly and that the calculations are being done correctly, you should first run the partitioning program using this example data. Open sh/wvlet.sh and sh/part_master.sh and set the variable WORKDIR to the full path where the partitioning program is located on your machine. Additionally in sh/part_master.sh, for the variables HH_output, HH_stats, NonEb_periods, and zc_file, modify the portion of the path prior to the ref_data subdirectory to match WORKDIR. Copy the folder of example high frequency data "sample/data/site_yr" into the "data" subdirectory of your partitioning program; note that the name of this folder is the same as the value for the variable site_yr displayed in sh/wvlet.sh and sh/part_master.sh. Copy the folder "sample/ref_data/Input/site_yr" into the "ref_data/Input" subdirectory of your partitioning program. Lastly before running the program, make sure that the shebang in the first line of each of the two shell scripts matches the location of the tcsh shell on your machine – you can get this information by typing the command "whereis tcsh" in the terminal; the path in the shebang should be the first path returned by this command. Run sh/wvlet.sh to perform the wavelet transformation on the high frequency data, and then run sh/part_master.sh to perform the partitioning. These shell scripts can be executed in the terminal by navigating to the working directory and typing "./" directly before the relative path to the shell script. The .sh scripts should be executable; if they are not, you can make them so by navigating to the location of the partitioning program in the terminal and typing the command "chmod u+x sh/wvlet.sh" (replacing wvlet.sh with whichever .sh script needs to be made executable). After the program runs, check if the results in the "wvlet" and "flux" directories of your partitioning program are the same as the results in the corresponding folders within the "sample" directory. Compare the figures generated in "ref_data/Output/site_yr" with those in the corresponding subdirectory in the sample folder. Included in the "sample/ref_data/Output" directory is a folder containing some of the figures generated by the program but using the full growing season of data from this site-year, for reference (see the Output files content guide for more information on these figures). Running the wavelet transformation on the data should take 3–5 minutes or less, and running the partitioning on the sample data should take 10–15 minutes or less.

iii) Now try running the program with your own data

*** at this stage, if the user has not already made time series plots of the high frequency data for all observation periods, it is highly recommended to do so by running the script scripts/TS_plot.py (modify the user defined variables at the top of the script to match the formatting of your high frequency data files, execute in terminal). NOTE: this script should be run on a version of the data PRIOR to removing any NaNs.

a) Place your pre-processed high frequency data in a subdirectory within the "data" directory; the sub-folder holding the data should be named with an identifier for the site-year of data you are working with. These pre-processed high frequency data files should be named by their timestamp in the form 'YYYYmmdd_HHMM', have a ".dat" extension, and are typically separated into subfolders by day of observation to iteratively run the partitioning over your entire data collection period. Necessary pre-processing steps to perform on the high frequency data prior to feeding it into the partitioning program include despiking, coordinate rotation of the 3D wind vector data from the sonic anemometer, and synchronization (i.e., correction for time lags due to sensor separations) of gas concentration and wind velocity data. Note that this program is meant to operate on open-path gas analyzers, as a point-by-point correction of gas densities and sonic temperature is included in the program. Variables in the high frequency data files should include: streamwise wind velocity (m/s), vertical wind velocity (m/s), sonic temperature (degC), water vapor concentration (g/m3), carbon dioxide concentration (mg/m3), methane concentration (mmol/m3), atmospheric pressure (kPa), and air temperature (degC) in that order. Methane concentration, atmospheric pressure, and air temperature should be obtained from the open-path methane analyzer. All units should be converted in advance, and no header information should be included in the data files. Additionally, the user should ensure that no NaNs are present in the high frequency data. It is recommended to fill these values by linear interpolation with the addition of random noise.

b) Before running on your data, you should place ancillary input data files (i.e., HH_output, HH_stats, NonEb_periods, and zc_file below) into the directory 'ref_data/Input/site_yr' (where 'site_yr' is a subfolder you create which takes on its value as specified below) and modify several other variables in sh/wvlet.sh and sh/part_master.sh for your site.

For both sh/wvlet.sh and sh/part_master.sh the user should modify:

site_yr: this variable should be set to the site-year identifier you used for the subfolder in the "data" directory that holds the high frequency data (e.g., US-HRA-2019)
DATA: this should contain the name of each folder of high frequency data you wish to process in the "data/site_yr" subdirectory
DT: the time interval between data points in your high frequency data (e.g., if data were collected at 10 Hz, DT = 1/10 = 0.1 s, at 20 Hz DT = 1/20 = 0.05 s, etc.)

For just sh/part_master.sh the user should modify:

run_ID: a unique alphanumeric identifier for each partitioning run so that multiple partitioning runs with various parameter values can be performed on a single site-year of data (e.g., Run1)
tz_local: the local time zone where the data were collected (e.g., America/Chicago)
tz_logging: the time zone in which the data were logged (e.g., Etc/GMT); time zone label options correspond to those found in the 'pytz' python package
first_run: should be set to True or False depending on whether or not you have partitioned this site-year of data before (used to eliminate redundancy in program steps that do not change with parameter selection)
HH_output: the location of the half-hourly output data file for this site-year from a typical flux processing software such as EddyPro
HH_stats: the location of other half-hourly output stats from flux processing software (if all half-hourly quantities are not included in the HH_output file); if all necessary stats are located in HH_output file, set this variable to None
NonEb_periods: the location of a text file containing a list of timestamps for half-hourly periods which have been identified as non-ebullitive; this is CRITICAL for accurately conducting partitioning. SEE THE ATTACHED GUIDE "Instructions for non-ebullitive period selection" for more detailed information on how to identify non-ebullitive periods within your data
zc_file: the location of a text file containing canopy height data at a daily time step; these data can be roughly approximated if direct observations are not available
Zm: the measurement height of your eddy-covariance system in meters
LB: the lower frequency bound defining which wavelet coefficients are included in the partitioning and iteratively reweighted least squares regressions (empirically determined; default value = 0.003)
UB: the upper frequency bound defining which wavelet coefficients are included in the iteratively reweighted least squares regressions (empirically determined; default value = 1.0)
- LB and UB are dimensionless quantities
EbThresh_type: technique for fitting the empirical ebullition threshold (NonEb_LinReg unless testing different ebullition threshold types (e.g., quantile regression on all observations))
EbThresh_qty: the x-quantity used to estimate the empirical ebullition threshold (should be sigma_m in most cases; can be configured to use sigma_m normalized by friction velocity)
ust_thresh: a u-star threshold value used to filter for adequate turbulence (optional; set to 0.0 if you do not want to use one; value should have units m/s)
WD_min, WD_max: the minimum and maximum wind directions (degrees CW from north) around the tower which contain the fetch of interest (if fetch surrounds tower in all directions, set to 0 and 360 respectively)
fetch_x: approximate distance at which desired fetch stops (in meters); if tower footprint does not ever extend past desired fetch, set to False
FCH4_min: the lowest total CH4 flux magnitude (in umol/m2/s) at which scalar similarity is maintained (approximately 0.01 umol/m2/s from accompanying manuscript)
LE_thresh, Fco2_thresh: approximate threshold values for reference scalar fluxes below which scalar similarity is not maintained even in the absence of ebullition; can leave default values and change after examining figure created by program; LE_thresh should be in units of W/m2, Fco2_thresh should be in units of umol/m2/s
NL_filt: whether or not to filter for non-local processes using flux-variance similarity (should be set to True or False; HIGHLY recommended to set to True to ensure quality of partitioning outputs)

Lastly, at the top of scripts/funcs.py, the user should modify the variables specifying column names and other specifics in the half-hourly output file being provided. These will then be used for proper loading and selection of in the rest of the program.

File details:

'raw_tstamp_beg': boolean variable specifying if timestamps in the raw data filenames refer to beginning or end of period (e.g., True indicates beginning of period, False, indicates end of period)
'HH_tstamp_beg': boolean variable specifying if timestamps in the user-supplied half-hourly flux output file refer to beginning or end of averaging period (e.g., True indicates beginning of period, False indicates end of period)
'datetime_sep': boolean variable specifying if date and time information is separated into two columns (True means date and time are separate, False means there is one column containing all timestamp info)
'tstamp_cols': a comma-separated list containing names of columns containing timestamp information (should be of length 2 if datetime_sep = True and length 1 if datetime_sep = False)
'HH_output_header': a zero-indexed integer specifying which row in the user-supplied half-hourly flux output file contains the column headers
'HH_output_skiprow': a list of zero-indexed row numbers containing other text annotations in the user-supplied half-hourly flux output file (i.e., other rows besides the headers preceding the data); set as an empty list ('[]') if no other lines to skip
'NaN_vals': a list of the values used to represent "not a number" values in the user-supplied half-hourly flux output file

Column names:

'FCH4_SS': test flag for methane flux stationarity test (H2O and CO2 flux stationarity follow the same naming scheme); 1 to 9 scale
'w_ITC': integral turbulence characteristic test flag for vertical wind speed; 1 to 9 scale
'sigma_x': standard deviation of methane and reference scalars
'MOST_stab': atmospheric stability (Monin-Obukhov stability parameter)
'WD': wind direction; [degrees]
'ust': friction velocity; [m/s]
'fetch_name': variable specifying the upwind distance defining the flux footprint; [m]
'wq_cov': raw covariance between vertical wind speed and water vapor (same for 'wc_cov' except using carbon dioxide); covariances should be calculated using gas concentration units matching those of sigma_x
'LE_name': latent heat flux; [W/m2]
'Fco2_name': net co2 flux; [umol/m2/s]

If the user has half-hourly air temperature in K and pressure values in Pa, they can convert to celsius and kPa respectively if desired (modify values on line starting with variable 'air_temp_K' and change value of 'anc_unit_conv' to True).

If the user prefers to manually identify reference scalar thresholds, change the value of 'RefScal_Type' to user-supplied; otherwise, any other value for this variable with result in reference scalar thresholds estimated using a broken-line regression (see scripts/RefScalar_Thresh.R; still in progress).

All output files created by the program are indexed by the beginning of the averaging period.

3. Determining Empirical Parameters

In addition to selecting non-ebullitive periods for fitting the ebullition threshold, the user should determine appropriate values for LB and UB using spectral analysis. When a site-year of data is partitioned for the first time, the program scripts/PostProc_diagnostics.py is run, creating two figures: a) a figure with normalized vertical wind speed - methane covariance, methane - water vapor coherence, methane - carbon dioxide coherence, and water vapor - carbon dioxide coherence spectra; the default LB and UB values are plotted as dashed vertical lines on these plots b) a figure showing how the inclusion of different wavelet scales in the partitioning changes as a function of LB

When looking at figure a), the region bounded by LB and UB should contain most of the vertical wind speed - methane covariance and should be characterized by strong coherence between scalars. When looking at figure b), an appropriate LB value should generally eliminate the inclusion of wavelet coefficients with time scale greater than 1 or 2 minutes. For reference, in the manuscript accompanying this partitioning program, UB did not need to be changed, but LB was increased from 0.003 to 0.02.

Additionally, the user should examine the 'RefScalarThreshPlot' figure in ref_data/Output/site_yr/run_ID to check if the threshold values selected/estimated are reasonable for the site.

4. Wavelet visualization resources

Included with the partitioning program is a standalone script to visualize the best quality periods from a given partitioning run in the wavelet domain (scripts/wvlt_xplot.py). Each figure contains scatterplots of wavelet coefficients for: 1) methane vs. water vapor, 2) methane vs. carbon dioxide, and 3) carbon dioxide vs. water vapor. 1) and 2) give insight into the nature of the detected ebullition (particularly with respect to frequency) while 3) is a useful screening tool to ensure that the detected ebullition is not due to events in the reference scalar time series. The header area of the figure contains annotations with the values of several ancillary variables at the time of observation (e.g, air temperature, friction velocity, stability, scalar fluxes, etc.). This script should be run after conducting at least one partitioning run. Prior to executing in the terminal, user variables at the top of the script should be modified (largely matching their counterparts in scripts/part_master.sh). A few extra variable names from your half-hourly observations that are not specified at the top of scripts/funcs.py should be supplied here for the figure annotations (wind speed, sensible heat flux, air temperature in Celsius, and relative humidity (%)).

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
data		data
fig		fig
filt		filt
flux		flux
fort		fort
ref_data		ref_data
sample		sample
scripts		scripts
sh		sh
slope		slope
txt		txt
wvlet		wvlet
Instructions for non-ebullitive period selection.pdf		Instructions for non-ebullitive period selection.pdf
LICENSE		LICENSE
Output files content guide.txt		Output files content guide.txt
README.md		README.md
README.txt		README.txt

License

richardsonwillp/Wetland-CH4-partition

Folders and files

Latest commit

History

Repository files navigation

Wetland CH4 Flux Partition

1. Computational Requirements

2. Program usage

3. Determining Empirical Parameters

4. Wavelet visualization resources

About

Resources

License

Stars

Watchers

Forks

Languages