Hard to Measure Well: Can Feasible Policies Reduce Methane Emissions?
Authors: Karl Dunkle Werner and Wenfeng Qiu
Steps to replicate (TL;DR)
- Read this README
- Make sure you have an appropriate OS (Linux or WSL2) and the necessary computing resources (see below)
- Unzip the replication files.
- If the data is saved somewhere outside the project folder, mount a copy inside the project folder. (Useful for development only)
- Install Conda and Snakemake (see below)
- Run Snakemake
- Check results
Putting it all the together:
# 3. Unzip mkdir methane_replication # or whatever you want cd methane_replication unzip path/to/replication_public.zip -d . unzip path/to/replication_drillinginfo.zip -d . # 4. OPTIONAL # If the data is saved somewhere outside the project folder, mount # a copy inside the project folder. # This is only necessary if the data are stored somewhere *outside* # the project folder. You may need to change these paths to fit # your situation data_drive="$HOME/Dropbox/data/methane_abatement" scratch_drive="$HOME/scratch/methane_abatement" project_dir="$(pwd)" mkdir -p "$scratch_drive" "$project_dir/data" "$project_dir/scratch" sudo mount --bind "$data_drive" "$project_dir/data" sudo mount --bind "$scratch_drive" "$project_dir/scratch" # 6. Install Conda and Snakemake # If conda is not already installed, follow instructions here: # https://docs.conda.io/en/latest/miniconda.html conda env create --name snakemake --file code/envs/install_snakemake.yml conda activate snakemake snakemake --version singularity --version # Should show versions, not an error # 7. Run Snakemake to create all outputs # (this takes about a day with 4 CPU) /usr/bin/time -v snakemake # snakemake --dry-run to see what will be run # 8. Check results (optional and slow) # Check everything into git, rerun snakemake, and verify results are the same. git init git add . git commit -m "Replication run 1" snakemake --delete-all-output rm -r scratch/* rm -r .snakemake/conda snakemake --use-conda --use-singularity --singularity-args='--cleanenv' # Results should be binary-identical if everything worked correctly # (except software_cites_r.bib, which has some manual edits) git diff
This code uses Singularity. You don't have to install it yourself, but you do have to be on an operating system where it can be installed. Good options are any recent version of Linux or Windows WSL2 (but not WSL1).
On macOS, or on Windows outside WSL2, things are more difficult. One approach is to install Vagrant, use Vagrant to create a virtual machine, and run everything inside that virtual machine. Good luck.
- To get started, first install Conda (mini or full-sized).
- Then use Conda to install Snakemake and Singularity from the file
install_snakemake.yml(in the replication zipfile).
In a terminal:
conda env create --name snakemake --file code/envs/install_snakemake.yml
Run all other commands in that activated environment.
If you close the terminal window, you need to re-run
conda activate snakemake before running the rest of the commands.
These downloads can be large.
What does Snakemake do?
Snakemake uses rules to generate outputs and manages the code environment to make it all work.
In particular, we're following a pattern Snakemake calls an Ad-hoc combination of Conda package management with containers.
Snakemake uses Singularity (an alternative to Docker) to run code in a virtual environment, and uses conda to install packages. All of this is handled transparently as the rules are run.
It can be useful to run
snakemake --dry-run to see the planned jobs.
Snakemake keeps track of what needs to run and what doesn't. If something goes wrong midway through, snakemake will see that some outputs are up-to-date and others aren't, and won't re-run the things that don't need it.
The Snakefile is set up to retry failing jobs once, to avoid issues where temporary issues cause the build to fail (e.g. "Error creating thread: Resource temporarily unavailable").
If you would rather not restart failed jobs, remove the line
workflow.restart_times = 1 from
Note that Snakemake will still stop after failing twice (it will not run other jobs).
Files and data
We need make sure the code can access the right files. There are two ways this can be done, the straightforward way and the way Karl does it.
Recommended file access
Straightforward approach: Unzip the replication files, either interactively or with the commands below.
mkdir methane_replication # or whatever you want cd methane_replication unzip path/to/replication_public.zip -d . unzip path/to/replication_drillinginfo.zip -d .
Alternative file access
Less straightforward, arguably better for development
- Store the
scratchfolders somewhere else (e.g.
- Create your own bind mounts to point to the
scratchfolders. (See an example in
For people familiar with Singularity:
$SINGULARITY_BIND doesn't work, because it's not used until the Singularity container is running, so Snakemake thinks files are missing.
For people familiar with symlinks: Using symlinks (in place of bind mounts) do not work here, because Singularity will not follow them.
All files in
scratch/ are auto-generated and can safely be deleted. All other files in
data/ should not be deleted.
Some files in
graphics/ are auto-generated, but the ones that are in the replication zipfile are not.
scratch/ are ignored by
- The PDF outputs are built with Latexmk and LuaLaTeX.
- For size reasons, LuaLaTeX is not included in the set of software managed by conda. The
paperjob, which runs
latexmkmight fail if it's not installed on your computer. All the outputs up to that point will be present.
- The tex files use some fonts that are widely distributed, but may not be installed by default.
- For size reasons, LuaLaTeX is not included in the set of software managed by conda. The
- Note that the code depends on
moodymudskipper/safejoinwhich is a different package than
moodymudskipper/safejoinwill be renamed.
- In case the original author deletes the repository, a copy is here.
Computing resources for a full run
In addition to the programs above, parts of this program require significant amounts of memory and disk space. Most parts also benefit from having multiple processors available. (The slow parts parallelize well, so speed should increase almost linearly with processor count.)
The tasks that require significant memory are noted in the Snakemake file (see the
The highest requirement for any task is 10 GB, though most are far lower.
(These could be overstatements; we haven't tried that hard to find the minimum memory requirements for each operation.)
The programs also use about 80 GB of storage in
scratch/ in addition to the ~10 GB of input data and ~8 GB output data.
Running the whole thing takes 23 hours on a computer with 4 CPUs.
/usr/bin/time -v, it uses 22:45 of wall time and 81:45 of user time.
Maximum resident set size is (allegedly) 3.62 GiB (this seems low).
Data sources and availability
The data in this study come from a variety of sources, with the sources in bold providing the central contribution.
- Scientific studies (except as noted, all from the published papers and supplementary information)
- Alverez et al. (2018)
- Duren et al. (2019)
- Frankenberg et al. (2016)
- Includes data received by email from the authors
- Lyon et al. (2016)
- Omara et al. (2018)
- Zavala-Araiza et al. (2018)
- US Agencies
- St. Louis Federal Reserve
- Data providers
- SNL: prices at trading hubs
- Enverus (formerly Drillinginfo): Well production and characteristics
All datasets are included in the replication_public.zip file, except the Enverus data. I believe my Enverus data subset can be shared with people who have access to the Enverus entity production headers and entity production monthly datasets.
These notes are modestly outdated, and aren't useful for replication.
Other installation instructions
- Download and extract CmdStan
- Add these lines to a file named
localin the CmdStan
localif it doesn't already exist)
O_STANC=3 STANCFLAGS+= --O --warn-pedantic STAN_THREADS=true STAN_CPP_OPTIMS=true STANC3_VERSION=2.27.0 # change this version to match the downloaded cmdstan version
- Edit user environment variable
CMDSTANto the folder (e.g.
- Windows only:
mingw32-make install-tbb(even if
- Follow prompts, including adding the TBB dir to your path (Windows only)
make examples/bernoulli/bernoulli(see install instructions)
- Installation tries to download
stanc(because compilation is a hassle), but sometimes I've had to download manually from https://github.com/stan-dev/stanc3/releases
- After installing conda, use
snakemakeenvironment (will error if you already have one named
conda env create -f code/snakemake_environment_windows.yml
- Install cyipopt manually.
- Download and extract ipopt
- Download and extract cyipopt
- Copy the ipopt folders into cyipopt
python ../cyipopt/setup.py install(assuming you saved the cyipopt directory next to the
Running on Windows
- Activate the newly created
snakemakeenvironment, and do not set
--use_condawhen running Snakemake.
- There will be some warnings.
Connect to Overleaf by Git. See details here.
git remote add overleaf https://git.overleaf.com/5d6e86df6820580001f6bdfa git checkout master git pull overleaf master --allow-unrelated-histories