The code in this replication package takes as inputs a mixture of publicly available data and commercial data, and outputs the figures, tables, and LaTeX input files used in the paper. All code is written in the R programming language. The replicator can run all of the paper's code by executing the make
command from the root level of the replication package. The makefile will also generate a PDF file of the paper, using LaTeX and the files generated by the aforementioned R programs. The replicator should expect the code to run for about 12
hours on a modern laptop computer. If the replicator does not have access to the commercial data used in the paper (see below), the make
command will recognize this and execute the subset of analyses that are possible using only the publicly available data.
This paper uses several publicly accessible data sources, exact copies of which are included in the replication package, and one commercially accessible data source, which is not included. In the details provided below, we describe how we obtained each of these sources, and, where possible, provide internet addresses for current versions of these sources. In many cases, the publicly available data is occasionally updated, and, to our knowledge, the associated data providers do not provide permanent links to previous versions. As a result, the replication package includes copies of the exact versions used in the paper.
- I certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.
- I certify that the author(s) of the manuscript have documented permission to redistribute/publish the data contained within this replication package.
- All data are publicly available.
- Some data cannot be made publicly available.
- No data can be made publicly available.
The subset of the data used in this paper which can be made publicly available is deposited as a "data replication package" in a Harvard Dataverse repository, accessible at https://doi.org/10.7910/DVN/9WJ3JK. As described below in the "Instructions to Replicators" section, this paper's code replication package will automatically download this data replication package for replicators who do not download it themselves first.
Texas General Land Office (GLO) The paper makes use of three kinds of GLO data. The first type, "raw" GLO data, is data that we either downloaded in shapefile or tabular form from the GLO website or which we received via public records request. Our code uses this data in its unmodified form. The second type, "manually entered" GLO data, is data which we created ourselves from PDF documents available on the GLO website, or from non-digitized responses to public record requests. The third type, "modified" GLO data, is data which represents manual corrections we have made to the first two types of data.
Raw data includes:
- Active and Inactive lease shape files, available in their current form from GLO GIS website. This replication package includes the version of these files that we downloaded in June, 2020, in
raw_data/leases/June2020/
. - The Lease Land File, which we received from GLO via a public records request. This file associates leases with their underlying parcels, and is included in the replication package as
raw_data/leases/LeaseLandFile.csv
. - The Mineral Lease Summary File, which we received from GLO via a public records request. This file provides additional non-spatial lease information that is not included in the Active and Inactive lease shape files, and is included in the replication package as
raw_data/leases/tblMineralLeaseSummaryAll_2020.csv
. - The Mineral Lease Assignment File, which we received from GLO via a public records request. This file provides information on when leases are assigned from one party to another, as well as identifying information about the parties involved. It is included in the replication package as
raw_data/Assignments/glo_assignments.csv
. - The Royalty Revenue Files, which we received from GLO via a public records request. These files document the product-specific (oil vs. gas) monthly royalty revenues earned by each lease. They are included in the replication package as the set of excel files matching
raw_data/payments/Royalty_Payments_*.xlsx
. - The Bonus and Delay Rental Payment File, which we received from GLO via a public records request. This file records bonus payments, delay rental payments, and other miscellaneous non-royalty payments earned by each lease. It is included in the replication package as
raw_data/payments/2051506_PIR-20-0915_Rentals_Cleared_Payments.xlsx
.
Manually entered data includes:
- Auction Bid Notices, which are published by GLO in PDF format on their website. These are public notices documenting which parcels will be available in each auction, including auctions on parcels that ultimately do not transact. The current public version of this file covers auctions conducted since 2005. We additionally did public records searches for earlier auctions, back to the year 2000, on GLO's Land Grant Search. We converted the PDF into digital tabular format using OCR software and some manual checking/editing. The digital tabular version of this is included in the replication package as
raw_data/notices/old_notices.xlsx
,raw_data/notices/newest_notices.xlsx
andintermediate_data/glo_notices_final.csv
. - Auction Bid Results, which are published by GLO in PDF format on their website. These are public records of which bids were received for each parcel at each auction, including the name of the bidder and the bid amount. The current public version of this file covers auctions conducted since 2005. We additionally did public records searches for earlier auctions, back to the year 2000, on GLO's Land Grant Search. We converted the PDF into digital tabular format using OCR software and some manual checking. The digital tabular version of this is included in the replication package as the set of excel files matching
raw_data/bids/*.xlsx
. - RAL Lease Coversheets, which are included as a single page associated with each RAL lease's PDF file stored at the Land Grant Search. For each RAL lease, the coversheet shows the initial negotiated offer as well as GLO's "recommended" revised offer, and any associated comparable leases justifying this recommendation. We had a team of research assistants use the Land Grant Search to find each RAL lease's PDF document and then manually enter the coversheet information into excel. We have included this excel file in the replication package as
raw_data/coversheets/Final_Term_Sheet.xlsx
andraw_data/coversheets/Highlighted_Terms.xlsx
. - Orderly Development Agreement (ODA) Flags, which indicate sets of leases that have joint reporting of royalty payment revenue. Firms that sign an ODA with GLO report all associated royalty revenue on a single lease, so we use ODA information to allocate lease output to leases within an ODA. We found these by manually searching for ODA agreement numbers, starting at 1, and stopping at 6 (as of June, 2020, there were only 6), on the GLO Map Server. We have included a CSV file of the leases associated with each ODA in the replication package as
raw_data/leases/odas.csv
. - Additional Scraped Parcel Information, which we recorded from the Land Grant Search. Some leases do not appear in the Lease Land File, which relates leases to their underlying parcels, so we used web scraping software to search for each lease on the Land Grant Search and record any available parcel information. We have included an R dataset containing these scraping results in the replication package as
raw_data/leases/lease_parcel_scraping/scraped_parcels.Rda
. - Lease Addenda Information, which are included as several pages at the end of each RAL lease's PDF file stored at the Land Grant Search. For each RAL lease, the addenda information shows how the lessee and lessor have agreed to changes from the standard lease document. We had a team of research assistants use the Land Grant Search to find each RAL lease's PDF document and then manually enter and standardize the addenda information into a spreadsheet. We have included a CSV file of this data in the replication package as
raw_data/addenda.csv
. - Manually Improved Firm Names, which we created from the population of lessee, assignee, and bidder names. We manually reviewed these names, did internet research to verify that similar sounding names corresponded to the same entity, and created a single identifier for sets of names representing the same firm. We have included an excel file of this data in the replication package as
intermediate_data/manually_improved_names.xlsx
.
Modified data includes:
- A variety of manual edits to lease contract information that we made after reviewing lease PDFs, including edits to bonus payments, royalty rates, undivided interest status, effective dates, and expiration dates. We have included this information in tabular form in the replication package as:
intermediate_data/leases/missing_bonus.csv
,intermediate_data/leases/missing_bonus2.csv
,intermediate_data/leases/missing_bonus3.xlsx
,intermediate_data/leases/missing_bonus4.xlsx
,intermediate_data/leases/lease_check.csv
,intermediate_data/leases/missing_undivided.csv
,intermediate_data/leases/royalty_fixes.xlsx
,intermediate_data/leases/effective_date_fixes.xlsx
,intermediate_data/leases/term_fixes.xlsx
, - A variety of manual edits to lease assignment information that we made after reviewing the lease assignment data and lease documents available at the Land Grant Search. We have included this information in tabular form in the replication package as
intermediate_data/assignments/partial_assignments.csv
,intermediate_data/assignments/manual_pass.xlsx
,intermediate_data/assignments/glo_assignments_fix.csv
.
US Energy Information Administration (EIA)
We use EIA's oil and gas price information. In particular, we use their monthly West Texas Intermediate Crude Oil spot price series, available here and their monthly Henry Hub spot price series, available here. We have included the versions of these files that we downloaded in September, 2019, in the replication package, as raw_data/prices/RWTCm.xls
and raw_data/prices/RNGWHHDm.xls
.
We also use EIA's shale play shapefiles. We use shale play boundaries from here, and the replication package includes versions of these files that we downloaded in May, 2019, as the set of files matching shape_files/TightOil_ShaleGas_IndividualPlays_Lower48_EIA/*Boundary*
. Finally, we use EIA's shale play thickness information for the Permian Shale and the Eagle Ford Shale. We have included the versions of these files that we downloaded in May, 2019, in the replication package as the set of files matching shape_files/TightOil_ShaleGas_IndividualPlys_Lower48_EIA/*Isopach*
.
Multi-Resolution Land Characteristics Consortium (MRLC)
We downloaded Land Cover data from the MLRC in November, 2017. Our analyses use the National Land Cover Database (NLCD), during the 2006 epoch. This raster data covers the entire continental US, and, as such, is contained in an extremely large file that can be hard to work with on a laptop computer. To aid replicators, we have included a version that we clipped to the state of Texas (using a separate desktop computer running ArcGIS) in our replication package, as the files in the directory shape_files/Land_Cover/landcover
.
Texas Department of Transportation (TXDOT)
We downloaded the Texas public road network (highways, county roads, city streets, toll roads, and local streets) in shape file format from the Texas Department of Transportation. The download link that we used is no longer active, but a current version of this data is available here. We have included the version of this file that we downloaded in August, 2017, in the replication package as the set of files in the folder shape_files/txdot-roads_tx/
.
US Geological Survey (USGS)
We downloaded shape files for rivers, streams, and water bodies from the US Geological Survey National Hydrography Dataset. Though it is possible to download the current version of the NHD from that link, the version we downloaded in September, 2018, is no longer available, so we include it as a part of the replication package as the set of files in the folder shape_files/usgs-rivers_tx/
.
US Census (Census)
We downloaded a shapefile describing the boundary of all US counties, which we use to identify counties for leases and parcels, as well as assemble a shape for the State of Texas, from the US Census. The download link that we used is no longer active, but a current version of this data is available here. We have included the version of this file that we downloaded in January, 2017, in the replication package as the set of files in the folder shape_files/us_county/
.
The commercially accessible data in this paper is the Texas Permanent School Fund Land Grid owned by P2 Energy Solutions. This data represents the location, shape, and land type of all original PSF parcels, in shape file format. We acquired an academic license to use this data in February, 2018, after approximately six months of negotiations, and are not able to redistribute it. Interested researchers can also access this data by contacting P2 Energy Solutions and negotiating a similar academic license. We would be happy to assist with any reasonable replication attempts for two years following publication.
make
, which is installed by default on UNIX-like systems (MacOS, Linux, etc). Windows users can installmake
from a variety of sources. We recommend installing Chocolately first, and then using it to installmake
with the terminal commandchoco install make
.- A LaTeX distribution.
- R 4.2.0, with the following packages and their versions
boot
(1.3-28)broom
(1.0.2)exactextractr
(0.8.1)fixest
(0.11.0)Formula
(1.2-4)furrr
(0.3.1)fuzzyjoin
(0.1.6)grf
(2.2.1)grid
(4.2.2)gstat
(2.1-0)kableExtra
(1.3.4)knitr
(1.41)lmtest
(0.9-40)lubridate
(1.9.0)lwgeom
(0.2-10)raster
(3.6-13)readxl
(1.4.1)rgdal
(1.6-3)rgeos
(0.6-1)RISCA
(1.0.3)sandwich
(3.0-2)sf
(1.0-9)tidyverse
(1.3.2)
Note: the program code/package_installation.R
, which is executed as part of the makefile process, will check for the presence of these packages and installs the newest version of them if they are not currently available.
Approximate time needed to reproduce the analyses on a 2018 vintage laptop computer:
- <10 minutes
- 10-60 minutes
- 1-8 hours
- 8-24 hours
- 1-3 days
- 3-14 days
- > 14 days
- Not feasible to run on a desktop machine, as described below.
The code was last run on a 4-core Intel-based laptop, with 16 gigabytes of RAM, running MacOS version 11.6.7.
- Programs in
code/Data_Cleaning
ingest and clean all of the raw data described above, saving their output in the data directoriesgenerated_data
andgenerated_shape_files
. - Programs in
code/Analysis
processes data from thegenerated_data
folder and generate tables in theoutput/tables
folder, figures in theoutput/figures
folder, and LaTeX fragments in theoutput/estimates
folder. - Programs in
code/functions
define commonly used functions in the analysis and data cleaning programs. - The program
code/paths.R
defines relative paths used by data cleaning and analysis programs. It depends on the presence of adata.txt
file in the root level of the replication archive (for details, see below in the "Instructions to Replicators" section). - The program
code/texas_constants.R
defines fixed values of various parameters used in data cleaning and analysis programs.
This guide assumes you have already downloaded the code replication package and have expanded that archive into a known location on your machine, e.g., /Users/tcovert/texas_code
or C:\rsweeney\texas_code
. There is a separate data replication package, available as a Harvard Dataverse repository, at https://doi.org/10.7910/DVN/9WJ3JK. If you have not already downloaded it, the code replication process can download it for you (see step 7 below). However, if you wish to download it separately, make note of where the data replication package is saved, and update data.txt
accordingly (see step 2 below).
- Create a separate folder on your computer for the data replication package, e.g.,
/Users/tcovert/texas_data
orC:\rsweeney\texas_data
. - Save a
data.txt
file in the root level of your copy of the paper's code folder. It should contain the full path to where you want the data replication package to be saved, such as/Users/tcovert/texas_data
orC:\rsweeney\texas_data
or where it is saved if you downloaded it separately. - If you haven't already installed R, install it.
- If you haven't already installed LaTeX, install it.
- If
make
isn't already installed, install it (see above for instructions to Windows users). - Navigate your terminal to the root level of the code repository.
- Optionally install the relevant R packages by typing
make install
. If any of the required packages is already installed on your computer, this step will not overwrite the package versions you already have. This step is required if you do not already have all of the relevant packages installed. - If you have not yet downloaded the data repository, type
make getdata
. This will download the data replication package and save it to the folder you have defined indata.txt
. If you have already downloaded the data replication package (and saved it in the location specified indata.txt
) you can skip this step. - To run the code replication and build a fresh pdf of the paper, type
make
.
If the replicator does not have access to the commercially available data, make
will execute the subset of analyses that are possible using only publicly available data.
Note: many of the programs make use of computationally intensive Double/Debiased Machine Learning (DML) estimation techniques from Chernozhukov et al, which represent the vast majority of the computational time reported above. Replicators who are willing to sacrifice some accuracy in order to obtain faster results should change the value of the variable dml_n
in code/texas_constants.R
to an odd number smaller than its default value, which is 101. This variable refers to the number of cross-fitting steps that the DML estimators average over, so setting it to something smaller (e.g., 11) would reduce the time spent in DML computation by a factor of 10.
The provided code reproduces:
- All numbers provided in text in the paper
- All tables and figures in the paper
- Selected tables and figures in the paper, as explained and justified below.
Figure/Table # | Program | Line Number | Output file | Note |
---|---|---|---|---|
Figure 1 | code/Analysis/lease_stats.R | 82 | output/figures/cohorts.png | |
Table 1 | code/Analysis/lease_stats.R | 223 | output/tables/summary_stats_by_type.tex | |
Table 2 | code/Analysis/parcel_stats.R | 169 | output/tables/summary_stats_parcel.tex | Requires commercial data |
Figure 2 | writeups/cs_texas.tex | 241 | none | |
Table 3 | code/Analysis/parcel_stats.R | 50 | output/tables/parcel_balance.tex | Requires commercial data |
Figure 3 | code/Analysis/leases_maps.R | 169 | output/figures/sample_glo_leases.png | |
Table 4 | code/Analysis/regressions_lease_contracts.R | 243 | output/tables/logbonus_regressions.tex | |
Table 5 | code/Analysis/regressions_lease_contracts.R | 244 | output/tables/royalty_term_regressions.tex | |
Table 6, panel a | code/Analysis/regressions_outputs.R | 295 | output/tables/stacked_output_levels.tex | |
Table 6, panel b | code/Analysis/regressions_outputs.R | 296 | output/tables/stacked_output_poisson.tex | |
Figure 4 | code/Analysis/parcel_monthplots.R | 97 | output/figures/active_plot.png | Requires commercial data |
Table 7 | code/Analysis/regessions_parcels.R | 181 | output/tables/parcel_regressions.tex | Requires commercial data |
Table 8 | code/Analysis/allocative_diffs.R | 110 | output/tables/allocative.tex | |
Table 9 | code/Analysis/regressions_firms.R | 61 | output/tables/firms_regressions.tex | |
Table 10 | code/Analysis/auction_analysis.R | 198 | output/tables/TopPairAuctionShares.tex | |
Table 11 | code/Analysis/auction_analysis.R | 775 | output/tables/auction_number_bids.tex | |
Table 12 | code/Analysis/auction_analysis.R | 362 | output/tables/auction_bonus_regressions.tex |
The Online Appendix contains additional tables and figures which map to code in this replication package as follows:
Figure/Table # | Program | Line Number | Output File | Note |
---|---|---|---|---|
Figure A.1 | code/analysis/leases_maps.R | 91 | output/figures/glo_leases_in_texas.png | |
Figure A.2 | code/analysis/parcel_hazard_analysis.R | 395 | output/figures/ipwkm10.png | Requires commercial data |
Figure A.3 | code/analysis/parcel_hazard_analysis.R | 429 | output/figures/ipwkm20.png | Requires commercial data |
Table A.1 | code/analysis/regressions_lease_contracts.R | 240 | output/tables/bonus_regressions.tex | |
Table A.2 | code/analysis/regressions_extracontrols.R | 186 | output/tables/lease_regressions_extra_bonus.tex | |
Table A.3 | code/analysis/regressions_extracontrols.R | 183 | lease_regressions_extra_output.tex | |
Table A.4 | code/analysis/regressions_drilled.R | 131 | output/tables/drilled_regressions.tex | |
Table A.5 | code/analysis/regressions_drilled.R | 132 | output/tables/logdboe_drilled_regressions.tex | |
Table A.6 | code/analysis/parcel_hazard_analysis.R | 146 | output/tables/spell_stats.tex | Requires commercial data |
Table A.7 | code/analysis/parcel_hazard_analysis.R | 360 | output/tables/logrank_stats.tex | Requires commercial data |
Table A.8 | code/analysis/regressions_lessors.R | 106 | output/tables/logbonus_regressions_lessor_heterogeneity.tex | |
Table A.9 | code/analysis/size_het.R | 86 | output/tables/logbonus_size_heterogeneity.tex | |
Table A.10 | code/analysis/regressions_parcels.R | 314 | output/tables/lease_parcel_comparisons_linear.tex | Requires commercial data |
Table A.11 | code/analysis/regressions_parcels.R | 317 | output/tables/lease_parcel_comparisons_poisson.tex | Requires commercial data |
Table A.12 | code/analysis/leases_stats.R | 340 | output/tables/summary_data_construction.tex | |
Table A.13 | code/analysis/auction_appendix.tex | 965 | output/tables/auction_appendix.tex | |
Table A.14 | code/analysis/regressions_bonus_raladdenda.R | 181 | output/tables/stacked_regressions_addenda.tex |
Covert, Thomas; Sweeney, Richard, 2022, "Replication Data for: "Relinquishing Riches: Auctions vs Informal Negotiations in Texas Oil and Gas Leasing", https://doi.org/10.7910/DVN/9WJ3JK, Harvard Dataverse, V1
Texas General Land Office, “Past Bid Sale Results,” April 2017.
Texas General Land Office, “Active Oil & Gas Leases,” June 2020.
Texas General Land Office, “Inactive Oil & Gas Leases,” June 2020.
Texas General Land Office, "Auction Bid Notices,” June 2020.
U.S. Energy Information Administration, “Eagle Ford play boundaries, structure and isopachs,” May 2019.
U.S. Energy Information Administration, “Henry Hub Natural Gas Spot Price,” September 2019.
U.S. Energy Information Administration, “Low permeability oil and gas play boundaries in Lower 48 States,” May 2019.
U.S. Energy Information Administration, “Permian Basin: Wolfcamp formation elevation and isopachs,” May 2019.
U.S. Energy Information Administration, “West Texas Intermediate Crude Oil Spot Price,” September 2019.
Multi-Resolution Land Characteristics Consortium, "National Land Cover Database," November 2017.
Texas Department of Transportation, “TxDOT Roadways,” August 2017.
U.S. Geological Survey, “National Land Cover Database,” June 2021.
U.S. Census, "TIGER/Line Shapefiles," January 2017.
P2 Energy Solutions, “Texas Permanent School Fund Land Grid,” February 2018.
V Chernozhukov, D Chetverikov, M Demirer, E Duflo, C Hansen, W Newey and James Robbins, "Double/debiased machine learning for treatment and structural parameters," The Econometrics Journal, February 2018