Tools for Quantitative Archaeology – in R

2020-11-06

Tools for Quantitative Archaeology (TFQA) is a collection of DOS programs developed by Keith Kintigh to perform statistical analyses used in archaeology. TFQA includes 50 programs and so gives a good representation of the range of analyses used by archaeologists. The purpose of this document is to track which of these analyses are currently available in R packages. We hope it will be useful in both porting TFQA-based analyses to R, and in highlighting which methods are not yet implemented in R packages.

The table below presents a list of TFQA programs and their equivalent functions in R. By “equivalent”, we mean R functions that provide substantially the same functionality as the original TFQA program with a similar high-level user interface; we can assume that all of the analyses listed can be performed in R if the user is prepared to reimplement them themselves. The list of R equivalents is also not intended to be exhaustive. The packages/functions listed are a subjective assessment of the “best” (most complete/widely used/actively maintained) way to get the same results as the TFQA program using R. See open-archeo.info for a general list of archaeology-related R packages.

TFQA is still available and actively maintained. Some of the more recently added programs are also available as free and open source software (see https://github.com/kintigh). Matt Peeples has also ported many to R scripts. This list should therefore be seen as list of alternatives to TFQA (for those who prefer R, open source software, and/or cannot run DOS programs), rather than superseding it.

This is a work-in-progress. Please feel free to contribute by editing the table and submitting a pull request, or opening an issue with suggestions.

R equivalents of TFQA programs

Generated from tfqar.csv.

TFQA Program	Description	Available in R?	R package(s)	R function(s)	Notes
Spatial Analysis
CONTIG	Monte Carlo evaluation of the statistical significance of the observed… Monte Carlo evaluation of the statistical significance of the observed degree of contiguity of grid units assigned to the same cluster.
FISHER	Calculates Fisher’s Exact test	Yes	stats	stats::fisher\_test()
GRID	Aggregates point-provenience data into counts by type for each grid… Aggregates point-provenience data into counts by type for each grid unit.	Yes	sf	sf::st\_join()	Tutorial
HOA	Computes Hodder and Okell’s A and dispersion ratios	Yes	GmAMisc	GmAMisc::Aindex()
KMEANS	Performs k-means cluster analysis with extensive output designed to facilitate… Performs k-means cluster analysis with extensive output designed to facilitate interpretation. The program can be used to cluster analyze any data set, but has special features developed for use in archaeological spatial analysis. In particular, Kintigh and Ammerman’s (1982) k-means pure locational clustering method can be performed. The program also executes the clustering for Whallon’s (1984) unconstrained clustering method on data smoothed using the GRID or LDEN programs.				Unpackaged script: mpeeples2008/Kmeans
KMPLT	Plots the SSE and (2 dimensional) cluster configuration results of… Plots the SSE and (2 dimensional) cluster configuration results of KMEANS on screen and creates hard-copy publishable quality plots				Unpackaged script: mpeeples2008/Kmeans
KOETJE	Performs the Monte Carlo analysis of homogeneity of cluster configurations… Performs the Monte Carlo analysis of homogeneity of cluster configurations as suggested by Koetje (1987).
LDEN	Performs Johnson’s (1984) Local Density Analysis on point-provenienced or grid… Performs Johnson’s (1984) Local Density Analysis on point-provenienced or grid data. The program also outputs counts or percentages of points of different types that occur within a circular neighborhood around each data point.
LDPLT	Plots selected local density coefficients computed by LDEN against radius,… Plots selected local density coefficients computed by LDEN against radius, so behavior of coefficients for different pairs of classes can be easily observed over a range of radii
NEIG	An efficient, general-purpose nearest-neighbor (Whallon 1984) and gravity model program… An efficient, general-purpose nearest-neighbor (Whallon 1984) and gravity model program useful for intrasite spatial analysis or regional analysis. It allows categorization of items by class (e.g. site type or tool type) and permits the calculation of within or between class neighbors.
RANDPT	Generates random sets of coordinates, including for clumped distributions with… Generates random sets of coordinates, including for clumped distributions with different parameters. Also random walks any number of points in an existing distribution with arbitrary number of steps and step length.	Partially	spatstat	spatstat::rpoint() spatstat::runifpoint() spatstat::rpoispp()	Not sure about the “random walk” part.
Diversity
BOONE	Calculates, for a set of proveniences with counts by artifact… Calculates, for a set of proveniences with counts by artifact class, Boone’s (1987) assemblage heterogeneity measure and related values.
DIVERS	Calculates richness and evenness (H/Hmax) dimensions of diversity for a… Calculates richness and evenness (H/Hmax) dimensions of diversity for a given data set and uses Monte-Carlo methods to derive expected diversity for a model distribution over a range of sample sizes (Kintigh 1984, 1989).
DIVMEAS	Calculates several diversity measures including Richness, Simpson’s, Shannon’s, Brillouin’s, and… Calculates several diversity measures including Richness, Simpson’s, Shannon’s, Brillouin’s, and the Renyi and Delta families of generalized diversity measures for any given distribution of counts.	Yes	tabula, vegan	tabula::index\_richness() tabula::index\_heterogeneity() vegan::renyi()	tabula is not currently available on CRAN
DIVPLT	Plots the results of DIVERS on screen and creates publishable… Plots the results of DIVERS on screen and creates publishable quality plots
EVALC	Performs a Monte Carlo evaluation of the significance of an… Performs a Monte Carlo evaluation of the significance of an observed value of Simpson’s C measure of diversity relative to a given assumption about the population.
RAREFY	Performs rarefaction analysis for sets of sample counts in a… Performs rarefaction analysis for sets of sample counts in a CSV file as described by Baxter (2001). Provides expected richness, standard deviation of the expected, Z score, and probability for each larger sample to every smaller sample size. Also outputs expected richness for each sample up to its sample size for graphing.
Distance
BAYES	This program implements Bayesian methods for proportions as described by… This program implements Bayesian methods for proportions as described by Iversen (1984). Intervals are calculated and graphed for Bayesian estimates of proportions based on both flat and informative priors.
BINOMIAL	Computes binomial probabilities and population proportion intervals for a sample.
BRSAMPLE	Provides a Monte Carlo estimate of the sampling error of… Provides a Monte Carlo estimate of the sampling error of differences of the Brainerd Robinson coefficient calculated between a sample and a known population or between two samples drawn from the same population
CLCA	Performs a Complete Linkage Cluster Analysis on up to 180… Performs a Complete Linkage Cluster Analysis on up to 180 cases. It takes as input an upper triangular distance matrix, as is created by the DIST program. As output, it lists the sequence of item/cluster joins and fusion values but does not create a dendrogram.
DIST	Computes a triangular matrix of distance or similarity measures: Euclidean… Computes a triangular matrix of distance or similarity measures: Euclidean Distance, Pearson’s r, Brainerd-Robinson Coefficient, Jaccard’s Coefficient, Simple Matching Coefficient, and Gower Coefficient.	Partially	vegan	vegan::vegdist()	vegan implements Euclidean, Jaccard, and Gower distances.
FORD	Plots a publishable quality battleship curve (Ford) diagram	Yes	tabula	tabula::plot\_ford()	tabula is not currently available on CRAN
POISSON	Computes Poisson and negative binomial probabilities, given expected counts.
resampleBRED	Provide Monte Carlo estimates of the sampling error of differences… Provide Monte Carlo estimates of the sampling error of differences of the Brainerd-Robinson and Euclidean Distance coefficients calculated between a sample and a known population or between two samples drawn from the same population, as described and applied in Deboer et al. (1996).
TWOWAY	Provides tests of independence and measures of association and prints… Provides tests of independence and measures of association and prints tables that have been standardized with a number of techniques. Standard Chi² and G tests of independence are provided. Using Monte Carlo methods, Chi² and G tests can be performed on tables with very small expected counts. A Chi² goodness of fit test (with externally determined expected values) can also be calculated. Measures of association include Yule’s Q, Phi, Cramer’s V and proportional reduction of error measures Tau and Lambda. Table standardization methods include median polish (Lewis 1986) and Mosteller (multiplicative) standardization as well as Haberman’s z-score standardization for independent variables used by Grayson (1984) and Allison’s binomial probability-based z-score standardization. It will also print row, column, and cell percents, Chi² cell contributions, and Chi² expected values.
Dating and Demography
ARRANGE	Creates a probabilistic estimate of the range of site dates… Creates a probabilistic estimate of the range of site dates based on the proportions of dated ceramic types in the assemblage. Output includes a density plot against time. The program also calculates mean ceramic dates. This method is described in Steponaitis and Kintigh (1993).				Unpackaged script: mpeeples2008/Mean-Ceramic-Date-and-Error-Estimation
C14	provides a graphical way to analyze sets of radiocarbon dates…. provides a graphical way to analyze sets of radiocarbon dates. Each radiocarbon date is treated not as a single point in time but as a normally distributed probability with a mean and standard deviation given by the lab. In evaluating several dates, for each interval the probability distributions associated with the dates are summed. For each temporal interval, an expected number of dates is calculated and plotted in a histogram.	Yes	rcarbon	rcarbon::plot() rcarbon::spd()	Also stratigraphr for tidy alternatives.
CALCULATE\_K	Calculates K for for use in Cowgill’s formula that estimates… Calculates K for for use in Cowgill’s formula that estimates the span of true interval producing an observed set of measured dates with Gaussian errors. It calculates the value of K for any standard deviation of a Normal Distribution. See Cowgill and Kintigh (2020).	No			Pascal source available: kintigh/phaselen
DSPLIT	Compares and combines radiocarbon samples using the procedure published in… Compares and combines radiocarbon samples using the procedure published in Archaeometry by Wilson and Ward (1981).
MATCHINTERVAL	Performs a MonteCarlo evaluation of the correspondence between temporal intervals… Performs a MonteCarlo evaluation of the correspondence between temporal intervals with extreme climate events and the occurrence dates of major cultural changes as described and applied by Kintigh & Ingram (2018).
PHASELEN	Provides a Monte Carlo analysis to estimate the span of… Provides a Monte Carlo analysis to estimate the span of true span producing an observed set of measured dates with Gaussian errors such as radiocarbon and obsidian hydration dates. The program has an option for calibration.	No			Pascal source available: kintigh/phaselen
ROOMACCUM	Estimates within-period rates of population growth (or decline) given structure… Estimates within-period rates of population growth (or decline) given structure counts dated to a sequence of chronological periods as described and applied by Kintigh and Peeples (2020). It assumes a knowledge of the number of structures that date to each specific period, the period lengths, and an estimated structure use life. The population growth rate estimates are derived by simulating the construction (due to replacement and population growth) and abandonment (due to the completion of the use life or population decline) of individual structures such that the observed number of rooms dating to a period matches the simulated number of rooms.	No			Pascal source available: kintigh/RoomAccum.
Subsurface Testing
PLACESTP	Calculates the optimal placement of test units in a rectangular… Calculates the optimal placement of test units in a rectangular or linear survey area. For a user-specified number of survey transects (or user-specified lengthwise and width-wise spacing of test units), in any one of three basic configurations, the program will print out the coordinates of the optimal test unit placement, along with some statistics about the largest circular site that can go unsampled in the survey area. This program implements the formulae provided by Krakker, Shott, and Welch (1983) and revised in Kintigh (1988).	No			Could be implemented in fieldwalkr
STP	Probabilistic evaluation of subsurface testing designs as described in Kintigh… Probabilistic evaluation of subsurface testing designs as described in Kintigh 1988. STP uses Monte-Carlo methods to evaluate the effectiveness of a test unit layout within a survey area to locate sites with a given size and artifact density.
Utility
ADFUTIL	Generates random data sets and manipulates files in the data… Generates random data sets and manipulates files in the data format used by the analysis programs. It allows the creation of random data set of any size. Variables may be uniform or normally distributed variables with user specified ranges or means standard deviations. ADFUTIL allows the deletion of columns (variables), selective deletion of rows (observations) based on values in a column, replacement of values in a column, randomization of columns for Monte Carlo analysis, the addition of new columns from another data set, and selection of a random sample of cases.
CNTCNV	Program to speed data input and increase entry accuracy for… Program to speed data input and increase entry accuracy for count data, where the number of categories is large relative to the number of items counted for an observation (e.g. surface collection counts of 40 ceramic type divided into 8 vessel forms). It permits a highly abbreviated input format but it writes out a standard matrix (of the sort read by most analysis programs) with one count per category of each observation. The program provides labeled printouts of the data and can perform elaborate aggregation of count categories and simple aggregation of observations.
CntEdit	CntEdit is a companion program to CNTCNV and can be… CntEdit is a companion program to CNTCNV and can be used to do global or selective substititions of row or column field values in a data file formatted for CNTCNV.
CntRefmt	CntRefmt is a companion program to CNTCNV that reformats row-column-count… CntRefmt is a companion program to CNTCNV that reformats row-column-count segments of records formatted for CntCnv, e.g, to make differently formatted files consistent or to change the spacing to make reading easier.
CONVSYS	Converts a SYSTAT internal format data file into a raw… Converts a SYSTAT internal format data file into a raw data file, a variable label file, and a case label file that can be used these and other programs that read free-format ASCII data. Works with versions 2.0 and above of SYSTAT, on files of any size.
HPPLOT	Provides a flexible user interface to a Hewlett Packard compatible… Provides a flexible user interface to a Hewlett Packard compatible plotters. Its can create a customized analysis graphics from a raw data file edited to include the plot commands.
MVC	Permits arbitrarily complex copying of sets of columns in an… Permits arbitrarily complex copying of sets of columns in an input record into sets of columns in an output record. It can extract data from fixed-format data records for use with analytical programs that require free format input. Files of any size can be processed.
SCAT	Produces screen and publishable quality scatter plots of variables. All… Produces screen and publishable quality scatter plots of variables. All points may be plotted with the same symbol, or different symbols can be plotted based on the value of a variable.	Yes	ggplot2	ggplot2::geom\_point()
SORTLINE	A general purpose sort utility, SORTLINE sorts fixed-format data files… A general purpose sort utility, SORTLINE sorts fixed-format data files of up to 32,767 lines into an order defined by any number of user-specified sort fields.	Yes	dplyr	dplyr::arrange()
SPLIT	Divides a large file into sections that can be recombined… Divides a large file into sections that can be recombined with the DOS COPY command. Thus, large hard disk file can be split and copied onto several floppies.
UNTAB	Replaces tabs and control characters in a file with blanks… Replaces tabs and control characters in a file with blanks so they can be used with analysis programs that require pure ASCII files (e.g. SYSTAT).
TFQA program descriptions copied from http://tfqa.com/programs.htm

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md
tfqar.Rproj		tfqar.Rproj
tfqar.csv		tfqar.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.Rmd

README.Rmd

README.md

README.md

tfqar.Rproj

tfqar.Rproj

tfqar.csv

tfqar.csv

Repository files navigation

Tools for Quantitative Archaeology – in R

R equivalents of TFQA programs

About

sslarch/tfqar

Folders and files

Latest commit

History

Repository files navigation

Tools for Quantitative Archaeology – in R

R equivalents of TFQA programs

About

Topics

Resources

Stars

Watchers

Forks