bspam

Speed-Accuracy Psychometric Modeling for Binomial Count Outcome Data with R

bspam is an R package that contains functions to fit the speed-accuracy psychometric model for repeatedly measured count outcome data (Potgieter, Kamata & Kara, 2017; Kara, Kamata, Potgieter & Nese, 2020), where the accuracy and speed are modeled by a binomial count and a log-normal latent variable models, respectively.

For example, the use of this modeling technique allows model-based calibration and scoring for oral reading fluency (ORF) assessment data. This document demonstrates some uses of the bspam package by using an ORF assessment data set.

Installation:

To install the bspam package, follow the steps below.

Install remotes package, by running the following R code.

if(!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")

Install bspam by running the following R code.

remotes::install_github("kamataak/bspam")

Optional:

bspam can implement a fully Bayesian approach for the calibration of model parameters and scoring by using JAGS or Stan. If you desire to use the fully Bayesian estimation, please download JAGS and Stan software on your computer.

To download and install JAGS, download the installation file from https://sourceforge.net/projects/mcmc-jags/files/JAGS/ as per operating system requirements and follow the installation steps. bspam internally uses runjags package as an interface to JAGS. runjags package needs to be installed separately once JAGS is installed.

Stan can be installed by following the steps explained here: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started. Note that these steps show the installation of RStan, which bspam uses as an internal package to run Stan code. Please follow the installation steps carefully along with recommended versions of R and RStudio to prevent any issues releated to running Stan with bspam.

Basic Usage: Task-level Data Analysis

Data Preparation

It is required that the data are prepared as a long-format data, where each row is data for a unique case, namely, a specific task from a specific person in a specific testing occasion. In the context of ORF assessment, each row is data for a specific passage from a specific student in a specific testing occasion. A data file can be prepared in any file format that is readable into R.

For example, a CSV formatted data can be read into R by using the read.csv() function that is part of the base packages. Also, data files in some statistical software formats can be read in to R with functions in packages, such as the haven package. For example, read_spss() function can be used to read in an SPSS formatted data file into R.

The data set should minimally contain the following 7 variables: (1) individual ID, (2) group ID, (3) testing occasion ID, (4) task ID, (5) the maximum number of attempts in the task, (6) the number of successes in the task, and (7) the time took to complete the task. Note if the data are coming from only one testing occasion and one group, the variables (2) group ID, and (3) testing occasion ID should contain an arbitrary constant value, such as all 1’s.

Variable names and the order of the variables can be flexible in the data frame. When running functions in this package, variable names for required variables need to be specified, as shown in examples below.

Then, the data need to be read into R as a data frame.

Data Set For the Demonstration

The bspam package comes with several example data sets in the ORF assessment context. To demonstrate some basic usage of key functions in the package, a sample ORF assessment data set is used here. In the context of an ORF assessment, a task is a passage given to the student to read aloud, and an attempt is student’s reading on a word in the passage, which is scored correct (i.e., success) or not. Accordingly, the number of successes in the task is the number of words correctly read in the passage.

The data set passage2 is a passage-level student data and consisted of reading accuracy and time data for 12 passages from 85 students. Although the 85 students were assigned to all 12 passages, the number of passages read by the 85 students varied from 2 to 12 passages. The number of students per passage were between 59 to 79. This is a small subset of the data collected by Nese and Kamata (2014-2018).

Load Packages

Load required packages, and view the example data set passage2.

library(tidyverse)
library(bspam)
View(passage2)

Task (i.e., Passage) Calibration

Calibrate the passages using the fit.model() function by implementing the Monte Carlo EM algorithm described in Potgieter et al. (2017).

MCEM_run <- fit.model(person.data = passage2,
                      person.id = "id.student",
                      task.id = "id.passage",
                      max.counts = "numwords.pass",
                      obs.counts = "wrc",
                      time = "sec",
                      k.in = 5,
                      reps.in = 50,
                      est = "mcem")
MCEM_run

By default, the standard errors for the model parameters are not estimated. This will allow one to increase the number of Monte-Carlo iterations reps.in to improve the quality of the model parameter estimates, while minimizing the computation time. The number of reps.in should be 50 to 100 in realistic calibrations. Also note that k.in is the number of imputations for the MCEM algorithm, where the default is 5. Note that SE’s for model parameters are not required for running the scoring() function to estimate latent factor scores and WCPM scores in the next step. If standard errors for model parameters are desired, an additional argument se = "analytic" or se = "bootstrap" needs to be added to the fit.model() function.

Passage calibration can also be done by using the est = "bayes" option, which implements a fully Bayesian approach through Gibbs sampling or Hamiltonian Monte Carlo with JAGS and Stan, respectively. Currently, Stan will be used with complete data, that is, if all students have data on all passages. Otherwise, JAGS will be used as it provides faster computations with missing observations. Note that bspam runs JAGS in the auto mode, which does not require the user to supply any specifications for Bayesian estimation (e.g., number of the iterations) or monitor convergence. The standard deviations of the posterior distributions are reported as the standard errors. A specification for the fit.model() function can look like as follows.

Bayes_run <- fit.model(person.data = passage2,
                       person.id = "id.student",
                       task.id = "id.passage",
                       max.counts = "numwords.pass",
                       obs.counts = "wrc",
                       time = "sec",
                       est = "bayes",
                       verbose=TRUE)
Bayes_run

Scoring: Estimating Latent Factor Scores and WCPM Scores 1

In order to estimate latent factor scores and/or WCPM scores, the scoring() function needs to be run.

Note that we use the output object MCEM_run from the previous passage calibration phase. By default (type = "general"), only factor scores ($\theta$ and $\tau$) and their standard errors are reported. By providing an argument type = "orf", WCPM scores and their standard errors are also reported. Also, WCPM scores and/or factor scores, and their standard errors will be estimated for all cases in the data by default.

There are several estimator options and standard error estimation options: maximum likelihood (MLE), maximum a posteriori (MAP), expected a posteriori (EAP), and fully Bayesian approach. In this example, maximum a priori (MAP) estimators for scoring and analytic approach for estimating standard errors are used.

Scoring for All Cases

To estimate WCPM scores for all observations in the data set passage2, we use the scoring() function as follows.

WCPM_all <- scoring(calib.data=MCEM_run, 
                    person.data = passage2,
                    person.id = "id.student",
                    occasion = "occasion",
                    group = "grade",
                    task.id = "id.passage",
                    max.counts = "numwords.pass",
                    obs.counts = "wrc",
                    time = "sec",
                    est = "map", 
                    se = "analytic",
                    type="general")
summary(WCPM_all)

Scoring for Selected Cases

If the computations of WCPM scores and/or factor scores for only selected cases are desired, a list of cases needs to be provided by the cases = argument. The list of cases has to be a one-variable data frame with a variable name cases. The format of case values should be: personid_occasion. This one-variable data frame can be created manually, just like shown below. Also, it can be generated by the get.cases() function, which will be shown later in this document.

sample.cases <- data.frame(cases = c("2033_fall", "2043_fall", "2089_fall"))
WCPM_sample <- scoring(calib.data = MCEM_run, 
                       person.data = passage2,
                       person.id = "id.student",
                       occasion = "occasion",
                       group = "grade",
                       task.id = "id.passage",
                       max.counts = "numwords.pass",
                       obs.counts = "wrc",
                       time = "sec",
                       cases = sample.cases,
                       est = "eap", 
                       se = "analytic",
                       type = "orf")
summary(WCPM_sample)

Scoring with External Task Set

Also, we can specify a set of tasks (i.e., passages) to scale the WCPM scores. If WCPM scores are scaled with a set of passages that is different from the set of passages the student read, the set of passages is referred to as an external passage set for the ORF assessment context, or external task set in general.

The use of an external passage set is particularly important in the context of ORF assessment to make the estimated WCPM scores to be comparable between students who read different sets of passages, as well as within students for longitudinal data, where a student is likely to read different sets of passages.

WCPM_sample_ext1 <- scoring(calib.data = MCEM_run, 
                            person.data = passage2,
                            person.id = "id.student",
                            occasion = "occasion",
                            group = "grade",
                            task.id = "id.passage",
                            max.counts = "numwords.pass",
                            obs.counts = "wrc",
                            time = "sec",
                            cases = sample.cases, 
                            external = c("32004","32010","32015","32016","33003","33037"),
                            est = "eap", 
                            se = "analytic",
                            type = "orf")
summary(WCPM_sample_ext1)

A fully Bayesian approach can also be used as an estimator in the scoring() function. Note that if est="bayes" is specified, there is no need to use the se= argument. By default, the standard deviations of the posterior distributions are reported as the standard errors, along with 95% high density intervals, which is analogous to 95% confidence intervals. Here is an example of using the fully Bayesian approach for estimating WCPM scores from the same external passages used in the previous example.

WCPM_sample_ext1_bayes <- scoring(calib.data=MCEM_run, 
                                  person.data = passage2,
                                  person.id = "id.student",
                                  occasion = "occasion",
                                  group = "grade",
                                  task.id = "id.passage",
                                  max.counts = "numwords.pass",
                                  obs.counts = "wrc",
                                  time = "sec",
                                  cases = sample.cases, 
                                  external = c("32004","32010","32015","32016","33003","33037"),
                                  est = "bayes",
                                  type = "orf")
summary(WCPM_sample_ext1_bayes)

Scoring: Estimating Latent Factor Scores and WCPM Scores 2

Alternatively, we can run the scoring() in two steps.

Step 1: Prepare the data using the prep() function, where required data set for the scoring() function is prepared, including changing variable names and a generation of the natural-logarithm of the time data.

The output from the prep() function is a list of two components. The data.long component is a data frame, which is a long format of student response data, and the data.wide is list that contains four components, including a wide format of the data, as well as other information such as the number of passages and the number of words for each passage.

One benefit of this two-step approach is that we can use another utility function get.cases() to generate a list of unique cases with the output of the prep() function. This list can be useful when our interest is to estimate WCPM scores only for selected cases.

data <- prep(data = passage2,
             person.id = "id.student",
             occasion = "occasion",
             group = "grade",
             task.id = "id.passage",
             max.counts = "numwords.pass",
             obs.counts = "wrc",
             time = "sec")

Generate a list of unique cases:

all.cases <- get.cases(data$data.long)
all.cases

Step 2: Run the scoring() function to estimate WCPM scores. Note that we pass the output object MCEM_run from the passage calibration phase, as well as the manipulated data data.long from Step 1. By default, WCPM scores will be estimated for all cases in the data. Additionally, there are several estimator options and standard error estimation options. In this example, maximum a priori (MAP) estimators for model parameter estimation and analytic approach to estimate standard errors are used.

WCPM_all <- scoring(calib.data=MCEM_run, 
                    person.data = data$data.long,
                    est = "map", 
                    se = "analytic",
                    type = "orf")
summary(WCPM_all)

WCPM_sample_ext2 <- scoring(calib.data=MCEM_run, 
                            person.data = data$data.long,
                            cases = sample.cases, 
                            external = c("32004","32010","32015","32016","33003","33037"),
                            est = "map", 
                            se = "analytic",
                            type = "orf")
summary(WCPM_sample_ext2)

Please see the package website for more detailed usage of the package.

Citation

Kara, Y., Kamata, A., Potgieter, C., & Nese, J. F. (2020). Estimating model-based oral reading fluency: A bayesian approach with a binomial-lognormal joint latent model. Educational and Psychological Measurement, 1–25.

Nese, J. F. T. & Kamata, A. (2014-2018). Measuring Oral Reading Fluency: Computerized Oral Reading Evaluation (Project No. R305A140203) [Grant]. Institute of Education Sciences, U.S. Department of Education. https://ies.ed.gov/funding/grantsearch/details.asp?ID=1492

Potgieter, N., Kamata, A., & Kara, Y. (2017). An EM algorithm for estimating an oral reading speed and accuracy model. Manuscript submitted for publication.

Qiao, X, Potgieter, N., & Kamata, A. (2023). Likelihood Estimation of Model-based Oral Reading Fluency. Manuscript submitted for publication.

Copyright Statement

The bspam package is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

The bspam package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this package. If not, see http://www.gnu.org/licenses/.

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
R		R
data		data
docs		docs
inst		inst
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README		README
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
bspam.Rproj		bspam.Rproj
bspam_1.2.2.pdf		bspam_1.2.2.pdf
bspam_1.2.3.pdf		bspam_1.2.3.pdf
orfrlog.log		orfrlog.log

kamataak/bspam

Folders and files

Latest commit

History

Repository files navigation