Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
Akihito Kamata committed Feb 23, 2024
1 parent e5a57af commit 3b38b34
Show file tree
Hide file tree
Showing 2 changed files with 112 additions and 90 deletions.
59 changes: 30 additions & 29 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ output: github_document

# Speed-Accuracy Psychometric Modeling for Binomial Count Outcome Data with R

`bspam` is an R package that contains functions to fit the speed-accuracy psychometric model for count outcome data (Potgieter, Kamata & Kara, 2017; Kara, Kamata, Potgieter & Nese, 2020), where the accuracy is modeled by a binomial count latent variable model.
`bspam` is an R package that contains functions to fit the speed-accuracy psychometric model for repeatedly measured count outcome data (Potgieter, Kamata & Kara, 2017; Kara, Kamata, Potgieter & Nese, 2020), where the accuracy and speed are modeled by a binomial count and a log-normal latent variable models, respectively.

For example, the use of this modeling technique allows model-based calibration and scoring for oral reading fluency (ORF) assessment data. This document demonstrates some uses of the `bspam` package by using an ORF assessment data set.

Expand All @@ -23,15 +23,15 @@ remotes::install_github("kamataak/bspam")
```

### Optional:
`bspam` can implement Bayesian estimation by using JAGS and Stan. If you desire to use the Bayesian estimation, please download JAGS and Stan software on your computer.
`bspam` can implement a fully Bayesian approach for the calibration of model parameters and scoring by using JAGS or Stan. If you desire to use the fully Bayesian estimation, please download JAGS and Stan software on your computer.

To download and install JAGS, download the installation file from https://sourceforge.net/projects/mcmc-jags/files/JAGS/ as per operating system requirements and follow the installation steps. `bspam` internally uses `runjags` package as an interface to JAGS. `runjags` package needs to be installed separately once JAGS is installed.

Stan can be installed by following the steps explained here: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started. Note that these steps show the installation of `RStan`, which `bspam` uses as an internal package to run Stan code. Please follow the installation steps carefully along with recommended versions of R and RStudio to prevent any issues releated to running Stan with `bspam`.

## Basic Usage:
## Basic Usage: Task-level Data Analysis
### Data Preparation
It is required that the data are prepared as a long-format data, where each row is data for a unique case, namely, a specific task from a specific student in a specific testing occasion. A data file can be prepared in any file format that is readable into R.
It is required that the data are prepared as a long-format data, where each row is data for a unique case, namely, a specific task from a specific person in a specific testing occasion. In the context of ORF assessment, each row is data for a specific passage from a specific student in a specific testing occasion. A data file can be prepared in any file format that is readable into R.

For example, a CSV formatted data can be read into R by using the `read.csv()` function that is part of the base packages. Also, data files in some statistical software formats can be read in to R with functions in packages, such as the `haven` package. For example, `read_spss()` function can be used to read in an SPSS formatted data file into R.

Expand All @@ -42,9 +42,9 @@ Variable names and the order of the variables can be flexible in the data frame.
Then, the data need to be read into R as a data frame.

### Data Set For the Demonstration
The `bspam` package comes with several data sets. To demonstrate some basic usage of key functions in the package, a sample ORF assessment data set is used here. In the context of an ORF assessment, a task is a passage given to the student to read aloud, and an attempt is student's reading on a word in the passage, which is scored correct (i.e., success) or not. Accordingly, the number of successes in the task is the number of words correctly read in the passage.
The `bspam` package comes with several example data sets in the ORF assessment context. To demonstrate some basic usage of key functions in the package, a sample ORF assessment data set is used here. In the context of an ORF assessment, a task is a passage given to the student to read aloud, and an attempt is student's reading on a word in the passage, which is scored correct (i.e., success) or not. Accordingly, the number of successes in the task is the number of words correctly read in the passage.

The data set `passage2` is a passage-level student data and consisted of reading accuracy and time data for 12 passages from 85 students. Although the 85 students were assigned to all 12 passages, the number of passages read by the 85 students varied from 2 to 12 passages. The number of students per passage were between 59 to 79.
The data set `passage2` is a passage-level student data and consisted of reading accuracy and time data for 12 passages from 85 students. Although the 85 students were assigned to all 12 passages, the number of passages read by the 85 students varied from 2 to 12 passages. The number of students per passage were between 59 to 79. This is a small subset of the data collected by Nese and Kamata (2014-2018).

### Load Packages
Load required packages, and view the example data set `passage2`.
Expand All @@ -54,7 +54,7 @@ library(bspam)
View(passage2)
```

### Passage Calibration
### Task (i.e., Passage) Calibration

Calibrate the passages using the `fit.model()` function by implementing the Monte Carlo EM algorithm described in Potgieter et al. (2017).

Expand All @@ -71,7 +71,7 @@ MCEM_run <- fit.model(person.data = passage2,
MCEM_run
```

By default, the standard errors for the model parameters are not estimated. This will allow one to increase the number of Monte-Carlo iterations `reps.in` to improve the quality of the model parameter estimates, while minimizing the computation time. The number of `reps.in` should be 50 to 100 in realistic calibrations. Note that SE's for model parameters are not required for running the `scoring()` function to estimate latent factor scores and WCPM scores in the next step. If standard errors for model parameters are desired, an additional argument `se = "analytic"` or `se = "bootstrap"` needs to be added to the `fit.model()` function.
By default, the standard errors for the model parameters are not estimated. This will allow one to increase the number of Monte-Carlo iterations `reps.in` to improve the quality of the model parameter estimates, while minimizing the computation time. The number of `reps.in` should be 50 to 100 in realistic calibrations. Also note that `k.in` is the number of imputations for the MCEM algorithm, where the default is 5. Note that SE's for model parameters are not required for running the `scoring()` function to estimate latent factor scores and WCPM scores in the next step. If standard errors for model parameters are desired, an additional argument `se = "analytic"` or `se = "bootstrap"` needs to be added to the `fit.model()` function.

Passage calibration can also be done by using the `est = "bayes"` option, which implements a fully Bayesian approach through Gibbs sampling or Hamiltonian Monte Carlo with JAGS and Stan, respectively. Currently, Stan will be used with complete data, that is, if all students have data on all passages. Otherwise, JAGS will be used as it provides faster computations with missing observations. Note that `bspam` runs JAGS in the auto mode, which does not require the user to supply any specifications for Bayesian estimation (e.g., number of the iterations) or monitor convergence. The standard deviations of the posterior distributions are reported as the standard errors. A specification for the `fit.model()` function can look like as follows.

Expand All @@ -87,22 +87,16 @@ Bayes_run <- fit.model(person.data = passage2,
Bayes_run
```

### Estimating Latent Factor Scores and WCPM Scores 1
### Scoring: Estimating Latent Factor Scores and WCPM Scores 1

In order to estimate WCPM scores and/or latent factor scores, the `scoring()` function needs to be run.
In order to estimate latent factor scores and/or WCPM scores, the `scoring()` function needs to be run.

Note that we use the output object `MCEM_run` from the passage calibration phase. By default (`type = "general"`), only factor scores ($\theta$ and $\tau$) and their standard errors are reported. By providing an argument `type = "orf"`, WCPM scores and their standard errors are reported, rather than the factor scores. Also, latent factor scores or WCPM scores and their standard errors will be estimated for all cases in the data by default.
Note that we use the output object `MCEM_run` from the previous passage calibration phase. By default (`type = "general"`), only factor scores ($\theta$ and $\tau$) and their standard errors are reported. By providing an argument `type = "orf"`, WCPM scores and their standard errors are also reported. Also, WCPM scores and/or factor scores, and their standard errors will be estimated for all cases in the data by default.

There are several estimator options and standard error estimation options. In this example, maximum a priori (MAP) estimators for model parameter estimation and analytic approach to estimate standard errors are used.
There are several estimator options and standard error estimation options: maximum likelihood (MLE), maximum a posteriori (MAP), expected a posteriori (EAP), and fully Bayesian approach. In this example, maximum a priori (MAP) estimators for scoring and analytic approach for estimating standard errors are used.

#### Estimation for All Cases
To estimate WCPM scores for all observations in the dataset `passage2`, we use the `scoring()` function as follows.

#Note: Did not run initially and gave an error:
#Error in e$fun(obj, substitute(ex), parent.frame(), e$data) :
# worker initialization failed: there is no package called ‘miscTools’

#Installed `miscTools` package, then it ran.
#### Scoring for All Cases
To estimate WCPM scores for all observations in the data set `passage2`, we use the `scoring()` function as follows.

```{r eval = F}
WCPM_all <- scoring(calib.data=MCEM_run,
Expand All @@ -121,9 +115,9 @@ summary(WCPM_all)
```


#### Estimation for Selected Cases
#### Scoring for Selected Cases

If the computations of WCPM scores and/or factor scores for only selected cases are desired, a list of cases needs to be provided by the `cases =` argument. The list of cases has to be a one-variable data frame with a variable name `cases`. The format of case values should be: `studentid_occasion`. This one-variable data frame can be created manually, just like shown below. Also, it can be generated by the `get.cases()` function, which will be shown later in this document.
If the computations of WCPM scores and/or factor scores for only selected cases are desired, a list of cases needs to be provided by the `cases =` argument. The list of cases has to be a one-variable data frame with a variable name `cases`. The format of case values should be: `personid_occasion`. This one-variable data frame can be created manually, just like shown below. Also, it can be generated by the `get.cases()` function, which will be shown later in this document.

```{r eval = F}
sample.cases <- data.frame(cases = c("2033_fall", "2043_fall", "2089_fall"))
Expand All @@ -143,10 +137,10 @@ WCPM_sample <- scoring(calib.data = MCEM_run,
summary(WCPM_sample)
```

#### Estimation with External Task Set
Also, we can specify a set of passages to scale the WCPM scores. If WCPM scores are scaled with a set of passages that is different from the set of passages the student read, the set of passages is referred to as an **external task set**, or **external passage set** for the ORF assessment context.
#### Scoring with External Task Set
Also, we can specify a set of tasks (i.e., passages) to scale the WCPM scores. If WCPM scores are scaled with a set of passages that is different from the set of passages the student read, the set of passages is referred to as an **external passage set** for the ORF assessment context, or **external task set** in general.

The use of an external passage set is particularly important to make the estimated WCPM scores to be comparable between students who read different sets of passages, as well as within students for longitudinal data, where a student are likely to read different sets of passages.
The use of an external passage set is particularly important in the context of ORF assessment to make the estimated WCPM scores to be comparable between students who read different sets of passages, as well as within students for longitudinal data, where a student is likely to read different sets of passages.

```{r eval = F}
WCPM_sample_ext1 <- scoring(calib.data = MCEM_run,
Expand All @@ -166,7 +160,7 @@ WCPM_sample_ext1 <- scoring(calib.data = MCEM_run,
summary(WCPM_sample_ext1)
```

Fully Bayesian approach can also be used as an estimator in the `scoring()` function. Note that if `est="bayes"` is specified, there is no need to use the `se=` argument. By default, the standard deviations of the posterior distributions are reported as the standard errors, along with 95% high density intervals (i.e., analogous to confidence intervals). Here is an example of using the fully Bayesian approach for estimating the same WCPM scores from external passages used in the former example:
A fully Bayesian approach can also be used as an estimator in the `scoring()` function. Note that if `est="bayes"` is specified, there is no need to use the `se=` argument. By default, the standard deviations of the posterior distributions are reported as the standard errors, along with 95% high density intervals, which is analogous to 95% confidence intervals. Here is an example of using the fully Bayesian approach for estimating WCPM scores from the same external passages used in the previous example.

```{r eval = F}
WCPM_sample_ext1_bayes <- scoring(calib.data=MCEM_run,
Expand All @@ -180,15 +174,16 @@ WCPM_sample_ext1_bayes <- scoring(calib.data=MCEM_run,
time = "sec",
cases = sample.cases,
external = c("32004","32010","32015","32016","33003","33037"),
est = "bayes")
est = "bayes",
type = "orf")
summary(WCPM_sample_ext1_bayes)
```

### Estimating WCPM scores 2
### Scoring: Estimating Latent Factor Scores and WCPM Scores 2

Alternatively, we can run the `scoring()` in two steps.

**Step 1:** Prepare the data using the `prep()` function, where required data for the `scoring()` function are prepared, including changing variable names and a generation of the natural-logarithm of the time data.
**Step 1:** Prepare the data using the `prep()` function, where required data set for the `scoring()` function is prepared, including changing variable names and a generation of the natural-logarithm of the time data.

The output from the `prep()` function is a list of two components. The `data.long` component is a data frame, which is a long format of student response data, and the `data.wide` is list that contains four components, including a wide format of the data, as well as other information such as the number of passages and the number of words for each passage.

Expand Down Expand Up @@ -236,7 +231,13 @@ summary(WCPM_sample_ext2)
Please see the [package website](https://github.com/kamataak/bspam/) for more detailed usage of the package.

## Citation
Kara, Y., Kamata, A., Potgieter, C., & Nese, J. F. (2020). Estimating model-based oral reading fluency: A bayesian approach with a binomial-lognormal joint latent model. Educational and Psychological Measurement, 1–25.

Nese, J. F. T. & Kamata, A. (2014-2018). Measuring Oral Reading Fluency: Computerized Oral Reading Evaluation (Project No. R305A140203) [Grant]. Institute of Education Sciences, U.S. Department of Education. https://ies.ed.gov/funding/grantsearch/details.asp?ID=1492

Potgieter, N., Kamata, A., & Kara, Y. (2017). An EM algorithm for estimating an oral reading speed and accuracy model. Manuscript submitted for publication.

Qiao, X, Potgieter, N., & Kamata, A. (2023). Likelihood Estimation of Model-based Oral Reading Fluency. Manuscript submitted for publication.

## Copyright Statement
Copyright (C) 2022-2023 The ORF Project Team
Expand Down
Loading

0 comments on commit 3b38b34

Please sign in to comment.