# 3. Modeling Simple Decisions and Applications Using a Diffusion Model

* 싸이그래머 인지모델링 스터디 교재 [1]
* 김무성

# Contents

* Abstract
* Diffusion Models for Rapid Decisions
* Variants of the Standard Two-Choice Task
* Optimality
* Domains of Application 
* Situations in Which the Standard Model Fails
* Competing Two-Choice Models
* Conclusions    

# Abstract

* The <font color="red">diffusion model</font> is one of the major <font color="red">sequential-sampling models</font> for 
    - <font color="red">two-choice decision-making</font> and 
    - <font color="red">choice response time</font> in psychology. 
* The model conceives of decision-making 
    - <font color="red">as a process</font> 
    - in which <font color="red">noisy evidence is accumulated</font> 
    - until one of two response criteria is reached and 
    - the associated response is made. 
* The criteria represent 
    - the <font color="red">amount of evidence</font> needed to make each decision and 
    - reflect the decision maker’s 
        - <font color="red">response biases</font> and 
        - <font color="red">speed-accuracy trade-off</font> settings. 
* In this chapter we examine the application of the diffusion model in a variety of different settings. 
* We 
    - discuss the <font color="red">optimality of the model</font> and 
    - review its applications to a number of <font color="red">cognitive tasks</font>, including
        - <font color="red">perception</font>,
        - <font color="red">memory</font>, and 
        - <font color="red">language</font> tasks. 
* We also consider 
    - its applications to <font color="red">normal and special</font> populations, 
    - to the cognitive foundations of <font color="red">individual differences</font>, 
    - to <font color="red">value-based decisions</font>, and 
    - its role in understanding the <font color="red">neural basis of decision-making</font>.

# KeyWords

* diffusion model 
* sequential-sampling 
* drift rate, choice 
* decision time
* accuracy
* confidence 
* perceptual decision
* memory decision
* lexical decision

# Diffusion Models for Rapid Decisions

* EXPRESSION FOR ACCURACY AND RT DISTRIBUTIONS

* Over the last 30 or 40 years, there has been a steady development of models for <font color="red">simple decision making</font> that <font color="red">deal with both</font> the <font color="red">accuracy</font> of decisions and the <font color="red">time taken</font> to make them.
    - The models <font color="red">assume</font> that <font color="red">decisions</font> are made by <font color="red">accumulating noisy information</font> to <font color="red">decision criteria</font>, one criterion for each possible choice.
    - The models successfully <font color="red">account for the probability</font> that <font color="red">each choice</font> is made and the response time (RT) distributions for correct responses and errors.
    - The <font color="red">most frequent applications</font> of these models have been to <font color="red">tasks that require two-choice decisions that are made reasonably quickly, typically with mean RTs less than 1.0–2.0 s</font>
* <font color="red">An important feature of human decision-making</font> is that the processing system is <font color="red">very flexible</font> because humans can switch tasks, stimulus dimensions, and output modalities very quickly, from one trial to the next.
* The same decision mechanism might operate for all these tasks or the mechanism might be <font color="red">task</font> and <font color="red">modality specific</font>.
* For <font color="red">two-choice tasks</font>
    - the assumption usually made is that <font color="red">all decision-related</font> information, 
    - that is, all the <font color="red">information that comes from</font> a stimulus or memory, 
    - is <font color="red">collapsed</font> onto a single variable, called <font color="red">drift rate</font>, 
    - that characterizes the <font color="red">discriminative</font> or <font color="red">preference</font> information in the stimulus.
* In this chapter, we focus on one model of the class of <font color="red">sequential sampling models of evidence accumulation</font>, the <font color="red">diffusion model</font> (Ratcliff, 1978; Ratcliff & McKoon, 2008; Smith, 2000).
* A <font color="red">comparison of the diffusion model</font> with other sequential-sampling models, such as
    - the Poisson counter model (Townsend & Ashby, 1983), 
    - the Vickers accumulator model (Smith & Vickers, 1988; Vickers, 1970), and 
    - the leaky competing accumulator model (Usher & McClelland, 2001) can be found in Ratcliff and Smith (2004)

Fitting the model to data provides estimates of drift rates, decision boundaries, and a parameter representing the duration of nondecision processes.

The model’s ability to separate these components is one of its key contributions and places major constraints on its ability to explain data. 

Stimulus difficulty affects drift rate but not the criteria, and to a good approximation, speed-accuracy shifts are represented in the criteria, not drift rate.

If difficulty varies, changes in drift rate alone must accommodate all the changes in performance, namely accuracy and the changes in the spreads and locations of the correct and error RT distributions.

Likewise, changes in the criteria affect all the aspects of performance.

* In a perceptual task, drift rate depends on the quality of the perceptual information from a stim- ulus;
* in a memory task, it depends on the quality of the match between a test item and memory.
* In a brightness discrimination task, for example, if the accumulated evidence reaches the top boundary, a “bright” response is executed and a “dark” response would then correspond to the bottom boundary.

* Figure 3.1 shows an example, using a brightness discrimination task.
- The three paths in Figure 3.1 show three differ- ent outcomes, all with the same drift rate.
- Noise in the accumulation process produces errors when the accumulated evidence reaches the incorrect boundary and it produces variable RTs that form a distribution of RTs that has the shape of empirically obtained distributions.

* In the figure, one path leads to a fast correct decision, one to a slow correct decision, and one to an error. Most responses are reasonably fast, but there are slower ones that spread out the right-hand tails of the distributions (as in the distribution at the top of Figure 3.1).

* Figure 3.1 shows the accumulation-of-evidence process.

* Besides this, there are processes that encode stimuli, access memory, transform stimulus information into a decision-related variable that determines drift rate, and execute responses.

* These components of processing are combined into one “nondecision” component in the model, that has mean Ter. 

* The total processing time for a decision is the sum of the time taken by the decision process and the time taken by the nondecision component.

<img src="figures/fig3.1.png" width=600 />

* As drift rate changes from a large value to near zero, the mean of the RT distribution for both correct and error responses increases because the tail of the RT distribution spreads out. 

* Figure 3.2 shows simulated individual RTs from the model as a function of drift rate, which is assumed to vary from trial to trial.

* The shortest RTs change little with drift rate, and so a fast response says nothing about the difficulty of the trial.

* The probability of obtaining a slow response from a high drift rate is very small (e.g., Figure 3.2) and so conditions with the slowest responses come from lower drift rates (see Ratcliff, Philiastides, & Sajda, 2009).

<img src="figures/fig3.2.png" width=600 />

* The boundaries of the decision process can be manipulated by instructions (“respond as quickly as possible” or “respond as accurately as possi- ble”), differential rewards for the two choices, and the relative frequencies with which the two stimuli are presented in the experiment.

* Changes in instructions, rewards, or biases affect both RTs and accuracy but in the model, to a good approximation, the effects on RTs and accuracy are due to shifts in boundary settings alone, not drift rates or nondecision time.

* In fact, the patterns of the relative speed of correct versus error responses are as follows: with accuracy instructions and/or difficult tasks, errors are slower than correct responses, and with speed instructions and/or easy tasks, errors are faster than correct responses (Luce, 1986).

<img src="figures/fig3.3.png" width=600 />

* In the diffusion model, the observed patterns of correct versus error RTs fall out naturally because there is trial-to-trial variability in drift rate and starting point (e.g., Ratcliff, 1981).

* Figure 3.4 illustrates how this mixing works with just two drift rates or two starting points instead of their full distributions.

* In Figure 3.4 left panel, the v1 drift rate produces high accuracy and fast responses, the v2 one lower accuracy and slow responses.

* The mixture of these produces errors slower than correct responses because 5% of the 400 ms process averaged with 20% of the 600 ms process gives a weighted mean of 560 ms, which is slower than the weighted mean for correct responses (491 ms). 

* In Figure 3.4, right panel, the distributions to the left are for processes that start near the correct boundary (the dotted arrow shows the distance the process has to go to make an error—the larger the distance, the slower the response) and the distributions to the right are for processes that start further away from the correct boundary.

* In practice, drift rate is assumed to be normally distributed from trial to trial and the starting point is uniformly distributed, but these specific functional forms are not critical (Ratcliff, 2013).

<img src="figures/fig3.4.png" width=600 />

* For one set, the shapes and locations of the RT distributions were changed as a function of task difficulty, and for the other, the shapes and locations were changed as a function of speed versus accuracy instructions

## EXPRESSION FOR ACCURACY AND RT DISTRIBUTIONS

* For a two-boundary diffusion process with no across-trial variability in any of the parameters, the equation for accuracy, the proportion of responses terminating at the boundary at zero, is given by

<img src="figures/eq3.1.png" width=600 />

and the cumulative distribution of finishing times at the same boundary is given by

<img src="figures/eq3.2.png" width=600 />

* where a is boundary separation (the top boundary is at a, the bottom boundary is at 0 and the distribution of finishing times is the distribution at the bottom boundary),

* z is the starting point,

* v is drift rate,

* and s is the SD in the normal distribution of within-trial variability (square root of the diffusion coefficient).

* Because Equation 2 contains an infinite sum, values of the RT density function need to be computed numerically. 

* The series needs to be summed until it converges; this means that terms have to be added until subsequent terms become so small that they does not affect the total. This is complicated by the sine term, which can allow one value in the sum to be small, whereas the next one is not small. To deal with this practically, it is necessary to require that two or three successive terms are very small.

* The predictions from the model are obtained by integrating the results from Equations 1 and 2 over the distributions of the model’s across-trial variability parameters using numerical integration. In the standard model, drift rate is normally distributed across trials with SD η, the starting point is uniformly distributed with range sz, and nondecision time is uniformly distributed with range st .

* The predicted values are “exact” numerical predictions in the sense that they can be made as accurate as necessary (e.g., 0.1 ms or better) by using more and more steps in the infinite sum and more and more steps in the numerical integrations (packages that perform fitting are mentioned later).

* Alternative computational methods for obtain- ing predictions for diffusion models have been described by Smith (2000) and Diederich and Busemeyer (2003).

* Smith (1995) and Smith and Ratcliff (2009) have proposed models

* Diederich and Busemeyer (2003) proposed a matrix method for obtaining predictions for diffu- sion models. 

* In some situations, it is important to generate predictions by simulation because simulated data can show the effects of all the sources of variability on a subject’s RTs and accuracy.

* The number of simulated observations can be increased sufficiently that the data approach the predictions that would be determined exactly from the numerical method.

* The expression for the update of evidence, 􏰀delta-x, on every time step delta-􏰀t during the decision process, is determined by the drift rate, v, plus a noise term (Gaussian random variable, εi with SD σ) to represent variability in processing:

<img src="figures/eq3.3.png" width=600 />

* This equation provides the most straightforward method of simulating the diffusion process, but it is not the most efficient.

* In fitting the diffusion model to data, accuracy and RT distributions for correct and error responses for all the conditions of the experiment must be simultaneously fit and the values of all of the components of processing estimated simulta- neously. 

* In any data set, there is the potential problem of outlier RTs, which could be fast (e.g., fast guesses) or slow (e.g., inattention).

* New methods for fitting the diffusion model have been developed recently and, over the last 6 or 7 years, fitting packages have been made available by Vandekerckhove and Tuerlinckx (2007) and Voss and Voss (2007).

* Also, Bayesian methods have been developed (Vandekerckhove, Tuerlinckx, & Lee, 2011) and a Bayesian package by Wiecki, Sofer & Frank (2013) has been made available. These Bayesian methods also implement hierarchical modeling schemes, in which model parameters for individual subjects are assumed to be random samples from population distributions that are specified within the model. 

* To show how well the diffusion model fits data, we plot RT quantiles against the proportions for which the two responses are made. The top panel of Figure 3.5 shows a histogram for an RT distribution.

* These quantiles can be used to construct a quantile-probability plot by plotting the 0.1–0.9 quantile RTs vertically,

* Example RT distributions constructed from the equal area rectangles are also shown in grey. When there is a bias in starting point or when the two response categories are not symmetric (as in lexical decision and memory experiments), two quantile probability are needed, one for each response category.

* With quantile probability plots, changes in RT distribution locations and spread as a function of response proportion can be seen easily and compared with model fits

* In the bottom panel of Figure 3.5, the 1–5 symbols are the data and the solid lines are the predictions from fits of the model to the data (with circles denoting the exact location of the predictions).

<img src="figures/fig3.5-1.png" width=600 />

<img src="figures/fig3.5-2.png" width=600 />

# Variants of the Standard Two-Choice Task

* RESPONSE SIGNAL AND DEADLINE TASKS
* MEYER, IRWIN, OSMAN, & KOUNIOS, (1988) PARTIAL INFORMATION PARADIGM
* TIME-VARYING PROCESSING
* GO/NOGO TASK

The model has also been successfully applied to paradigms in which decision time is manipulated. Here we discuss three of these.

## RESPONSE SIGNAL AND DEADLINE TASKS

* For response signal and deadline tasks, a signal is presented after the stimulus and a subject is required to respond as quickly as possible (in, say, 200–300 ms). For a deadline paradigm, the time between the stimulus and the signal is fixed across trials. For a response signal paradigm, the time varies from trial to trial (Reed, 1973; Schouten & Bekker, 1967; Wickelgren, 1977; Wickelgren, Corbett, & Dosher, 1980). With the deadline paradigm, subjects can adopt different strategies or criteria for different deadlines. 

* With the deadline paradigm, subjects can adopt different strategies or criteria for different deadlines

* To apply the diffusion model to response signal data, Ratcliff (1988, 2006) assumed that there are response criteria just as for the standard two- choice task, and at some signal lag, responses come from a mixture of processes, those that have terminated at one or the other of the boundaries and those that have not. This is in accord with subjects’ intuitions that, at the long lags, the decision has already been made, the response has been chosen, and the subject is simply waiting for the signal. As the time between stimulus and signal decreases, a larger and larger proportion of processes will have failed to terminate.

* Differences among experimental conditions of different diffi- culties appear as differences in the proportions of accumulated information at the different lags. At the longest lags (2 or more seconds), all or almost all processes will have terminated. For nonterminated processes, there are two possibilities: that decisions are made on the basis of the partial information that has already been accumulated (Figure 3.6 top panel) or that they are simply guesses (Figure 3.6 middle panel). 

<img src="figures/fig3.6-1.png" width=600 />

<img src="figures/fig3.6-2.png" width=600 />

<img src="figures/fig3.6-3.png" width=600 />

* Ratcliff (2006) tested between these possibilities with a numerosity discrimination experiment (subjects decide whether the number of asterisks displayed on a PC monitor is greater than or less than 50). The same subjects participated in the response signal task and the standard task and examples of the response signal data and model fits are shown in Figure 3.7. 

<img src="figures/fig3.7.png" width=600 />

* When the model was fit to the two sets of data simultaneously, it fit well and it fit equally well for the two possibilities for nonterminated processes. 

## MEYER, IRWIN, OSMAN, & KOUNIOS, (1988) PARTIAL INFORMATION PARADIGM

* Meyer et al. developed a method based on a race model that decomposes accuracy on the signal trials (at each signal lag) into a component from fast finishing regular trials and a component based on partial information.

* Ratcliff (1988) examined the predictions of the diffusion model with the assumption that decisions on signal trials were a mixture of processes that terminated at a boundary and processes based on position in the decision process, that is, partial information. T

* Therefore, if a process was above the starting point (i.e., the black area in the vertical distribution in the top panel of Figure 3.6), the decision corresponded to the choice at the upper boundary.

* Figure 3.6 bottom panel shows a heat map of the evolution of simulated diffusion processes. 

* The hotter the color (whiter), the more processes in that region. 

* As time goes by, the color becomes cooler because there are fewer and fewer processes that have not terminated.

* This produces an almost stationary distribution (the distribution to the right of the heat map), which gradually collapses over time (the two vertical distributions in the top panel of Figure 3.6).

* For the case in which partial information is used in the decision, the expression for the distribution of the positions x of decision processes at time t is given by:

<img src="figures/eq3.4.png" width=600 />

* where s2 is the diffusion coefficient, z is the starting point, a is the separation between the boundaries, and v is the drift rate.

* For model fitting, the expression in Equation 4 must be integrated over the normal distribution of drift rates and the uniform distribution of starting points to include variability in drift rate and starting point across trials. 

* This can be accomplished with numerical integration using Gaussian quadrature.

* The series in Equation 4 must be summed until it converges; 

* Then, to obtain the probability of choosing each response alternative, the proportion of processes between 0 and a/2 (for the negative alternative) and between a/2 and a (for the positive alternative) is calculated by integrating the expression for the density over position.

## TIME-VARYING PROCESSING

* Ratcliff (1980) examined two cases in which drift rate changes across the time course of processing. For one, drift rate changes discretely at one fixed time. 

* For another case, boundaries are removed completely and drift rate and the drift coefficient varied continuously over time.

## GO/NOGO TASK

* In the go/no go task, subjects are told to respond for one of the two choices but to make no response for the other choice. 

* Gomez, Ratcliff, and Perea (2007) proposed that there are two response boundaries for the go/no go task just as for the standard task, but subjects made a response only when accumulated evidence reaches the “go” boundary. 

* Gomez et al. successfully fit the model simultaneously to data from the standard task and data from the go/no go task. 

# Optimality

* In animal studies, performance has been de- scribed in terms of how close it comes to maxi- mizing reward rate. 

* This is part of a larger theme in neuroscience, which reprises the classical signal detection and sequential-sampling literatures, in which reward rate is used as a criterion for un- derstanding whether neural computations approach optimality.

* However, when this kind of optimality is translated to human studies, the a priori reasonableness comes into question.

* This is because humans do not aim to get the most correct per unit time. Instead, they aim to get the most correct in the available time.

* If a student takes a 2-hour exam and obtains 60% correct in 1 hour, but another student gets 80% correct in 2 hours, the first has more correct per unit time, but the second would be more likely to pass the course.

* Bogacz, Brown, Moehlis, Holmes, & Cohen et al. (2006) performed extensive analyses of optimality and set the stage for analyses of data. They showed that optimality as defined by <font color="red">reward rate</font> can be adjusted by <font color="red">changing boundary settings</font>. If the <font color="blue">boundaries are too far apart</font>, subjects are accurate, but slow and so there are few correct per unit of time. <font color="blue">If boundaries are too narrow</font>, RT is short but accuracy is low and there are few correct responses per unit of time. 

* Reward-rate optimality predicts that when difficulty increases, subjects should speed up and sacrifice accuracy.per unit time. Results showed subjects did the opposite, slowing down with increases in difficulty. This is the result we might expect from years of academic training to spend more time on difficult problems.

* Starns and Ratcliff (2010) analyzed several published data sets with young and older adults and found that young adults with accuracy feedback sometimes approached reward-rate optimality. But older adults rarely moved more than a few percent away from asymptotic accuracy. 

# Domains of Application

* Perceptual Tasks
* Recognition Memory
* Lexical Decision
* Semantic and Recognition Priming Effects
* Value-Based Judgments
* Aging
* Individual Differences
* Child Development
* Clinical Applications
* Manipulations of Homeostatic State

Here we describe a number of applications, some of which provide new insights into <font color="red">processing, individual differences</font> and <font color="red">differences among subject groups</font> are obtained

* But in other cases, even when the obvious results are obtained the model integrates the three dependent variables, namely, accuracy and correct and error RT distributions, into a common theoretical frame- work that provides explanations of data that many hypothesis-testing approaches do not. Hypothesis- testing approaches usually select only accuracy or only mean RT as the dependent variable.

* In some cases, the two variables tell the same empirical story, but in other cases, they are inconsistent. The model based approach helps to resolve such inconsistencies.

## Perceptual Tasks

* Recently diffusion models have been applied to psychophysical discrimination tasks in which stimuli are presented very briefly, often at low levels of contrast, sometimes with backward masks to limit iconic persistence. The focus has been to understand the perceptual processes involved in the computation of drift rates.

* Psychophysical paradigms have historically been used mainly with threshold or accuracy measures but recent studies have collected accuracy and RT data.

* The standard application of the model assumes that, at some point in time after stimulus encoding, the decision process turns on, and evidence is accumulated toward a decision.

* This time is assumed to be the same across conditions and drift rate is assumed to be at a constant values from the point the process turns on.

* The assumption of a constant drift rate could be relaxed:

* Smith and Ratcliff (2009) developed a model, the integrated system model, that is a <font color="red">continuous-flow model</font> comprised of <font color="blue">perceptual, memory, and decision processes operating</font> in cascade.

* The perceptual encoding processes are linear filters (Watson, 1986) and the transient outputs of the filters are encoded in a durable form in visual short- term memory (VSTM), which is under the control of spatial attention.

* The strength of the VSTM trace determines the drift rate for the diffusion process and the moment-to-moment variations in trace strength act as a source of noise in the decision process.

* The model has successfully accounted for accuracy and RT distributions in tasks with brief backward-masked stimuli.

* The main area of application of the integrated system model has been to tasks in which spatial attention is manipulated by spatial cues.

* In many cuing tasks, in which a single well-localized stimulus is presented in an otherwise empty display, atten- tion shortens RT but increases accuracy only when stimuli are masked (Smith, Ratcliff, & Wolfgang, 2004; Smith, Ellis, Sewell, & Wolfgang, 2010).

* The model assumes that attention increases the efficiency with which perceptual information is transferred to VSTM and that masks interrupt the process of VSTM trace formation before it is complete. 

* Diederich and Busemeyer (2006) also considered the effects of attention on decision-making in a diffusion-process framework, studying decisions about multi-attribute stimuli for which it is plau- sible that people shift their attention sequentially from one attribute of a stimulus to the next.

* They assumed that some attributes would provide more information than others and modeled this successfully as a sequence of step changes in drift rate during the course of a trial.

## Recognition Memory

* One of the early applications of the diffusion model was to recognition memory. 

* In global memory models, a test item is matched against all <font color="red">memory in parallel</font>, and the <font color="red">output is a single value of strength or familiarity</font>. 

* From this point of view, the diffusion model provides a meeting point between the decision process and memory, specifically, the <font color="blue">drift rate</font> for a test item <font color="blue">represents the degree of match</font> between a test item and <font color="red">memory</font>.

* In signal detection approaches to recognition memory, there has been considerable interest in the <font color="red">relative standard deviations (SDs) in strength between old and new test items</font>, typically measured by confidence judgement paradigms.

* The common finding is that z-ROC functions (i.e., z-score transformed receiver operating characteristics) are approximately linear with a slope less than 1 (e.g., Ratcliff, Sheu, & Gronlund, 1992).

* <font color="red">One is a single-process model</font> that assumes the SD of memory strength is normally distributed, but the SD for old items is larger than that for new items. 

* The other is a <font color="red">dual-process model</font> in which the familiarity of old and new items comes from normal distributions with <font color="red">equal SDs</font> but there is an additional <font color="red">recollection process</font>

* In fits of the diffusion model to recognition memory data, it has been usually assumed that the SD in drift rate across trials is the same for studied and new items. 

* Starns and Ratcliff (2014) performed an analysis of existing data sets that <font color="red">allowed the across-trial variability in drift rate</font> to be different for studied and new items. 

* The advantage of this analysis is that the relative variability of studied and new items was able to be determined from two-choice data and did not require confidence judgments.

## Lexical Decision

* Much like recognition memory, a test item for lexical decision is matched against memory. 

* The output is a value of how “wordlike” the item is. For <font color="red">sequential sampling models</font>, proposals about <font color="blue">how lexical items are accessed in memory</font> must provide output values that, when mapped through a sequential sampling model, produce RTs and accuracy that fit data (Ratcliff, Gomez, & McKoon, 2004). 

* Often, <font color="red">lexical decision response time (RT)</font> has been <font color="blue">interpreted as a direct measure of the speed</font> with <font color="blue">which a word can be accessed in the lexicon</font>.

* For example, some researchers have argued that the well-known effect of <font color="red">word frequency</font>—shorter RTs for higher frequency words—demonstrates the greater accessibility of high frequency words (e.g., their order in a serial search, Forster, 1976; the resting levels of activation in units representing the words in a parallel processing system, Morton, 1969). However, other researchers have argued, as we do here, against a direct mapping from RT to accessibility. 

* For example, Balota and Chumbley (1984) suggested that the effect of word frequency might be a <font color="red">by-product of the nature of the task itself</font>, and <font color="blue">not a manifestation of accessibility</font>.

## Semantic and Recognition Priming Effects

* For <font color="red">semantic priming</font>, the task is usually a lexical decision. <font color="blue">A target word</font> is <font color="blue">immediately preceded</font> in a test list either by a word related to it (e.g., <font color="blue">cat dog</font>) or some other word (e.g. table dog). 

* For recognition priming, the task is old/new recognition and a target word is immediately preceded by a word that was studied near to it in the list of items to be remembered or far from it.

* <font color="red">In the diffusion model</font>, the <font color="blue">simplest assumption about priming effects</font> is that they result from <font color="blue">higher drift rates for primed than unprimed items</font>.

* It has been hypothesized that the <font color="red">difference in drift rates between primed and unprimed items</font> arises from the familiarity of compound cues to memory

* The compound cue for an item is a multiplicative combination of the familiarity of the target word and the familiarity of the prime

* If the prime and target words are related in memory, the combination produces a higher value of the joint familiarity than if they were not related.

* McKoon and Ratcliff (2012) compared priming in word recognition to associative recognition.

* Subjects studied pairs of words and then per- formed either a single-word recognition task or an associative recognition task (see also Ratcliff, Thapar, & McKoon, 2011).

* Data from the two tasks were fit with the <font color="red">diffusion model</font> and the <font color="blue">results showed parallel behavior</font>: the drift rates for associative recognition and those for priming were parallel across ages and IQ, indicating that they are based, at least to some degree, on the same information in memory.

## Value-Based Judgments

* Busemeyer and Townsend (1993) developed a diffusion model called decision field theory to ex- plain choices and decision times for decisions under uncertainty, and later Roe, Busemeyer, Townsend (2001) extended it to multi-alternative and multi- attribute situations.

* According to the theory, at each moment in time, options are compared in terms of advantages/disadvantages with respect to an attribute, these evaluations are accumulated across time until a threshold is reached, and the first option to cross the threshold determines the choice that is made.

* The theory accounts for a number of findings that seem paradoxical from the perspective of rational choice theory. 

* Milosavljevic, Malmaud, Huth, Koch, & Rangel (2010) examined several variants of diffusion mod- els for value-based judgments. 

* They examined value-based judgments for food items and had subjects choose which of two alternatives they preferred. They monitored eye fixations and in modeling, they assumed evidence was accumulated at a higher rate for the alternative fixated.

* Philiastides and Ratcliff (2013) examined value- based judgments of consumer choices with brand names presented on some trials as well as the items for which the choices were made.

* Application of the diffusion model showed that the effect of the brand was to alter drift rate but none of the other parameters of the model. This means that the value of the stimulus and brand name were processed as a whole.

* Currently, there is a growing interest in the application of diffusion models to decision-making in marketing and economics, including neuroe- conomics. 

## Aging

* The application of the diffusion model to studies of aging has been especially successful, producing a different view of the effects of aging on cognition than has been usual in aging research. 

* What they found is that older adults had slower nondecision times and set their boundaries wider, but their drift rates were not lower than those of young adults.

## Individual Differences

* The diffusion model has been used to examine individual differences. To do so requires that the SDs in model parameters from estimation variabil- ity are smaller than the SDs between subjects. 

* In the aging studies described earlier, with about 45 minutes of data collection, individual differences in drift rates, boundary settings, and nondecision time were three to five times larger than the SDs of the model parameters. 

* They found that drift rates in the diffusion model mapped onto working memory, speed of processing, and reasoning ability measures (each of these was measured by aggregated performance on several tasks).

* In aging studies by Ratcliff et al. (2010, 2011), IQs ranged from about 80 to about 140.

* In most real-life situations, we rarely encounter more than single decisions on a particular stimulus class (except perhaps at Las Vegas or in psychology experiments).

* This means that there is little chance of adjusting decision criteria in real life because there is little extended experience with a task in which the decision maker can extract statistics from a long sequence of trials in which the structure of the trials does not change.

* The diffusion model assumes that a decision maker uses this decision mechanism across many tasks, and so we would expect to see correlations in boundary separation across tasks. 

## Child Development

* A natural extension from the aging studies is to test children on similar tasks to those performed with older adults to trace the course of develop- ment within the model framework. 

* Ratcliff, Love, Thompson, and Opfer (2012) tested several groups of children on a numerosity discrimination task and a lexical decision task.

* In other laboratories, drift rates have been found to be lower for ADHD and dyslexic children relative to normal controls (ADHD, Mulder et al., 2010; dyslexia, Zeguers et al., 2011).

* These studies show that the diffusion model can be applied to data collected from children, a domain in which there has been relatively little research with decision models.

## Clinical Applications

* In research on psychopathology and clinical pop- ulations, two-choice tasks are commonly used to investigate processing differences between patients and healthy controls. 

* Depressive symptoms are more closely linked with abnormal emotional processing with a negative emotional bias in clinical depression, even-handedness (i.e., no emotional bias) in dysphoria, and a positive emotional bias in nondepressed individuals.

* However, item recogni- tion and lexical decision tasks often fail to produce significant results.

* They found positive emotional bias in nondysphoric subjects and even-handedness in dysphoric subjects in drift rates.

* One limitation of these studies and similar ones is that there may be relatively few materials with the right kinds of properties or structures (also in language processing experiments for example).

* The emotional word pools for the experiments only contained 30 words each.

* This left relatively few observations (especially for errors) to use in fitting the diffusion model, which would result in unreliable parameter estimates.

* The results showed a bias for positive emotional words in the nondysphoric participants, but not in the dysphoric participants (White et al., 2009).

* This difference in emotional bias was not significant when the diffusion model was fit only to the emotional conditions with few observations, nor was it significant in comparisons of mean RT or accuracy.

* Another study examined the effects of aphasia in a lexical decision task.

* In diffusion model analyses, decision and nondecision processes were compromised, but the quality of the infor- mation upon which the decisions were based did not differ much from that of unimpaired subjects (Ratcliff, Perea, Colangelo, & Buchanan, 2004).

## Manipulations of Homeostatic State

* Ratcliff and Van Dongen (2009) looked at effects of sleep deprivation with a numerosity discrimination task, van Ravenzwaaij, Dutilh, and Wagenmakers (2012) looked at the effects of alcohol consumption with a lexical decision task, and Geddes et al. (2010) looked at the effects of reduced blood sugar with a numerosity discrimination task.

* Applying the model to all of these studies, the main effect was a reduced drift rate but with either small or no effect on boundary separation and nondecision time.

* These results show that the diffusion model is useful in providing interpretations of group differences among different subject populations.

* This means that this modeling approach, when paired with the right tasks, may have a useful role to play in neuropsychological assessment.

# Situations in Which the Standard Model Fails

* There are several cases in which the standard diffusion model fails to account for experimental data.

* These fall into two classes: one involves dynamic noise and categorical stimuli and the other involves conflict paradigms. 

# Competing Two-Choice Models

* Multichoice Decision-Making and Confidence Judgments
* One-Choice Decisions
* Neuroscience
    - MONKY NEUROPHYSIOLOGY
* Human Neuroscience
    - EEG SUPPORT FOR ACROSS-TRIAL VARIABILITY IN DRIFT RATE
    - EEG SUPPORT FOT ACROSS-TRIAL VARIABILITY IN STARTING POINT
    - STRUCTURAL MRI
    - FMRI    

<img src="figures/fig3.8.png" width=600 />

<img src="figures/eq3.5.png" width=600 />

## Multichoice Decision-Making and Confidence Judgments

* Recently, interest in the neuroscience domain in multichoice decision-making tasks has devel- oped for visual search (Basso & Wurtz, 1998; Purcell et al., 2010) and motion discrimination (Niwa & Ditterich, 2008; Ditterich, 2010).

* In psychology, there have been investigations using generalizations of standard two-choice tasks (Leite & Ratcliff, 2010) and in absolute identification (Brown, Marley, Donkin, & Heathcote, 2008).

* It is clear that there is no simple way to extend the two-choice model to tasks with three or more choices. But models with racing accumulators can be extended. Some models with racing accumula- tors become standard diffusion models when the number of choices is reduced to two.

* Research on multichoice decision making, including confidence judgments, is a growing industry but the constraints provided by RT distri- butions and response proportions for the different choices makes the modeling quite challenging.

## One-Choice Decisions

* Relatively little work has been done recently on one-choice decisions. In these, there is only one key to press when a stimulus is detected.

## Neuroscience

* One of the major advances in understanding decision making is in neuroscience applications using single cell recording in monkeys (and rats), human neuroscience including fMRI, EEG, and MEG

* All these domains have had interactions between diffusion model theory and neuroscience measures. 

### MONKY NEUROPHYSIOLOGY

* In both psychology and neuroscience, theories of decision processes have been developed that assume that evidence is gradually accumulated over time. 

* In these studies, cells in the lateral intraparietal cortex (LIP), frontal eye field (FEF), and the superior colliculus (SC) exhibit behavior that corresponds to a gradual buildup in activity that matches the buildup in evidence in making simple perceptual decisions.

<img src="figures/fig3.9.png" width=600 />

<img src="figures/fig3.10.png" width=600 />

## Human Neuroscience

### EEG SUPPORT FOR ACROSS-TRIAL VARIABILITY IN DRIFT RATE

### EEG SUPPORT FOT ACROSS-TRIAL VARIABILITY IN STARTING POINT

### STRUCTURAL MRI

### FMRI

# Conclusions

# 참고자료

* [1] The Oxford Handbook of Computational and Mathematical Psychology - http://www.amazon.com/Handbook-Computational-Mathematical-Psychology-Library/dp/0199957991
* [2] The Ratcliff diffusion model : Extensions and applications -  http://www.powershow.com/view/f5ee8-MWI3Y/The_Ratcliff_diffusion_model_powerpoint_ppt_presentation