Leinenger_2018_JEP:LMC
Mallorie Leinenger June 28, 2018
Leinenger M. (in press) Survival analyses reveal how early phonological processing affects eye movements during reading. Journal of Experimental Psychology: Learning, Memory, and Cognition
Overview
These data come from four experiments investigating the time course of phonological coding during reading. Participants read single-sentence stimuli (see example stimuli below) while their eye movement behavior was recorded. Sentences contained either a correct target word that was moderately predictable in context (e.g., beach, sleep), a contextually inappropriate (non-)word that shared phonology with the correct target word (e.g., beech, sleap), or a contextually inappropriate (non-)word that was match on orthographic overlap, but did not share phonology (e.g., bench, slerp).
- The surfers traveled to the world-famous beach/beech/bench where the waves were very large. (Exps. 1 & 3)
- Because of his insomnia, Caleb couldn’t sleep/sleap/slerp even though he was tired. (Exps. 2 & 4)
In Experiments 1 & 2 participants directly fixated each type of target word, and in Experiments 3 & 4 the phonologically related and orthographic control (non-)words were presented as parafoveal previews using the invisible boundary display-change paradigm (Rayner, 1975). In addition to means analyses, survival analyses were conducted (following the method outlined in Reingold & Sheridan, 2014) to determine the earliest observable influence of phonological coding on the eye movement record.
Abstract
Numerous studies have provided evidence that readers generate phonological codes while reading. However, a central question in much of this research has been how early these codes are generated. Answering this question has implications for the roles that phonological coding might play for skilled readers, especially whether phonological codes affect the identification of most words, which can only be the case if these codes are generated rapidly. To investigate the time course of phonological coding during silent reading, the present series of experiments examined survival analyses of first-fixation durations on phonologically related (homophones, pseudohomophones) and orthographic control (orthographically matched words and non-words) stimuli that were either embedded in sentences in place of correct targets (Experiments 1 and 2) or presented as parafoveal previews for correct targets using the boundary paradigm (Experiments 3 and 4). Survival analyses revealed a discernible difference between processing the phonologically related versus the orthographic control items by as early as 160 ms from the start of fixation on average (160–173 ms across experiments). Because only approximately 18% of first fixation durations were shorter than these mean estimates and follow-up tests revealed that earlier divergence point estimates were associated with shorter gaze durations (e.g., more rapid word identification), results suggest that skilled readers rapidly generate phonological codes during normal, silent reading and that these codes may affect the identification of most words.
Experiment 1
Data for 48 participants (subject) reading 168 sentences (item) that either contained the correct target word, a homophone, or an orthographic control word.
- The surfers traveled to the world-famous beach/beech/bench where the waves were very large.
ffd = first fixation duration, sfd = single fixation duration, gzd = gaze duration, gpt = go-past time, tvt = total time, skp = fixation probability (inverse of skipping), rgi = regression-in probability, rgo = regression-out probability.
## 'data.frame': 7698 obs. of 12 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ item : Factor w/ 168 levels "1","2","3","4",..: 1 2 3 4 5 7 8 9 10 11 ...
## $ condition: Factor w/ 3 levels "1","2","3": 1 2 3 1 2 1 2 3 1 2 ...
## $ ffd : int 265 158 NA NA 198 NA 230 NA NA NA ...
## $ sfd : int 265 158 NA NA 198 NA 230 NA NA NA ...
## $ gzd : int 265 158 NA NA 198 NA 230 NA NA NA ...
## $ tvt : int 265 158 NA 173 198 NA 230 NA NA NA ...
## $ gpt : int 265 158 NA NA 198 NA 230 NA NA NA ...
## $ skp : int 1 1 0 0 1 0 1 0 0 0 ...
## $ rgi : int 0 0 0 1 0 0 0 0 0 0 ...
## $ rgo : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Condition: Factor w/ 3 levels "Control","Homophone",..: 3 2 1 3 2 3 2 1 3 2 ...
## subject item condition ffd sfd gzd tvt gpt skp rgi rgo Condition
## 1 1 1 1 265 265 265 265 265 1 0 0 Target
## 2 1 2 2 158 158 158 158 158 1 0 0 Homophone
## 3 1 3 3 NA NA NA NA NA 0 0 0 Control
## 4 1 4 1 NA NA NA 173 NA 0 1 0 Target
## 5 1 5 2 198 198 198 198 198 1 0 0 Homophone
## 6 1 7 1 NA NA NA NA NA 0 0 0 Target
Means & Standard Errors for Exp. 1
byCond <- group_by(data, Condition)
stats.m<-summarize_if(byCond,is.numeric,funs(mean), na.rm=TRUE)
stats.se<-summarize_if(byCond,is.numeric,funs(std.error), na.rm=TRUE)
stats.m
## # A tibble: 3 x 9
## Condition ffd sfd gzd tvt gpt skp rgi rgo
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Control 245. 249. 276. 436. 361. 0.671 0.361 0.119
## 2 Homophone 241. 244. 266. 372. 325. 0.668 0.264 0.0998
## 3 Target 225. 226. 239. 272. 280. 0.662 0.121 0.0736
stats.se
## # A tibble: 3 x 9
## Condition ffd sfd gzd tvt gpt skp rgi rgo
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Control 2.27 2.53 3.17 6.56 6.59 0.00929 0.00949 0.00639
## 2 Homophone 2.13 2.36 2.84 5.64 5.21 0.00932 0.00872 0.00593
## 3 Target 1.86 1.94 2.23 3.53 4.19 0.00931 0.00643 0.00514
Figure of early measures for Exp. 1
Because it's easier to visualize on a figure than in a table, here are the earliest measures where you can see that readers spent the least amount of time fixating (i.e., processing) the correct target, the most amount of time fixating the orthographic control word, and an intermediate amount of time fixating the homophone--demonstrating an advantage for processing a phonologically related word over a word that shares the same number of letters (i.e., is physically similar).
Survival Analyses for Exp. 1
Although this graded processing suggests that readers activate phonological codes which help them more easily process words which are phonologically related to the expected/contextually appropriate target, just how rapidly these codes come online and begin to influence behavior is not clear. In order to better characterized the time course of code generation, survival analyses were conducted to determine the earliest observable influence of phonology on behavior.
In these analyses, the rate of "surviving" fixations is compared across two condition (here the phonologically related and orthographic control condition) to determine the point at which the survival curves diverge (e.g., the point in time at which more readers have left the phonologically related word than the orthographic control word) which is indicative of easier processing in one condition relative to the other. The divergence point then reflects the earliest observable influence of phonological coding on behavior.
For the paper, I used Matlab to calculate divergence point estimates, however here I am using the new RTsurvival package
tmp<-select(data,subject,ffd,condition) %>% arrange(condition) %>%
rename(duration=ffd)
tmp$condition <- as.numeric(tmp$condition)
survdata<- as.data.frame(tmp %>% as_tibble() %>% mutate(condition = condition-1)) %>%
filter(condition != 0, !is.na(duration))
survdata$condition <- as.factor(survdata$condition)
str(survdata)
## 'data.frame': 3425 obs. of 3 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ duration : int 158 198 230 198 304 484 260 173 244 390 ...
## $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
We have 48 participants who each have between 35 and 100 data points:
n.per.sbj <- table(survdata$subject)
length(n.per.sbj)
## [1] 48
range(n.per.sbj)
## [1] 35 100
We can now use these data to generate divergence point estimates (DPE) for each participant:
ip.dpa <- DPA.ip(survdata$subject, survdata$duration, survdata$condition, quiet = TRUE)
dpe <- as.data.frame(ip.dpa$dp_matrix)
# critical columns in output
# 'dpcount' = the number of iterations (out of 1000) on which a DPE was obtained
#' median_dp_duration' = median of the DPEs obtained on each iteration
str(dpe)
## 'data.frame': 48 obs. of 6 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ dpcount : num 1000 1000 743 939 802 1000 816 1000 1000 1000 ...
## $ median_dp_point: num 45 1 621 209 1 109 998 1 1 1 ...
## $ median_duration: num 162 136 202 196 142 ...
## $ ci.lower : num 161.5 136.5 92.5 179 141.5 ...
## $ ci.upper : num 162 194 204 240 249 ...
dpe$subject[dpe$dpcount<500]
## [1] 19 20 21 25 27 30 38
## 48 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ... 48
Doing so reveals that the DPE for 7 participants were unreliable (i.e., a DP was found on fewer than half of the iterations). Removing those participants reveals a mean DPE of ~173 across the remaining participants (the value moves around ever so slightly each time the bootstrap re-sampling procedure runs). This tells us that on average, phonological coding was influencing behavior by as early as 173 ms after fixation on the target word began.
dpe.rel <- filter(dpe, dpcount >= 500)
summarize(dpe.rel, mean.dpe = mean(median_duration, na.rm=TRUE))
## mean.dpe
## 1 172.7073
Finally, we can represent this visually by examining the survival curves created using the ggsurv function. Note--the values displayed on this figure are from the published version of the manuscript and might vary by ~ 1ms from those generated here.
data$survdat<-as.integer(ifelse(data$Condition=="Target",NA,data$ffd))
tmp2 <- filter(data, !subject %in% c(19, 20, 21, 25, 27, 30, 38))
ffd.surv <- survfit(Surv(survdat) ~ Condition, data=tmp2)
pl2<-ggsurv(s=ffd.surv)
pl2 + geom_vline(xintercept = 172.61, linetype = "dotted") +
annotate("rect", xmin=152.4, xmax=192.9, ymin=0, ymax=1, alpha = .2) +
annotate("text", x = 450, y = 0.75, label = "Divergence Point = 173ms", size = 5) +
annotate("text", x = 450, y = 0.7, label = "95% CI: 152 - 193ms", size = 5) +
theme(axis.text.x = element_text(colour="grey4", size=16), axis.text.y = element_text(colour = "grey4", size=16)) +
labs(y = "Survival", x = "Time") + theme_grey(base_size=16)
Experiment 2
Data for 48 participants (subject) reading 180 sentences (item) that either contained the correct target word, a pseudohomophone (i.e., phonologically related non-word), or an orthographic control non-word.
- Because of his insomnia, Caleb couldn’t sleep/sleap/slerp even though he was tired.
ffd = first fixation duration, sfd = single fixation duration, gzd = gaze duration, gpt = go-past time, tvt = total time, skp = fixation probability (inverse of skipping), rgi = regression-in probability, rgo = regression-out probability.
## 'data.frame': 8347 obs. of 12 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ item : Factor w/ 180 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ condition: Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 1 ...
## $ ffd : int NA 186 218 214 178 NA NA NA 192 219 ...
## $ sfd : int NA 186 218 214 NA NA NA NA 192 219 ...
## $ gzd : int NA 186 218 214 434 NA NA NA 192 219 ...
## $ tvt : int NA 186 218 214 434 286 525 379 666 219 ...
## $ gpt : int NA 186 218 214 434 NA NA NA 192 219 ...
## $ skp : int 0 1 1 1 1 0 0 0 1 1 ...
## $ rgi : int 0 0 0 0 0 1 1 1 0 0 ...
## $ rgo : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Condition: Factor w/ 3 levels "Control","Pseudohomophone",..: 3 2 1 3 2 1 3 2 1 3 ...
## subject item condition ffd sfd gzd tvt gpt skp rgi rgo Condition
## 1 1 1 1 NA NA NA NA NA 0 0 0 Target
## 2 1 2 2 186 186 186 186 186 1 0 0 Pseudohomophone
## 3 1 3 3 218 218 218 218 218 1 0 0 Control
## 4 1 4 1 214 214 214 214 214 1 0 0 Target
## 5 1 5 2 178 NA 434 434 434 1 0 0 Pseudohomophone
## 6 1 6 3 NA NA NA 286 NA 0 1 0 Control
Means & Standard Errors for Exp. 2
byCond <- group_by(data, Condition)
stats.m <- summarize_if(byCond,is.numeric,funs(mean), na.rm=TRUE)
stats.se <- summarize_if(byCond,is.numeric,funs(std.error), na.rm=TRUE)
stats.m
## # A tibble: 3 x 9
## Condition ffd sfd gzd tvt gpt skp rgi rgo
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Control 260. 263. 300. 410. 343. 0.739 0.273 0.0732
## 2 Pseudohomophone 248. 251. 275. 325. 309. 0.745 0.166 0.0623
## 3 Target 218. 220. 232. 250. 254. 0.676 0.0773 0.0469
stats.se
## # A tibble: 3 x 9
## Condition ffd sfd gzd tvt gpt skp rgi rgo
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Control 2.40 2.63 3.49 5.85 4.91 0.00836 0.00849 0.00496
## 2 Pseudohomophone 2.17 2.36 2.85 4.02 4.09 0.00825 0.00705 0.00457
## 3 Target 1.78 1.88 2.16 2.72 3.23 0.00885 0.00506 0.00400
Figure of early measures for Exp. 2
As with Experiment 1, the graded processing is clearly visible, where readers have an advantage to processing a phonologically related word over a control word matched on visual similarity.
Survival Analyses for Exp. 2
Survival analyses were again conducted to determine just how rapidly phonological codes come online and begin to influence behavior, The DPE reflects the earliest observable influence of phonological coding on behavior.
tmp<-select(data,subject,ffd,condition) %>% arrange(condition) %>%
rename(duration=ffd)
tmp$condition <- as.numeric(tmp$condition)
survdata<- as.data.frame(tmp %>% as_tibble() %>% mutate(condition = condition-1)) %>%
filter(condition != 0, !is.na(duration))
survdata$condition <- as.factor(survdata$condition)
str(survdata)
## 'data.frame': 4120 obs. of 3 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ duration : int 186 178 249 164 203 431 228 196 295 206 ...
## $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
We have 48 participants who each have between 55 and 110 data points:
n.per.sbj <- table(survdata$subject)
length(n.per.sbj)
## [1] 48
range(n.per.sbj)
## [1] 55 110
We can now use these data to generate divergence point estimates (DPE) for each participant:
ip.dpa <- DPA.ip(survdata$subject, survdata$duration, survdata$condition, quiet = TRUE)
dpe <- as.data.frame(ip.dpa$dp_matrix)
# critical columns in output
# 'dpcount' = the number of iterations (out of 1000) on which a DPE was obtained
#' median_dp_duration' = median of the DPEs obtained on each iteration
str(dpe)
## 'data.frame': 48 obs. of 6 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ dpcount : num 1000 1000 1000 1000 997 984 0 1000 1000 1000 ...
## $ median_dp_point: num 430 138 1 239 1 ...
## $ median_duration: num 212 176 144 165 81 ...
## $ ci.lower : num 210.5 175 144.5 80.5 81 ...
## $ ci.upper : num 212 176 144 173 103 ...
dpe$subject[dpe$dpcount<500]
## [1] 7 16 26 45 46
## 48 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ... 48
Doing so reveals that the DPE for 5 participants were unreliable (i.e., a DP was found on fewer than half of the iterations). Removing those participants reveals a mean DPE of ~161 across the remaining participants (the value moves around ever so slightly each time the bootstrap re-sampling procedure runs). This tells us that on average, phonological coding was influencing behavior by as early as 161 ms after fixation on the target word began.
dpe.rel <- filter(dpe, dpcount >= 500)
summarize(dpe.rel, mean.dpe = mean(median_duration, na.rm=TRUE))
## mean.dpe
## 1 162.6279
Finally, we can represent this visually by examining the survival curves created using the ggsurv function. Note--the values displayed on this figure are from the published version of the manuscript and might vary by ~ 1ms from those generated here.
data$survdat<-as.integer(ifelse(data$Condition=="Target",NA,data$ffd))
tmp2 <- filter(data, !subject %in% c(7, 16, 26, 45, 46))
ffd.surv <- survfit(Surv(survdat) ~ Condition, data=tmp2)
pl2<-ggsurv(s=ffd.surv)
pl2 + geom_vline(xintercept = 161.06, linetype = "dotted") +
annotate("rect", xmin=146, xmax=176, ymin=0, ymax=1, alpha = .2) +
annotate("text", x = 450, y = 0.75, label = "Divergence Point = 161ms", size = 5) +
annotate("text", x = 450, y = 0.7, label = "95% CI: 146 - 176ms", size = 5) +
theme(axis.text.x = element_text(colour="grey4", size=16), axis.text.y = element_text(colour = "grey4", size=16)) +
labs(y = "Survival", x = "Time") + theme_grey(base_size=16)
Experiment 3
Data for 48 participants (subject) reading the same 168 sentences (item) from Experiment 1. Instead of directly fixating the different target words, all readers fixated the correct target word but received a parafoveal preview that was either identical, the homophone, or the orthographic control word. Previews were presented using the gaze-contingent invisible boundary paradigm (Rayner, 1975).
ffd = first fixation duration, sfd = single fixation duration, gzd = gaze duration, gpt = go-past time, tvt = total time, skp = fixation probability (inverse of skipping), rgi = regression-in probability, rgo = regression-out probability.
## 'data.frame': 6403 obs. of 13 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ item : Factor w/ 168 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ condition: Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 1 ...
## $ ffd : int 187 NA NA NA 200 180 209 NA NA NA ...
## $ sfd : int 187 NA NA NA 200 180 NA NA NA NA ...
## $ gzd : int 187 NA NA NA 200 180 362 NA NA NA ...
## $ tvt : int 187 NA NA NA 200 180 362 NA NA NA ...
## $ gpt : int 187 NA NA NA 200 180 362 NA NA NA ...
## $ skp : int 1 0 0 0 1 1 1 0 0 0 ...
## $ rgi : int 0 0 0 0 0 0 0 0 0 0 ...
## $ rgo : int 0 0 0 0 0 0 0 0 0 0 ...
## $ launch : int 8 NA NA NA 5 6 14 NA NA NA ...
## $ Condition: Factor w/ 3 levels "Control","Homophone",..: 3 2 1 3 2 1 3 2 1 3 ...
## subject item condition ffd sfd gzd tvt gpt skp rgi rgo launch Condition
## 1 1 1 1 187 187 187 187 187 1 0 0 8 Target
## 2 1 2 2 NA NA NA NA NA 0 0 0 NA Homophone
## 3 1 3 3 NA NA NA NA NA 0 0 0 NA Control
## 4 1 4 1 NA NA NA NA NA 0 0 0 NA Target
## 5 1 5 2 200 200 200 200 200 1 0 0 5 Homophone
## 6 1 6 3 180 180 180 180 180 1 0 0 6 Control
Means & Standard Errors for Exp. 3
byCond <- group_by(data, Condition)
stats.m <- summarize_if(byCond,is.numeric,funs(mean), na.rm=TRUE)
stats.se <- summarize_if(byCond,is.numeric,funs(std.error), na.rm=TRUE)
stats.m
## # A tibble: 3 x 10
## Condition ffd sfd gzd tvt gpt skp rgi rgo launch
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Control 241. 244. 260. 290. 307. 0.655 0.146 0.0829 6.61
## 2 Homophone 235. 237. 253. 286. 296. 0.661 0.157 0.0715 6.63
## 3 Target 220. 221. 231. 257. 265. 0.654 0.109 0.0618 6.70
stats.se
## # A tibble: 3 x 10
## Condition ffd sfd gzd tvt gpt skp rgi rgo launch
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Control 2.20 2.35 2.73 3.59 4.93 0.0103 0.00768 0.00599 0.0911
## 2 Homophone 2.07 2.21 2.60 3.46 4.93 0.0103 0.00788 0.00559 0.0927
## 3 Target 1.89 2.00 2.34 3.31 4.42 0.0103 0.00672 0.00519 0.0941
Figure of early measures for Exp. 3
As with Experiment 1, the graded processing is clearly visible, where readers have an advantage to processing a phonologically related word over a control word matched on visual similarity. This demonstrates that the advantage to processing the phonologically related word was not driven by the somewhat strange task of reading contextually inappropriate words, which may have inflated fixation times, especially in the control condition.
Survival Analyses for Exp. 3
Survival analyses were again conducted to determine just how rapidly phonological codes come online and begin to influence behavior, The DPE reflects the earliest observable influence of phonological coding on behavior.
tmp<-select(data,subject,ffd,condition) %>% arrange(condition) %>%
rename(duration=ffd)
tmp$condition <- as.numeric(tmp$condition)
survdata<- as.data.frame(tmp %>% as_tibble() %>% mutate(condition = condition-1)) %>%
filter(condition != 0, !is.na(duration))
survdata$condition <- as.factor(survdata$condition)
str(survdata)
## 'data.frame': 2797 obs. of 3 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ duration : int 200 294 191 246 315 222 230 176 273 384 ...
## $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
We have 48 participants who each have between 33 and 84 data points:
n.per.sbj <- table(survdata$subject)
length(n.per.sbj)
## [1] 48
range(n.per.sbj)
## [1] 33 84
We can now use these data to generate divergence point estimates (DPE) for each participant:
ip.dpa <- DPA.ip(survdata$subject, survdata$duration, survdata$condition, quiet = TRUE)
dpe <- as.data.frame(ip.dpa$dp_matrix)
# critical columns in output
# 'dpcount' = the number of iterations (out of 1000) on which a DPE was obtained
#' median_dp_duration' = median of the DPEs obtained on each iteration
str(dpe)
## 'data.frame': 48 obs. of 6 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ dpcount : num 1000 1000 1000 1000 1000 1000 1000 881 1000 1000 ...
## $ median_dp_point: num 1 1 1 278 150 64 51 694 383 1 ...
## $ median_duration: num 132 138 114 208 174 ...
## $ ci.lower : num 132 138 114 192 166 ...
## $ ci.upper : num 132 138 114 208 184 ...
dpe$subject[dpe$dpcount<500]
## [1] 14 27
## 48 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ... 48
Doing so reveals that the DPE for 2 participants were unreliable (i.e., a DP was found on fewer than half of the iterations). Removing those participants reveals a mean DPE of ~170 across the remaining participants (the value moves around ever so slightly each time the bootstrap re-sampling procedure runs). This tells us that on average, phonological coding was influencing behavior by as early as 170 ms after fixation on the target word began.
dpe.rel <- filter(dpe, dpcount >= 500)
summarize(dpe.rel, mean.dpe = mean(median_duration, na.rm=TRUE))
## mean.dpe
## 1 169.8804
Finally, we can represent this visually by examining the survival curves created using the ggsurv function. Note--the values displayed on this figure are from the published version of the manuscript and might vary by ~ 1ms from those generated here.
data$survdat<-as.integer(ifelse(data$Condition=="Target",NA,data$ffd))
tmp2 <- filter(data, !subject %in% c(14, 27))
ffd.surv <- survfit(Surv(survdat) ~ Condition, data=tmp2)
pl2<-ggsurv(s=ffd.surv)
pl2 + geom_vline(xintercept = 170.32, linetype = "dotted") +
annotate("rect", xmin=155, xmax=186, ymin=0, ymax=1, alpha = .2) +
annotate("text", x = 450, y = 0.75, label = "Divergence Point = 170ms", size = 5) +
annotate("text", x = 450, y = 0.7, label = "95% CI: 155 - 186ms", size = 5) +
theme(axis.text.x = element_text(colour="grey4", size=16), axis.text.y = element_text(colour = "grey4", size=16)) +
labs(y = "Survival", x = "Time") + theme_grey(base_size=16)
Experiment 4
Data for 48 participants (subject) reading the 180 sentences (item) from Experiment 2, with the word-type manipulation done using the gaze-contingent boundary paradigm like Experiment 3, such that all readers directly fixated the correct target after having had a preview that was either identical, a pseudohomophone, or an orthographic control non-word.
ffd = first fixation duration, sfd = single fixation duration, gzd = gaze duration, gpt = go-past time, tvt = total time, skp = fixation probability (inverse of skipping), rgi = regression-in probability, rgo = regression-out probability.
## 'data.frame': 6704 obs. of 13 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ item : Factor w/ 180 levels "1","2","3","4",..: 1 2 3 4 5 7 8 9 10 11 ...
## $ condition: Factor w/ 3 levels "1","2","3": 1 2 3 1 2 1 2 3 1 2 ...
## $ ffd : int NA 173 196 175 NA 215 198 143 195 130 ...
## $ sfd : int NA 173 NA 175 NA 215 198 143 195 130 ...
## $ gzd : int NA 173 317 175 NA 215 198 143 195 130 ...
## $ tvt : int 107 173 317 175 NA 215 198 197 195 130 ...
## $ gpt : int NA 264 317 175 NA 215 198 143 195 130 ...
## $ skp : int 0 1 1 1 0 1 1 1 1 1 ...
## $ rgi : int 1 0 0 0 0 0 0 1 0 0 ...
## $ rgo : int 0 1 0 0 0 0 0 0 0 0 ...
## $ launch : int NA 7 8 2 NA 5 7 5 13 7 ...
## $ Condition: Factor w/ 3 levels "Control","Pseudohomophone",..: 3 2 1 3 2 3 2 1 3 2 ...
## subject item condition ffd sfd gzd tvt gpt skp rgi rgo launch
## 1 1 1 1 NA NA NA 107 NA 0 1 0 NA
## 2 1 2 2 173 173 173 173 264 1 0 1 7
## 3 1 3 3 196 NA 317 317 317 1 0 0 8
## 4 1 4 1 175 175 175 175 175 1 0 0 2
## 5 1 5 2 NA NA NA NA NA 0 0 0 NA
## 6 1 7 1 215 215 215 215 215 1 0 0 5
## Condition
## 1 Target
## 2 Pseudohomophone
## 3 Control
## 4 Target
## 5 Pseudohomophone
## 6 Target
Means & Standard Errors for Exp. 4
byCond <- group_by(data, Condition)
stats.m <- summarize_if(byCond,is.numeric,funs(mean), na.rm=TRUE)
stats.se <- summarize_if(byCond,is.numeric,funs(std.error), na.rm=TRUE)
stats.m
## # A tibble: 3 x 10
## Condition ffd sfd gzd tvt gpt skp rgi rgo launch
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Control 238. 243. 257. 276. 291. 0.716 0.105 0.0740 6.05
## 2 Pseudohomophone 232. 235. 249. 271. 287. 0.727 0.100 0.0740 6.02
## 3 Target 215. 214. 229. 250. 256. 0.685 0.0851 0.0597 6.22
stats.se
## # A tibble: 3 x 10
## Condition ffd sfd gzd tvt gpt skp rgi rgo launch
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Control 1.93 2.06 2.32 3.13 4.03 0.00952 0.00647 0.00553 0.0789
## 2 Pseudohomo… 1.80 1.90 2.22 3.19 5.36 0.00947 0.00638 0.00556 0.0815
## 3 Target 1.81 1.83 2.28 3.18 3.67 0.00981 0.00589 0.00500 0.0854
Figure of early measures for Exp. 4
As with with the previous 4 experiments, we again see the graded processing indicative of an advantage to processing phonologically related previews.
Survival Analyses for Exp. 4
Survival analyses were again conducted to determine just how rapidly phonological codes come online and begin to influence behavior, The DPE reflects the earliest observable influence of phonological coding on behavior.
tmp<-select(data,subject,ffd,condition) %>% arrange(condition) %>%
rename(duration=ffd)
tmp$condition <- as.numeric(tmp$condition)
survdata<- as.data.frame(tmp %>% as_tibble() %>% mutate(condition = condition-1)) %>%
filter(condition != 0, !is.na(duration))
survdata$condition <- as.factor(survdata$condition)
str(survdata)
## 'data.frame': 3218 obs. of 3 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ duration : int 173 198 130 221 179 246 145 167 114 134 ...
## $ condition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
We have 48 participants who each have between 34 and 98 data points:
n.per.sbj <- table(survdata$subject)
length(n.per.sbj)
## [1] 48
range(n.per.sbj)
## [1] 34 98
We can now use these data to generate divergence point estimates (DPE) for each participant:
ip.dpa <- DPA.ip(survdata$subject, survdata$duration, survdata$condition, quiet = TRUE)
dpe <- as.data.frame(ip.dpa$dp_matrix)
# critical columns in output
# 'dpcount' = the number of iterations (out of 1000) on which a DPE was obtained
#' median_dp_duration' = median of the DPEs obtained on each iteration
str(dpe)
## 'data.frame': 48 obs. of 6 variables:
## $ subject : Factor w/ 48 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ dpcount : num 1000 1000 995 1000 1000 1000 909 1000 0 1000 ...
## $ median_dp_point: num 241 426 1 127 1 55 178 1 NA 154 ...
## $ median_duration: num 136 232 116 178 132 ...
## $ ci.lower : num 136 208 116 178 132 ...
## $ ci.upper : num 136 234 289 178 167 ...
dpe$subject[dpe$dpcount<500]
## [1] 9 20 22 29 33 37 39 43
## 48 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ... 48
Doing so reveals that the DPE for 8 participants were unreliable (i.e., a DP was found on fewer than half of the iterations). Removing those participants reveals a mean DPE of ~160 across the remaining participants (the value moves around ever so slightly each time the bootstrap re-sampling procedure runs). This tells us that on average, phonological coding was influencing behavior by as early as 160 ms after fixation on the target word began.
dpe.rel <- filter(dpe, dpcount >= 500)
summarize(dpe.rel, mean.dpe = mean(median_duration, na.rm=TRUE))
## mean.dpe
## 1 160.55
Finally, we can represent this visually by examining the survival curves created using the ggsurv function. Note--the values displayed on this figure are from the published version of the manuscript and might vary by ~ 1ms from those generated here.
data$survdat<-as.integer(ifelse(data$Condition=="Target",NA,data$ffd))
tmp2 <- filter(data, !subject %in% c( 9, 20, 22, 29, 33, 37, 39, 43))
ffd.surv <- survfit(Surv(survdat) ~ Condition, data=tmp2)
pl2<-ggsurv(s=ffd.surv)
pl2 + geom_vline(xintercept = 159.81, linetype = "dotted") +
annotate("rect", xmin=147, xmax=172, ymin=0, ymax=1, alpha = .2) +
annotate("text", x = 450, y = 0.75, label = "Divergence Point = 160ms", size = 5) +
annotate("text", x = 450, y = 0.7, label = "95% CI: 147 - 172ms", size = 5) +
theme(axis.text.x = element_text(colour="grey4", size=16), axis.text.y = element_text(colour = "grey4", size=16)) +
labs(y = "Survival", x = "Time") + theme_grey(base_size=16)