Playful iconicity: data & analyses

Mark Dingemanse & Bill Thompson (this version: 2019-12-03)

Introduction
Main analyses
End

Introduction

This code notebook provides a fully reproducible workflow for the paper Playful iconicity: Structural markedness underlies the relation between funniness and iconicity. To increase readability, not all code chunks present in the .Rmd source are shown in the output. A separate code notebook has the supplementary analyses.

Data sources

Primary data sources:

iconicity ratings: Perry, Lynn K. et al. Iconicity in the Speech of Children and Adults. Developmental Science. doi:10.1111/desc.12572
funniness ratings: Engelthaler, Tomas, and Thomas T. Hills. 2017. Humor Norms for 4,997 English Words. Behavior Research Methods, July, 1-9. doi:10.3758/s13428-017-0930-6

We use these ratings in our analyses, but we also feed them to our imputation method, which regresses the human ratings against semantic vectors in order to generate imputed ratings for an additional 63.680 words.

Secondary data sources:

number of morphemes: Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., … Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459. doi: 10.3758/BF03193014
word frequency: Brysbaert, M., New, B., & Keuleers, E. (2012). Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior Research Methods, 44(4), 991–997. doi: 10.3758/s13428-012-0190-4 (for word frequency)
lexical decision times: Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287-304. doi: 10.3758/s13428-011-0118-4
phonotactic measures: Vaden, K.I., Halpin, H.R., Hickok, G.S. (2009). Irvine Phonotactic Online Dictionary, Version 2.0. [Data file]. Available from http://www.iphod.com.

Secondary data sources used in supplementary analyses:

valence, arousal and dominance: Warriner, A.B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191-1207
age of acquisition: Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978-990. doi: 10.3758/s13428-012-0210-4

After collating these data sources we add a range of summary variables, mainly for easy plotting and subset selection.

words <- words %>%
  mutate(fun_perc = ntile(fun,10),
         fun_resid_perc = ntile(fun_resid,10),
         ico_perc = ntile(ico,10),
         diff_rank = fun_perc + ico_perc,
         ico_imputed_perc = ntile(ico_imputed,10),
         fun_imputed_perc = ntile(fun_imputed,10),
         fun_imputed_resid_perc = ntile(fun_imputed_resid,10),
         diff_rank_setB = fun_perc + ico_imputed_perc,
         diff_rank_setC = fun_imputed_perc + ico_imputed_perc,
         diff_rank_setD = fun_imputed_perc + ico_perc,
         logletterfreq_perc = ntile(logletterfreq,10),
         dens_perc = ntile(unsDENS,10),
         biphone_perc = ntile(unsBPAV,10),
         triphone_perc = ntile(unsTPAV,10),
         posprob_perc = ntile(unsPOSPAV,10),
         valence_perc = ntile(valence,10))

Descriptive data

We have 4.996 words rated for funniness, 2.945 rated for iconicity, and 1.419 in the intersection (set A). We have 3.577 words with human funniness ratings and imputed iconicity ratings (set B). We have imputed data for a total of 70.202 words, and we’re venturing outside the realm of rated words for 63.680 of them (set C).

(We also have 1.526 words with human iconicity ratings and imputed funniness ratings in set D, the mirror image of set B; this is not used in the paper but reported on in Supplementary Analyses below.)

set	n
A	1419
B	3577
C	63680
D	1526

## # A tibble: 3 x 9
##   word    ico   fun ico_perc fun_perc ico_imputed fun_imputed
##   <chr> <dbl> <dbl>    <int>    <int>       <dbl>       <dbl>
## 1 wigg~   2.6  3.52       10       10        3.37        3.39
## 2 wobb~   2.4  3.15        9       10        3.06        3.11
## 3 wagg~  NA   NA          NA       NA        2.37        3.42
## # ... with 2 more variables: ico_imputed_perc <int>,
## #   fun_imputed_perc <int>

The most important columns in the data are shown below for set A. Sets B and C feature ico_imputed and fun_imputed instead of or in addition to the human ratings. The field diff_rank is the sum of fun and ico deciles for a given word: a word with diff_rank 2 occurs in the first decile (lowest 10%) of both funniness and iconicity ratings, and a word with diff_rank 20 occurs in the 10th decile (highest 10%) of both.

Structure of the data

word	ico	fun	logletterfreq	logfreq	rt	nmorph	diff_rank
flop	3.142857	3.031250	-3.223260	2.075547	587.9189	1	20
whine	2.666667	2.833333	-2.706352	1.924279	588.6667	1	19
slip	2.615385	2.586207	-2.978875	3.120903	546.2000	1	18
sigh	2.800000	2.535714	-2.941718	2.240549	577.4595	1	17
must	1.500000	2.636364	-2.952673	4.552206	569.8056	1	16
frog	2.181818	2.440000	-3.097211	2.781037	533.2051	1	15
moose	0.300000	3.103448	-2.622790	2.451786	550.5263	1	14
stretch	2.400000	2.187500	-2.616122	2.874482	567.4615	1	13
block	2.428571	2.153846	-3.366796	3.315551	537.0000	1	12
lark	-0.900000	3.025641	-3.000745	1.924279	607.9375	1	11

Figures

For a quick impression of the main findings, this section reproduces the figures from the paper.

Figure 1: Overview

Figure 3: Funniness and iconicity

Figure 4: Highest rated words

Figure 5: Structural markedness

Main analyses

Funniness and iconicity

Reproducing prior results

Engelthaler & Hills report frequency as the strongest correlate with funniness (less frequent words are rated as more funny), and lexical decision RT as the second strongest (words with slower RTs are rated as more funny). By way of sanity check let’s replicate their analysis.

Raw correlations hover around 28%, as reported (without corrections or controls) in their paper. A linear model with funniness as dependent variable and frequency and RT as predictors shows a role for both, though frequency accounts for a much larger portion of the variance (15%) than rt (0.6%).

To what extent do frequency and RT predict funniness?

Model m0: lm(formula = fun ~ logfreq + rt, data = words %>% drop_na(fun))

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	78.329	454.096	0	0.083
rt	1	17.315	100.380	0	0.020
Residuals	4993	861.264

Known knowns

If frequency and RT explain some of the variance in funniness ratings, how much is left for iconicity? We’ll do this analysis on the core set of 1419 words for which we have funniness and iconicity ratings.

Turns out that the magnitude estimate of iconicity is about half that of frequency, and with positive sign instead of a negative one (higher funniness ratings go with higher iconicity ratings). The effect of iconicity ratings is much larger than RT, the second most important correlate reported by Engelthaler & Hill.

Model m1.1: lm(formula = fun ~ logfreq + rt, data = words %>% filter(set == , “A”))

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	36.143	247.813	0.000	0.149
rt	1	1.249	8.562	0.003	0.006
Residuals	1416	206.519

Model m1.2: lm(formula = fun ~ logfreq + rt + ico, data = words %>% filter(set == , “A”))

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	36.143	258.779	0.000	0.155
rt	1	1.249	8.941	0.003	0.006
ico	1	8.891	63.661	0.000	0.043
Residuals	1415	197.628

model comparison of m1.1 and m1.2

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
1416	206.5194
1415	197.6281	1	8.891332	63.66118	0

Partial correlations show 20.6% covariance between funniness and iconicity, partialing out log frequency as a mediator. This shows the effects of iconicity and funniness are not reducible to frequency alone.

funniness and iconicity controlling for word frequency

estimate	p.value	statistic	n	gp	Method
0.2064276	0	7.938811	1419	1	pearson

Example words

Both high: zigzag, squeak, chirp, pop, clunk, moo, clang, oink, zoom, smooch, babble, squawk, thud, gush, fluff, flop, waddle, giggle, tinkle, ooze

Both low: silent, statement, poor, cellar, incest, window, lie, coffin, platform, address, slave, wait, year, case

High funniness, low iconicity: belly, buttocks, beaver, chipmunk, turkey, bra, hippo, chimp, blonde, penis, pun, dingo, trombone, deuce, lark, gander, magpie, tongue, giraffe, hoe

High iconicity, low funniness: click, roar, crash, chime, scratch, swift, sunshine, low, break, clash, shoot, airplane, dread

N.B. controlling for frequency in these lists (by using fun_resid instead of fun) does not make a difference in ranking, so not done here and elsewhere.

What about compound nouns among high iconicity words? From eyeballing, it seems to be about 10% in a set of the highest rated 200 nouns. Many probable examples can be found by looking at highly rated nouns with multiple morphemes: zigzag, buzzer, skateboard, sunshine, zipper, freezer, snowball, juggler, airplane, bedroom, goldfish, seaweed, lipstick, mixer, corkscrew, doorknob, killer, moonlight, tummy, kingdom, razor, singer, ashtray, fireworks, pliers, racer, uproar (zigzag, one of the few reduplicative words in English, is included here because the Balota et al. database lists it as having 2 morphemes).

Funniness and imputed iconicity

Here we study the link between funniness ratings and imputed iconicity ratings.

Compared to model m2.1 with just log frequency and lexical decision time as predictors, model m2.2 including imputed iconicity as predictor provides a significantly better fit and explains a larger portion of the variance.

Model m2.1: lm(formula = fun ~ logfreq + rt, data = words.setB)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	39.528	218.214	0	0.058
rt	1	20.487	113.100	0	0.031
Residuals	3574	647.400

Model m2.2: lm(formula = fun ~ logfreq + rt + ico_imputed, data = words.setB)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	39.528	245.736	0	0.064
rt	1	20.487	127.365	0	0.034
ico_imputed	1	72.669	451.769	0	0.112
Residuals	3573	574.731

model comparison

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
3574	647.3997
3573	574.7308	1	72.66885	451.7694	0

A partial correlations analysis shows that imputed iconicity values correlate with funniness ratings at at least the same level as actual iconicity ratings, controlling for frequency (r = 0.32, p < 0.0001).

Example words

High imputed funniness and high imputed iconicity: swish, chug, bop, gobble, smack, blip, whack, oomph, poke, wallop, funk, chuckle, quickie, wriggle, quiver, scamp, burp, hooky, oodles, weasel

Low imputed funniness and low imputed iconicity: subject, ransom, libel, bible, siege, hospice, conduct, arsenic, clothing, negro, mosque, typhoid, request, expense, author, length, anthrax, mandate, plaintiff, hostage

High funniness and low imputed iconicity: heifer, dinghy, cuckold, nudist, sheepdog, oddball, spam, harlot, getup, rickshaw, sac, kiwi, whorehouse, soiree, condom, plaything, croquet, charade, fiver, loch

Low funniness and high imputed iconicity: shudder, scrape, taps, fright, heartbeat, puncture, choke, tremor, biceps, glimpse, disgust, doom, stir, dent, scold, bully, reign, blister, check, horror

What about analysable compounds among high iconicity nouns? Here too about 10%, with examples like heartbeat, mouthful, handshake, bellboy, comeback, catchphrase.

Imputed funniness and imputed iconicity

Model 3.1: lm(formula = fun_imputed ~ logfreq + rt, data = words.setC)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	92.862	1018.093	0	0.050
rt	1	13.422	147.150	0	0.008
Residuals	19204	1751.629

Model 3.2: lm(formula = fun_imputed ~ logfreq + rt + ico_imputed, data = words.setC)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	92.862	1258.529	0	0.062
rt	1	13.422	181.901	0	0.009
ico_imputed	1	334.714	4536.279	0	0.191
Residuals	19203	1416.915

model comparison

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
19204	1751.629
19203	1416.915	1	334.7145	4536.279	0

Partial correlations show that imputed iconicity and imputed funniness share 43% covariance not explained by word frequency.

imputed funniness and imputed iconicity controlling for word frequency

estimate	p.value	statistic	n	gp	Method
0.4274166	0	119.302	63680	1	pearson

Example words

High imputed funniness and high imputed iconicity: whoosh, whirr, whooshing, brr, argh, chomp, whir, swoosh, brrr, zaps, squeaks, whirring, squelchy, gulps, smacking, growls, clanks, squish, whoo, clop

Low imputed funniness and low imputed iconicity: apr, dei, covenants, palestinians, covenant, clothier, palestinian, variant, mitochondria, israelis, serb, sufferers, herein, isotope, duration, ciudad, appellant, palestine, alexandria, infantrymen

High imputed funniness and low imputed iconicity: pigs, monkeys, herr, raja, franz, lulu, von, beau, caviar, penguins, elves, virgins, lesbians, fez, amuse, hawaiian, hens, salami, perverts, gertrude

Low imputed funniness and high imputed iconicity: slashes, gunshots, footstep, cries, footsteps, fade, froze, cr, swelter, crushing, piercing, shoots, breathing, sobs, tremors, strokes, choking, slammed, shocked, ng

What about compound nouns here? In the top 200 nouns we can spot ~5 (shockwave, doodlebug, flashbulb, backflip, footstep) but that is of course a tiny tail end of a much larger dataset than the earlier two.

A better way is to sample 200 random nouns from a proportionate slice of the data, i.e. 200 * 17.8 = 3560 top nouns in imputed iconicity. In this subset we find at least 30 non-iconic analysable compounds: fireworm, deadbolt, footstep, pockmark, uppercut, woodwork, biotech, notepad, spellbinder, henchmen, quicksands, blowgun, heartbreaks, moonbeams, sketchpad, et cetera.

words.setC %>% 
  filter(ico_imputed_perc > 9,
         POS == "Noun") %>%
  arrange(-ico_imputed) %>%
  slice(1:200) %>%
  dplyr::select(word) %>% unlist %>% unname() 

set.seed(1983)
words.setC %>% 
  filter(ico_imputed_perc > 9,
         POS == "Noun") %>%
  arrange(-ico_imputed) %>%
  slice(1:3560) %>%
  sample_n(200) %>%
  dplyr::select(word) %>% unlist %>% unname()

Structural properties of highly rated words

Log letter frequency

Mean iconicity and mean funniness are higher for lower log letter frequency quantiles:

Mean funniness and iconicity by log letter frequency quantiles

logletterfreq_perc	mean_ico	mean_fun
1	1.2562724	2.510892
2	1.0972947	2.434144
3	0.9435569	2.339590
4	0.7677072	2.313565
5	0.6163793	2.323666
6	0.7206575	2.286704
7	0.7950753	2.361308
8	0.8434129	2.284869
9	0.7531960	2.249879
10	0.5100479	2.273432

High-iconicity high-funniness words tend to have lower log letter frequencies:

Log letter frequency percentiles for upper quantiles of funniness + iconicity

word	fun	ico	diff_rank	logletterfreq_perc
zigzag	3.113636	4.300000	20	1
squeak	3.230769	4.230769	20	2
chirp	3.000000	4.142857	20	1
buzzer	2.833333	4.090909	19	1
pop	3.294118	4.076923	20	1
bleep	2.931818	3.928571	19	6
clunk	3.344828	3.928571	20	1
moo	3.700000	3.882353	20	4
clang	3.200000	3.857143	20	2
boom	2.829268	3.846154	19	1
bang	2.843750	3.833333	19	1
murmur	2.812500	3.833333	19	1
whirl	2.911765	3.818182	19	2
crunch	2.857143	3.785714	19	1
rip	2.827586	3.736842	19	2
sludge	2.875000	3.700000	19	2
ping	2.875000	3.636364	19	1
oink	3.871795	3.615385	20	3
zoom	3.043478	3.600000	20	1
smooch	3.333333	3.600000	20	3

Model comparison with funniness as the DV and log letter frequency as an additional predictor shows that a model including log letter frequency provides a significantly better fit.

Model m4.1: lm(formula = fun ~ logfreq + rt + ico, data = words %>% filter(set == , “A”))

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	36.143	258.779	0.000	0.155
rt	1	1.249	8.941	0.003	0.006
ico	1	8.891	63.661	0.000	0.043
Residuals	1415	197.628

Model m4.2: lm(formula = fun ~ logfreq + rt + ico + logletterfreq, data = words %>% , filter(set == “A”))

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	36.143	265.179	0.000	0.158
rt	1	1.249	9.162	0.003	0.006
ico	1	8.891	65.236	0.000	0.044
logletterfreq	1	4.906	35.994	0.000	0.025
Residuals	1414	192.722

model comparison

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
1415	197.6281
1414	192.7222	1	4.905856	35.9942	0

Partial correlations show that funniness rating and log letter frequency have a covariance of -15.7% controlling for iconicity, and that iconicity and log letter frequency have a covariance of -16.3% controlling for funniness ratings (all p < 0.0001 correcting for multiple comparisons).

funniness and log letter frequency controlling for iconicity

estimate	p.value	statistic	n	gp	Method
-0.157001	0	-5.982098	1419	1	pearson

iconicity and log letter frequency controlling for funniness

estimate	p.value	statistic	n	gp	Method
-0.1634579	0	-6.234739	1419	1	pearson

Model comparison for combined funniness and iconicity scores suggests that having log letter frequency as a predictor significantly improves fit over and above word frequency and lexical decision time.

Model m5.1: lm(formula = funico ~ logfreq + rt, data = words.setA)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	420.245	206.078	0.0	0.127
rt	1	5.516	2.705	0.1	0.002
Residuals	1416	2887.579

Model m5.2: lm(formula = funico ~ logfreq + rt + logletterfreq, data = words.setA)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	420.245	219.963	0.00	0.135
rt	1	5.516	2.887	0.09	0.002
logletterfreq	1	184.189	96.407	0.00	0.064
Residuals	1415	2703.390

model comparison

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
1416	2887.579
1415	2703.390	1	184.1887	96.40745	0

Structural analysis

We carry out a qualitative analysis of the 80 highest ranked words (top deciles for funniness+iconicity) to see if there are formal cues of foregrounding and structural markedness that can help predict funniness and iconicity ratings. Then we find these cues in the larger dataset and see if the patterns hold up.

This analysis reveals the following sets of complex onsets, codas, and verbal diminutive suffixes that are likely structural cues of markedness (given here in the form of regular expressions):

onsets: ^(bl|cl|cr|dr|fl|sc|sl|sn|sp|spl|sw|tr|pr|sq)
codas: (nch|mp|nk|rt|rl|rr|sh|wk)$
verbal suffix: [b-df-hj-np-tv-xz]le)$" (i.e., look for -le after a consonant)

We tag these cues across the whole dataset (looking for the -le suffix only in verbs because words like mutable, unnameable, scalable, manacle are not the same phenomenon) in order to see how they relate to funniness and iconicity.

Model the contribution of markedness relative to logletter frequency. Model comparison shows that a model including the measure of cumulative markedness as predictor provides a significantly better fit (F = 52.78, p < 0.0001) and explains a larger portion of the variance (adjusted R2 = 0.21 vs. 0.18) than a model with just word frequency, lexical decision time and log letter frequency.

Model m5.3: lm(formula = funico ~ logfreq + rt + logletterfreq + cumulative, , data = words.setA)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	420.245	228.013	0.000	0.139
rt	1	5.516	2.993	0.084	0.002
logletterfreq	1	184.189	99.936	0.000	0.066
cumulative	1	97.283	52.783	0.000	0.036
Residuals	1414	2606.107

Model comparison of m5.2 and m5.3

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
1415	2703.390
1414	2606.107	1	97.28312	52.78307	0

Now we trace cumulative markedness in the imputed portions of the dataset, and do the same model comparison as above.

First have a look at a random sample of top imputed words and their markedness:

Cumulative markedness in a random sample of words from the highest quantile of imputed iconicity

word	ico_imputed_perc	ico_imputed	cumulative
brr	10	4.065481	1
squish	10	3.570914	2
clunks	10	2.965891	1
scamp	10	2.397420	2
spank	10	2.360551	2
squoosh	10	2.312993	2
crunk	10	2.165342	2
sw	10	1.898677	1
flipping	10	1.875252	1
flatly	10	1.819130	1
crumping	10	1.737702	1
flourish	10	1.722582	2
crispy	10	1.721382	1
snappish	10	1.612547	2
flush	10	1.598260	2
scrumptious	10	1.491476	1
blank	10	1.435437	2
tramp	10	1.426469	2
speakeasy	10	1.393087	1
scornfully	10	1.366685	1

And at a random sample of words from lower quadrants and their markedness:

Cumulative markedness in a random sample of words from lower quantiles of imputed iconicity

word	ico_imputed_perc	ico_imputed	cumulative
spoilsport	7	0.7898544	2
draughted	7	0.7022719	1
drank	6	0.6164557	2
sweetfish	6	0.5390918	2
protectress	6	0.5275526	1
transmits	5	0.4734109	1
schrank	5	0.4676592	2
blamed	5	0.4105656	1
flatfish	5	0.3870658	2
crystallised	5	0.3525224	1
trench	4	0.2827599	2
preshrunk	3	0.1581131	2
spectroscopic	3	0.1460491	1
splendours	3	0.0495967	1
prelaunch	3	0.0355877	2
spearfish	2	0.0134083	2
flemish	2	-0.0374782	2
flamingos	2	-0.0452134	1
cryptography	2	-0.1193397	1
triangulation	2	-0.1590728	1

Looks like random samples of 20 high-complexity words always feature a majority of high iconicity words:

Imputed ratings for 20 random words high in cumulative markedness

word	ico_imputed_perc	ico_imputed	fun_imputed	cumulative
squoosh	10	2.3129932	2.945588	2
squirt	10	2.1139378	3.302116	2
crump	10	1.7575309	3.072162	2
snaffle	10	1.6898653	3.005494	2
crank	10	1.6032802	2.772123	2
flush	10	1.5982598	2.668945	2
clump	10	1.5840648	2.744887	2
spangle	10	1.5792685	3.046803	2
scribble	10	1.5335878	2.832759	2
swink	10	1.5061947	3.015406	2
tramp	10	1.4264688	2.899633	2
slapdash	9	1.3323017	2.544342	2
prank	8	0.9101965	3.091282	2
crawfish	8	0.8564163	2.726631	2
sweetheart	8	0.8273857	2.711589	2
spinsterish	6	0.5475747	2.329725	2
scrimp	5	0.4658493	2.534127	2
flatfish	5	0.3870658	2.628965	2
prelaunch	3	0.0355877	2.304523	2
scottish	1	-0.2946582	2.556597	2

Let’s have a closer look at subsets. First quadrants, then deciles.

Markedness cues across quartiles of imputed iconicity

target_perc	n	onset	coda	verbdim	complexity
1	15920	0.0639447	0.0060302	0.0003769	0.0703518
2	15920	0.0731784	0.0076005	0.0009422	0.0817211
3	15920	0.0923367	0.0097362	0.0009422	0.1030151
4	15920	0.1583543	0.0155151	0.0049623	0.1788317

Markedness cues across deciles of imputed iconicity

target_perc	n	onset	coda	verbdim	complexity
1	6368	0.0565327	0.0061244	0.0003141	0.0629711
2	6368	0.0684673	0.0045540	0.0004711	0.0734925
3	6368	0.0714510	0.0084799	0.0001570	0.0800879
4	6368	0.0675251	0.0069095	0.0010992	0.0755339
5	6368	0.0788317	0.0080088	0.0012563	0.0880967
6	6368	0.0819724	0.0078518	0.0006281	0.0904523
7	6368	0.0978329	0.0105214	0.0010992	0.1094535
8	6368	0.1103957	0.0111495	0.0025126	0.1240578
9	6368	0.1350503	0.0144472	0.0028266	0.1523241
10	6368	0.2014761	0.0191583	0.0076947	0.2283291

Comparison of models with combined imputed funniness and iconicity as a dependent variable shows that a linear model including cumulative markedness as predictor provides a significantly better fit (F1,19230 = 337.3, p < 0.0001) and explains a little bit more the variance (adjusted R2 = 0.124 vs. 0.109) than a model with just word frequency, lexical decision time and log letter frequency.

Model m5.4: lm(formula = funico_imputed ~ logfreq + rt + logletterfreq, data = words.setC)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	1025.303	361.774	0.000	0.018
rt	1	4.778	1.686	0.194	0.000
logletterfreq	1	5608.765	1979.029	0.000	0.093
Residuals	19203	54423.210

Model m5.5: lm(formula = funico_imputed ~ logfreq + rt + logletterfreq + , cumulative, data = words.setC)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	1025.303	368.110	0.00	0.019
rt	1	4.778	1.716	0.19	0.000
logletterfreq	1	5608.765	2013.687	0.00	0.095
cumulative	1	939.486	337.299	0.00	0.017
Residuals	19202	53483.724

model comparison

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
19203	54423.21
19202	53483.72	1	939.4858	337.299	0

End

Thanks for your interest. Also see the separate code notebook with�supplementary analyses.

If you find this useful, consider checking out the following resources that have been helpful in preparing this Rmarkdown document:

Two of my own past projects (remember, the person most grateful for your well-documented past code is future you):
- Expressiveness and grammatical integration (by Mark Dingemanse)
- Coloured vowels: open data and code (by Mark Dingemanse & Christine Cuskley)
Formatting ANOVA tables in R (by Rose Hartman, Understanding Data)
Iconicity in the speech of children and adults (by Bodo Winter)
English letter frequencies

And of course have a look at the paper itself — latest preprint here: Playful iconicity

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

playful_iconicity_paper.md

playful_iconicity_paper.md

Playful iconicity: data & analyses

Introduction

Data sources

Descriptive data

Figures

Main analyses

Funniness and iconicity

Reproducing prior results

Known knowns

Funniness and imputed iconicity

Imputed funniness and imputed iconicity

Structural properties of highly rated words

Log letter frequency

Structural analysis

End

Files

playful_iconicity_paper.md

Latest commit

History

playful_iconicity_paper.md

File metadata and controls

Playful iconicity: data & analyses

Introduction

Data sources

Descriptive data

Figures

Main analyses

Funniness and iconicity

Reproducing prior results

Known knowns

Funniness and imputed iconicity

Imputed funniness and imputed iconicity

Structural properties of highly rated words

Log letter frequency

Structural analysis

End