Skip to content

Latest commit

 

History

History
8843 lines (4540 loc) · 76.9 KB

playful_iconicity_paper.md

File metadata and controls

8843 lines (4540 loc) · 76.9 KB

Playful iconicity: data & analyses

Mark Dingemanse & Bill Thompson (this version: 2019-12-03)

Introduction

This code notebook provides a fully reproducible workflow for the paper Playful iconicity: Structural markedness underlies the relation between funniness and iconicity. To increase readability, not all code chunks present in the .Rmd source are shown in the output. A separate code notebook has the supplementary analyses.

Data sources

Primary data sources:

  • iconicity ratings: Perry, Lynn K. et al. Iconicity in the Speech of Children and Adults. Developmental Science. doi:10.1111/desc.12572
  • funniness ratings: Engelthaler, Tomas, and Thomas T. Hills. 2017. Humor Norms for 4,997 English Words. Behavior Research Methods, July, 1-9. doi:10.3758/s13428-017-0930-6

We use these ratings in our analyses, but we also feed them to our imputation method, which regresses the human ratings against semantic vectors in order to generate imputed ratings for an additional 63.680 words.

Secondary data sources:

  • number of morphemes: Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., … Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459. doi: 10.3758/BF03193014
  • word frequency: Brysbaert, M., New, B., & Keuleers, E. (2012). Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior Research Methods, 44(4), 991–997. doi: 10.3758/s13428-012-0190-4 (for word frequency)
  • lexical decision times: Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287-304. doi: 10.3758/s13428-011-0118-4
  • phonotactic measures: Vaden, K.I., Halpin, H.R., Hickok, G.S. (2009). Irvine Phonotactic Online Dictionary, Version 2.0. [Data file]. Available from http://www.iphod.com.

Secondary data sources used in supplementary analyses:

  • valence, arousal and dominance: Warriner, A.B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191-1207
  • age of acquisition: Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978-990. doi: 10.3758/s13428-012-0210-4

After collating these data sources we add a range of summary variables, mainly for easy plotting and subset selection.

words <- words %>%
  mutate(fun_perc = ntile(fun,10),
         fun_resid_perc = ntile(fun_resid,10),
         ico_perc = ntile(ico,10),
         diff_rank = fun_perc + ico_perc,
         ico_imputed_perc = ntile(ico_imputed,10),
         fun_imputed_perc = ntile(fun_imputed,10),
         fun_imputed_resid_perc = ntile(fun_imputed_resid,10),
         diff_rank_setB = fun_perc + ico_imputed_perc,
         diff_rank_setC = fun_imputed_perc + ico_imputed_perc,
         diff_rank_setD = fun_imputed_perc + ico_perc,
         logletterfreq_perc = ntile(logletterfreq,10),
         dens_perc = ntile(unsDENS,10),
         biphone_perc = ntile(unsBPAV,10),
         triphone_perc = ntile(unsTPAV,10),
         posprob_perc = ntile(unsPOSPAV,10),
         valence_perc = ntile(valence,10))

Descriptive data

We have 4.996 words rated for funniness, 2.945 rated for iconicity, and 1.419 in the intersection (set A). We have 3.577 words with human funniness ratings and imputed iconicity ratings (set B). We have imputed data for a total of 70.202 words, and we’re venturing outside the realm of rated words for 63.680 of them (set C).

(We also have 1.526 words with human iconicity ratings and imputed funniness ratings in set D, the mirror image of set B; this is not used in the paper but reported on in Supplementary Analyses below.)

set

n

A

1419

B

3577

C

63680

D

1526

## # A tibble: 3 x 9
##   word    ico   fun ico_perc fun_perc ico_imputed fun_imputed
##   <chr> <dbl> <dbl>    <int>    <int>       <dbl>       <dbl>
## 1 wigg~   2.6  3.52       10       10        3.37        3.39
## 2 wobb~   2.4  3.15        9       10        3.06        3.11
## 3 wagg~  NA   NA          NA       NA        2.37        3.42
## # ... with 2 more variables: ico_imputed_perc <int>,
## #   fun_imputed_perc <int>

The most important columns in the data are shown below for set A. Sets B and C feature ico_imputed and fun_imputed instead of or in addition to the human ratings. The field diff_rank is the sum of fun and ico deciles for a given word: a word with diff_rank 2 occurs in the first decile (lowest 10%) of both funniness and iconicity ratings, and a word with diff_rank 20 occurs in the 10th decile (highest 10%) of both.

Structure of the data

word

ico

fun

logletterfreq

logfreq

rt

nmorph

diff_rank

flop

3.142857

3.031250

-3.223260

2.075547

587.9189

1

20

whine

2.666667

2.833333

-2.706352

1.924279

588.6667

1

19

slip

2.615385

2.586207

-2.978875

3.120903

546.2000

1

18

sigh

2.800000

2.535714

-2.941718

2.240549

577.4595

1

17

must

1.500000

2.636364

-2.952673

4.552206

569.8056

1

16

frog

2.181818

2.440000

-3.097211

2.781037

533.2051

1

15

moose

0.300000

3.103448

-2.622790

2.451786

550.5263

1

14

stretch

2.400000

2.187500

-2.616122

2.874482

567.4615

1

13

block

2.428571

2.153846

-3.366796

3.315551

537.0000

1

12

lark

-0.900000

3.025641

-3.000745

1.924279

607.9375

1

11

Figures

For a quick impression of the main findings, this section reproduces the figures from the paper.

Figure 1: Overview

Figure 3: Funniness and iconicity

Figure 4: Highest rated words

Figure 5: Structural markedness

Main analyses

Funniness and iconicity

Reproducing prior results

Engelthaler & Hills report frequency as the strongest correlate with funniness (less frequent words are rated as more funny), and lexical decision RT as the second strongest (words with slower RTs are rated as more funny). By way of sanity check let’s replicate their analysis.

Raw correlations hover around 28%, as reported (without corrections or controls) in their paper. A linear model with funniness as dependent variable and frequency and RT as predictors shows a role for both, though frequency accounts for a much larger portion of the variance (15%) than rt (0.6%).

To what extent do frequency and RT predict funniness?

Model m0: lm(formula = fun ~ logfreq + rt, data = words %>% drop_na(fun))

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

78.329

454.096

0

0.083

rt

1

17.315

100.380

0

0.020

Residuals

4993

861.264

Known knowns

If frequency and RT explain some of the variance in funniness ratings, how much is left for iconicity? We’ll do this analysis on the core set of 1419 words for which we have funniness and iconicity ratings.

Turns out that the magnitude estimate of iconicity is about half that of frequency, and with positive sign instead of a negative one (higher funniness ratings go with higher iconicity ratings). The effect of iconicity ratings is much larger than RT, the second most important correlate reported by Engelthaler & Hill.

Model m1.1: lm(formula = fun ~ logfreq + rt, data = words %>% filter(set == , “A”))

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

36.143

247.813

0.000

0.149

rt

1

1.249

8.562

0.003

0.006

Residuals

1416

206.519

Model m1.2: lm(formula = fun ~ logfreq + rt + ico, data = words %>% filter(set == , “A”))

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

36.143

258.779

0.000

0.155

rt

1

1.249

8.941

0.003

0.006

ico

1

8.891

63.661

0.000

0.043

Residuals

1415

197.628

model comparison of m1.1 and m1.2

Res.Df

RSS

Df

Sum of Sq

F

Pr(>F)

1416

206.5194

1415

197.6281

1

8.891332

63.66118

0

Partial correlations show 20.6% covariance between funniness and iconicity, partialing out log frequency as a mediator. This shows the effects of iconicity and funniness are not reducible to frequency alone.

funniness and iconicity controlling for word frequency

estimate

p.value

statistic

n

gp

Method

0.2064276

0

7.938811

1419

1

pearson

Example words

Both high: zigzag, squeak, chirp, pop, clunk, moo, clang, oink, zoom, smooch, babble, squawk, thud, gush, fluff, flop, waddle, giggle, tinkle, ooze

Both low: silent, statement, poor, cellar, incest, window, lie, coffin, platform, address, slave, wait, year, case

High funniness, low iconicity: belly, buttocks, beaver, chipmunk, turkey, bra, hippo, chimp, blonde, penis, pun, dingo, trombone, deuce, lark, gander, magpie, tongue, giraffe, hoe

High iconicity, low funniness: click, roar, crash, chime, scratch, swift, sunshine, low, break, clash, shoot, airplane, dread

N.B. controlling for frequency in these lists (by using fun_resid instead of fun) does not make a difference in ranking, so not done here and elsewhere.

What about compound nouns among high iconicity words? From eyeballing, it seems to be about 10% in a set of the highest rated 200 nouns. Many probable examples can be found by looking at highly rated nouns with multiple morphemes: zigzag, buzzer, skateboard, sunshine, zipper, freezer, snowball, juggler, airplane, bedroom, goldfish, seaweed, lipstick, mixer, corkscrew, doorknob, killer, moonlight, tummy, kingdom, razor, singer, ashtray, fireworks, pliers, racer, uproar (zigzag, one of the few reduplicative words in English, is included here because the Balota et al. database lists it as having 2 morphemes).

Funniness and imputed iconicity

Here we study the link between funniness ratings and imputed iconicity ratings.

Compared to model m2.1 with just log frequency and lexical decision time as predictors, model m2.2 including imputed iconicity as predictor provides a significantly better fit and explains a larger portion of the variance.

Model m2.1: lm(formula = fun ~ logfreq + rt, data = words.setB)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

39.528

218.214

0

0.058

rt

1

20.487

113.100

0

0.031

Residuals

3574

647.400

Model m2.2: lm(formula = fun ~ logfreq + rt + ico_imputed, data = words.setB)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

39.528

245.736

0

0.064

rt

1

20.487

127.365

0

0.034

ico_imputed

1

72.669

451.769

0

0.112

Residuals

3573

574.731

model comparison

Res.Df

RSS

Df

Sum of Sq

F

Pr(>F)

3574

647.3997

3573

574.7308

1

72.66885

451.7694

0

A partial correlations analysis shows that imputed iconicity values correlate with funniness ratings at at least the same level as actual iconicity ratings, controlling for frequency (r = 0.32, p < 0.0001).

Example words

High imputed funniness and high imputed iconicity: swish, chug, bop, gobble, smack, blip, whack, oomph, poke, wallop, funk, chuckle, quickie, wriggle, quiver, scamp, burp, hooky, oodles, weasel

Low imputed funniness and low imputed iconicity: subject, ransom, libel, bible, siege, hospice, conduct, arsenic, clothing, negro, mosque, typhoid, request, expense, author, length, anthrax, mandate, plaintiff, hostage

High funniness and low imputed iconicity: heifer, dinghy, cuckold, nudist, sheepdog, oddball, spam, harlot, getup, rickshaw, sac, kiwi, whorehouse, soiree, condom, plaything, croquet, charade, fiver, loch

Low funniness and high imputed iconicity: shudder, scrape, taps, fright, heartbeat, puncture, choke, tremor, biceps, glimpse, disgust, doom, stir, dent, scold, bully, reign, blister, check, horror

What about analysable compounds among high iconicity nouns? Here too about 10%, with examples like heartbeat, mouthful, handshake, bellboy, comeback, catchphrase.

Imputed funniness and imputed iconicity

Model 3.1: lm(formula = fun_imputed ~ logfreq + rt, data = words.setC)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

92.862

1018.093

0

0.050

rt

1

13.422

147.150

0

0.008

Residuals

19204

1751.629

Model 3.2: lm(formula = fun_imputed ~ logfreq + rt + ico_imputed, data = words.setC)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

92.862

1258.529

0

0.062

rt

1

13.422

181.901

0

0.009

ico_imputed

1

334.714

4536.279

0

0.191

Residuals

19203

1416.915

model comparison

Res.Df

RSS

Df

Sum of Sq

F

Pr(>F)

19204

1751.629

19203

1416.915

1

334.7145

4536.279

0

Partial correlations show that imputed iconicity and imputed funniness share 43% covariance not explained by word frequency.

imputed funniness and imputed iconicity controlling for word frequency

estimate

p.value

statistic

n

gp

Method

0.4274166

0

119.302

63680

1

pearson

Example words

High imputed funniness and high imputed iconicity: whoosh, whirr, whooshing, brr, argh, chomp, whir, swoosh, brrr, zaps, squeaks, whirring, squelchy, gulps, smacking, growls, clanks, squish, whoo, clop

Low imputed funniness and low imputed iconicity: apr, dei, covenants, palestinians, covenant, clothier, palestinian, variant, mitochondria, israelis, serb, sufferers, herein, isotope, duration, ciudad, appellant, palestine, alexandria, infantrymen

High imputed funniness and low imputed iconicity: pigs, monkeys, herr, raja, franz, lulu, von, beau, caviar, penguins, elves, virgins, lesbians, fez, amuse, hawaiian, hens, salami, perverts, gertrude

Low imputed funniness and high imputed iconicity: slashes, gunshots, footstep, cries, footsteps, fade, froze, cr, swelter, crushing, piercing, shoots, breathing, sobs, tremors, strokes, choking, slammed, shocked, ng

What about compound nouns here? In the top 200 nouns we can spot ~5 (shockwave, doodlebug, flashbulb, backflip, footstep) but that is of course a tiny tail end of a much larger dataset than the earlier two.

A better way is to sample 200 random nouns from a proportionate slice of the data, i.e. 200 * 17.8 = 3560 top nouns in imputed iconicity. In this subset we find at least 30 non-iconic analysable compounds: fireworm, deadbolt, footstep, pockmark, uppercut, woodwork, biotech, notepad, spellbinder, henchmen, quicksands, blowgun, heartbreaks, moonbeams, sketchpad, et cetera.

words.setC %>% 
  filter(ico_imputed_perc > 9,
         POS == "Noun") %>%
  arrange(-ico_imputed) %>%
  slice(1:200) %>%
  dplyr::select(word) %>% unlist %>% unname() 

set.seed(1983)
words.setC %>% 
  filter(ico_imputed_perc > 9,
         POS == "Noun") %>%
  arrange(-ico_imputed) %>%
  slice(1:3560) %>%
  sample_n(200) %>%
  dplyr::select(word) %>% unlist %>% unname() 

Structural properties of highly rated words

Log letter frequency

Mean iconicity and mean funniness are higher for lower log letter frequency quantiles:

Mean funniness and iconicity by log letter frequency quantiles

logletterfreq_perc

mean_ico

mean_fun

1

1.2562724

2.510892

2

1.0972947

2.434144

3

0.9435569

2.339590

4

0.7677072

2.313565

5

0.6163793

2.323666

6

0.7206575

2.286704

7

0.7950753

2.361308

8

0.8434129

2.284869

9

0.7531960

2.249879

10

0.5100479

2.273432

High-iconicity high-funniness words tend to have lower log letter frequencies:

Log letter frequency percentiles for upper quantiles of funniness + iconicity

word

fun

ico

diff_rank

logletterfreq_perc

zigzag

3.113636

4.300000

20

1

squeak

3.230769

4.230769

20

2

chirp

3.000000

4.142857

20

1

buzzer

2.833333

4.090909

19

1

pop

3.294118

4.076923

20

1

bleep

2.931818

3.928571

19

6

clunk

3.344828

3.928571

20

1

moo

3.700000

3.882353

20

4

clang

3.200000

3.857143

20

2

boom

2.829268

3.846154

19

1

bang

2.843750

3.833333

19

1

murmur

2.812500

3.833333

19

1

whirl

2.911765

3.818182

19

2

crunch

2.857143

3.785714

19

1

rip

2.827586

3.736842

19

2

sludge

2.875000

3.700000

19

2

ping

2.875000

3.636364

19

1

oink

3.871795

3.615385

20

3

zoom

3.043478

3.600000

20

1

smooch

3.333333

3.600000

20

3

Model comparison with funniness as the DV and log letter frequency as an additional predictor shows that a model including log letter frequency provides a significantly better fit.

Model m4.1: lm(formula = fun ~ logfreq + rt + ico, data = words %>% filter(set == , “A”))

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

36.143

258.779

0.000

0.155

rt

1

1.249

8.941

0.003

0.006

ico

1

8.891

63.661

0.000

0.043

Residuals

1415

197.628

Model m4.2: lm(formula = fun ~ logfreq + rt + ico + logletterfreq, data = words %>% , filter(set == “A”))

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

36.143

265.179

0.000

0.158

rt

1

1.249

9.162

0.003

0.006

ico

1

8.891

65.236

0.000

0.044

logletterfreq

1

4.906

35.994

0.000

0.025

Residuals

1414

192.722

model comparison

Res.Df

RSS

Df

Sum of Sq

F

Pr(>F)

1415

197.6281

1414

192.7222

1

4.905856

35.9942

0

Partial correlations show that funniness rating and log letter frequency have a covariance of -15.7% controlling for iconicity, and that iconicity and log letter frequency have a covariance of -16.3% controlling for funniness ratings (all p < 0.0001 correcting for multiple comparisons).

funniness and log letter frequency controlling for iconicity

estimate

p.value

statistic

n

gp

Method

-0.157001

0

-5.982098

1419

1

pearson

iconicity and log letter frequency controlling for funniness

estimate

p.value

statistic

n

gp

Method

-0.1634579

0

-6.234739

1419

1

pearson

Model comparison for combined funniness and iconicity scores suggests that having log letter frequency as a predictor significantly improves fit over and above word frequency and lexical decision time.

Model m5.1: lm(formula = funico ~ logfreq + rt, data = words.setA)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

420.245

206.078

0.0

0.127

rt

1

5.516

2.705

0.1

0.002

Residuals

1416

2887.579

Model m5.2: lm(formula = funico ~ logfreq + rt + logletterfreq, data = words.setA)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

420.245

219.963

0.00

0.135

rt

1

5.516

2.887

0.09

0.002

logletterfreq

1

184.189

96.407

0.00

0.064

Residuals

1415

2703.390

model comparison

Res.Df

RSS

Df

Sum of Sq

F

Pr(>F)

1416

2887.579

1415

2703.390

1

184.1887

96.40745

0

Structural analysis

We carry out a qualitative analysis of the 80 highest ranked words (top deciles for funniness+iconicity) to see if there are formal cues of foregrounding and structural markedness that can help predict funniness and iconicity ratings. Then we find these cues in the larger dataset and see if the patterns hold up.

This analysis reveals the following sets of complex onsets, codas, and verbal diminutive suffixes that are likely structural cues of markedness (given here in the form of regular expressions):

  • onsets: ^(bl|cl|cr|dr|fl|sc|sl|sn|sp|spl|sw|tr|pr|sq)
  • codas: (nch|mp|nk|rt|rl|rr|sh|wk)$
  • verbal suffix: [b-df-hj-np-tv-xz]le)$" (i.e., look for -le after a consonant)

We tag these cues across the whole dataset (looking for the -le suffix only in verbs because words like mutable, unnameable, scalable, manacle are not the same phenomenon) in order to see how they relate to funniness and iconicity.

Model the contribution of markedness relative to logletter frequency. Model comparison shows that a model including the measure of cumulative markedness as predictor provides a significantly better fit (F = 52.78, p < 0.0001) and explains a larger portion of the variance (adjusted R2 = 0.21 vs. 0.18) than a model with just word frequency, lexical decision time and log letter frequency.

Model m5.3: lm(formula = funico ~ logfreq + rt + logletterfreq + cumulative, , data = words.setA)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

420.245

228.013

0.000

0.139

rt

1

5.516

2.993

0.084

0.002

logletterfreq

1

184.189

99.936

0.000

0.066

cumulative

1

97.283

52.783

0.000

0.036

Residuals

1414

2606.107

Model comparison of m5.2 and m5.3

Res.Df

RSS

Df

Sum of Sq

F

Pr(>F)

1415

2703.390

1414

2606.107

1

97.28312

52.78307

0

Now we trace cumulative markedness in the imputed portions of the dataset, and do the same model comparison as above.

First have a look at a random sample of top imputed words and their markedness:

Cumulative markedness in a random sample of words from the highest quantile of imputed iconicity

word

ico_imputed_perc

ico_imputed

cumulative

brr

10

4.065481

1

squish

10

3.570914

2

clunks

10

2.965891

1

scamp

10

2.397420

2

spank

10

2.360551

2

squoosh

10

2.312993

2

crunk

10

2.165342

2

sw

10

1.898677

1

flipping

10

1.875252

1

flatly

10

1.819130

1

crumping

10

1.737702

1

flourish

10

1.722582

2

crispy

10

1.721382

1

snappish

10

1.612547

2

flush

10

1.598260

2

scrumptious

10

1.491476

1

blank

10

1.435437

2

tramp

10

1.426469

2

speakeasy

10

1.393087

1

scornfully

10

1.366685

1

And at a random sample of words from lower quadrants and their markedness:

Cumulative markedness in a random sample of words from lower quantiles of imputed iconicity

word

ico_imputed_perc

ico_imputed

cumulative

spoilsport

7

0.7898544

2

draughted

7

0.7022719

1

drank

6

0.6164557

2

sweetfish

6

0.5390918

2

protectress

6

0.5275526

1

transmits

5

0.4734109

1

schrank

5

0.4676592

2

blamed

5

0.4105656

1

flatfish

5

0.3870658

2

crystallised

5

0.3525224

1

trench

4

0.2827599

2

preshrunk

3

0.1581131

2

spectroscopic

3

0.1460491

1

splendours

3

0.0495967

1

prelaunch

3

0.0355877

2

spearfish

2

0.0134083

2

flemish

2

-0.0374782

2

flamingos

2

-0.0452134

1

cryptography

2

-0.1193397

1

triangulation

2

-0.1590728

1

Looks like random samples of 20 high-complexity words always feature a majority of high iconicity words:

Imputed ratings for 20 random words high in cumulative markedness

word

ico_imputed_perc

ico_imputed

fun_imputed

cumulative

squoosh

10

2.3129932

2.945588

2

squirt

10

2.1139378

3.302116

2

crump

10

1.7575309

3.072162

2

snaffle

10

1.6898653

3.005494

2

crank

10

1.6032802

2.772123

2

flush

10

1.5982598

2.668945

2

clump

10

1.5840648

2.744887

2

spangle

10

1.5792685

3.046803

2

scribble

10

1.5335878

2.832759

2

swink

10

1.5061947

3.015406

2

tramp

10

1.4264688

2.899633

2

slapdash

9

1.3323017

2.544342

2

prank

8

0.9101965

3.091282

2

crawfish

8

0.8564163

2.726631

2

sweetheart

8

0.8273857

2.711589

2

spinsterish

6

0.5475747

2.329725

2

scrimp

5

0.4658493

2.534127

2

flatfish

5

0.3870658

2.628965

2

prelaunch

3

0.0355877

2.304523

2

scottish

1

-0.2946582

2.556597

2

Let’s have a closer look at subsets. First quadrants, then deciles.

Markedness cues across quartiles of imputed iconicity

target_perc

n

onset

coda

verbdim

complexity

1

15920

0.0639447

0.0060302

0.0003769

0.0703518

2

15920

0.0731784

0.0076005

0.0009422

0.0817211

3

15920

0.0923367

0.0097362

0.0009422

0.1030151

4

15920

0.1583543

0.0155151

0.0049623

0.1788317

Markedness cues across deciles of imputed iconicity

target_perc

n

onset

coda

verbdim

complexity

1

6368

0.0565327

0.0061244

0.0003141

0.0629711

2

6368

0.0684673

0.0045540

0.0004711

0.0734925

3

6368

0.0714510

0.0084799

0.0001570

0.0800879

4

6368

0.0675251

0.0069095

0.0010992

0.0755339

5

6368

0.0788317

0.0080088

0.0012563

0.0880967

6

6368

0.0819724

0.0078518

0.0006281

0.0904523

7

6368

0.0978329

0.0105214

0.0010992

0.1094535

8

6368

0.1103957

0.0111495

0.0025126

0.1240578

9

6368

0.1350503

0.0144472

0.0028266

0.1523241

10

6368

0.2014761

0.0191583

0.0076947

0.2283291

Comparison of models with combined imputed funniness and iconicity as a dependent variable shows that a linear model including cumulative markedness as predictor provides a significantly better fit (F1,19230 = 337.3, p < 0.0001) and explains a little bit more the variance (adjusted R2 = 0.124 vs. 0.109) than a model with just word frequency, lexical decision time and log letter frequency.

Model m5.4: lm(formula = funico_imputed ~ logfreq + rt + logletterfreq, data = words.setC)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

1025.303

361.774

0.000

0.018

rt

1

4.778

1.686

0.194

0.000

logletterfreq

1

5608.765

1979.029

0.000

0.093

Residuals

19203

54423.210

Model m5.5: lm(formula = funico_imputed ~ logfreq + rt + logletterfreq + , cumulative, data = words.setC)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

1025.303

368.110

0.00

0.019

rt

1

4.778

1.716

0.19

0.000

logletterfreq

1

5608.765

2013.687

0.00

0.095

cumulative

1

939.486

337.299

0.00

0.017

Residuals

19202

53483.724

model comparison

Res.Df

RSS

Df

Sum of Sq

F

Pr(>F)

19203

54423.21

19202

53483.72

1

939.4858

337.299

0

End

Thanks for your interest. Also see the separate code notebook with�supplementary analyses.

If you find this useful, consider checking out the following resources that have been helpful in preparing this Rmarkdown document:

And of course have a look at the paper itself — latest preprint here: Playful iconicity