Skip to content

Latest commit

 

History

History
7857 lines (4022 loc) · 66.9 KB

playful_iconicity_supplements.md

File metadata and controls

7857 lines (4022 loc) · 66.9 KB

Playful iconicity: supplementary analyses

Mark Dingemanse & Bill Thompson (this version: 2019-11-28)

Part of supporting materials for the paper Playful iconicity: Structural markedness underlies the relation between funniness and iconicity. Here we report additional analyses that provide more details than we have room for in the paper. The main analyses, figures, and data are in a separate code notebook.

Imputed funniness and iconicity

In the paper, we test the imputation method by seeing whether the funniness ~ iconicity relation is upheld in imputed iconicity ratings. This is a good test case because we have a sizable test set (3.577) and there is an objective definition of iconicity (resemblance between aspects of form and aspects of meaning). Indeed we find that words with high imputed iconicity are clearly imitative and cite some evidence from OED definitions (though we don’t do this in a systematic way).

It is also reasonable to test the imputation method the other way around. Does the relation between human iconicity ratings and imputed funniness ratings make any sense? There are 1.526 words for which we have human iconicity ratings but not funniness ratings. Since this is a much smaller set and there is no objective ways to judge the funniness of words we don’t report this comparison in the paper, but it comes out just as expected.

We construct a linear model predicting imputed funniness based on frequency and rt, and compare that with a model that includes human iconicity ratings see how much this improves our predictions.

Compared to model mS2.1, which predicts fun_imputed with just log frequency and lexical decision time, model mS2.2 including iconicity as predictor provides a significantly better fit (F = 125.88, p < 0.0001) and explains a larger portion of the variance (adjusted R2 = 0.32 vs. 0.24).

Model mS2.1: lm(formula = fun_imputed ~ logfreq + rt, data = words.setD)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

25.869

291.852

0

0.225

rt

1

2.174

24.522

0

0.024

Residuals

1006

89.170

Model m2.2: lm(formula = fun_imputed ~ logfreq + rt + ico, data = words.setD)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

25.869

328.080

0

0.246

rt

1

2.174

27.566

0

0.027

ico

1

9.926

125.878

0

0.111

Residuals

1005

79.244

model comparison

Res.Df

RSS

Df

Sum of Sq

F

Pr(>F)

1006

89.16985

1005

79.24433

1

9.92552

125.8784

0

A partial correlations analysis shows that there is 32% of covariance between iconicity ratings and imputed funniness that is not explained by word frequency (r = 0.32, p < 0.0001). In other words, human iconicity ratings are a strong predictor of imputed funniness.

imputed funniness and iconicity controlling for word frequency

estimate

p.value

statistic

n

gp

Method

0.3230455

0

13.3213

1526

1

pearson

Example words

High imputed funniness and high iconicity: gurgle, mushy, screech, icky, goopy, hiss, quack, cooing, chirping, squishy, mini, crinkle, sizzle, slosh, slurp, purring, splat, crinkly, buzz, scoot

Low imputed funniness and low iconicity: synagogue, bequeath, require, choose, repent, condition, ambulance, polio, injury, attorney, oppose, resign, denial, motionless

High funniness and low iconicity: buttock, knave, cockatoo, bib, yam, donut, zucchini, honeyed, dewy, emu, budgie, buttery, holey, vagina, leotards, parakeet, kitten, burl, downy, slang

Low imputed funniness and high iconicity: explosion, crushed, no, stinging, breathe, harsh, sting, huge, fibrous

Analysable morphology bias in iconicity ratings

An inspection of the top few hundred words reveals many clearly iconic words, but also a number of transparently compositional words like sunshine, seaweed, downpour, dishwasher, corkscrew, bedroom. Looking at top rated iconic nouns with >1 morphemes is a good way of finding many of these.

# 200 most iconic words for visual inspection
words %>%
  drop_na(ico) %>%
  filter(ico_perc > 8) %>%
  arrange(-ico) %>%
  dplyr::select(word) %>%
  slice(1:200) %>% unlist() %>% unname()

# top rated iconic nouns with >1 morphemes is a good way of getting at many of these
words %>%
  drop_na(ico) %>%
  filter(ico_perc > 8,
         nmorph > 1,
         POS == "Noun") %>%
  arrange(-ico) %>%
  dplyr::select(word) %>%
  slice(1:200) %>% unlist() %>% unname()

These analysable compound nouns are treated by naïve raters as “sounding like what they mean” and therefore given high iconicity ratings, leading to rating artefacts. We can use data on number of morphemes from the English lexicon project (Balota et al. 2007) to filter out such words and look at monomorphemic words only.

The plots and partial correlations below show that the basic patterns emerge somewhat clearer in monomorphemic words, as expected. All findings remain the same.

There are 1278 monomorphemic words in set A (out of a total of 1419).

mean iconicity by number of morphemes

nmorph

n

mean.ico

1

1278

0.8546147

2

137

1.0236474

3

3

1.4055556

1

1.0000000

highest 7 iconic words per number of morphemes (1-3)

word

ico

fun

nmorph

click

4.4615385

2.135135

1

beep

4.3571429

2.615385

1

squeak

4.2307692

3.230769

1

chirp

4.1428571

3.000000

1

stomp

4.1000000

2.421053

1

pop

4.0769231

3.294118

1

bleep

3.9285714

2.931818

1

zigzag

4.3000000

3.113636

2

buzzer

4.0909091

2.833333

2

skateboard

3.6000000

2.208333

2

sunshine

3.0909091

2.064516

2

zipper

2.9230769

2.516129

2

freezer

2.9166667

2.281250

2

bubbly

2.8181818

3.352941

2

fireworks

1.9000000

2.294118

3

pliers

1.9000000

2.352941

3

influence

0.4166667

1.914286

3

Partial correlations between funniness and iconicity, controlling for frequency, in monomorphemic words

estimate

p.value

statistic

n

gp

Method

0.2158506

0

7.893486

1278

1

pearson

There are 2176 monomorphemic words in set B (61% of 3577).

mean iconicity by number of morphemes in set B

nmorph

n

mean.ico

#

14

0.8584171

1

2176

0.6878947

2

1321

0.5808049

3

42

0.4412872

24

0.2862270

Partial correlations between funniness and imputed iconicity, controlling for frequency, in monomorphemic words

estimate

p.value

statistic

n

gp

Method

0.3278004

0

16.17424

2176

1

pearson

There are only 5168 monomorphemic words in set C (out of 41548 words for which we have data on number of morphemes).

mean iconicity by number of morphemes in set C

nmorph

n

mean.ico

#

1320

0.4958385

1

5168

0.5410642

2

20456

0.6485362

3

11575

0.4194742

4

2689

0.3195566

5

329

0.2877888

6

11

0.3718408

22132

0.4706343

Partial correlations between imputed funniness and imputed iconicity, controlling for frequency, in monomorphemic words

estimate

p.value

statistic

n

gp

Method

0.4370105

0

34.91781

5168

1

pearson

Imputing ratings based on monomorphemic words only

Given what we know about the bias in iconicity ratings it may make sense to base imputation only on monomorphemic words and see how this affects the results. It should lead to less analysable compounds showing up high in the imputed iconicity ratings of set B and set C.

Model comparison shows that a model with imputed monomorphemic iconicity has a significantly better fit (F 227.5, p < 0.0001) and explains a larger amount of variance (R2 = 0.139 vs 0.084) than a model with just frequency and RT. However, the original model with imputed iconicity based on all words explains still more of the variance (R2 = 0.187).

Partial correlations show 23% covariance in set B (n = 3036) between funniness and imputed iconicity based on monomorphemic words, controlling for word frequency.

Partial correlations between funniness and imputed monomorphemic iconicity, controlling for frequency

estimate

p.value

statistic

n

gp

Method

0.2292556

0

12.97119

3036

1

pearson

Example words

High imputed funniness and high imputed monomorphemic iconicity: whack, burp, smack, fizz, chug, dud, wallop, beatnik, oddball, swish, snooze, bop, loony, squirm, chuckle, poof, bebop, getup, spunk, shindig

Low funniness and low imputed monomorphemic iconicity: housework, town, divorce, purchase, plaintiff, spacing, mean, prayer, hunting, arson, conscience, theft, shipping, visa, amends, bible, thyroid, concourse, union, wheelchair

High funniness and low imputed monomorphemic iconicity: rump, dodo, toga, scrotum, muskrat, satyr, sphincter, gourd, kebab, cheesecake, swank, girth, ducky, pubes, gad, rectum, sphinx, trump, harlot, lapdog

Low funniness and high imputed monomorphemic iconicity: doom, scrape, feedback, shudder, choke, replay, transient, shrapnel, fright, dental, thaw, lockup, tech, brow, cue, bloodbath, post, blend, decay, lair

Set C In set C we see the same: regressions are not much improved by using imputed scores based on monomorphemic words only.

Since the monomorphemic ratings were introduced specifically to check whether we can address the analysable compound bias in iconicity ratings, we use the original imputed funniness ratings, although we also have imputed funniness ratings based on monomorphemic words (fun_imputed_monomorph).

Model comparison shows that the imputed iconicity ratings based on monomorphemic words are pretty good, explaining more variance (R2 = 0.14 versus 0.06) than a model without iconicity. However, a model based on the original imputed ratings does much better (R2 = 0.24), so this is not giving us more power to detect the relation between funniness and iconicity ratings.

Example words

High imputed funniness and high imputed monomorphemic iconicity: tiddly, whir, sleaze, wibble, phat, whoo, whoosh, lah, rah, wah, buzzy, pung, popsy, plonk, phooey, thwack, whirr, chit, oozy, talky

Low imputed funniness and low imputed monomorphemic iconicity: upbringing, finalizing, surpassed, silva, p, received, suffrage, excused, undersigned, abase, disobedience, absences, biography, guilty, basin, sacredness, records, designating, scriptural, justifies

High imputed funniness and low imputed monomorphemic iconicity: copula, bratwurst, pisser, grum, ferme, prat, twitty, shags, wadi, gleba, lovebird, heifers, putz, chickweed, bungo, froufrou, burg, ramus, porgy, wiener

Low imputed funniness and high imputed monomorphemic iconicity: req, notify, engulf, concussive, desc, tox, undergoes, unbind, afb, hts, filmic, unrelentingly, undergo, ld, awl, excruciate, reeducation, adrenalin, storyboard, downpours

How about compounds?

In the new imputed ratings based on monomorphemic words, is it still easy to find analysable compound nouns rated as highly iconic? Yes, it is… oddball, cleanup, dustpan, killjoy, shakedown, showbizz, feedback, etc.

Visualisastions of iconicity ratings by number of morphemes are hard to interpret. The distribution of the ratings is somewhat different (a more squat distribution in the ratings based on monomorphemic words), but it is not obvious that there are large differences in the relative preponderance of monomorphemic versus multimorphemic words in the top percentiles of iconicity ratings.

## # A tibble: 1 x 1
##       n
##   <int>
## 1   265

Set B, top 20% of words by imputed iconicity based on all words

nmorph

n

1

520

2

210

3

4

Set B, top 20% of words by imputed iconicity based on monomorphemic words

nmorph

n

1

417

2

224

3

3

Set C, top 20% of words by imputed iconicity based on all words

nmorph

n

1

1083

2

5174

3

1408

Set C, top 20% of words by imputed iconicity based on monomorphemic words

nmorph

n

1

1157

2

4572

3

1759

In sum, while basing imputed iconicity ratings on monomorphemic words with human ratings gives reasonable results, it does not seem to result in a marked improvement of the imputed ratings, though further analysis is needed.

Markedness patterns in words with imputed ratings

While the primary focus of analysis 4 was on set A (the core set of human ratings), it’s interesting to see how well the structural cues fare in explaining independently imputed iconicity ratings in the larger datasets.

Mean imputed scores by levels of cumulative markedness

cumulative

n

ico_imputed

fun_imputed

0

59843

0.4908895

2.377589

1

7301

0.7852391

2.450599

2

113

1.2294607

2.646994

Cumulative markedness for <10 deciles of imputed iconicity

n

ico_imputed

fun_imputed

cumulative

60940

0.3901764

2.353881

0.0985724

imputed iconicity for 20 random words of high phonological complexity

word

ico_imputed_perc

ico_imputed

cumulative

clomp

10

2.7573962

2

blurt

10

2.2853380

2

squirt

10

2.1139378

2

spunk

10

2.0987844

2

dribble

10

2.0983419

2

trunch

10

1.9866388

2

flinch

10

1.8646337

2

sluggish

10

1.5854586

2

cronk

10

1.4004689

2

primp

8

0.9036671

2

blueish

8

0.8951717

2

crawfish

8

0.8564163

2

swinish

8

0.8504212

2

snowbank

7

0.7425518

2

blondish

5

0.4183398

2

blandish

5

0.4082991

2

trench

4

0.2827599

2

flank

4

0.2230756

2

crayfish

3

0.1607801

2

prudish

3

0.1531699

2

Cumulative markedness scores per iconicity decile in Set B

ico_imputed_perc

n

ico

fun

onset

coda

verbdim

cumulative

1

182

-0.4528116

2.220783

0.0714286

0.0164835

0.0000000

0.0879121

2

249

-0.0841993

2.268928

0.0843373

0.0160643

0.0000000

0.1004016

3

247

0.1030573

2.318616

0.1052632

0.0202429

0.0000000

0.1255061

4

299

0.2579817

2.317502

0.1270903

0.0200669

0.0033445

0.1505017

5

290

0.4042797

2.349267

0.1068966

0.0172414

0.0068966

0.1310345

6

323

0.5487701

2.377754

0.1207430

0.0309598

0.0030960

0.1547988

7

333

0.7084445

2.403432

0.1141141

0.0180180

0.0000000

0.1321321

8

374

0.9002872

2.487929

0.1470588

0.0454545

0.0000000

0.1925134

9

370

1.1681607

2.528468

0.1594595

0.0297297

0.0081081

0.1972973

10

369

1.7764394

2.705826

0.2276423

0.0921409

0.0271003

0.3468835

Cumulative markedness scores per iconicity decile in Set C

ico_imputed_perc

n

ico

fun

onset

coda

verbdim

cumulative

1

6643

-0.4518873

2.245994

0.0575041

0.0058708

0.0003011

0.0636760

2

6540

-0.0871170

2.271298

0.0677370

0.0053517

0.0004587

0.0735474

3

6507

0.1024110

2.291713

0.0705394

0.0075304

0.0003074

0.0783771

4

6402

0.2590349

2.307478

0.0670103

0.0078101

0.0010934

0.0759138

5

6345

0.4032373

2.334882

0.0780142

0.0077226

0.0011032

0.0868400

6

6297

0.5495897

2.357597

0.0865492

0.0079403

0.0007940

0.0952835

7

6208

0.7108025

2.397076

0.0979381

0.0106314

0.0011276

0.1096972

8

6188

0.9045974

2.449854

0.1115061

0.0119586

0.0021008

0.1255656

9

6056

1.1732741

2.521398

0.1370542

0.0143659

0.0034676

0.1548877

10

5778

1.8190651

2.692276

0.2057805

0.0188647

0.0074420

0.2320872

Markedness for iconicity vs funniness ratings

Cumulative markedness is particularly good for predicting iconicity, rivalling funniness, word frequency and log letter frequency as a predictor of iconicity rating (model mS.1). It is less good for predicting funniness ratings, which are (as we know) also influenced by semantic and collocational factors (model mS.2).

Model mS.1: lm(formula = ico ~ logfreq + rt + fun + logletterfreq + cumulative, , data = words.setA)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

58.495

55.422

0.000

0.038

rt

1

0.054

0.051

0.822

0.000

fun

1

72.397

68.594

0.000

0.046

logletterfreq

1

44.700

42.351

0.000

0.029

cumulative

1

73.125

69.284

0.000

0.047

Residuals

1413

1491.344

Model mS.2: lm(formula = fun ~ logfreq + rt + logletterfreq + ico * cumulative, , data = words.setA)

predictor

df

SS

(F)

(p)

partial (\eta^2)

logfreq

1

36.143

266.115

0.000

0.159

rt

1

1.249

9.195

0.002

0.006

logletterfreq

1

7.653

56.346

0.000

0.038

ico

1

6.144

45.241

0.000

0.031

cumulative

1

0.092

0.676

0.411

0.000

ico:cumulative

1

0.858

6.315

0.012

0.004

Residuals

1412

191.773

Phonotactic measures from IPHOD

A quick look at a range of IPhOD measures shows that none of them correlates as strongly with iconicity or funniness as logletterfreq, so they don’t offer us much additional explanatory power.

N.B. IPhOD contains homographs, but frequencies are given only at the level of orthographic forms. To avoid duplication of data we keep only the first of multiple homographs in IPhOD, accepting some loss of precision about possible pronunciations. We use IPhOD’s phonotactic probability and phonological density measures. Since we have no stress-related hypotheses we work with unstressed calculations. We work with values unweighted for frequency because we include frequency as a fixed effect in later analyses.

Valence helps explain high-iconicity low-funniness words

Valence is one reason for some iconic words not being rated as funny. Words like ‘crash’, ‘dread’, ‘scratch’ and ‘shoot’ (all in the lowest percentiles of valence) may be highly iconic but they have no positive or humorous connotation. In general, valence is of course already known to be related to funniness ratings: negative words are unlikely to be rated as highly funny.

Valence percentiles for words rated as iconic but not funny

word

ico

fun

ico_perc

fun_perc

valence_perc

crash

3.769231

1.731707

10

1

1

scratch

3.285714

1.800000

10

1

5

low

2.916667

1.575758

10

1

3

shoot

2.600000

1.838710

10

1

2

dread

2.545454

1.583333

10

1

1

pulse

2.416667

1.923077

9

1

9

slum

2.400000

1.696970

9

1

1

stab

2.285714

1.666667

9

1

1

killer

2.090909

1.466667

9

1

1

carnage

2.090909

1.885714

9

1

2

sick

2.000000

1.846154

9

1

1

torment

2.000000

1.310345

9

1

1

prompt

2.000000

1.914286

9

1

9

stick

1.928571

1.769231

9

1

6

small

1.923077

1.769231

9

1

7

gloom

1.916667

1.888889

9

1

1

corpse

1.900000

1.878788

9

1

1

victim

1.846154

1.571429

9

1

1

Age of acquisition

Simon Kirby asked on Twitter whether the relation between funniness and iconicity might have something to do with child-directedness. This is hard to test directly (and unlikely to apply across the board) but if this were the case presumably it would also be reflected in AoA ratings — e.g., the more funny and iconic words would have relatively lower AoA ratings. (Importantly: we already know from Perry et al. 2017 that AoA is negatively correlated with iconicity: words rated higher in iconicity have a somewhat lower age of acquisition.)

We have AoA data for all 1.419 words in set A. It doesn’t really explain the iconicity + funniness relation. That is, words high in both iconicity and funniness are not strikingly low in AoA.

Though an important caveat is that this particular small subset may not be the best data to judge this on.

AoA ratings for every decile of combined iconicity and funniness

diff_rank

n

mean.aoa

2

14

6.714286

3

39

7.150513

4

66

6.632273

5

71

6.578169

6

98

6.425612

7

104

6.498365

8

113

6.420443

9

122

6.417049

10

112

6.270446

11

124

6.340081

12

102

5.975392

13

88

6.211932

14

84

6.348333

15

62

6.193387

16

48

6.368542

17

48

6.667917

18

44

6.930454

19

40

7.022500

20

40

7.146500

The sign of simple (uncorrected) correlations is positive for funniness (r = 0.1), but negative for iconicity (r = -0.07), so if anything there is not a unitary effect here (and the two cancel each other out).

cor.test(words$fun,words$aoa)
cor.test(words$ico,words$aoa)

cor.test(words$diff_rank,words$aoa)


# doesn't look very different in the ico_imputed ratings in set B

words %>%
  drop_na(aoa) %>%
  filter(set=="B") %>%
  group_by(diff_rank_setB) %>%
  summarise(n=n(),mean.ico=mean.na(ico_imputed),mean.aoa=mean.na(aoa)) %>%
  kable(caption="AoA ratings for every decile of imputed iconicity and funniness in set B")

AoA ratings for every decile of imputed iconicity and funniness in set C

diff_rank_setC

n

mean.ico

mean.aoa

2

541

-0.4430372

12.207763

3

820

-0.2533501

12.026902

4

1103

-0.1043019

11.916999

5

1342

0.0005344

12.027414

6

1470

0.0755901

11.939408

7

1724

0.1730946

11.833515

8

1658

0.2596555

11.817979

9

1803

0.3375967

11.925130

10

1831

0.4183328

11.685560

11

1714

0.5205835

11.647083

12

1576

0.5927657

11.566002

13

1445

0.6779066

11.528595

14

1258

0.7798878

11.503458

15

1109

0.8541895

11.429675

16

988

0.9600370

11.164443

17

870

1.0548924

11.102793

18

750

1.2269124

10.907840

19

694

1.3898187

10.604366

20

712

1.8827607

9.935927

Same for funniness

fun_imputed_perc

n

mean.fun

mean.aoa

1

1171

1.812639

11.31959

2

1170

1.957586

11.51386

3

1171

2.025905

11.45502

4

1170

2.077910

11.52602

5

1170

2.121456

11.58050

6

1171

2.161224

11.54835

7

1170

2.200252

11.56376

8

1171

2.236485

11.59654

9

1170

2.270268

11.65503

10

1170

2.303327

11.77170

11

1171

2.338253

11.66440

12

1170

2.375653

11.79544

13

1171

2.416009

11.83196

14

1170

2.458268

11.77729

15

1170

2.505473

11.88938

16

1171

2.560082

11.86482

17

1170

2.625283

11.69788

18

1171

2.711887

11.60738

19

1170

2.833098

11.57900

20

1170

3.091464

10.73097

Word classes

Reviewer 1 asked us to look into word classes. We report this here as an exploratory analysis. The correlation between funniness and iconicity ratings has the same sign across word classes. The somewhat steeper correlation in verbs (n = 241) can be attributed in part to the verbal diminutive suffix -le (n = 17).

Mean iconicity and funniness in set A across word classes

POS

n

mean.ico

mean.fun

raw.correlation

Adjective

109

0.9662906

2.270046

0.1839577

Noun

1049

0.7212491

2.367076

0.2059030

Verb

241

1.4846836

2.366951

0.5255179