Playful iconicity: supplementary analyses

Mark Dingemanse & Bill Thompson (this version: 2019-11-28)

Imputed funniness and iconicity
Analysable morphology bias in iconicity ratings
Imputing ratings based on monomorphemic words only
Markedness patterns in words with imputed ratings
Markedness for iconicity vs funniness ratings
Phonotactic measures from IPHOD
Valence helps explain high-iconicity low-funniness words
Age of acquisition
Word classes

Part of supporting materials for the paper Playful iconicity: Structural markedness underlies the relation between funniness and iconicity. Here we report additional analyses that provide more details than we have room for in the paper. The main analyses, figures, and data are in a separate code notebook.

Imputed funniness and iconicity

In the paper, we test the imputation method by seeing whether the funniness ~ iconicity relation is upheld in imputed iconicity ratings. This is a good test case because we have a sizable test set (3.577) and there is an objective definition of iconicity (resemblance between aspects of form and aspects of meaning). Indeed we find that words with high imputed iconicity are clearly imitative and cite some evidence from OED definitions (though we don’t do this in a systematic way).

It is also reasonable to test the imputation method the other way around. Does the relation between human iconicity ratings and imputed funniness ratings make any sense? There are 1.526 words for which we have human iconicity ratings but not funniness ratings. Since this is a much smaller set and there is no objective ways to judge the funniness of words we don’t report this comparison in the paper, but it comes out just as expected.

We construct a linear model predicting imputed funniness based on frequency and rt, and compare that with a model that includes human iconicity ratings see how much this improves our predictions.

Compared to model mS2.1, which predicts fun_imputed with just log frequency and lexical decision time, model mS2.2 including iconicity as predictor provides a significantly better fit (F = 125.88, p < 0.0001) and explains a larger portion of the variance (adjusted R2 = 0.32 vs. 0.24).

Model mS2.1: lm(formula = fun_imputed ~ logfreq + rt, data = words.setD)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	25.869	291.852	0	0.225
rt	1	2.174	24.522	0	0.024
Residuals	1006	89.170

Model m2.2: lm(formula = fun_imputed ~ logfreq + rt + ico, data = words.setD)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	25.869	328.080	0	0.246
rt	1	2.174	27.566	0	0.027
ico	1	9.926	125.878	0	0.111
Residuals	1005	79.244

model comparison

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
1006	89.16985
1005	79.24433	1	9.92552	125.8784	0

A partial correlations analysis shows that there is 32% of covariance between iconicity ratings and imputed funniness that is not explained by word frequency (r = 0.32, p < 0.0001). In other words, human iconicity ratings are a strong predictor of imputed funniness.

imputed funniness and iconicity controlling for word frequency

estimate	p.value	statistic	n	gp	Method
0.3230455	0	13.3213	1526	1	pearson

Example words

High imputed funniness and high iconicity: gurgle, mushy, screech, icky, goopy, hiss, quack, cooing, chirping, squishy, mini, crinkle, sizzle, slosh, slurp, purring, splat, crinkly, buzz, scoot

Low imputed funniness and low iconicity: synagogue, bequeath, require, choose, repent, condition, ambulance, polio, injury, attorney, oppose, resign, denial, motionless

High funniness and low iconicity: buttock, knave, cockatoo, bib, yam, donut, zucchini, honeyed, dewy, emu, budgie, buttery, holey, vagina, leotards, parakeet, kitten, burl, downy, slang

Low imputed funniness and high iconicity: explosion, crushed, no, stinging, breathe, harsh, sting, huge, fibrous

Analysable morphology bias in iconicity ratings

An inspection of the top few hundred words reveals many clearly iconic words, but also a number of transparently compositional words like sunshine, seaweed, downpour, dishwasher, corkscrew, bedroom. Looking at top rated iconic nouns with >1 morphemes is a good way of finding many of these.

# 200 most iconic words for visual inspection
words %>%
  drop_na(ico) %>%
  filter(ico_perc > 8) %>%
  arrange(-ico) %>%
  dplyr::select(word) %>%
  slice(1:200) %>% unlist() %>% unname()

# top rated iconic nouns with >1 morphemes is a good way of getting at many of these
words %>%
  drop_na(ico) %>%
  filter(ico_perc > 8,
         nmorph > 1,
         POS == "Noun") %>%
  arrange(-ico) %>%
  dplyr::select(word) %>%
  slice(1:200) %>% unlist() %>% unname()

These analysable compound nouns are treated by naïve raters as “sounding like what they mean” and therefore given high iconicity ratings, leading to rating artefacts. We can use data on number of morphemes from the English lexicon project (Balota et al. 2007) to filter out such words and look at monomorphemic words only.

The plots and partial correlations below show that the basic patterns emerge somewhat clearer in monomorphemic words, as expected. All findings remain the same.

There are 1278 monomorphemic words in set A (out of a total of 1419).

mean iconicity by number of morphemes

nmorph	n	mean.ico
1	1278	0.8546147
2	137	1.0236474
3	3	1.4055556
	1	1.0000000

highest 7 iconic words per number of morphemes (1-3)

word	ico	fun	nmorph
click	4.4615385	2.135135	1
beep	4.3571429	2.615385	1
squeak	4.2307692	3.230769	1
chirp	4.1428571	3.000000	1
stomp	4.1000000	2.421053	1
pop	4.0769231	3.294118	1
bleep	3.9285714	2.931818	1
zigzag	4.3000000	3.113636	2
buzzer	4.0909091	2.833333	2
skateboard	3.6000000	2.208333	2
sunshine	3.0909091	2.064516	2
zipper	2.9230769	2.516129	2
freezer	2.9166667	2.281250	2
bubbly	2.8181818	3.352941	2
fireworks	1.9000000	2.294118	3
pliers	1.9000000	2.352941	3
influence	0.4166667	1.914286	3

Partial correlations between funniness and iconicity, controlling for frequency, in monomorphemic words

estimate	p.value	statistic	n	gp	Method
0.2158506	0	7.893486	1278	1	pearson

There are 2176 monomorphemic words in set B (61% of 3577).

mean iconicity by number of morphemes in set B

nmorph	n	mean.ico
#	14	0.8584171
1	2176	0.6878947
2	1321	0.5808049
3	42	0.4412872
	24	0.2862270

Partial correlations between funniness and imputed iconicity, controlling for frequency, in monomorphemic words

estimate	p.value	statistic	n	gp	Method
0.3278004	0	16.17424	2176	1	pearson

There are only 5168 monomorphemic words in set C (out of 41548 words for which we have data on number of morphemes).

mean iconicity by number of morphemes in set C

nmorph	n	mean.ico
#	1320	0.4958385
1	5168	0.5410642
2	20456	0.6485362
3	11575	0.4194742
4	2689	0.3195566
5	329	0.2877888
6	11	0.3718408
	22132	0.4706343

Partial correlations between imputed funniness and imputed iconicity, controlling for frequency, in monomorphemic words

estimate	p.value	statistic	n	gp	Method
0.4370105	0	34.91781	5168	1	pearson

Imputing ratings based on monomorphemic words only

Given what we know about the bias in iconicity ratings it may make sense to base imputation only on monomorphemic words and see how this affects the results. It should lead to less analysable compounds showing up high in the imputed iconicity ratings of set B and set C.

Model comparison shows that a model with imputed monomorphemic iconicity has a significantly better fit (F 227.5, p < 0.0001) and explains a larger amount of variance (R² = 0.139 vs 0.084) than a model with just frequency and RT. However, the original model with imputed iconicity based on all words explains still more of the variance (R² = 0.187).

Partial correlations show 23% covariance in set B (n = 3036) between funniness and imputed iconicity based on monomorphemic words, controlling for word frequency.

Partial correlations between funniness and imputed monomorphemic iconicity, controlling for frequency

estimate	p.value	statistic	n	gp	Method
0.2292556	0	12.97119	3036	1	pearson

Example words

High imputed funniness and high imputed monomorphemic iconicity: whack, burp, smack, fizz, chug, dud, wallop, beatnik, oddball, swish, snooze, bop, loony, squirm, chuckle, poof, bebop, getup, spunk, shindig

Low funniness and low imputed monomorphemic iconicity: housework, town, divorce, purchase, plaintiff, spacing, mean, prayer, hunting, arson, conscience, theft, shipping, visa, amends, bible, thyroid, concourse, union, wheelchair

High funniness and low imputed monomorphemic iconicity: rump, dodo, toga, scrotum, muskrat, satyr, sphincter, gourd, kebab, cheesecake, swank, girth, ducky, pubes, gad, rectum, sphinx, trump, harlot, lapdog

Low funniness and high imputed monomorphemic iconicity: doom, scrape, feedback, shudder, choke, replay, transient, shrapnel, fright, dental, thaw, lockup, tech, brow, cue, bloodbath, post, blend, decay, lair

Set C In set C we see the same: regressions are not much improved by using imputed scores based on monomorphemic words only.

Since the monomorphemic ratings were introduced specifically to check whether we can address the analysable compound bias in iconicity ratings, we use the original imputed funniness ratings, although we also have imputed funniness ratings based on monomorphemic words (fun_imputed_monomorph).

Model comparison shows that the imputed iconicity ratings based on monomorphemic words are pretty good, explaining more variance (R² = 0.14 versus 0.06) than a model without iconicity. However, a model based on the original imputed ratings does much better (R² = 0.24), so this is not giving us more power to detect the relation between funniness and iconicity ratings.

Example words

High imputed funniness and high imputed monomorphemic iconicity: tiddly, whir, sleaze, wibble, phat, whoo, whoosh, lah, rah, wah, buzzy, pung, popsy, plonk, phooey, thwack, whirr, chit, oozy, talky

Low imputed funniness and low imputed monomorphemic iconicity: upbringing, finalizing, surpassed, silva, p, received, suffrage, excused, undersigned, abase, disobedience, absences, biography, guilty, basin, sacredness, records, designating, scriptural, justifies

High imputed funniness and low imputed monomorphemic iconicity: copula, bratwurst, pisser, grum, ferme, prat, twitty, shags, wadi, gleba, lovebird, heifers, putz, chickweed, bungo, froufrou, burg, ramus, porgy, wiener

Low imputed funniness and high imputed monomorphemic iconicity: req, notify, engulf, concussive, desc, tox, undergoes, unbind, afb, hts, filmic, unrelentingly, undergo, ld, awl, excruciate, reeducation, adrenalin, storyboard, downpours

How about compounds?

In the new imputed ratings based on monomorphemic words, is it still easy to find analysable compound nouns rated as highly iconic? Yes, it is… oddball, cleanup, dustpan, killjoy, shakedown, showbizz, feedback, etc.

Visualisastions of iconicity ratings by number of morphemes are hard to interpret. The distribution of the ratings is somewhat different (a more squat distribution in the ratings based on monomorphemic words), but it is not obvious that there are large differences in the relative preponderance of monomorphemic versus multimorphemic words in the top percentiles of iconicity ratings.

## # A tibble: 1 x 1
##       n
##   <int>
## 1   265

Set B, top 20% of words by imputed iconicity based on all words

nmorph	n
1	520
2	210
3	4

Set B, top 20% of words by imputed iconicity based on monomorphemic words

nmorph	n
1	417
2	224
3	3

Set C, top 20% of words by imputed iconicity based on all words

nmorph	n
1	1083
2	5174
3	1408

Set C, top 20% of words by imputed iconicity based on monomorphemic words

nmorph	n
1	1157
2	4572
3	1759

In sum, while basing imputed iconicity ratings on monomorphemic words with human ratings gives reasonable results, it does not seem to result in a marked improvement of the imputed ratings, though further analysis is needed.

Markedness patterns in words with imputed ratings

While the primary focus of analysis 4 was on set A (the core set of human ratings), it’s interesting to see how well the structural cues fare in explaining independently imputed iconicity ratings in the larger datasets.

Mean imputed scores by levels of cumulative markedness

cumulative	n	ico_imputed	fun_imputed
0	59843	0.4908895	2.377589
1	7301	0.7852391	2.450599
2	113	1.2294607	2.646994

Cumulative markedness for <10 deciles of imputed iconicity

n	ico_imputed	fun_imputed	cumulative
60940	0.3901764	2.353881	0.0985724

imputed iconicity for 20 random words of high phonological complexity

word	ico_imputed_perc	ico_imputed	cumulative
clomp	10	2.7573962	2
blurt	10	2.2853380	2
squirt	10	2.1139378	2
spunk	10	2.0987844	2
dribble	10	2.0983419	2
trunch	10	1.9866388	2
flinch	10	1.8646337	2
sluggish	10	1.5854586	2
cronk	10	1.4004689	2
primp	8	0.9036671	2
blueish	8	0.8951717	2
crawfish	8	0.8564163	2
swinish	8	0.8504212	2
snowbank	7	0.7425518	2
blondish	5	0.4183398	2
blandish	5	0.4082991	2
trench	4	0.2827599	2
flank	4	0.2230756	2
crayfish	3	0.1607801	2
prudish	3	0.1531699	2

Cumulative markedness scores per iconicity decile in Set B

ico_imputed_perc	n	ico	fun	onset	coda	verbdim	cumulative
1	182	-0.4528116	2.220783	0.0714286	0.0164835	0.0000000	0.0879121
2	249	-0.0841993	2.268928	0.0843373	0.0160643	0.0000000	0.1004016
3	247	0.1030573	2.318616	0.1052632	0.0202429	0.0000000	0.1255061
4	299	0.2579817	2.317502	0.1270903	0.0200669	0.0033445	0.1505017
5	290	0.4042797	2.349267	0.1068966	0.0172414	0.0068966	0.1310345
6	323	0.5487701	2.377754	0.1207430	0.0309598	0.0030960	0.1547988
7	333	0.7084445	2.403432	0.1141141	0.0180180	0.0000000	0.1321321
8	374	0.9002872	2.487929	0.1470588	0.0454545	0.0000000	0.1925134
9	370	1.1681607	2.528468	0.1594595	0.0297297	0.0081081	0.1972973
10	369	1.7764394	2.705826	0.2276423	0.0921409	0.0271003	0.3468835

Cumulative markedness scores per iconicity decile in Set C

ico_imputed_perc	n	ico	fun	onset	coda	verbdim	cumulative
1	6643	-0.4518873	2.245994	0.0575041	0.0058708	0.0003011	0.0636760
2	6540	-0.0871170	2.271298	0.0677370	0.0053517	0.0004587	0.0735474
3	6507	0.1024110	2.291713	0.0705394	0.0075304	0.0003074	0.0783771
4	6402	0.2590349	2.307478	0.0670103	0.0078101	0.0010934	0.0759138
5	6345	0.4032373	2.334882	0.0780142	0.0077226	0.0011032	0.0868400
6	6297	0.5495897	2.357597	0.0865492	0.0079403	0.0007940	0.0952835
7	6208	0.7108025	2.397076	0.0979381	0.0106314	0.0011276	0.1096972
8	6188	0.9045974	2.449854	0.1115061	0.0119586	0.0021008	0.1255656
9	6056	1.1732741	2.521398	0.1370542	0.0143659	0.0034676	0.1548877
10	5778	1.8190651	2.692276	0.2057805	0.0188647	0.0074420	0.2320872

Markedness for iconicity vs funniness ratings

Cumulative markedness is particularly good for predicting iconicity, rivalling funniness, word frequency and log letter frequency as a predictor of iconicity rating (model mS.1). It is less good for predicting funniness ratings, which are (as we know) also influenced by semantic and collocational factors (model mS.2).

Model mS.1: lm(formula = ico ~ logfreq + rt + fun + logletterfreq + cumulative, , data = words.setA)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	58.495	55.422	0.000	0.038
rt	1	0.054	0.051	0.822	0.000
fun	1	72.397	68.594	0.000	0.046
logletterfreq	1	44.700	42.351	0.000	0.029
cumulative	1	73.125	69.284	0.000	0.047
Residuals	1413	1491.344

Model mS.2: lm(formula = fun ~ logfreq + rt + logletterfreq + ico * cumulative, , data = words.setA)

predictor	df	SS	(F)	(p)	partial (\eta^2)
logfreq	1	36.143	266.115	0.000	0.159
rt	1	1.249	9.195	0.002	0.006
logletterfreq	1	7.653	56.346	0.000	0.038
ico	1	6.144	45.241	0.000	0.031
cumulative	1	0.092	0.676	0.411	0.000
ico:cumulative	1	0.858	6.315	0.012	0.004
Residuals	1412	191.773

Phonotactic measures from IPHOD

A quick look at a range of IPhOD measures shows that none of them correlates as strongly with iconicity or funniness as logletterfreq, so they don’t offer us much additional explanatory power.

N.B. IPhOD contains homographs, but frequencies are given only at the level of orthographic forms. To avoid duplication of data we keep only the first of multiple homographs in IPhOD, accepting some loss of precision about possible pronunciations. We use IPhOD’s phonotactic probability and phonological density measures. Since we have no stress-related hypotheses we work with unstressed calculations. We work with values unweighted for frequency because we include frequency as a fixed effect in later analyses.

Valence helps explain high-iconicity low-funniness words

Valence is one reason for some iconic words not being rated as funny. Words like ‘crash’, ‘dread’, ‘scratch’ and ‘shoot’ (all in the lowest percentiles of valence) may be highly iconic but they have no positive or humorous connotation. In general, valence is of course already known to be related to funniness ratings: negative words are unlikely to be rated as highly funny.

Valence percentiles for words rated as iconic but not funny

word	ico	fun	ico_perc	fun_perc	valence_perc
crash	3.769231	1.731707	10	1	1
scratch	3.285714	1.800000	10	1	5
low	2.916667	1.575758	10	1	3
shoot	2.600000	1.838710	10	1	2
dread	2.545454	1.583333	10	1	1
pulse	2.416667	1.923077	9	1	9
slum	2.400000	1.696970	9	1	1
stab	2.285714	1.666667	9	1	1
killer	2.090909	1.466667	9	1	1
carnage	2.090909	1.885714	9	1	2
sick	2.000000	1.846154	9	1	1
torment	2.000000	1.310345	9	1	1
prompt	2.000000	1.914286	9	1	9
stick	1.928571	1.769231	9	1	6
small	1.923077	1.769231	9	1	7
gloom	1.916667	1.888889	9	1	1
corpse	1.900000	1.878788	9	1	1
victim	1.846154	1.571429	9	1	1

Age of acquisition

Simon Kirby asked on Twitter whether the relation between funniness and iconicity might have something to do with child-directedness. This is hard to test directly (and unlikely to apply across the board) but if this were the case presumably it would also be reflected in AoA ratings — e.g., the more funny and iconic words would have relatively lower AoA ratings. (Importantly: we already know from Perry et al. 2017 that AoA is negatively correlated with iconicity: words rated higher in iconicity have a somewhat lower age of acquisition.)

We have AoA data for all 1.419 words in set A. It doesn’t really explain the iconicity + funniness relation. That is, words high in both iconicity and funniness are not strikingly low in AoA.

Though an important caveat is that this particular small subset may not be the best data to judge this on.

AoA ratings for every decile of combined iconicity and funniness

diff_rank	n	mean.aoa
2	14	6.714286
3	39	7.150513
4	66	6.632273
5	71	6.578169
6	98	6.425612
7	104	6.498365
8	113	6.420443
9	122	6.417049
10	112	6.270446
11	124	6.340081
12	102	5.975392
13	88	6.211932
14	84	6.348333
15	62	6.193387
16	48	6.368542
17	48	6.667917
18	44	6.930454
19	40	7.022500
20	40	7.146500

The sign of simple (uncorrected) correlations is positive for funniness (r = 0.1), but negative for iconicity (r = -0.07), so if anything there is not a unitary effect here (and the two cancel each other out).

cor.test(words$fun,words$aoa)
cor.test(words$ico,words$aoa)

cor.test(words$diff_rank,words$aoa)


# doesn't look very different in the ico_imputed ratings in set B

words %>%
  drop_na(aoa) %>%
  filter(set=="B") %>%
  group_by(diff_rank_setB) %>%
  summarise(n=n(),mean.ico=mean.na(ico_imputed),mean.aoa=mean.na(aoa)) %>%
  kable(caption="AoA ratings for every decile of imputed iconicity and funniness in set B")

AoA ratings for every decile of imputed iconicity and funniness in set C

diff_rank_setC	n	mean.ico	mean.aoa
2	541	-0.4430372	12.207763
3	820	-0.2533501	12.026902
4	1103	-0.1043019	11.916999
5	1342	0.0005344	12.027414
6	1470	0.0755901	11.939408
7	1724	0.1730946	11.833515
8	1658	0.2596555	11.817979
9	1803	0.3375967	11.925130
10	1831	0.4183328	11.685560
11	1714	0.5205835	11.647083
12	1576	0.5927657	11.566002
13	1445	0.6779066	11.528595
14	1258	0.7798878	11.503458
15	1109	0.8541895	11.429675
16	988	0.9600370	11.164443
17	870	1.0548924	11.102793
18	750	1.2269124	10.907840
19	694	1.3898187	10.604366
20	712	1.8827607	9.935927

Same for funniness

fun_imputed_perc	n	mean.fun	mean.aoa
1	1171	1.812639	11.31959
2	1170	1.957586	11.51386
3	1171	2.025905	11.45502
4	1170	2.077910	11.52602
5	1170	2.121456	11.58050
6	1171	2.161224	11.54835
7	1170	2.200252	11.56376
8	1171	2.236485	11.59654
9	1170	2.270268	11.65503
10	1170	2.303327	11.77170
11	1171	2.338253	11.66440
12	1170	2.375653	11.79544
13	1171	2.416009	11.83196
14	1170	2.458268	11.77729
15	1170	2.505473	11.88938
16	1171	2.560082	11.86482
17	1170	2.625283	11.69788
18	1171	2.711887	11.60738
19	1170	2.833098	11.57900
20	1170	3.091464	10.73097

Word classes

Reviewer 1 asked us to look into word classes. We report this here as an exploratory analysis. The correlation between funniness and iconicity ratings has the same sign across word classes. The somewhat steeper correlation in verbs (n = 241) can be attributed in part to the verbal diminutive suffix -le (n = 17).

Mean iconicity and funniness in set A across word classes

POS	n	mean.ico	mean.fun	raw.correlation
Adjective	109	0.9662906	2.270046	0.1839577
Noun	1049	0.7212491	2.367076	0.2059030
Verb	241	1.4846836	2.366951	0.5255179

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

playful_iconicity_supplements.md

playful_iconicity_supplements.md

Playful iconicity: supplementary analyses

Imputed funniness and iconicity

Analysable morphology bias in iconicity ratings

Imputing ratings based on monomorphemic words only

Markedness patterns in words with imputed ratings

Markedness for iconicity vs funniness ratings

Phonotactic measures from IPHOD

Valence helps explain high-iconicity low-funniness words

Age of acquisition

Word classes

Files

playful_iconicity_supplements.md

Latest commit

History

playful_iconicity_supplements.md

File metadata and controls

Playful iconicity: supplementary analyses

Imputed funniness and iconicity

Analysable morphology bias in iconicity ratings

Imputing ratings based on monomorphemic words only

Markedness patterns in words with imputed ratings

Markedness for iconicity vs funniness ratings

Phonotactic measures from IPHOD

Valence helps explain high-iconicity low-funniness words

Age of acquisition

Word classes