# Diction

The term 'diction' generally refers to the stylistic choices that are made by an author while writing a text. A study of the diction of an author may concentrate, among other things, on the words that are chosen. In stylometric research, it can be interesting to study the words that are characteristic of a given author, and to examine how the words that are chosen by one author differ from the words chosen by other authors. 


## Defing corpora

This notebooks explains how you can corpare the words in two different corpora, and how you can identify the words that are distinctive within each of these corpora.

The code below defines two lists named `corpus1` and `corpus2`. The files that are mentioned are both stored in a folder named `Corpus`. `corpus1` contains two early 20th century novels, and these will be compared to the two early 19th century novels in `corpus2`. 


In [1]:
dir = 'Corpus'

corpus1 = [ 'ARoomWithAView.txt' , 'SonsandLovers.txt' ]
corpus2 = [ 'Ivanhoe.txt' , 'TreasureIsland.txt' ]

# Calculating frequencies

The code in the cell below reads in the full text of the texts that are listed in `corpus1`. Using the function `findWordsFrequencies`, it finds the frequencies of the tokens in these texts. These frequencies are added to a dictionary named `freq1`, using the method `update()`. 

After this, the code does the same for the texts in `corpus2`. The word frequencies are placed in a dictionary named `freq2`.

After running this code, the variable `full_text1` will contain the *complete* texts of all the texts in `corpus1`. The dictionary named `freq1` will contain the frequencies of all the words in this full text. 

The variables `full_text2` and `freq2` store the same type of information for the texts in `corpus2`.

In [None]:
import tdm 
from os.path import join


freq1 = dict()
full_text1 = ''

for text in corpus1:
    print('Reading ' + text + ' ... ')
    with open( join( dir,text) ) as file_handler:
        full_text1 += file_handler.read() + ' '

words = tdm.word_tokenise( full_text1 )
for w in words:
    freq1[w] = freq1.get(w,0) +1
    
        
        
freq2 = dict()
full_text2 = ''
    
for text in corpus2:
    print('Reading ' + text + ' ... ')
    with open( join( dir,text) ) as file_handler:
        full_text2 += file_handler.read() + ' '

words = tdm.word_tokenise( full_text2 )
for w in words:
    freq2[w] = freq2.get(w,0) +1



##  Dunning's log likelihood

One of statistical methods that can be used to find such distinctive words is *Dunning's log likelihood*. In short, it analyses the distinctiveness of word in one set of texts compared to the texts in a reference corpus, by calculating probabilities based on word frequencies. A good explanation of the fomula can be found on the [wordHoard](https://wordhoard.northwestern.edu/userman/analysis-comparewords.html#loglike) website. 

Using the frequencies that have been calculated above, the Dunning log likelihood scores are calculated for all of the words that occur both in `corpus1` and `corpus2` in the cell below. The actual calculation takes place in a method named `log_likelihood()`. The scores that are calculated are all stored in a dictionary named `ll_scores`

The formula that is implemented in the `log_likelihood` returns a number which can either be positive or negative. A postive score indicates that there is a high probability that the word will be used in the first corpus. A negative probability indicates that occurence of the word is more common in the second corpus. The tokens that are assigned the highest scores, in other words, are also most distincive of the first corpus. 

The code below lists the words that are assigned a positive log likelihood score in the first corpus. 

In [3]:
import tdm
import math

def log_likelihood( word_count1 , word_count2, total1 , total_2 ):

    a = word_count1
    b = word_count2
    c = total1
    d = total2
 
    perc1 = (a/c)*100
    perc2 = (b/d)*100
    polarity = perc1 - perc2
 
    E1 = c*(a+b)/(c+d)
    E2 = d*(a+b)/(c+d)
    
    ln1 = math.log(a/E1)
    ln2 = math.log(b/E2)
    G2 = 2*((a* ln1) + (b* ln2))
    
    #if polarity < 0:
    #    G2 = -G2
    if a * math.log(a / E1) < 0:
        G2 = -G2

    return G2



ll_scores = dict()

total1 = 0
total2 = 0

for word1 in freq1:
    total1 += freq1[word1]
for word2 in freq2:
    total2 += freq2[word2]

for word in freq1:
    if word in freq2:

        ll_score = log_likelihood( freq1[word] , freq2[word] , total1 , total2 )
        ll_scores[word] = ll_score
        
for word in reversed( tdm.sortedByValue(ll_scores) ):
    print( word , ll_scores[word] )
    if ll_scores[word] < 0:
        break        

she 4799.43414900771
her 2750.585629023951
he 1793.109255260812
mrs 888.4658988070781
mother 630.1915898193039
miss 606.0028624440555
was 546.7018419482931
you 530.7363177302946
miriam 514.0392051639936
went 477.99762710233773
mr 445.8871169880334
don't 388.08090591415487
yes 374.6271161352601
him 368.03641939910574
asked 336.30717542638547
did 266.31831842794605
oh 244.42533212027473
felt 240.59440647918996
it's 236.05332745755626
it 233.13871242569383
want 217.643454342447
looked 213.9195875252711
go 213.00950300503212
home 211.81313958194988
then 210.57660697150197
came 208.27955161898728
wanted 202.72016176574436
always 201.07279150227413
sat 187.1110011541112
going 173.72828299309708
girl 168.78720860329489
because 168.28410524297806
things 168.28410524297806
loved 159.98052081556054
why 153.79547706385085
they 151.1584791189709
herself 142.717634405583
very 136.76877061311896
laughed 133.5014626778187
something 132.0272733714723
away 130.8121537835389
up 129.9950976016297
really 

brushed 5.49211037479988
latch 5.49211037479988
roofs 5.49211037479988
earning 5.49211037479988
intensely 5.49211037479988
glare 5.49211037479988
hedges 5.49211037479988
twisting 5.49211037479988
meals 5.49211037479988
apologetically 5.49211037479988
me— 5.49211037479988
shallow 5.49211037479988
discussing 5.49211037479988
studying 5.49211037479988
vague 5.49211037479988
streaming 5.49211037479988
discuss 5.49211037479988
glossy 5.49211037479988
i—i 5.49211037479988
unpleasant 5.49211037479988
properly 5.49211037479988
intolerable 5.49211037479988
cheerful 5.49211037479988
works 5.49211037479988
sensation 5.49211037479988
bath 5.49211037479988
unkind 5.49211037479988
perfect 5.491668687730732
along 5.448409904614451
plenty 5.443985542106413
built 5.443985542106413
meadow 5.437410683687037
wedding 5.437410683687037
dissatisfied 5.437410683687037
boy's 5.437410683687037
erect 5.437410683687037
writing 5.418103814332879
loaf 5.321974679849748
patiently 5.321974679849748
corn 5.32197467984

warming 2.14371291391085
wheels 2.14371291391085
trooping 2.14371291391085
firelight 2.14371291391085
screwed 2.14371291391085
competition 2.14371291391085
arguing 2.14371291391085
expensive 2.14371291391085
ridge 2.14371291391085
wrap 2.14371291391085
sharing 2.14371291391085
cleaning 2.14371291391085
library 2.14371291391085
compact 2.14371291391085
brood 2.14371291391085
dazzled 2.14371291391085
collars 2.14371291391085
plants 2.14371291391085
attain 2.14371291391085
blocked 2.14371291391085
witty 2.14371291391085
secrecy 2.14371291391085
recoil 2.14371291391085
rains 2.14371291391085
cordial 2.14371291391085
tenant 2.14371291391085
contract 2.14371291391085
agents 2.14371291391085
swathed 2.14371291391085
hypocrisy 2.14371291391085
innumerable 2.14371291391085
detected 2.14371291391085
old-fashioned 2.14371291391085
spoiling 2.14371291391085
peeping 2.14371291391085
radiance 2.14371291391085
happier 2.14371291391085
inadequate 2.14371291391085
interpret 2.14371291391085
unwise 2.14

grain 0.827043204900844
stretching 0.827043204900844
midday 0.827043204900844
inflamed 0.827043204900844
earned 0.827043204900844
doctors 0.827043204900844
reluctantly 0.827043204900844
knuckles 0.827043204900844
uplifted 0.827043204900844
wench 0.827043204900844
carries 0.827043204900844
hilltop 0.827043204900844
a'ready 0.827043204900844
hotly 0.827043204900844
deceived 0.827043204900844
swollen 0.827043204900844
nodding 0.827043204900844
discomfort 0.827043204900844
stamped 0.827043204900844
claimed 0.827043204900844
caress 0.827043204900844
stifle 0.827043204900844
accounts 0.827043204900844
acquiesced 0.827043204900844
niece 0.827043204900844
desirable 0.827043204900844
shifting 0.827043204900844
amazement 0.827043204900844
beds 0.827043204900844
grinding 0.827043204900844
inflict 0.827043204900844
muscles 0.827043204900844
sloped 0.827043204900844
continually 0.827043204900844
scorched 0.827043204900844
correct 0.827043204900844
wings 0.827043204900844
immortal 0.827043204900844


admirably 0.413521602450422
narrowness 0.413521602450422
parishioners 0.413521602450422
thanking 0.413521602450422
quoting 0.413521602450422
directing 0.413521602450422
imparted 0.413521602450422
handkerchiefs 0.413521602450422
chestnut 0.413521602450422
series 0.413521602450422
receded 0.413521602450422
trot 0.413521602450422
slackened 0.413521602450422
lung 0.413521602450422
troubling 0.413521602450422
unlocked 0.413521602450422
bullocks 0.413521602450422
overflowing 0.413521602450422
sedulously 0.413521602450422
blundered 0.413521602450422
fashionable 0.413521602450422
demolishing 0.413521602450422
repressed 0.413521602450422
framed 0.413521602450422
bowl 0.41036531000486054
brook 0.41036531000486054
dignified 0.41036531000486054
smart 0.41036531000486054
brutal 0.41036531000486054
plunged 0.41036531000486054
actually 0.41036531000486054
anyone 0.3848445290304703
vulgar 0.3848445290304703
twilight 0.3848445290304703
join 0.3848445290304703
bundle 0.36495008167315746
ended 0.36495008

rejoiced 0.007357864328411845
fits 0.007357864328411845
yielding 0.007357864328411845
conviction 0.007357864328411845
darkening 0.007357864328411845
constitution 0.007357864328411845
grumble 0.007357864328411845
bounded 0.007357864328411845
upwards 0.007357864328411845
shrewd 0.007357864328411845
alacrity 0.007357864328411845
yea 0.007357864328411845
baffled 0.007357864328411845
wives 0.007357864328411845
neat 0.007357864328411845
creeping 0.007357864328411845
gulf 0.007357864328411845
volume 0.007357864328411845
trimmed 0.007357864328411845
notions 0.007357864328411845
keys 0.007357864328411845
conceived 0.007357864328411845
formal 0.007357864328411845
mist 0.007357864328411845
tricks 0.007357864328411845
holiness 0.007357864328411845
scattering 0.007357864328411845
tip 0.007357864328411845
magical 0.007357864328411845
bringing 0.007357864328411845
polished 0.007357864328411845
gestures 0.007357864328411845
kindled 0.007357864328411845
tilted 0.007357864328411845
argued 0.007357864328

heron 0.0024526214428039483
barked 0.0024526214428039483
swoop 0.0024526214428039483
glibness 0.0024526214428039483
breakfasted 0.0024526214428039483
obsequiously 0.0024526214428039483
earnestness 0.0024526214428039483
swarms 0.0024526214428039483
stove 0.0024526214428039483
straighter 0.0024526214428039483
slacked 0.0024526214428039483
weights 0.0024526214428039483
adventurous 0.0024526214428039483
suburbs 0.0024526214428039483
cats 0.0024526214428039483
proprietor 0.0024526214428039483
fiddle 0.0024526214428039483
serge 0.0024526214428039483
ne'er 0.0024526214428039483
stalked 0.0024526214428039483
oblong 0.0024526214428039483
puff 0.0024526214428039483
satin 0.0024526214428039483
farthing 0.0024526214428039483
miracle 0.0024526214428039483
refusing 0.0024526214428039483
bored 0.0024526214428039483
meagre 0.0024526214428039483
cheapest 0.0024526214428039483
beef 0.0024526214428039483
plums 0.0024526214428039483
monsieur 0.0024526214428039483
decipher 0.0024526214428039483
prouder 0.0

Words with negative log likelihood scores are more likely to appear in the reference corpus (i.e. the second corpus) than in the first corpus. 

The code below lists the words with the highest negative scores. 

In [4]:
for word in tdm.sortedByValue(ll_scores) :
    print( word , ll_scores[word] )
    if ll_scores[word] > 0:
        break   

of -1439.851543543724
thou -1165.454352230357
the -984.5633915979506
which -660.8971986083308
by -622.3438024663222
knight -619.2114293301083
thy -604.5251320200484
my -563.7511476323103
upon -402.58703978492156
our -398.1488283780656
de -352.0253264162649
thee -336.43378010930894
prince -291.59656239568824
john -250.50682097307597
master -246.90457234847958
silver -229.98937207976286
their -217.23914364669554
hath -201.4022062767266
sir -186.5180289898121
this -184.38126198092328
and -167.5447226714582
order -165.72641268977912
noble -160.73241872281545
holy -152.039599244743
knights -132.3714881171689
thine -131.61777325623805
may -131.13300855942148
we -127.5701492892266
art -126.13567631090822
who -124.83260960981829
will -123.88134633272443
castle -118.70650216304709
norman -114.97491006152725
whom -113.08794250914042
these -111.2969373929478
as -109.93187236401138
brian -107.7244691093924
king -103.71918597764731
saint -102.43030615389102
ship -102.21720850486588
honour -101.1924

portion -13.738373317118082
security -13.738373317118082
masters -13.738373317118082
gun -13.738373317118082
chase -13.738373317118082
lands -13.738373317118082
rear -13.738373317118082
witch -13.738373317118082
skeleton -13.738373317118082
anchor -13.738373317118082
mates -13.738373317118082
answered -13.69880630007728
defend -13.62638095404117
rules -13.62638095404117
glove -13.62638095404117
fellows -13.480108859968128
take -13.379500958298607
force -13.356183306769552
least -13.30024293742455
blood -13.27392664728692
guess -13.19105568935206
roof -13.19105568935206
step -13.118127596375867
sounds -13.024050978341538
risk -13.024050978341538
eye -13.008526308032089
expression -12.942299716137217
passage -12.92274342465973
parts -12.910448049954965
event -12.910448049954965
sum -12.910448049954965
strength -12.754618015542633
once -12.634478461424635
best -12.583780693662803
herald -12.554329516739848
honoured -12.554329516739848
weapons -12.554329516739848
stores -12.554329516739848

surf -6.813190477020585
echo -6.813190477020585
litter -6.813190477020585
compass -6.813190477020585
hermitage -6.813190477020585
doubt -6.760420035655088
feelings -6.756307257040211
weight -6.756307257040211
person -6.690867789215066
second -6.5376588051416675
age -6.5218882312513955
tone -6.463163229698214
sign -6.37918635916702
mounted -6.37918635916702
scorn -6.37918635916702
nearer -6.289388907317843
feet -6.2523834077376215
fur -6.244968793481989
traveller -6.244968793481989
punishment -6.244968793481989
adventures -6.244968793481989
gift -6.244968793481989
yourselves -6.244968793481989
pavilion -6.244968793481989
king's -6.244968793481989
port -6.244968793481989
le -6.244968793481989
issue -6.244968793481989
prison -6.244968793481989
amongst -6.223440634643193
hopes -6.223440634643193
haste -6.223440634643193
have -6.21896296451942
mile -6.102684494649828
bars -6.102684494649828
oak -6.102684494649828
willing -6.102684494649828
equally -6.102684494649828
aim -6.05740269684208
su

higher -2.864560590536044
changed -2.788862174189532
won -2.788862174189532
loss -2.755484581066737
separate -2.755484581066737
pleasure -2.748841848369537
attempt -2.7343240905957913
seated -2.7343240905957913
saved -2.7343240905957913
welcome -2.7343240905957913
bottle -2.7343240905957913
reduced -2.733937011071581
europe -2.733937011071581
envy -2.733937011071581
moment's -2.733937011071581
discover -2.733937011071581
brother's -2.733937011071581
reasonable -2.733937011071581
poverty -2.733937011071581
offering -2.733937011071581
measure -2.733937011071581
sprung -2.733937011071581
price -2.733937011071581
mistress -2.733937011071581
pitched -2.733937011071581
labour -2.733937011071581
suits -2.733937011071581
proper -2.7141425471522
choose -2.7141425471522
occasions -2.7141425471522
cruelty -2.7141425471522
committed -2.6382111378704125
date -2.6382111378704125
acquire -2.6382111378704125
majestic -2.6382111378704125
mourning -2.6382111378704125
jealousy -2.6382111378704125
praises

entertained -1.3777422905333685
accomplish -1.3777422905333685
exact -1.3777422905333685
custom -1.3777422905333685
ages -1.3777422905333685
collect -1.3777422905333685
condemned -1.3777422905333685
undo -1.3777422905333685
abandon -1.3777422905333685
folding -1.3777422905333685
concern -1.3777422905333685
preparations -1.3777422905333685
doubtful -1.3777422905333685
metal -1.3777422905333685
romantic -1.3777422905333685
admiration -1.3777422905333685
fiercely -1.3777422905333685
urged -1.3777422905333685
worldly -1.3777422905333685
turf -1.3777422905333685
calmly -1.3777422905333685
lives -1.3744209241847685
humour -1.3671620452978956
birth -1.3671620452978956
stones -1.3671620452978956
press -1.3671620452978956
beware -1.3671620452978956
waving -1.3671620452978956
alfred -1.3671620452978956
load -1.3671620452978956
grief -1.3671620452978956
lodging -1.3671620452978956
alongside -1.3671620452978956
frequently -1.3671620452978956
taken -1.3437968544784447
passed -1.310186246192373
spac

ambitious -0.5468648181191575
weep -0.5468648181191575
shoved -0.5468648181191575
whiteness -0.5468648181191575
desolation -0.5468648181191575
threshold -0.5468648181191575
homely -0.5468648181191575
clothing -0.5468648181191575
regretted -0.5468648181191575
morsel -0.5468648181191575
preceded -0.5468648181191575
screen -0.5468648181191575
tossed -0.5468648181191575
lame -0.5468648181191575
burns -0.5468648181191575
trampling -0.5468648181191575
uncouth -0.5468648181191575
impotent -0.5468648181191575
bled -0.5468648181191575
taut -0.5468648181191575
seizing -0.5468648181191575
me—i -0.5468648181191575
borrow -0.5468648181191575
continuing -0.5468648181191575
lain -0.5468648181191575
willow -0.5468648181191575
baggage -0.5468648181191575
shoulder -0.5441789799822088
scene -0.5360536361381296
cross -0.5207476928926953
roar -0.5043954410689602
paradise -0.5043954410689602
seats -0.5043954410689602
entering -0.5043954410689602
friendly -0.5043954410689602
corners -0.5043954410689602
depar

show -0.25594632443463716
surface -0.24565992116106772
rapidly -0.24565992116106772
future -0.24565992116106772
crowded -0.24565992116106772
bird -0.24565992116106772
smile -0.2403363938189651
apart -0.2331960233143242
empty -0.2331960233143242
drawn -0.2331960233143242
rock -0.2331960233143242
has -0.22835527097165276
hung -0.2276337272910478
mentioned -0.21049288340344496
action -0.21049288340344496
relieve -0.20952048490805075
gather -0.20952048490805075
attracted -0.20952048490805075
fourth -0.20952048490805075
stooping -0.20952048490805075
bravely -0.20952048490805075
managed -0.20952048490805075
swallowed -0.20952048490805075
grown -0.20952048490805075
suited -0.20952048490805075
curled -0.20952048490805075
sought -0.20952048490805075
thread -0.20952048490805075
temper -0.2093322844975658
whatever -0.2033032450998884
deep -0.19282278057448732
wind -0.1910482013972854
creature -0.18825508627721854
golden -0.18825508627721854
calling -0.18139299332740544
stroke -0.18139299332740544

## Mann Whitney formula

In a [blogpost on identifying literary diction](https://tedunderwood.com/2011/11/09/identifying-the-terms-that-characterize-an-author-or-genre-why-dunnings-may-not-be-the-best-method/), Ted Underwood argues that Dunning's log likelihood function also has a number of disadvantages. It is sensitive to outliers, for example. He explains that the Mann Whitney ranks test can be a good alternative. 

To perform the Mann-Whitney ranks test, we firstly need to find all the words the two corpora to be compared have in common. Next, we need to divide the full texts of the two corpora to be compared into smaller chuncks, all of the same size. These can be fragments of 500 words, for instance. Next, we need to count the number of times each word occurs in these chunks. Using these counts, we can determine whether the word is more frequent in corpus 1 than in corpus 2 (or vice versa). As a final step, we determine the total number of fragments in which the word is most frequent, both in the first and the second corpus. If it is found, using these steps, that a word is much more common in one of the two corpora, this word can be identified as a distinctive word. The Mann-Whitney ranks test really looks at occurrences across the whole corpus, and it is neutralises the effect of exceptionally high counts in one or two of these chunks.      

The Mann Whitney test can be performed in Pyhton using the `mannwhitneyu()` method form the `scipy.stats` module. 

In [5]:
from tdm import word_tokenise
from scipy.stats import mannwhitneyu

## make a list of all the words in both corpora
words1 = word_tokenise(full_text1)
words2 = word_tokenise(full_text2)

def divide_into_chunks(words, length):

    chunks=[]
    ## chunk contains dictionaries
    # with word frequencies
    
    for i in range(0, len(words), length):
        counts = dict()
        for j in range(length):
            if i+j < len(words):
                word = words[i+j]
                counts[word] = counts.get(word,0)+1
        chunks.append(counts)
    return chunks


length = 500
chunks1 = divide_into_chunks(words1,length)
chunks2 = divide_into_chunks(words2,length)


# vocab is the union of terms in both sets
all_words = dict()
    
for chunk in chunks1:
    for word in chunk:
        all_words[word]= all_words.get(word,0) + 1
for chunk in chunks2:
    for word in chunk:
        all_words[word]= all_words.get(word,0) + 1
    
rho =  dict()
    
for word in all_words:
        
    a=[]
    b=[]
        
    for chunk in chunks1:
        a.append(chunk.get(word,0))
    for chunk in chunks2:
        b.append(chunk.get(word,0))

    stat,pval=mannwhitneyu(a,b, alternative="two-sided")
    mean =len(chunks1)*len(chunks2)*0.5
    if stat <= mean:
        pval = 0 - pval
            
    rho[word]= ( pval )


The words that are most distinctive in corpus 1 have a negative value.

In [6]:
print( "The following words are distinctive in corpus 1:" )    
for word in tdm.sortedByValue( rho):
    if rho[word] > 0:
        print( f'{word}\t{rho[word]:.22f}' ) 

The following words are distinctive in corpus 1:
she	0.0000000000000000000000
her	0.0000000000000000000000
he	0.0000000000000000000000
paul	0.0000000000000000000000
mrs	0.0000000000000000000000
mother	0.0000000000000000000000
morel	0.0000000000000000000000
don't	0.0000000000000000000000
went	0.0000000000000000000000
yes	0.0000000000000000000000
was	0.0000000000000000000000
asked	0.0000000000000000000000
felt	0.0000000000000000000000
did	0.0000000000000000000000
it's	0.0000000000000000000000
came	0.0000000000000000000000
lucy	0.0000000000000000000000
you	0.0000000000000000000000
oh	0.0000000000000000000000
always	0.0000000000000000000000
looked	0.0000000000000000000000
it	0.0000000000000000000000
miss	0.0000000000000000000000
going	0.0000000000000000000000
then	0.0000000000000000000000
home	0.0000000000000000000000
things	0.0000000000000000000000
go	0.0000000000000000000000
wanted	0.0000000000000000000000
sat	0.0000000000000000000000
miriam	0.0000000000000000000000
want	0.00000000000000

restlessness	0.0383091209159346010593
straightened	0.0383091209159346010593
sewing	0.0383091209159346010593
hodgkisson	0.0383091209159346010593
ma'e	0.0383091209159346010593
thrilled	0.0383091209159346010593
glamour	0.0383091209159346010593
mun	0.0383091209159346010593
thysen	0.0383091209159346010593
appen	0.0383091209159346010593
pint	0.0383091209159346010593
chimney-piece	0.0383091209159346010593
butty	0.0383091209159346010593
stalls	0.0383091209159346010593
mornings	0.0383091209159346010593
despicable	0.0383091209159346010593
impotence	0.0383091209159346010593
stress	0.0383091209159346010593
noisily	0.0383091209159346010593
snap-bag	0.0383091209159346010593
smeared	0.0383091209159346010593
fender	0.0383091209159346010593
slices	0.0383091209159346010593
tidy	0.0383091209159346010593
bullied	0.0383091209159346010593
rim	0.0383091209159346010593
halfway	0.0383091209159346010593
pasture	0.0383091209159346010593
impersonal	0.0383091209159346010593
jerked	0.0383091209159346010593
spoons	0

wake	0.1274455050241983522508
thoroughly	0.1280379480157005922525
possibly	0.1280379480157005922525
self	0.1280379480157005922525
neighbour	0.1280379480157005922525
bunch	0.1280379480157005922525
pool	0.1283085062266602072167
pound	0.1283086696314033048338
flower	0.1285914620961300658397
warned	0.1286320983951279584012
faintly	0.1286320983951279584012
existed	0.1286320983951279584012
jumping	0.1286320983951279584012
visitor	0.1286875772818722729607
eggs	0.1286876318211934433489
owned	0.1290670824627238311155
sadness	0.1290670824627238311155
alley	0.1290670824627238311155
chairs	0.1293859430618738426411
drops	0.1293859430618738426411
level	0.1294368096386865618630
clouds	0.1294368096386865618630
spat	0.1294470212045160384395
mothers	0.1294470212045160384395
fearfully	0.1294470212045160384395
assented	0.1294470212045160384395
peering	0.1294470212045160384395
detached	0.1294470212045160384395
phrases	0.1294470212045160384395
shift	0.1294470212045160384395
wearied	0.1294470212045160384395


assertive	0.1436100785514247968333
poker	0.1436100785514247968333
raker	0.1436100785514247968333
saucer	0.1436100785514247968333
laboriously	0.1436100785514247968333
calico	0.1436100785514247968333
pit-singlet	0.1436100785514247968333
winsome	0.1436100785514247968333
locking	0.1436100785514247968333
stalk	0.1436100785514247968333
departing	0.1436100785514247968333
ash-pit	0.1436100785514247968333
dustpan	0.1436100785514247968333
swindle	0.1436100785514247968333
mixing	0.1436100785514247968333
floury	0.1436100785514247968333
guts	0.1436100785514247968333
underground	0.1436100785514247968333
pit-bank	0.1436100785514247968333
peppering	0.1436100785514247968333
mug	0.1436100785514247968333
woollen	0.1436100785514247968333
congregational	0.1436100785514247968333
dinners	0.1436100785514247968333
peeled	0.1436100785514247968333
barkled	0.1436100785514247968333
accidentally	0.1436100785514247968333
mill-race	0.1436100785514247968333
waggon	0.1436100785514247968333
fierily	0.1436100785514247968

whoever	0.1551387985857061724282
quicker	0.1551387985857061724282
ma	0.1551387985857061724282
knelt	0.1551387985857061724282
tapped	0.1551387985857061724282
gothic	0.1551387985857061724282
stimulated	0.1551387985857061724282
appalling	0.1551387985857061724282
grandmother	0.1551387985857061724282
egg	0.1551387985857061724282
serves	0.1551387985857061724282
sulky	0.1551387985857061724282
boil	0.1551387985857061724282
whereupon	0.1551387985857061724282
needles	0.1551387985857061724282
nun	0.1551387985857061724282
pigeon	0.1551387985857061724282
m	0.1554151768199185656982
attract	0.1554151768199185656982
gap	0.1554151768199185656982
catastrophe	0.1554151768199185656982
whereon	0.1554151768199185656982
tap	0.1554151768199185656982
ensued	0.1554151768199185656982
accompany	0.1554151768199185656982
solution	0.1554151768199185656982
drifted	0.1554151768199185656982
ugliness	0.1554151768199185656982
smallest	0.1554151768199185656982
melting	0.1554151768199185656982
reference	0.15541517681991856

harvested	0.3018316948382196995837
up—free	0.3018316948382196995837
theologian	0.3018316948382196995837
behaves—like	0.3018316948382196995837
universe	0.3018316948382196995837
twelve-winded	0.3018316948382196995837
blemish	0.3018316948382196995837
everlasting	0.3018316948382196995837
yes—a	0.3018316948382196995837
but—but—	0.3018316948382196995837
hobby	0.3018316948382196995837
bores	0.3018316948382196995837
lakes	0.3018316948382196995837
inflated	0.3018316948382196995837
spiritually	0.3018316948382196995837
esthetically	0.3018316948382196995837
tombstones	0.3018316948382196995837
collapsing	0.3018316948382196995837
invent	0.3018316948382196995837
empyrean	0.3018316948382196995837
marvelling	0.3018316948382196995837
executante	0.3018316948382196995837
labelled	0.3018316948382196995837
pictorial	0.3018316948382196995837
what—that	0.3018316948382196995837
decides	0.3018316948382196995837
diaries	0.3018316948382196995837
intoxicated	0.3018316948382196995837
auspices	0.30183169483821969958

no—more	0.3018316948382196995837
you—may	0.3018316948382196995837
absurdities	0.3018316948382196995837
business-like	0.3018316948382196995837
pince-nez	0.3018316948382196995837
flattened	0.3018316948382196995837
navvy—nay	0.3018316948382196995837
recast	0.3018316948382196995837
flowerlike	0.3018316948382196995837
rebuked	0.3018316948382196995837
manliness	0.3018316948382196995837
revere	0.3018316948382196995837
inmost	0.3018316948382196995837
humourist	0.3018316948382196995837
solicitor	0.3018316948382196995837
speculation	0.3018316948382196995837
remnants	0.3018316948382196995837
indigenous	0.3018316948382196995837
milieu	0.3018316948382196995837
satisfaction—which	0.3018316948382196995837
solicitors	0.3018316948382196995837
despise—of	0.3018316948382196995837
obtainable	0.3018316948382196995837
immigrants	0.3018316948382196995837
questioning—their	0.3018316948382196995837
affluence	0.3018316948382196995837
inexplosive	0.3018316948382196995837
paper-bags	0.3018316948382196995837
orang

brookside	0.3018316948382196995837
mines	0.3018316948382196995837
donkeys	0.3018316948382196995837
gin	0.3018316948382196995837
countryside	0.3018316948382196995837
burrowing	0.3018316948382196995837
coal-miners	0.3018316948382196995837
stockingers	0.3018316948382196995837
straying	0.3018316948382196995837
elbowed	0.3018316948382196995837
financiers	0.3018316948382196995837
carston	0.3018316948382196995837
waite	0.3018316948382196995837
notorious	0.3018316948382196995837
brooks	0.3018316948382196995837
carthusians	0.3018316948382196995837
farmlands	0.3018316948382196995837
valleyside	0.3018316948382196995837
bunker's	0.3018316948382196995837
beggarlee	0.3018316948382196995837
loop	0.3018316948382196995837
accommodate	0.3018316948382196995837
regiments	0.3018316948382196995837
quadrangles	0.3018316948382196995837
dots	0.3018316948382196995837
blank-six	0.3018316948382196995837
auriculas	0.3018316948382196995837
saxifrage	0.3018316948382196995837
sweet-williams	0.3018316948382196995837
d

trusses	0.3018316948382196995837
wash-leather	0.3018316948382196995837
horse-hair	0.3018316948382196995837
no—i—	0.3018316948382196995837
manner—he	0.3018316948382196995837
action—he	0.3018316948382196995837
crackling	0.3018316948382196995837
the—it's	0.3018316948382196995837
sir,—please	0.3018316948382196995837
me'—er—er—i	0.3018316948382196995837
the—er—'two	0.3018316948382196995837
pairs—gris	0.3018316948382196995837
fil	0.3018316948382196995837
bas—grey	0.3018316948382196995837
stockings'—er—er—'sans—without'—er—i	0.3018316948382196995837
words—er—'doigts—fingers'—er—i	0.3018316948382196995837
fingers'—as	0.3018316948382196995837
well—as	0.3018316948382196995837
rule—	0.3018316948382196995837
clod	0.3018316948382196995837
shut-off	0.3018316948382196995837
in—at	0.3018316948382196995837
station—at	0.3018316948382196995837
you—it's	0.3018316948382196995837
paving	0.3018316948382196995837
sun—apples	0.3018316948382196995837
oranges	0.3018316948382196995837
green-gage	0.301831694838219

rappelleras	0.3018316948382196995837
beaute	0.3018316948382196995837
caresses	0.3018316948382196995837
desiccated	0.3018316948382196995837
bookcase	0.3018316948382196995837
unswathed	0.3018316948382196995837
curtly	0.3018316948382196995837
glumly	0.3018316948382196995837
parcels—meat	0.3018316948382196995837
green-groceries	0.3018316948382196995837
curtains—	0.3018316948382196995837
yes—often	0.3018316948382196995837
hectoring	0.3018316948382196995837
i—and	0.3018316948382196995837
ovenful	0.3018316948382196995837
well—then	0.3018316948382196995837
ropes	0.3018316948382196995837
trapseing	0.3018316948382196995837
morning—	0.3018316948382196995837
rhythmic	0.3018316948382196995837
sateen	0.3018316948382196995837
mother—you	0.3018316948382196995837
her—i—i	0.3018316948382196995837
her—she	0.3018316948382196995837
why—painting—and	0.3018316948382196995837
herbert	0.3018316948382196995837
spencer	0.3018316948382196995837
now—and	0.3018316948382196995837
does—	0.3018316948382196995837
t's	0

bernhardt's	0.3018316948382196995837
howl	0.3018316948382196995837
catamaran—	0.3018316948382196995837
catamaran	0.3018316948382196995837
malays	0.3018316948382196995837
good-humouredly	0.3018316948382196995837
he?"—laughing	0.3018316948382196995837
wi'out	0.3018316948382196995837
confidentially	0.3018316948382196995837
pyjama	0.3018316948382196995837
pierrot	0.3018316948382196995837
cribbage	0.3018316948382196995837
crib	0.3018316948382196995837
shuffled	0.3018316948382196995837
eight—	0.3018316948382196995837
preparatory	0.3018316948382196995837
things"—she	0.3018316948382196995837
hand—"and	0.3018316948382196995837
room's	0.3018316948382196995837
scrubbed	0.3018316948382196995837
clanged	0.3018316948382196995837
hair-pins	0.3018316948382196995837
dressing-table—her	0.3018316948382196995837
hair-brush	0.3018316948382196995837
click!—he	0.3018316948382196995837
unfasten	0.3018316948382196995837
strung	0.3018316948382196995837
dangled	0.3018316948382196995837
adore	0.301831694838219699

scarred	0.5246907598586111332040
scrupulously	0.5246907598586111332040
repented	0.5246907598586111332040
weeping	0.5246907598586111332040
ripped	0.5246907598586111332040
tasks	0.5246907598586111332040
sparks	0.5246907598586111332040
jokes	0.5246907598586111332040
soaking	0.5246907598586111332040
soaked	0.5246907598586111332040
fumbled	0.5246907598586111332040
gnawed	0.5246907598586111332040
sup	0.5246907598586111332040
scolded	0.5246907598586111332040
plump	0.5246907598586111332040
exclaim	0.5246907598586111332040
trotted	0.5246907598586111332040
plumped	0.5246907598586111332040
gettin	0.5246907598586111332040
towering	0.5246907598586111332040
jollily	0.5246907598586111332040
godfather	0.5246907598586111332040
extracts	0.5246907598586111332040
scratching	0.5246907598586111332040
waxen	0.5246907598586111332040
alight	0.5246907598586111332040
shrieking	0.5246907598586111332040
transmitted	0.5246907598586111332040
dipping	0.5246907598586111332040
fulfilment	0.5246907598586111332040
muffle

ties	0.9627596031895512274090
foolishly	0.9627596031895512274090
twitch	0.9627596031895512274090
overlooks	0.9627596031895512274090
portals	0.9627596031895512274090
uneasiness	0.9627596031895512274090
allows	0.9627596031895512274090
unpardonable	0.9627596031895512274090
shockingly	0.9627596031895512274090
sufferance	0.9627596031895512274090
enlightened	0.9627596031895512274090
science	0.9627596031895512274090
extracting	0.9627596031895512274090
arrives	0.9627596031895512274090
begs	0.9627596031895512274090
pet	0.9627596031895512274090
prerogatives	0.9627596031895512274090
omit	0.9627596031895512274090
profitable	0.9627596031895512274090
potent	0.9627596031895512274090
alien	0.9627596031895512274090
deride	0.9627596031895512274090
continuance	0.9627596031895512274090
possessing	0.9627596031895512274090
pattern	0.9627596031895512274090
prompted	0.9627596031895512274090
workings	0.9627596031895512274090
cling	0.9627596031895512274090
leeches	0.9627596031895512274090
absurdly	0.96275960318

The words that are most distinctive in corpus 2 have a negative value. 

In [7]:
print( "The following words are distinctive in corpus 2:"  )  

for word in reversed(tdm.sortedByValue( rho) ):
    if rho[word] < 0:
        print( f'{word}: {rho[word]:.22f}' ) 

The following words are distinctive in corpus 2:
of: -0.0000000000000000000000
by: -0.0000000000000000000000
thou: -0.0000000000000000000000
the: -0.0000000000000000000000
upon: -0.0000000000000000000000
which: -0.0000000000000000000000
knight: -0.0000000000000000000000
our: -0.0000000000000000000000
thy: -0.0000000000000000000000
saxon: -0.0000000000000000000000
my: -0.0000000000000000000000
this: -0.0000000000000000000000
and: -0.0000000000000000000000
cedric: -0.0000000000000000000000
thee: -0.0000000000000000000000
sir: -0.0000000000000000000000
de: -0.0000000000000000000000
these: -0.0000000000000000000000
master: -0.0000000000000000000001
john: -0.0000000000000000000001
may: -0.0000000000000000000001
captain: -0.0000000000000000000001
rebecca: -0.0000000000000000000001
hath: -0.0000000000000000000002
templar: -0.0000000000000000000002
hast: -0.0000000000000000000011
their: -0.0000000000000000000017
bois-guilbert: -0.0000000000000000000020
order: -0.0000000000000000000207
noble: -

struck: -0.0031420942915661408124
dick: -0.0031552701570675928226
turret: -0.0031793214861520636277
cheer: -0.0031852940314921348273
victim: -0.0031912499819814524847
discipline: -0.0031912499819814524847
bloody: -0.0031912499819814524847
stores: -0.0031912499819814524847
trade: -0.0031912499819814524847
charms: -0.0031912499819814524847
exchange: -0.0031971811558689955564
disgrace: -0.0031971811558689955564
sable: -0.0031971811558689955564
spanish: -0.0031971811558689955564
completely: -0.0031971811558689955564
ignorant: -0.0031971811558689955564
devotion: -0.0031971811558689955564
prompt: -0.0031971811558689955564
feast: -0.0031971811558689955564
heads: -0.0032310247768282578454
sea-cook: -0.0035996114848805371537
risen: -0.0035996114848805371537
brace: -0.0035996114848805371537
execution: -0.0035996114848805371537
methinks: -0.0035996114848805371537
contents: -0.0035996114848805371537
trumpet: -0.0035996114848805371537
stirrup: -0.0035996114848805371537
spears: -0.003599611484880537

morals: -0.0303981763700562727937
absolution: -0.0303981763700562727937
natives: -0.0303981763700562727937
effectually: -0.0303981763700562727937
respects: -0.0303981763700562727937
cautious: -0.0303981763700562727937
horsemen: -0.0303981763700562727937
bravest: -0.0303981763700562727937
ox: -0.0303981763700562727937
unfit: -0.0303981763700562727937
bondsman: -0.0303981763700562727937
clowns: -0.0303981763700562727937
coronet: -0.0303981763700562727937
bracelets: -0.0303981763700562727937
longitude: -0.0303981763700562727937
swine-herd: -0.0303981763700562727937
accoutred: -0.0303981763700562727937
eminence: -0.0303981763700562727937
convert: -0.0303981763700562727937
inflicted: -0.0303981763700562727937
degrees: -0.0303981763700562727937
inhabitants: -0.0303981763700562727937
races: -0.0303981763700562727937
multiplied: -0.0303981763700562727937
extensive: -0.0303981763700562727937
goat: -0.0303985537446484904711
esquires: -0.0303985537446484904711
submission: -0.030398553744648490471

accompanied: -0.0601830043409993589720
crest: -0.0601830043409993589720
violent: -0.0601830043409993589720
shouts: -0.0601830043409993589720
pursued: -0.0601830043409993589720
petty: -0.0601830043409993589720
blessing: -0.0601830043409993589720
described: -0.0601830043409993589720
concealed: -0.0601830043409993589720
extended: -0.0601830043409993589720
plainly: -0.0601830043409993589720
acquired: -0.0607751049502972817695
chief: -0.0612400863158017172427
supposed: -0.0632248938318316217044
places: -0.0638419486785467144019
southern: -0.0663241995274810591798
fixed: -0.0664792082969833525441
conflict: -0.0666969649316586504773
aloft: -0.0668502169457511480344
changing: -0.0671136356410222123525
information: -0.0671136356410222123525
animal: -0.0671140445467392626755
handle: -0.0671140445467392626755
proportion: -0.0671140445467392626755
presented: -0.0671140445467392626755
intelligence: -0.0671140445467392626755
fitting: -0.0671140445467392626755
patience: -0.0671140445467392626755
merc

outlandish: -0.0940102027295939002283
interwoven: -0.0940102027295939002283
namely: -0.0940102027295939002283
conversing: -0.0940102027295939002283
import: -0.0940102027295939002283
conveyance: -0.0940102027295939002283
ecclesiastic: -0.0940102027295939002283
personages: -0.0940102027295939002283
ascertain: -0.0940102027295939002283
amounted: -0.0940102027295939002283
tempest: -0.0940102027295939002283
requires: -0.0940102027295939002283
worshipful: -0.0940102027295939002283
pate: -0.0940102027295939002283
expound: -0.0940102027295939002283
lambs: -0.0940102027295939002283
ejaculated: -0.0940102027295939002283
nightfall: -0.0940102027295939002283
forsake: -0.0940102027295939002283
luxurious: -0.0940102027295939002283
esteemed: -0.0940102027295939002283
prolonged: -0.0940102027295939002283
cased: -0.0940102027295939002283
engraved: -0.0940102027295939002283
matted: -0.0940102027295939002283
scrip: -0.0940102027295939002283
scottish: -0.0940102027295939002283
calf: -0.0940102027295939002

avast: -0.1719124018672629761184
mutineer: -0.1719124018672629761184
bubbled: -0.1719124018672629761184
knee-deep: -0.1719124018672629761184
unprovided: -0.1719124018672629761184
camped: -0.1719124018672629761184
snuff-box: -0.1719124018672629761184
admixture: -0.1719124018672629761184
stumps: -0.1719124018672629761184
hailing: -0.1719124018672629761184
alexander: -0.1719124018672629761184
cannonade: -0.1719124018672629761184
tom's: -0.1719124018672629761184
reloaded: -0.1719124018672629761184
cracking: -0.1719124018672629761184
wade: -0.1719124018672629761184
ebb-tide: -0.1719124018672629761184
gig: -0.1719124018672629761184
process: -0.1719124018672629761184
gunner: -0.1719124018672629761184
bombardment: -0.1719124018672629761184
leeward: -0.1719124018672629761184
rippling: -0.1719124018672629761184
bread-bags: -0.1719124018672629761184
risking: -0.1719124018672629761184
you's: -0.1719124018672629761184
biscuits: -0.1719124018672629761184
ammunition: -0.1719124018672629761184
bellowe

soften: -0.1719124018672629761184
virelai: -0.1719124018672629761184
prelude: -0.1719124018672629761184
taper: -0.1719124018672629761184
studious: -0.1719124018672629761184
sharpens: -0.1719124018672629761184
long-bows: -0.1719124018672629761184
delilah: -0.1719124018672629761184
deeming: -0.1719124018672629761184
deserves: -0.1719124018672629761184
pastime: -0.1719124018672629761184
pattered: -0.1719124018672629761184
keepers: -0.1719124018672629761184
fort: -0.1719124018672629761184
emptied: -0.1719124018672629761184
hael: -0.1719124018672629761184
waes: -0.1719124018672629761184
provision: -0.1719124018672629761184
pewter: -0.1719124018672629761184
prudently: -0.1719124018672629761184
forest-walk: -0.1719124018672629761184
guest's: -0.1719124018672629761184
sinful: -0.1719124018672629761184
defile: -0.1719124018672629761184
entertainer: -0.1719124018672629761184
privations: -0.1719124018672629761184
forage: -0.1719124018672629761184
keeper's: -0.1719124018672629761184
provender: -0.

ladders: -0.1719126449865588934784
document: -0.1719126449865588934784
endor: -0.1719126449865588934784
diamonds: -0.1719126449865588934784
turret-chamber: -0.1719126449865588934784
unbridled: -0.1719126449865588934784
matilda: -0.1719126449865588934784
cruelties: -0.1719126449865588934784
threatens: -0.1719126449865588934784
loser: -0.1719126449865588934784
relenting: -0.1719126449865588934784
garlic: -0.1719126449865588934784
envoy: -0.1719126449865588934784
tosti: -0.1719126449865588934784
harold: -0.1719126449865588934784
wolfganger: -0.1719126449865588934784
morn: -0.1719126449865588934784
disappearance: -0.1719126449865588934784
uncanonical: -0.1719126449865588934784
trinkets: -0.1719126449865588934784
devout: -0.1719126449865588934784
ford: -0.1719126449865588934784
route: -0.1719126449865588934784
alighted: -0.1719126449865588934784
seneschal: -0.1719126449865588934784
scorns: -0.1719126449865588934784
baldwin: -0.1719126449865588934784
alicia: -0.1719126449865588934784
salisbu

airy: -0.3348510542848989191000
dereliction: -0.3348510542848989191000
saluting: -0.3348510542848989191000
mill-stones: -0.3348510542848989191000
imposter: -0.3348510542848989191000
imposter—a: -0.3348510542848989191000
single-handed: -0.3348510542848989191000
treasure-house: -0.3348510542848989191000
cheers: -0.3348510542848989191000
ambushed: -0.3348510542848989191000
diagonal: -0.3348510542848989191000
himself—given: -0.3348510542848989191000
useless—given: -0.3348510542848989191000
wormed: -0.3348510542848989191000
skeleton—it: -0.3348510542848989191000
half-idiot: -0.3348510542848989191000
pick-axes: -0.3348510542848989191000
eel: -0.3348510542848989191000
nick: -0.3348510542848989191000
mizzenmast: -0.3348510542848989191000
strangling: -0.3348510542848989191000
teetotum: -0.3348510542848989191000
musket-shots: -0.3348510542848989191000
crack!—three: -0.3348510542848989191000
then—crack: -0.3348510542848989191000
mates—: -0.3348510542848989191000
cub: -0.3348510542848989191000
scr

languor: -0.3348510542848989191000
surges: -0.3348510542848989191000
redescending: -0.3348510542848989191000
empire: -0.3348510542848989191000
re-established: -0.3348510542848989191000
whirr: -0.3348510542848989191000
simultaneous: -0.3348510542848989191000
marsh-birds: -0.3348510542848989191000
hands—well: -0.3348510542848989191000
dooty—: -0.3348510542848989191000
agin: -0.3348510542848989191000
lots: -0.3348510542848989191000
rope—"silver: -0.3348510542848989191000
man—and: -0.3348510542848989191000
tom—now: -0.3348510542848989191000
where'd: -0.3348510542848989191000
a-speaking: -0.3348510542848989191000
up—you: -0.3348510542848989191000
you—gold: -0.3348510542848989191000
blond: -0.3348510542848989191000
overhear: -0.3348510542848989191000
desperadoes: -0.3348510542848989191000
hearkening: -0.3348510542848989191000
live-oak: -0.3348510542848989191000
quack: -0.3348510542848989191000
reedy: -0.3348510542848989191000
knolls: -0.3348510542848989191000
brambles: -0.3348510542848989191

hand-barrow—a: -0.3348510542848989191000
one—the: -0.3348510542848989191000
creations: -0.3348510542848989191000
cooper: -0.3348510542848989191000
ballantyne: -0.3348510542848989191000
kingston: -0.3348510542848989191000
appetites: -0.3348510542848989191000
—so: -0.3348510542848989191000
youngsters: -0.3348510542848989191000
retold: -0.3348510542848989191000
schooners: -0.3348510542848989191000
tunes: -0.3348510542848989191000
purchaser: -0.3348510542848989191000
sweden—: -0.3348510542848989191000
chaluz: -0.3348510542848989191000
recur: -0.3348510542848989191000
tremour: -0.3348510542848989191000
numbered: -0.3348510542848989191000
dedicate: -0.3348510542848989191000
features—"that—may: -0.3348510542848989191000
wean: -0.3348510542848989191000
us—the: -0.3348510542848989191000
valueless: -0.3348510542848989191000
lady—to: -0.3348510542848989191000
value,—and: -0.3348510542848989191000
rebecca.—"you: -0.3348510542848989191000
tendering: -0.3348510542848989191000
ear-jewels: -0.33485105

lacking: -0.3348510542848989191000
brawl: -0.3348510542848989191000
none—and: -0.3348510542848989191000
hood.—"king: -0.3348510542848989191000
impelled: -0.3348510542848989191000
percy: -0.3348510542848989191000
warwickshire: -0.3348510542848989191000
unsheathing: -0.3348510542848989191000
other?—yet: -0.3348510542848989191000
kind—for: -0.3348510542848989191000
war—your: -0.3348510542848989191000
bodkin: -0.3348510542848989191000
meed—but: -0.3348510542848989191000
number—but: -0.3348510542848989191000
assurances: -0.3348510542848989191000
demean: -0.3348510542848989191000
besprinkled: -0.3348510542848989191000
macdonald: -0.3348510542848989191000
lordlings: -0.3348510542848989191000
xli: -0.3348510542848989191000
monarchs—bowed: -0.3348510542848989191000
style—a: -0.3348510542848989191000
jocose: -0.3348510542848989191000
joyously: -0.3348510542848989191000
himself—"but: -0.3348510542848989191000
altar-cloth: -0.3348510542848989191000
friar—: -0.3348510542848989191000
yearly—if: -0.3

master,—"and: -0.3348510542848989191000
peruse: -0.3348510542848989191000
thrashing-floor: -0.3348510542848989191000
fanners: -0.3348510542848989191000
professions: -0.3348510542848989191000
exclaimed—"here: -0.3348510542848989191000
literarum: -0.3348510542848989191000
lectione: -0.3348510542848989191000
forty-second: -0.3348510542848989191000
packthread: -0.3348510542848989191000
inspected: -0.3348510542848989191000
sword.—conrade: -0.3348510542848989191000
interrogator: -0.3348510542848989191000
armenian: -0.3348510542848989191000
jew.—give: -0.3348510542848989191000
dealest: -0.3348510542848989191000
scandalizing: -0.3348510542848989191000
questions.—what: -0.3348510542848989191000
unbeliever!—not: -0.3348510542848989191000
misbelieving: -0.3348510542848989191000
doubles: -0.3348510542848989191000
retreated.—"jew: -0.3348510542848989191000
it."—the: -0.3348510542848989191000
slavery: -0.3348510542848989191000
delinquents.—damian: -0.3348510542848989191000
weak—the: -0.3348510542848

it.—and: -0.3348510542848989191000
fool—i: -0.3348510542848989191000
jeopard: -0.3348510542848989191000
liberating: -0.3348510542848989191000
disadvantages: -0.3348510542848989191000
facilitate: -0.3348510542848989191000
ulrica's: -0.3348510542848989191000
pasture—let: -0.3348510542848989191000
———-and: -0.3348510542848989191000
xxxi: -0.3348510542848989191000
deathbed: -0.3348510542848989191000
parricide's: -0.3348510542848989191000
blasphemer: -0.3348510542848989191000
avaunt—avaunt!—: -0.3348510542848989191000
thee—for: -0.3348510542848989191000
thou?—speak: -0.3348510542848989191000
there?—ulrica: -0.3348510542848989191000
ear—"who: -0.3348510542848989191000
road—ha: -0.3348510542848989191000
me—a: -0.3348510542848989191000
prisoners—all: -0.3348510542848989191000
enterprises—the: -0.3348510542848989191000
strumpet—the: -0.3348510542848989191000
bracy—ulrica: -0.3348510542848989191000
templar—the: -0.3348510542848989191000
alone?—no—the: -0.3348510542848989191000
walls—thinkest: -0

canterbury: -0.3348510542848989191000
caper: -0.3348510542848989191000
arabic: -0.3348510542848989191000
sheepfold: -0.3348510542848989191000
permitting: -0.3348510542848989191000
flails: -0.3348510542848989191000
scythes: -0.3348510542848989191000
boar-spears: -0.3348510542848989191000
converts: -0.3348510542848989191000
township: -0.3348510542848989191000
orderly: -0.3348510542848989191000
equipments: -0.3348510542848989191000
arrow-flights: -0.3348510542848989191000
head-quarters: -0.3348510542848989191000
bestirred: -0.3348510542848989191000
association: -0.3348510542848989191000
fugitives: -0.3348510542848989191000
tenor:—"sir: -0.3348510542848989191000
engelred: -0.3348510542848989191000
carousals—: -0.3348510542848989191000
moment—"sir: -0.3348510542848989191000
bosom.—i: -0.3348510542848989191000
green-wood: -0.3348510542848989191000
send?—malvoisin: -0.3348510542848989191000
scaling: -0.3348510542848989191000
knight—ay: -0.3348510542848989191000
fork-headed: -0.334851054284898

ope: -0.3348510542848989191000
harp-strings: -0.3348510542848989191000
finger-ends: -0.3348510542848989191000
grape: -0.3348510542848989191000
pitches: -0.3348510542848989191000
surname: -0.3348510542848989191000
half-a-dozen: -0.3348510542848989191000
broadswords: -0.3348510542848989191000
scimitar: -0.3348510542848989191000
jael: -0.3348510542848989191000
tenpenny: -0.3348510542848989191000
sufficing: -0.3348510542848989191000
brotherly: -0.3348510542848989191000
impertinently: -0.3348510542848989191000
housekeeping: -0.3348510542848989191000
glades—resolve: -0.3348510542848989191000
prayers,—i: -0.3348510542848989191000
anon,—as: -0.3348510542848989191000
liege's: -0.3348510542848989191000
disport: -0.3348510542848989191000
abiding: -0.3348510542848989191000
trencher-man: -0.3348510542848989191000
therewithal: -0.3348510542848989191000
thews: -0.3348510542848989191000
brimmer: -0.3348510542848989191000
ceremonious: -0.3348510542848989191000
hooped: -0.3348510542848989191000
urus: -0

wanton: -0.3348510542848989191000
bull-baiting: -0.3348510542848989191000
laurel: -0.3348510542848989191000
adjudge: -0.3348510542848989191000
fourthly: -0.3348510542848989191000
matchless: -0.3348510542848989191000
warhorse: -0.3348510542848989191000
thirdly: -0.3348510542848989191000
outrance: -0.3348510542848989191000
proposing: -0.3348510542848989191000
unoccupied: -0.3348510542848989191000
gaged: -0.3348510542848989191000
reining: -0.3348510542848989191000
wantonness: -0.3348510542848989191000
stoned: -0.3348510542848989191000
jewess!—we: -0.3348510542848989191000
black-eyed: -0.3348510542848989191000
halidom: -0.3348510542848989191000
palamon: -0.3348510542848989191000
decrease: -0.3348510542848989191000
resounds: -0.3348510542848989191000
challenger: -0.3348510542848989191000
me—here: -0.3348510542848989191000
pleasest: -0.3348510542848989191000
heraldry: -0.3348510542848989191000
desist: -0.3348510542848989191000
weatherbrain: -0.3348510542848989191000
steps,—an: -0.33485105428

himself,—but: -0.3348510542848989191000
humblest: -0.3348510542848989191000
franca: -0.3348510542848989191000
lingua: -0.3348510542848989191000
filz: -0.3348510542848989191000
mes: -0.3348510542848989191000
redeeming: -0.3348510542848989191000
peep: -0.3348510542848989191000
banquet,—if: -0.3348510542848989191000
peasantry: -0.3348510542848989191000
largesses: -0.3348510542848989191000
revenues: -0.3348510542848989191000
covereth: -0.3348510542848989191000
commiseration: -0.3348510542848989191000
betters: -0.3348510542848989191000
severest: -0.3348510542848989191000
fleetest: -0.3348510542848989191000
best-trained: -0.3348510542848989191000
bowers: -0.3348510542848989191000
ennui: -0.3348510542848989191000
dispelling: -0.3348510542848989191000
delinquencies: -0.3348510542848989191000
volatile: -0.3348510542848989191000
coursers: -0.3348510542848989191000
large-jointed: -0.3348510542848989191000
manes: -0.3348510542848989191000
jerrid: -0.3348510542848989191000
el: -0.334851054284898919

lighted: -0.5184433139475266294482
humility: -0.5184433139475266294482
deeper: -0.5205950643169462432880
flying: -0.5205950643169462432880
indignation: -0.5205950643169462432880
acted: -0.5205950643169462432880
argument: -0.5205950643169462432880
health: -0.5269833361528146742359
break: -0.5304476103466577718848
taste: -0.5339075171321707724559
ceased: -0.5358406370403803187230
condemned: -0.5370775557467585281657
tools: -0.5385721713398341492507
expectation: -0.5385721713398341492507
night's: -0.5385721713398341492507
examine: -0.5385721713398341492507
innocence: -0.5385721713398341492507
disdain: -0.5385721713398341492507
die: -0.5386206776245235916534
influence: -0.5400361635846863128663
strode: -0.5400684160278934609067
arranging: -0.5400684160278934609067
veins: -0.5400684160278934609067
feed: -0.5400684160278934609067
pillow: -0.5400684160278934609067
midnight: -0.5400684160278934609067
earnestly: -0.5400684160278934609067
bluff: -0.5400684160278934609067
cheerily: -0.54006841602

pure: -0.9509490660958901386834
content: -0.9543146786110850987583
waste: -0.9543146786110850987583
bag: -0.9554390024733189212824
folded: -0.9638131636268729707240
desire: -0.9640434744114835252660
rock: -0.9791181809205186103995
should: -0.9868362063319010557549
meal: -0.9877342490308150813050
knowledge: -0.9877349095372265352921
shout: -0.9927069709699148924997
need: -0.9981287598834900354206
pointed: -0.9983988267647072589739


## Bibliography

* Dunning, Ted, 'Accurate Methods for the Statistics of Surprise and Coincidence', in *Computational Linguistics*, 19:1 (1993).
* Rayson, P. and Garside, R., 'Comparing corpora using frequency profiling', in *Proceedings of the workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000)* (2000)
* H. Mann and D. Whitney, 'On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other', in *Ann. Math. Statist.*, 1:18 (1947). <https://doi.org/10.1214/aoms/1177730491>
* Adam Kilgarriff, *Comparing Corpora*, in *International Journal of Corpus Linguistics*, 6:1 (2001). <https://doi.org/10.1075/ijcl.6.1.05kil>