# Generate Random Cognates

At times you may want to compare how well an algorithms performs against a completely random algorithm for cognate cluster annotation. Here's how you do it. We use the ```random_cognates``` function which assigns words to random cognate sets and check how well these random cognates correspond to our gold standard. We use the ```bcubes``` function which computes B-Cubed F-Scores for the purpose of evaluation to check the accuracy of our random cognate judgments. First, we import the relevant functions (we also import ```test_data``` to have access to LingPy's test data.

In [1]:
from lingpy import *
from lingpy.tests.util import test_data
from lingpy.evaluate.acd import random_cognates, bcubes

Now we test on Kessler's (2001) dataset, which has a lot of unrelated languages and therefore large amounts of unrelated words.

In [4]:
wl = Wordlist(test_data("KSL.qlc"))
for i in range(10):
    random_cognates(wl, ref='T_'+str(i))
    p, r, f = bcubes(wl, 'cogid', 'T_'+str(i))
    print('{0:2}:\t{1:.2f}\t{2:.2f}\t{3:.2f}'.format(i+1, p, r, f))

 1:	0.38	0.91	0.53
 2:	0.37	0.92	0.53
 3:	0.37	0.91	0.53
 4:	0.36	0.93	0.52
 5:	0.39	0.91	0.54
 6:	0.38	0.91	0.53
 7:	0.38	0.91	0.53
 8:	0.38	0.91	0.53
 9:	0.37	0.91	0.53
10:	0.37	0.91	0.53


Now we repeat this analysis but this time using a data set with more cognates, namely our Chinese test set (```phybo.qlc```).

In [3]:
wl = Wordlist(test_data("phybo.qlc"))
for i in range(10):
    random_cognates(wl, ref='T_'+str(i))
    p, r, f = bcubes(wl, 'cogid', 'T_'+str(i))
    print('{0:2}:\t{1:.2f}\t{2:.2f}\t{3:.2f}'.format(i+1, p, r, f))

 1:	0.36	0.27	0.31
 2:	0.44	0.26	0.33
 3:	0.37	0.26	0.31
 4:	0.42	0.25	0.31
 5:	0.40	0.25	0.31
 6:	0.40	0.23	0.30
 7:	0.39	0.27	0.32
 8:	0.38	0.29	0.33
 9:	0.42	0.26	0.32
10:	0.40	0.25	0.31


What we can see, and what is interesting in this context is that we achieve rather high scores on the Kessler dataset, while our scores on the other dataset are rather low. This is surely related to the number of concepts in the data as well as the number of languages, but also the overall "cognate density".