<p  style="z-index: 101;background: #fde073;text-align: center;line-height: 2.5;overflow: hidden;font-size:22px;">Please <a href="https://www.pycm.io/doc/#Cite" target="_blank">cite us</a> if you use the software</p>

# Distance/Similarity

PyCM's `distance` method provides users with a wide range of string distance/similarity metrics to evaluate a confusion matrix by measuring its distance to a perfect confusion matrix. Distance/Similarity metrics measure the distance between two vectors of numbers. Small distances between two objects indicate similarity. In the PyCM's `distance` method, a distance measure can be chosen from `DistanceType`. The measures' names are chosen based on the namig style suggested in [[1]](#ref1).

In [1]:
from pycm import ConfusionMatrix, DistanceType

In [2]:
cm = ConfusionMatrix(matrix={0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}})

$$TP \rightarrow True Positive$$
$$TN \rightarrow True Negative$$
$$FP \rightarrow False Positive$$
$$FN \rightarrow False Negative$$
$$POP \rightarrow Population$$

## AMPLE

AMPLE similarity [[2]](#ref2) [[3]](#ref3).

$$sim_{AMPLE}=|\frac{TP}{TP+FP}-\frac{FN}{FN+TN}|$$

In [3]:
cm.distance(metric=DistanceType.AMPLE)

{0: 0.6, 1: 0.3, 2: 0.17142857142857143}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Anderberg's D

Anderberg's D [[4]](#ref4).

$$sim_{Anderberg} =
\frac{(max(TP,FP)+max(FN,TN)+max(TP,FN)+max(FP,TN))-
(max(TP+FP,FP+TN)+max(TP+FP,FN+TN))}{2\times POP}$$

In [4]:
cm.distance(metric=DistanceType.Anderberg)

{0: 0.16666666666666666, 1: 0.0, 2: 0.041666666666666664}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Andres & Marzo's Delta

Andres & Marzo's Delta correlation [[5]](#ref5).

$$corr_{AndresMarzo_\Delta} = \Delta =
\frac{TP+TN-2 \times \sqrt{FP \times FN}}{POP}$$

In [5]:
cm.distance(metric=DistanceType.AndresMarzoDelta)

{0: 0.8333333333333334, 1: 0.5142977396044842, 2: 0.17508504286947035}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baroni-Urbani & Buser I

Baroni-Urbani & Buser I similarity [[6]](#ref6).

$$sim_{BaroniUrbaniBuserI} =
\frac{\sqrt{TP\times TN}+TP}{\sqrt{TP\times TN}+TP+FP+FN}$$

In [6]:
cm.distance(metric=DistanceType.BaroniUrbaniBuserI)

{0: 0.79128784747792, 1: 0.5606601717798213, 2: 0.5638559245324765}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baroni-Urbani & Buser II

Baroni-Urbani & Buser II correlation [[6]](#ref6).

$$corr_{BaroniUrbaniBuserII} =
\frac{\sqrt{TP \times TN}+TP-FP-FN}{\sqrt{TP \times TN}+TP+FP+FN}$$

In [7]:
cm.distance(metric=DistanceType.BaroniUrbaniBuserII)

{0: 0.58257569495584, 1: 0.12132034355964261, 2: 0.1277118490649528}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Batagelj & Bren

Batagelj & Bren distance [[7]](#ref7).

$$dist_{BatageljBren} =
\frac{FP \times FN}{TP \times TN}$$

In [8]:
cm.distance(metric=DistanceType.BatageljBren)

{0: 0.0, 1: 0.25, 2: 0.5}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu I

Baulieu I distance [[8]](#ref8).

$$sim_{BaulieuI} =
\frac{(TP+FP) \times (TP+FN)-TP^2}{(TP+FP) \times (TP+FN)}$$

In [9]:
cm.distance(metric=DistanceType.BaulieuI)

{0: 0.4, 1: 0.8333333333333334, 2: 0.7}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu II

Baulieu II similarity [[8]](#ref8).

$$sim_{BaulieuII} =
\frac{TP^2 \times TN^2}{(TP+FP) \times (TP+FN) \times (FP+TN) \times (FN+TN)}$$

In [10]:
cm.distance(metric=DistanceType.BaulieuII)

{0: 0.4666666666666667, 1: 0.11851851851851852, 2: 0.11428571428571428}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu III

Baulieu III distance [[8]](#ref8).

$$sim_{BaulieuIII} =
\frac{POP^2 - 4 \times (TP \times TN-FP \times FN)}{2 \times POP^2}$$

In [11]:
cm.distance(metric=DistanceType.BaulieuIII)

{0: 0.20833333333333334, 1: 0.4166666666666667, 2: 0.4166666666666667}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu IV

Baulieu IV distance [[9]](#ref9).

$$dist_{BaulieuIV} = \frac{FP+FN-(TP+\frac{1}{2})\times(TN+\frac{1}{2})\times TN  \times k}{POP}$$

In [12]:
cm.distance(metric=DistanceType.BaulieuIV)

{0: -41.45702383161246, 1: -22.855395541901885, 2: -13.85431293274332}

* The default value of k is Euler's number $e$

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu V

Baulieu V distance [[9]](#ref9).

$$dist_{BaulieuV} = \frac{FP+FN+1}{TP+FP+FN+1}$$

In [13]:
cm.distance(metric=DistanceType.BaulieuV)

{0: 0.5, 1: 0.8, 2: 0.6666666666666666}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu VI

Baulieu VI distance [[9]](#ref9).

$$dist_{BaulieuVI} = \frac{FP+FN}{TP+FP+FN+1}$$

In [14]:
cm.distance(metric=DistanceType.BaulieuVI)

{0: 0.3333333333333333, 1: 0.6, 2: 0.5555555555555556}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu VII

Baulieu VII distance [[9]](#ref9).

$$dist_{BaulieuVII} = \frac{FP+FN}{POP + TP \times (TP-4)^2}$$

In [15]:
cm.distance(metric=DistanceType.BaulieuVII)

{0: 0.13333333333333333, 1: 0.14285714285714285, 2: 0.3333333333333333}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu VIII

Baulieu VIII distance [[9]](#ref9).

$$dist_{BaulieuVIII} = \frac{(FP-FN)^2}{POP^2}$$

In [16]:
cm.distance(metric=DistanceType.BaulieuVIII)

{0: 0.027777777777777776, 1: 0.006944444444444444, 2: 0.006944444444444444}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu IX

Baulieu IX distance [[9]](#ref9).

$$dist_{BaulieuIX} = \frac{FP+2 \times FN}{TP+FP+2 \times FN+TN}$$

In [17]:
cm.distance(metric=DistanceType.BaulieuIX)

{0: 0.16666666666666666, 1: 0.35714285714285715, 2: 0.5333333333333333}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu X

Baulieu X distance [[9]](#ref9).

$$dist_{BaulieuX} = \frac{FP+FN+max(FP,FN)}{POP+max(FP,FN)}$$

In [18]:
cm.distance(metric=DistanceType.BaulieuX)

{0: 0.2857142857142857, 1: 0.35714285714285715, 2: 0.5333333333333333}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu XI

Baulieu XI distance [[9]](#ref9).

$$dist_{BaulieuXI} = \frac{FP+FN}{FP+FN+TN}$$

In [19]:
cm.distance(metric=DistanceType.BaulieuXI)

{0: 0.2222222222222222, 1: 0.2727272727272727, 2: 0.5555555555555556}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu XII

Baulieu XII distance [[9]](#ref9).

$$dist_{BaulieuXII} = \frac{FP+FN}{TP+FP+FN-1}$$

In [20]:
cm.distance(metric=DistanceType.BaulieuXII)

{0: 0.5, 1: 1.0, 2: 0.7142857142857143}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu XIII

Baulieu XIII distance [[9]](#ref9).

$$dist_{BaulieuXIII} = \frac{FP+FN}{TP+FP+FN+TP \times (TP-4)^2}$$

In [21]:
cm.distance(metric=DistanceType.BaulieuXIII)

{0: 0.25, 1: 0.23076923076923078, 2: 0.45454545454545453}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu XIV

Baulieu XIV distance [[9]](#ref9).

$$dist_{BaulieuXIV} = \frac{FP+2 \times FN}{TP+FP+2 \times FN}$$

In [22]:
cm.distance(metric=DistanceType.BaulieuXIV)

{0: 0.4, 1: 0.8333333333333334, 2: 0.7272727272727273}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Baulieu XV

Baulieu XV distance [[9]](#ref9).

$$dist_{BaulieuXV} = \frac{FP+FN+max(FP, FN)}{TP+FP+FN+max(FP, FN)}$$

In [23]:
cm.distance(metric=DistanceType.BaulieuXV)

{0: 0.5714285714285714, 1: 0.8333333333333334, 2: 0.7272727272727273}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Benini I

Benini I correlation [[10]](#ref10).

$$corr_{BeniniI} = \frac{TP \times TN-FP \times FN}{(TP+FN)\times(FN+TN)}$$

In [24]:
cm.distance(metric=DistanceType.BeniniI)

{0: 1.0, 1: 0.2, 2: 0.14285714285714285}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Benini II

Benini II correlation [[10]](#ref10).

$$corr_{BeniniII} = \frac{TP \times TN-FP \times FN}{min((TP+FN)\times(FN+TN), (TP+FP)\times(FP+TN))}$$

In [25]:
cm.distance(metric=DistanceType.BeniniII)

{0: 1.0, 1: 0.3333333333333333, 2: 0.2}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Canberra

Canberra distance [[11]](#ref11) [[12]](#ref12).

$$sim_{Canberra} =
\frac{FP+FN}{(TP+FP)+(TP+FN)}$$

In [26]:
cm.distance(metric=DistanceType.Canberra)

{0: 0.25, 1: 0.6, 2: 0.45454545454545453}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Clement

Clement similarity [[13]](#ref13).

$$sim_{Clement} =
\frac{TP}{TP+FP}\times\Big(1 - \frac{TP+FP}{POP}\Big) +
\frac{TN}{FN+TN}\times\Big(1 - \frac{FN+TN}{POP}\Big)$$

In [27]:
cm.distance(metric=DistanceType.Clement)

{0: 0.7666666666666666, 1: 0.55, 2: 0.588095238095238}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Consonni & Todeschini I

Consonni & Todeschini I similarity [[14]](#ref14).

$$sim_{ConsonniTodeschiniI} =
\frac{log(1+TP+TN)}{log(1+POP)}$$

In [28]:
cm.distance(metric=DistanceType.ConsonniTodeschiniI)

{0: 0.9348704159880586, 1: 0.8977117175026231, 2: 0.8107144632819592}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Consonni & Todeschini II

Consonni & Todeschini II similarity [[14]](#ref14).

$$sim_{ConsonniTodeschiniII} =
\frac{log(1+POP)-log(1+FP+FN)}{log(1+POP)}$$

In [29]:
cm.distance(metric=DistanceType.ConsonniTodeschiniII)

{0: 0.5716826589686053, 1: 0.4595236911453605, 2: 0.3014445045412856}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Consonni & Todeschini III

Consonni & Todeschini III similarity [[14]](#ref14).

$$sim_{ConsonniTodeschiniIII} =
\frac{log(1+TP)}{log(1+POP)}$$

In [30]:
cm.distance(metric=DistanceType.ConsonniTodeschiniIII)

{0: 0.5404763088546395, 1: 0.27023815442731974, 2: 0.5404763088546395}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Consonni & Todeschini IV

Consonni & Todeschini IV similarity [[14]](#ref14).

$$sim_{ConsonniTodeschiniIV} =
\frac{log(1+TP)}{log(1+TP+FP+FN)}$$

In [31]:
cm.distance(metric=DistanceType.ConsonniTodeschiniIV)

{0: 0.7737056144690831, 1: 0.43067655807339306, 2: 0.6309297535714574}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## Consonni & Todeschini V

Consonni & Todeschini V correlation [[14]](#ref14).

$$corr_{ConsonniTodeschiniV} =
\frac{log(1+TP \times TN)-log(1+FP \times FN)}{log(1+\frac{POP^2}{4})}$$

In [32]:
cm.distance(metric=DistanceType.ConsonniTodeschiniV)

{0: 0.8560267854703983, 1: 0.30424737289682985, 2: 0.17143541431350617}

<ul>
    <li><span style="color:red;">Notice </span> :  new in <span style="color:red;">version 3.8</span> </li>
</ul>

## References

<blockquote id="ref1">1- C. C. Little, "Abydos Documentation," 2018.</blockquote>

<blockquote id="ref2">2- V. Dallmeier, C. Lindig, and A. Zeller, "Lightweight defect localization for Java," in <i>European conference on object-oriented programming</i>, 2005: Springer, pp. 528-550.</blockquote>

<blockquote id="ref3">3- R. Abreu, P. Zoeteweij, and A. J. Van Gemund, "An evaluation of similarity coefficients for software fault localization," in 2006 <i>12th Pacific Rim International Symposium on Dependable Computing (PRDC'06)</i>, 2006: IEEE, pp. 39-46.</blockquote>

<blockquote id="ref4">4- M. R. Anderberg, <i>Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks</i>. Academic press, 2014.</blockquote>

<blockquote id="ref5">5- A. M. Andrés and P. F. Marzo, "Delta: A new measure of agreement between two raters," <i>British journal of mathematical and statistical psychology</i>, vol. 57, no. 1, pp. 1-19, 2004.</blockquote>

<blockquote id="ref6">6- C. Baroni-Urbani and M. W. Buser, "Similarity of binary data," <i>Systematic Zoology</i>, vol. 25, no. 3, pp. 251-259, 1976.</blockquote>

<blockquote id="ref7">7- V. Batagelj and M. Bren, "Comparing resemblance measures," <i>Journal of classification</i>, vol. 12, no. 1, pp. 73-90, 1995.</blockquote>

<blockquote id="ref8">8- F. B. Baulieu, "A classification of presence/absence based dissimilarity coefficients," <i>Journal of Classification</i>, vol. 6, no. 1, pp. 233-246, 1989.</blockquote>

<blockquote id="ref9">9- F. B. Baulieu, "Two variant axiom systems for presence/absence based dissimilarity coefficients," <i>Journal of Classification</i>, vol. 14, no. 1, pp. 0159-0170, 1997.</blockquote>

<blockquote id="ref10">10- R. Benini, <i>Principii di demografia</i>. Barbera, 1901.</blockquote>

<blockquote id="ref11">11- G. N. Lance and W. T. Williams, "Computer programs for hierarchical polythetic classification (“similarity analyses”)," <i>The Computer Journal</i>, vol. 9, no. 1, pp. 60-64, 1966.</blockquote>

<blockquote id="ref12">12- G. N. Lance and W. T. Williams, "Mixed-Data Classificatory Programs I - Agglomerative Systems," <i>Australian Computer Journal</i>, vol. 1, no. 1, pp. 15-20, 1967.</blockquote>

<blockquote id="ref13">13- P. W. Clement, "A formula for computing inter-observer agreement," <i>Psychological Reports</i>, vol. 39, no. 1, pp. 257-258, 1976.</blockquote>

<blockquote id="ref14">14- V. Consonni and R. Todeschini, "New similarity coefficients for binary data," <i>Match-Communications in Mathematical and Computer Chemistry</i>, vol. 68, no. 2, p. 581, 2012.</blockquote>