Jaccard similarity coefficient score
===

* Esta función computa el promedio de los coeficientes de similaridad de Jaccard (o índice de Jaccard) entre pares de conjuntos de etiquetas.

* Se calcula como: 
$$
J(y, \hat{y}) = 
\frac{|y \cup \hat{y}|}
{|y \cap \hat{y}|}
$$
donde:

    * $y$ es el conjunto de etiquetas observadas.
    
    * $\hat{y}$ es el conjunto de etiquedas pronosticadas.
   

* Esta medida se aplica a problemas binarios.

* Puede extenderse a problemas de múltiples etiquetas o múltiples clases usando el promedio.

Caso base binario
---

In [1]:
import numpy as np
from sklearn.metrics import jaccard_score

y_true = [0, 1, 1]
y_pred = [1, 1, 0]

#           | y_pred
#           | 1    0
#   --------|--------  |---------
#         1 | 1    1   | TP   FN             TP             1
#   y_true  |          |            J = ------------ = ----------- = 0.33
#         0 | 1    0   | FP   TN        TP + FN + FP    1 + 1 + 1
#
jaccard_score(
    # -------------------------------------------------------------------------
    # Recibe los mismos parámetros de las métricas anteriores
    # -------------------------------------------------------------------------
    y_true=y_true,
    y_pred=y_pred,
    labels=None,
    pos_label=1,
    average="binary",
    sample_weight=None,
    zero_division="warn,",
)

0.3333333333333333

Multiples etiquetas (average="samples")
---

In [2]:
y_true = np.array([[0, 1, 1], [1, 1, 0]])
y_pred = np.array([[1, 1, 1], [1, 0, 0]])

#                                | y_pred
# y_true[0] = [0, 1, 1]          | 1    0
# y_pred[0] = [1, 1, 1]  --------|--------  |---------
#                              1 | 2    0   | TP   FN             2
#                        y_true  |          |            J = ----------- = 2/3
#                              0 | 1    0   | FP   TN         2 + 1 + 0
#
#                                | y_pred
# y_true[1] = [1, 1, 0]          | 1    0
# y_pred[1] = [1, 0, 0]  --------|--------  |---------
#                              1 | 1    1   | TP   FN             1
#                        y_true  |          |            J = ----------- = 1/2
#                              0 | 0    1   | FP   TN         1 + 0 + 1
#
# (2/3 + 1/2) / 2 = (4/6 + 3/6) / 2 = 7/12 = 0.58333...
#
jaccard_score(y_true, y_pred, average="samples")

0.5833333333333333

Multiples etiquetas (average=None)
---

In [5]:
#
#                   A  B  C    A  B  C
y_true = np.array([[0, 1, 1], [1, 1, 0]])
y_pred = np.array([[1, 1, 1], [1, 0, 0]])
#
#             Patrones   Matriz de      Jaccard
#                        confusion
# Label A:      0 1         1 0         1 / (1 + 1 + 0) = 1/2
#               1 1         1 0
#
# Label B:      1 1         1 1         1 / (1 + 0 + 1) = 1/2
#               1 0         0 0
#
# Label C:      1 0         1 0         1 / (1 + 0 + 0) = 1
#               1 0         0 1
#
jaccard_score(y_true, y_pred, average=None)

array([0.5, 0.5, 1. ])

Multiples etiquetas (average="macro")
---

In [3]:
#
#                   A  B  C    A  B  C
y_true = np.array([[0, 1, 1], [1, 1, 0]])
y_pred = np.array([[1, 1, 1], [1, 0, 0]])
#
#             Patrones   Matriz de      Jaccard
#                        confusion
# Label A:      0 1         1 0         1 / (1 + 1 + 0) = 1/2
#               1 1         1 0
#
# Label B:      1 1         1 1         1 / (1 + 0 + 1) = 1/2
#               1 0         0 0
#
# Label C:      1 0         1 0         1 / (1 + 0 + 0) = 1
#               1 0         0 1
#
# macro -> (1/2 + 1/2 + 1) / 3 = 2/3
#
jaccard_score(y_true, y_pred, average="macro")

0.6666666666666666

Multiples etiquetas (average="micro")
---

In [4]:
#
#                   A  B  C    A  B  C
y_true = np.array([[0, 1, 1], [1, 1, 0]])
y_pred = np.array([[1, 1, 1], [1, 0, 0]])
#
#             Patrones   Matriz de     Num           Den
#                        confusion
# Label A:      0 1         1 0         1        (1 + 1 + 0) = 2
#               1 1         1 0
#
# Label B:      1 1         1 1         1        (1 + 0 + 1) = 2
#               1 0         0 0
#
# Label C:      1 0         1 0         1        (1 + 0 + 0) = 1
#               1 0         0 1
#                                   ---------      ----------
#                        Jaccard = (1 + 1 + 1)  /  (2 + 2 + 1) = 3/5 = 0.6
#
jaccard_score(y_true, y_pred, average="micro")

0.6

Problemas multiclase (average=None)
---

In [6]:
#
# Clases: 0, 1, 2
#
y_pred = [0, 2, 1, 2]
y_true = [0, 1, 2, 2]
#
#                           clase  matriz de  jaccard
#           |  y_pred              confusion
#           | 0  1  2       --------------------------------------
# ----------|---------      0      1 0        1 / (1 + 0 + 0) = 1
#         0 | 1  0  0              0 3
# y_true  1 | 0  0  1
#         2 | 0  1  1       1      0 1        0 / (0 + 1 + 1) = 0
#                                  1 2
#
#                           2      1 1        1 / (1 + 1 + 1) = 1/3
#                                  1 1
#
jaccard_score(y_true, y_pred, average=None)

array([1.        , 0.        , 0.33333333])

Problemas multiclase (average="macro")
---

In [7]:
#
#                           clase  matriz de  jaccard
#           |  y_pred              confusion
#           | 0  1  2       --------------------------------------
# ----------|---------      0      1 0        1 / (1 + 0 + 0) = 1
#         0 | 1  0  0              0 3
# y_true  1 | 0  0  1
#         2 | 0  1  1       1      0 1        0 / (0 + 1 + 1) = 0
#                                  1 2
#
#                           2      1 1        1 / (1 + 1 + 1) = 1/3
#                                  1 1
#
# jaccard = (1 + 0 + 1/3) / 3 = 4/9 = 0.44...
#
jaccard_score(y_true, y_pred, average="macro")

0.4444444444444444

In [8]:
#
#                           clase  matriz de  num  den
#           |  y_pred              confusion
#           | 0  1  2       --------------------------------------
# ----------|---------      0      1 0        1    (1 + 0 + 0) = 1
#         0 | 1  0  0              0 3
# y_true  1 | 0  0  1
#         2 | 0  1  1       1      0 1        0    (0 + 1 + 1) = 2
#                                  1 2
#
#                           2      1 1        1    (1 + 1 + 1) = 3
#                                  1 1
#                                        --------------------------
#                                         (1 + 0 + 1) / (1 + 2 + 3) = 2/6
#
jaccard_score(y_true, y_pred, average="micro")

0.3333333333333333