# In this notebook we try to come up with pF1 score that will be differentiable for any input
The original implementation was not differentiable if both precision and recall was 0, which is the case for untrained model.

In [None]:
import sympy as sp

In [None]:
c_precision, c_recall, beta = sp.symbols("c_precision, c_recall, beta")
beta_squared = beta ** 2
# symbolic pF1 function:
sym_pF1 = (
    (1 + beta_squared)
    * (c_precision * c_recall)
    / (beta_squared * c_precision + c_recall)
)
sym_pF1

In [None]:
sym_pF1.limit(c_precision, 0)

In [None]:
sym_pF1.limit(c_recall, 0)

We have confirmed, that with respect to recall and precision, the pF1 score reaches 0 in the limit to (recall == 0, precision == 0)

In [None]:
sp.plotting.plot3d(
    sym_pF1.subs(beta, 1),
    (c_precision, 1e-5, 1e-3),
    (c_recall, 1e-5, 1e-3),
    title = "pF1 score near 0,0 w.r.t precision and recall. Visual confirmation that the pF1 goes to 0.",
);

In [None]:
sym_pF1.diff(c_recall).limit(c_recall, 0).limit(c_precision, 0)

In [None]:
sym_pF1.diff(c_precision).limit(c_recall, 0).limit(c_precision, 0)

The partial derivative of pF1 is zero w.r.t precision, and `(b**2+1) / (b**2)` for recall.  

If we want the pF1 to be used directly in training a DL model, the implementation of pF1 we use, should be differentiable for any expected input.
When both precision and recall are 0 the function is undefined due to zero in the denominator.  

We can hack the implementation of pF1 to pass the proper gradient if we find a function that has the same partial derivative and the same value at (precision == 0, recall == 0)

In [None]:
f_of_the_same_gradient_at_0 = (
    (1 + beta_squared) / beta_squared * c_recall
)

In [None]:
f_of_the_same_gradient_at_0.diff(c_precision).limit(c_recall, 0).limit(c_precision, 0)

In [None]:
f_of_the_same_gradient_at_0.diff(c_recall).limit(c_recall, 0).limit(c_precision, 0)

In [None]:
f_of_the_same_gradient_at_0.subs(c_recall, 0).subs(c_precision, 0)  # at (0,0) the value is zero, same as pF1

In [None]:
# the function we have found:
f_of_the_same_gradient_at_0

desired implementation of the pF1 score:
```python
import numpy as np
from numpy.typing import NDArray


def pfbeta(
    labels: NDArray[np.int_], preds: NDArray[np.float_], beta: float = 1.0
) -> float:
    preds = preds.clip(0, 1)
    y_true_count = labels.sum()
    ctp = preds[labels == 1].sum()
    cfp = preds[labels == 0].sum()
    beta_squared = beta * beta
    c_precision = ctp / (ctp + cfp)
    c_recall = ctp / y_true_count
    if (c_precision + c_recall) == 0:
        # by definition the return value here is 0.0
        # Yet returned in this way, it has the same partial derivatives 
        # w.r.t precision and recall as the true pF1 in the limit
        # as the precision goes to 0 and the recall goes to 0
        zero: float = (1 + beta_squared) / beta_squared * c_recall
        return zero
    result: float = (
        (1 + beta_squared)
        * (c_precision * c_recall)
        / (beta_squared * c_precision + c_recall)
    )
    return result
```

When pF1 is implemented as above, the proper gradient is calculated. When untrained (`pF1 == 0`), the model with pF1 as a Loss will first try to increase recall, thus increase true positives. Note that for proper calculation of pF1, the pF1 should be calculated over the whole dataset.