# Import & Dependencies

Follow this [article](https://towardsdatascience.com/roc-curve-and-auc-from-scratch-in-numpy-visualized-2612bb9459ab) closely. Can also find it in my saved folder.

In [None]:
!pip install celluloid
import sklearn
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

# Ordering in ROC

As we see below, if the order of the `predictions` is the same, then the ROC value for them is the same. This implies that ROC value is invariant of the **real probabilities** in the `predictions`, the preservation of order will guarantee the same roc value.

In [None]:
target = [1, 0, 1, 1, 0]
preds = [0.5, 0.25, 0.2, 0.3, 0.1]

metric1 = sklearn.metrics.roc_auc_score(y_true=target, y_score=preds, average="macro", sample_weight=None, max_fpr=None)  # 0.833

fpr_1, tpr_1, thresholds_1 = sklearn.metrics.roc_curve(target, preds, drop_intermediate=True)
print(metric1, fpr_1, tpr_1, thresholds_1)

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize =[10,9])
plt.title('Receiver Operating Characteristic')
plt.plot(fpr_1, tpr_1, 'b', label = 'AUC = %0.2f' % metric1)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show();

In [None]:
target = [1, 0, 1, 1, 0]
preds = [0.7, 0.15, 0.1, 0.2, 0.05]

metric2 = sklearn.metrics.roc_auc_score(target, preds)  # 0.833

fpr_2, tpr_2, thresholds_2 = sklearn.metrics.roc_curve(target, preds, drop_intermediate=True)
print(metric2, fpr_2, tpr_2, thresholds_2)

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize =[10,9])
plt.title('Receiver Operating Characteristic')
plt.plot(fpr_2, tpr_2, 'b', label = 'AUC = %0.2f' % metric2)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show();

In [None]:
target = [1, 0, 1, 1, 0]
preds = [100, 25, 20, 30, 10]

metric = sklearn.metrics.roc_auc_score(target, preds)  # 0.833

fpr, tpr, thresholds = sklearn.metrics.roc_curve(target, preds, drop_intermediate=True)
print(metric, fpr, tpr, thresholds)

# Sorting

It seems to be the case that if you sort the `y_pred` does not change the final roc-value. So why do people sort them?

In [None]:
y_true = [1,1,1,1,1,1,0,0,0,0]
y_pred = [0.99999,0.98,0.97,0.96,0.95,0.94,0.68139,0.50961,0.48880,0.44951]
#y_pred.reverse()
#y_pred = [1,1,1,1,1,1,1,1,0,0]

metric3 = sklearn.metrics.roc_auc_score(y_true, y_pred)  

fpr_3, tpr_3, thresholds_3 = sklearn.metrics.roc_curve(y_true, y_pred, drop_intermediate=True)
print(metric3, fpr_3, tpr_3, thresholds_3)

In [None]:
y_true_reorder = [1,0,1,0,1,0,1,0,1,1]

y_pred_reorder = [0.99999,0.68139,0.98,0.50961, 0.97,0.48880,0.96,0.44951,0.95,0.94]

metric4 = sklearn.metrics.roc_auc_score(y_true_reorder, y_pred_reorder)  
fpr_4, tpr_4, thresholds_4 = sklearn.metrics.roc_curve(y_true_reorder, y_pred_reorder, drop_intermediate=True)
print(metric4, fpr_4, tpr_4, thresholds_4)

# Imbalanced Dataset

I just finished reading this discussion. They argue that PR AUC is better than ROC AUC on imbalanced dataset.

For example, we have 10 samples in test dataset. 9 samples are positive and 1 is negative. We have a terrible model which predicts everything positive. Thus, we will have a metric that TP = 9, FP = 1, TN = 0, FN = 0.

Then, Precision = 0.9, Recall = 1.0. The precision and recall are both very high, but we have a poor classifier.

On the other hand, TPR = TP/(TP+FN) = 1.0, FPR = FP/(FP+TN) = 1.0. Because the FPR is very high, we can identify that this is not a good classifier.

Clearly, ROC is better than PR on imbalanced datasets. Can somebody explain why PR is better?

## PR curve

Usually when I do imbalanced models, even balanced models, I look at PR for ALL my classes.

In your example, yes, your positive class has P = 0.9 and R = 1.0. But what you should look at are ALL your classes. So for your negative class, your P = 0 and your R = 0. And you usually don't just look at PR scores individually. You want to look at F1-score (F1 macro or F1 micro, depending on your problem) that is a harmonic average of your PR scores for both class 1 and class 0. Your class 1 PR score is super good, but combine that with your class 0 PR score, your F1-score will be TERRIBLE, which is the correct conclusion for your scenario.

TL,DR: Look at PR scores for ALL your classes, and combine them with a metric like F1-score to have a realistic conclusion about your model performance. The F1-score for your scenario will be TERRIBLE, which is the correct conclusion for your scenario.

In [None]:
np.random.seed(1930)

"""
1000 rows of training data;
20 features (think of tumor size, tumor color etc to predict malignancy)
2 classes - think of malignant and benign
"""
X, y = make_classification(
    n_samples=1000,
    n_informative=10,
    n_features=20,
    flip_y=0.2,
    random_state=1930,
    n_classes=2,
)
# print(X.shape)
# print(set(y))
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=1930
)

model = RandomForestClassifier()
model.fit(X_train, y_train)

prob_vector = model.predict_proba(X_test)[:, 1]
# returns a 1000 by 1 array, with single probs indicating the probs of the class being class 1 (positive class)
# print(prob_vector)

This function `roc_from_scratch` takes in `y_preds` and `y_true` and a default argument `partitions = 100` and returns an `np.array` of shape `(partitions+1, 2)`. This function iterates over `partitions+1` times. As an example, if we take `partitions = 10`, then we start from 0, and increment 1 until it reaches 10, a total of 11 times. In the `for` loop, we calculate the `fpr` and `tpr` of `y_preds` and `y_true` at each `threshold_value`, given by `i/partitions`. So in the first loop and final loop, we will always take the threshold value of 0 and 1, to ensure our both ends starts and end at (0,0) and (1,1) respectively.

In [None]:
def true_false_positive(y_pred_thresholded: np.array, y_true: np.array):
    """
    Returns the tpr and fpr rate. This is a simple 2 class confusion matrix, can be extended to multi class, but start from this first.
    1 is pos class, 0 is neg class

            Parameters:
                    y_pred_thresholded (np.array): A decimal integer

                    y_true (np.array): Another decimal integer

            Returns:
                    tpr (float):
                            True positive rate (TPR) where given a threshold t, and an input of y_pred_thresholded vector of
                            1s and 0s based on the decision rule y_pred > t implies positive class.

                    fpr (float):
                            False positive rate (FPR) where given a threshold t, and an input of y_pred_thresholded vector of
                            1s and 0s based on the decision rule y_pred > t implies positive class.
            Examples:
            --------
                    >>> import numpy as np
                    >>> from sklearn import metrics
                    >>> y_true = np.array([1,1,0,1,0,0])
                    >>> y_pred_thresholded = np.array([1,1,1,0,0,0])
                    >>> fpr, tpr = true_false_positive(y_pred_thresholded, y_true)
                    >>> fpr, tpr
                    0.66, 0.33
    """
    true_positive = np.equal(y_pred_thresholded, 1) & np.equal(y_true, 1)
    true_negative = np.equal(y_pred_thresholded, 0) & np.equal(y_true, 0)
    false_positive = np.equal(y_pred_thresholded, 1) & np.equal(y_true, 0)
    false_negative = np.equal(y_pred_thresholded, 0) & np.equal(y_true, 1)

    tpr = true_positive.sum() / (true_positive.sum() + false_negative.sum())
    fpr = false_positive.sum() / (false_positive.sum() + true_negative.sum())

    return tpr, fpr


def roc_from_scratch(y_pred: np.array, y_true: np.array, partitions=100):
    """
    This function `roc_from_scratch` takes in `y_pred` and `y_true` and a default argument `partitions = 100` and returns an `np.array` of shape `(partitions+1, 2)`.
    This function iterates over `partitions+1` times. As an example, if we take `partitions = 10`, then we start from 0, and increment 1 until it reaches 10, a total of 11 times.
    In the `for` loop, we calculate the `fpr` and `tpr` of `y_pred` and `y_true` at each `threshold_value`, given by `i/partitions`.
    So in the first loop and final loop, we will always take the threshold value of 0 and 1, to ensure our both ends starts and end at (0,0) and (1,1) respectively.

            Parameters:
                    y_pred (np.array): predictions of the dataset in raw logits/probability form

                    y_true (np.array): ground truth of the dataset in 0 and 1 form for binary

            Returns:
                    tpr (float):
                            True positive rate (TPR) where given a threshold t, and an input of y_pred_thresholded vector of
                            1s and 0s based on the decision rule y_pred > t implies positive class.

                    fpr (float):
                            False positive rate (FPR) where given a threshold t, and an input of y_pred_thresholded vector of
                            1s and 0s based on the decision rule y_pred > t implies positive class.
            Examples:
            --------
                    >>> import numpy as np
                    >>> from sklearn import metrics
                    >>> y_true = np.array([0, 0, 1, 1])
                    >>> y_pred = np.array([0.1, 0.4, 0.35, 0.8])
                    >>> roc_from_scratch(y_pred, y_true,partitions=10)
                    array([ [1. , 1. ],
                            [1. , 1. ],
                            [0.5, 1. ],
                            [0.5, 1. ],
                            [0.5, 0.5],
                            [0. , 0.5],
                            [0. , 0.5],
                            [0. , 0.5],
                            [0. , 0.5],
                            [0. , 0. ],
                            [0. , 0. ] ])
    """
    tpr_fpr_array = np.array([])

    for i in range(partitions + 1):
        threshold_value = i / partitions
        # mask boolean array cast to int => True = 1, False = 0
        y_pred_thresholded = np.greater_equal(y_pred, threshold_value).astype(int)
        
        tpr, fpr = true_false_positive(y_pred_thresholded, y_true)
        tpr_fpr_array = np.append(tpr_fpr_array, [fpr, tpr])
        tpr_fpr_array = tpr_fpr_array.reshape((-1, 2))
    return tpr_fpr_array


def auc(fpr: np.array, tpr: np.array):
    """
    This function takes in fpr and tpr and calculates the integral of the tpr vs fpr graph using composite trapezoidal rule.
    We take y = tpr and x = fpr. We further take note that fpr and tpr on the x and y axis must be given in a monotone increasing/decreasing manner in a syncrhonized movement.
    Since it is integration, we can also use rectangles to estimate the area under the graph, where `dx` is the interval between each adjacent x-value (fpr).
    The height is given by the value of y at the point x. We note that we can use `np.diff` to check if the `dx` values are monotone.
    If `dx` are all negative, then it indicates the area is negative and we need to multiply by -1.

            Parameters:
                    fpr (np.array): predictions of the dataset in raw logits/probability form

                    tpr (np.array): ground truth of the dataset in 0 and 1 form for binary

            Returns:
                    tpr (float):
                            True positive rate (TPR) where given a threshold t, and an input of y_pred_thresholded vector of
                            1s and 0s based on the decision rule y_pred > t implies positive class.

                    fpr (float):
                            False positive rate (FPR) where given a threshold t, and an input of y_pred_thresholded vector of
                            1s and 0s based on the decision rule y_pred > t implies positive class.
            Examples:
            --------
                    >>> import numpy as np
                    >>> from sklearn import metrics
                    >>> y_true = np.array([0, 0, 1, 1])
                    >>> y_pred = np.array([0.1, 0.4, 0.35, 0.8])
                    >>> roc_from_scratch(y_pred, y_true,partitions=10)
                    array([ [1. , 1. ],
                            [1. , 1. ],
                            [0.5, 1. ],
                            [0.5, 1. ],
                            [0.5, 0.5],
                            [0. , 0.5],
                            [0. , 0.5],
                            [0. , 0.5],
                            [0. , 0.5],
                            [0. , 0. ],
                            [0. , 0. ] ])
    """
    direction = 1
    dx = np.diff(fpr)
    if np.any(dx < 0):
        if np.all(dx <= 0):
            direction = -1
        else:
            raise ValueError(
                "x is neither increasing nor decreasing " ": {}.".format(fpr)
            )
    area = np.trapz(y=tpr, x=fpr)
    return area


In [None]:
import matplotlib.pyplot as plt 
import seaborn as sns
sns.set()
plt.figure(figsize=(15,7))

ROC = roc_from_scratch(prob_vector,y_test,partitions=10)
plt.scatter(ROC[:,0],ROC[:,1],color='#0F9D58',s=100)
plt.title('ROC Curve',fontsize=20)
plt.xlabel('False Positive Rate',fontsize=16)
plt.ylabel('True Positive Rate',fontsize=16)
plt.show()

In [None]:
from celluloid import Camera
camera = Camera(plt.figure(figsize=(17,9)))
for i in range(30):
    ROC = roc_from_scratch(prob_vector,y_test,partitions=(i+1)*5)
    plt.scatter(ROC[:,0],ROC[:,1],color='#0F9D58',s=100)
    plt.title('ROC Curve',fontsize=20)
    plt.xlabel('False Positive Rate',fontsize=16)
    plt.ylabel('True Positive Rate',fontsize=16)
    camera.snap()
anim = camera.animate(blit=True,interval=300)
anim.save('scatter.gif')

In [None]:
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_test, prob_vector)

plt.figure(figsize=(15, 7))
plt.scatter(fpr, tpr, s=100, alpha=0.5, color="blue", label="Scikit-learn")
plt.scatter(
    ROC[:, 0], ROC[:, 1], color="red", s=100, alpha=0.3, label="Our implementation"
)
plt.title("ROC Curve", fontsize=20)
plt.xlabel("False Positive Rate", fontsize=16)
plt.ylabel("True Positive Rate", fontsize=16)
plt.legend()

In [None]:

def create_triangle(tpr_0,tpr_1,fpr_0,fpr_1):
    plt.plot([tpr_0,tpr_1],[fpr_0,fpr_1], 'k-', lw=2,color='#4285F4')
    plt.plot([tpr_0,tpr_1],[fpr_1,fpr_1], 'k-', lw=2,color='#4285F4')
    plt.plot([tpr_0,tpr_0],[fpr_0,fpr_1], 'k-', lw=2,color='#4285F4')
    print('area: ',(tpr_0-tpr_1)*(fpr_0-fpr_1)/2)

def create_rectangle(tpr_0,fpr_0):
    plt.plot([tpr_0,tpr_0],[fpr_0,0], 'k-', lw=2,color='#4285F4')

In [None]:
from celluloid import Camera

camera = Camera(plt.figure(figsize=(17, 9)))
for i in range(15):
    ROC = roc_from_scratch(prob_vector, y_test, partitions=5 + 2 * i)
    fpr, tpr = ROC[:, 0], ROC[:, 1]
    plt.scatter(fpr, tpr, color="#0F9D58", s=100)
    plt.plot([1, 0], [0, 0], "k-", lw=2, color="#4285F4")
    plt.title("ROC Curve", fontsize=20)
    plt.xlabel("False Positive Rate", fontsize=16)
    plt.ylabel("True Positive Rate", fontsize=16)

    for j in range(5 + 2 * i):
        create_rectangle(fpr[j], tpr[j])
    for k in range(5 + 2 * i):
        create_triangle(fpr[k], fpr[k + 1], tpr[k], tpr[k + 1])
    camera.snap()
anim = camera.animate(blit=True, interval=600)
anim.save("roc.gif")

In [None]:
import pandas as pd
import numpy as np
partitions = 100
ROC = roc_from_scratch(prob_vector, y_test, partitions=partitions)
fpr, tpr = ROC[:, 0], ROC[:, 1]
rectangle_roc = 0
for k in range(partitions):
        rectangle_roc = rectangle_roc + (fpr[k]- fpr[k + 1]) * tpr[k]
rectangle_roc

tpr vs fpr for ROC graph, where y = tpr, x = fpr.

In [None]:
np.trapz(y=tpr, x=fpr)

In [None]:
from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,prob_vector)

This function takes in fpr and tpr and calculates the integral of the tpr vs fpr graph using composite trapezoidal rule.
We take y = tpr and x = fpr. We further take note that fpr and tpr on the x and y axis must be given in a monotone increasing/decreasing manner in a syncrhonized movement.
Since it is integration, we can also use rectangles to estimate the area under the graph, where `dx` is the interval between each adjacent x-value (fpr). The height is given by the value of y at the point x. We note that we can use `np.diff` to check if the `dx` values are monotone. If `dx` are all negative, then it indicates the area is negative and we need to multiply by -1.


# Receiver operating characteristic (ROC)

[Wikipedia has an extensive explanation of the probability behind ROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)


[Reference I](https://www.kaggle.com/c/siim-isic-melanoma-classification/discussion/173020)

[On why thresholds return 2 sometimes](https://stackoverflow.com/questions/52358114/why-is-roc-curve-return-an-additional-value-for-the-thresholds-2-0-for-some-cl)(https://stackoverflow.com/questions/23200518/scikit-learn-roc-curve-why-does-it-return-a-threshold-value-2-some-time)

[PR-Curve vs ROC-Curve](https://neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc#:~:text=ROC%20AUC%20vs%20PR%20AUC&text=What%20is%20different%20however%20is,and%20true%20positive%20rate%20TPR.)


https://stackoverflow.com/questions/59666138/sklearn-roc-auc-score-with-multi-class-ovr-should-have-none-average-available
https://stackoverflow.com/questions/39685740/calculate-sklearn-roc-auc-score-for-multi-class
https://datascience.stackexchange.com/questions/36862/macro-or-micro-average-for-imbalanced-class-problems#:~:text=Micro%2Daverage%20is%20preferable%20if,your%20dataset%20varies%20in%20size.
https://www.google.com/search?q=roc_auc_score+multiclass+site:stackoverflow.com&rlz=1C1CHBF_enSG891SG891&sxsrf=ALeKk018tRSfmKgIUw63SPI8dsdkvJgPuw:1608711331403&sa=X&ved=2ahUKEwjg7NDb1OPtAhUXVH0KHVNHCmwQrQIoBHoECAMQBQ&biw=1280&bih=610
https://stackoverflow.com/questions/56227246/how-to-calculate-roc-auc-score-having-3-classes
https://glassboxmedicine.com/2019/02/23/measuring-performance-auc-auroc/
https://stackoverflow.com/questions/56227246/how-to-calculate-roc-auc-score-having-3-classes


## Definition of Binary Classification ROC-AUC


    sklearn.metrics.roc_auc_score(y_true, y_score, *, average='macro', sample_weight=None, max_fpr=None, multi_class='raise', labels=None)
    
    
**y_score: array-like of shape (n_samples,) or (n_samples, n_classes)**

Target scores. In the binary and multilabel cases, these can be either probability estimates or non-thresholded decision values (as returned by decision_function on some classifiers). In the multiclass case, these must be probability estimates which sum to 1. The binary case expects a shape (n_samples,), and the scores must be the scores of the class with the greater label. The multiclass and multilabel cases expect a shape (n_samples, n_classes). In the multiclass case, the order of the class scores must correspond to the order of labels, if provided, or else to the numerical or lexicographical order of the labels in y_true.

Understanding the binary case is important, it says that the binary case expects a list/array of shape `(n_samples,)`, a 1d-array, where the scores inside the 1d-array must be the scores of the **greater label**. In other words, if you have class 0 and 1, then the greater label is `np.argmax(0,1) = 1`. As a consequence, it is important that you should only pass the "positive class" which is the "greater label" here into the `y_score`. 

In multiclass, there are two cases, either you provide a `labels` argument in, say `labels = [0,2,1]` or `labels = [0,1,2]`, or if you do not provide, then the `y_score` will necessarily be in the order of the numerical/alphabetical order of the labels in `y_true`. In other words, if `y_true` has 3 unique labels: 0, 1 and 2; then the `y_score` will be a **2d-array** in the form of `y_score = [[0.2, 0.3, 0.5],[...],[...]]` where `y_score[0] = [0.2,0.3,0.5]` must correspond to class 1, 2 and 3 respectively, unless otherwise stated in `labels`.


### First Interpretation

Now ROC curve is a TPR vs FPR graph, and the AUC is the area under the curve literally. To find the ROC-AUC, we need to plot many different pairs of points on the graph, and compute the area under it.

As we can see from the above naive and simple example, there are a total of 6 pairs of points to plot. Those are from `fpr`, `tpr` respectively --> Allow me to further explain with this example where 1 is the positive class:

    y_true_1 = [0,0,1,1,0,0,1,1]
    y_preds_1 = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8]

1. We need to initialize the thresholds with a large number usually - usually `roc_curve` is written so that ROC point corresponding to the highest threshold `(fpr[0], tpr[0])` is always `(0, 0)`. If this is not the case, a new threshold is created with an arbitrary value of `max(y_score)+1`. Therefore, in this case, we get 1.8 as the first threshold. This large number will ensure the `fpr, tpr` starts at (0,0).

2. Next, when the threshold is $T=0.8$, then one can see that `y_preds_1` has 1 predictions 1, so `y_preds_1=[0,0,0,0,0,0,0,1]` and hence we can calculate the FPR and TPR: FPR will be 0 because no negative samples 0 are misclassified as 1 in our prediction. TPR will be 0.25 because by definition TPR=TP/TP+FN = 1/1+3=0.25 by definition. Therefore `(fpr, tpr) = (0,0.25)`

3. $T=0.7 \rightarrow$ `y_preds_1 = [0,0,0,0,0,0,1,1]`, same logic, FPR will be 0 cause no negative samples 0 are classified as 1 by our classifier! But TPR will be 0.5 because TPR = TP/TP+FN = 2/2+2 = 0.5. Therefore `(fpr, tpr) = (0,0.5)`

4. We continue this way until we exhaust all thresholds given `[array([1.8, 0.8, 0.7, 0.5, 0.3, 0.1])]`. And we plot on the graph. 

5. How then, do we calculate the area under this curve? One can refer to the source code `auc` in `sklearn.metrics.auc` and see that they used **[Trapezoidal Rule](https://en.wikipedia.org/wiki/Trapezoidal_rule)** to solve it.

6. So one have a rough idea, how the `ROC-AUC` area is computed, and one has to bear in mind that the area is calculated over all thresholds (apparently not the case as `sklearn` discretized the thresholds to reduce computing time, so you will not see the full range of thresholds here). 

7. As discussed earlier on [PR vs AUC](https://stats.stackexchange.com/questions/7207/roc-vs-precision-and-recall-curves), The key difference is that ROC curves will be the same no matter what the baseline probability is, but PR curves may be more useful in practice for needle-in-haystack type problems or problems where the "positive" class is more interesting than the negative class.

    To show this, first let's start with a very nice way to define precision, recall and specificity.  Assume you have a "positive" class called 1 and a "negative" class called 0.  $\hat{Y}$ is your estimate of the true class label $Y$.  Then:
    $$
    \begin{aligned}
    &\text{Precision} &= P(Y = 1 | \hat{Y} = 1)  \\
    &\text{Recall} = \text{Sensitivity} &= P(\hat{Y} = 1 | Y = 1)  \\
    &\text{Specificity} &= P(\hat{Y} = 0 | Y = 0)
    \end{aligned}
    $$
    The key thing to note is that sensitivity/recall and specificity, which make up the ROC curve, are probabilities *conditioned on the true class label*.  Therefore, they will be the same regardless of what $P(Y = 1)$ is.  Precision is a probability conditioned on *your estimate of the class label* and will thus vary if you try your classifier in different populations with different baseline $P(Y = 1)$.  However, it may be more useful in practice if you only care about one population with known background probability and the "positive" class is much more interesting than the "negative" class.  (IIRC precision is popular in the document retrieval field, where this is the case.)  This is because it directly answers the question, "What is the probability that this is a real hit given my classifier says it is?". 

    Interestingly, by Bayes' theorem you can work out cases where specificity can be very high and precision very low simultaneously.  All you have to do is assume $P(Y = 1)$ is very close to zero.  In practice I've developed several classifiers with this performance characteristic when searching for needles in DNA sequence haystacks.

    IMHO when writing a paper you should provide whichever curve answers the question you want answered (or whichever one is more favorable to your method, if you're cynical).  If your question is: "How meaningful is a positive result from my classifier *given the baseline probabilities of my problem*?", use a PR curve.  If your question is, "How well can this classifier be expected to perform *in general, at a variety of different baseline probabilities*?", go with a ROC curve.




## Binary ROC code from `sklearn`

**Average Parameter Will be ignored when y_true is binary.**

**Average Parameter Will be ignored when y_true is binary.**

**Average Parameter Will be ignored when y_true is binary.**

The above is important as there will be no difference on whether you put `average='macro,micro,samples'` etc.

In [None]:
y_true_binary = [0, 0, 1, 1, 0, 0, 1, 1]
y_preds_binary = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
roc_1 = sklearn.metrics.roc_auc_score(y_true=y_true_binary, y_score=y_preds_binary, average="macro", sample_weight=None, max_fpr=None)
fpr_1, tpr_1, thresholds_1 = sklearn.metrics.roc_curve(y_true_binary, y_preds_binary, drop_intermediate=True)
print(roc_1, fpr_1, tpr_1, thresholds_1)

# If we invert our y_true_binary to make 0 class 1 and class 1 0, then it will just be 1-roc_1!

y_true_binary = [1,1,0,0,1,1,0,0]
y_preds_binary = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
roc_2 = sklearn.metrics.roc_auc_score(y_true=y_true_binary, y_score=y_preds_binary, average="macro", sample_weight=None, max_fpr=None)
fpr_2, tpr_2, thresholds_2 = sklearn.metrics.roc_curve(y_true_binary, y_preds_binary, drop_intermediate=True)
print(roc_2, fpr_2, tpr_2, thresholds_2)

#fpr_1, tpr_1, thresholds_1 = roc_curve(y_true, y_preds, drop_intermediate=False)

import matplotlib.pyplot as plt
plt.figure(figsize =[10,9])
plt.title('Receiver Operating Characteristic')
plt.plot(fpr_1, tpr_1, 'b', label = 'AUC = %0.2f' % roc_1)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show();

## Multi-Class ROC

In multi-class model, we can plot N number of AUC ROC Curves for N number classes using One vs ALL methodology. So for Example, If you have three classes named X, Y and Z, you will have one ROC for X classified against Y and Z, another ROC for Y classified against X and Z, and a third one of Z classified against Y and X.


Firstly, you need to make use of the below code in [source](https://github.com/scikit-learn/scikit-learn/blob/0fb307bf3/sklearn/metrics/_ranking.py#L690) where we are using the concept of One-Vs-All (ovr) and first thing first, for all `y_true` labels, we need to `label_binarize` them. As we can see, we must pass in the `y_true` and `classes` in which if our classes are `[0,1,2,3,4,5]` then we need to specify in the `labels` argument of `roc_auc_curve`. If we do not specify, the `_encode` will help us as well, so it is up to one's preference if your labels order matter.

    else:
        # ovr is same as multi-label
        y_true_multilabel = label_binarize(y_true, classes=classes)
        return _average_binary_score(_binary_roc_auc_score, y_true_multilabel,
                                     y_score, average,
                                     sample_weight=sample_weight)

In [None]:
y = [0,0,0,0,1]
y_binarize=sklearn.preprocessing.label_binarize(y, classes=[0,1])
y_binarize.shape
y_binarize[:,0]

In [None]:
# This code belows also WORKS for Binary class if you are using Softmax Predictions!
# replicating from https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#multiclass-settings
def multiclass_roc(y_true,y_preds_softmax_array,config):
    label_dict = dict()   
    fpr = dict()
    tpr = dict()
    roc_auc = dict()
    roc_scores = []
    for label_num in range(len(config.class_list)):
        
        # get y_true_multilabel binarized version for each loop (end of each epoch)
        y_true_multiclass_array = sklearn.preprocessing.label_binarize(y_true, classes=config.class_list)

        y_true_for_curr_class = y_true_multiclass_array[:,label_num]
        y_preds_for_curr_class = y_preds_softmax_array[:, label_num]
        # calculate fpr,tpr and thresholds across various decision thresholds
        # pos_label = 1 because one hot encode guarantees it
        fpr[label_num],tpr[label_num],_ = sklearn.metrics.roc_curve(y_true=y_true_for_curr_class,
                                                       y_score=y_preds_for_curr_class,
                                                       pos_label=1)
        roc_auc[label_num] = sklearn.metrics.auc(fpr[label_num], tpr[label_num])
        roc_scores.append(roc_auc[label_num])
        # if binary class, the one hot encode will (n_samples,1) and therefore will only need to slice [:,0] ONLY.
        # that is why usually for binary class, we do not need to use this piece of code, just for testing purposes.
        # However, it will now treat our 0 (negative class) as positive, hence returning the roc for 0, in which case
        # to get both 0 and 1, you just need to use 1-roc(0)value
        if config.num_classes == 2:
            roc_auc[config.class_list[1]] = 1 - roc_auc[label_num]
            break
    avg_roc_score = np.mean(roc_scores)    
    return roc_auc, avg_roc_score