In order to evaluate link prediction approaches, it is common to use some machine learning and classification metrics. We include the following ones:
The accuracy measures the number of correctly classified links.
\mbox{Accuracy} = \frac{\mbox{TP} + \mbox{TN}}{\mbox{TP} + \mbox{TN} + \mbox{FP} + \mbox{FN}}
where TP is the number of true positives, FP is the number of false positives, TN is the number of true negatives, and FN is the number of false negatives.
We have two options here (mutually exclusive):
cutoff
: the (maximum) number of predicted links to consider (all the remaining links shall be considered as negatively predicted links).threshold
: the minimum score to consider as positive (all the remaining links shall be considered as negatively predicted links).
When both appear in the configuration file, they will be considered separately.
Accuracy:
cutoff:
type: int
values: [1,5,10]
threshold:
type: double
values: [0.2,0.5,1.0]
The area under the receiver operating characteristic curve (AUC), as its name indicates, measures the area under a curve. Such curve shows the rate of true positives as a function of the rate of false positives.
Reference: T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874 (2006).
AUC:
The F1 score combines precision and recall (see Precision and Recall, respectively) in a single value. It is the harmonic mean of the two measures:
\mbox{F1-score} = \frac{2\cdot\mbox{TP}}{2\cdot\mbox{TP} + \mbox{FN} + \mbox{FP}}
where TP is the number of true positives, FP is the number of false positives and FN is the number of false negatives.
We have two options here (mutually exclusive):
cutoff
: the (maximum) number of predicted links to consider (all the remaining links shall be considered as negatively predicted links).threshold
: the minimum score to consider as positive (all the remaining links shall be considered as negatively predicted links).
When both appear in the configuration file, they will be considered separately.
F1-score:
cutoff:
type: int
values: [1,5,10]
threshold:
type: double
values: [0.2,0.5,1.0]
The precision measures the proportion of correctly predicted links among those the algorithm has label as positive.
\mbox{Precision} = \frac{\mbox{TP}}{\mbox{TP} + \mbox{FP}}
where TP is the number of true positives and FP is the number of false positives.
We have two options here (mutually exclusive):
cutoff
: the (maximum) number of predicted links to consider (all the remaining links shall be considered as negatively predicted links).threshold
: the minimum score to consider as positive (all the remaining links shall be considered as negatively predicted links).
When both appear in the configuration file, they will be considered separately.
Precision:
cutoff:
type: int
values: [1,5,10]
threshold:
type: double
values: [0.2,0.5,1.0]
The recall measures the proportion of correctly predicted links which have been labeled as positive
\mbox{Precision} = \frac{\mbox{TP}}{\mbox{TP} + \mbox{FN}}
where TP is the number of true positives and FN is the number of false negatives.
We have two options here (mutually exclusive):
cutoff
: the (maximum) number of predicted links to consider (all the remaining links shall be considered as negatively predicted links).threshold
: the minimum score to consider as positive (all the remaining links shall be considered as negatively predicted links).
When both appear in the configuration file, they will be considered separately.
Recall:
cutoff:
type: int
values: [1,5,10]
threshold:
type: double
values: [0.2,0.5,1.0]