You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- if 'auto', the threshold is determined as in the
original paper,
the documentation states that if the parameter 'contamination' is set to 'auto', the threshold for LOF will be calculated as outlined in the original paper.
we can see that the threshold is set to 1.5 in all cases.
The original paper (https://doi.org/10.1145/335191.335388) never mentions the "magic number" 1.5, which makes me wonder where it comes from.
I'd be glad if you could clarify this. If someone is able to provide a reference for the value of 1.5 I'll gladly update the documentation.
EDIT: The original paper mentions the number 1.5, but only in an example (Sec. 7.3). I don't think they meant to suggest that there is anything special about the number 1.5, it might even be possible that they settled on this number because of space constraints in the paper.
The text was updated successfully, but these errors were encountered:
We see that the objects in the uniform clusters all have their LOF equal to 1. Most objects in the Gaussian clusters also have 1 as their LOF values. Slightly outside the Gaussian clusters, there are several weak outliers, i.e., those with relatively low, but larger than 1, LOF values. The remaining seven objects all have significantly larger LOF values.
They are stating that LOF threshold should be significantly higher than 1. In 7.2 they look at outliers that have scores (2, 2.4, 2.5, 2.8, 6) and in 7.3 they used 1.5 threshold.
This threshold could be set to legacy computation, I guess, if 1.5 is too confusing.
In
scikit-learn/sklearn/neighbors/_lof.py
Lines 103 to 104 in baf0ea2
But here
scikit-learn/sklearn/neighbors/_lof.py
Lines 147 to 148 in baf0ea2
scikit-learn/sklearn/neighbors/_lof.py
Line 310 in baf0ea2
The original paper (https://doi.org/10.1145/335191.335388) never mentions the "magic number" 1.5, which makes me wonder where it comes from.
I'd be glad if you could clarify this. If someone is able to provide a reference for the value of 1.5 I'll gladly update the documentation.
EDIT: The original paper mentions the number 1.5, but only in an example (Sec. 7.3). I don't think they meant to suggest that there is anything special about the number 1.5, it might even be possible that they settled on this number because of space constraints in the paper.
The text was updated successfully, but these errors were encountered: