-
Notifications
You must be signed in to change notification settings - Fork 187
The AUC-meter evaluates differently from classical statistics #42
Comments
The method of computing the AUC described in the article you linked to is a bit different from what is implemented here. Note how the article does linear interpolation between the actual observations. This is not entirely accurate: the linear interpolation is just an approximation of the AUC at the unobserved points. We use a more conservative, constant approximation: in our case, the plot would look like a stepsize function that never lies above the linear interpolation version (and is only equal at observed points). As a result, the AUC we measure will always be lower than what the article's method computes. It is still possible that there is a bug in |
Thanks for the explanation. I think I grasp the idea but since I'm not entirely comfortable with the calculation I switched approach - now I simply go for a random guess being equal to 0.5 and a perfect guess equal to 1. Unfortunately I seem to still be missing something as I somehow can't get the "perfect guess" to work, my current test code is: function test.AUCMeter()
local mtr = tnt.AUCMeter()
local test_size = 10^3
mtr:add(torch.rand(test_size), torch.zeros(test_size))
mtr:add(torch.rand(test_size), torch.Tensor(test_size):fill(1))
local err = mtr:value()
tester:eq(err, 0.5, "Random guesses should provide a AUC close to 0.5", 10^-1)
mtr:add(torch.Tensor(test_size):fill(0), torch.zeros(test_size))
mtr:add(torch.Tensor(test_size):fill(0.1), torch.zeros(test_size))
mtr:add(torch.Tensor(test_size):fill(0.2), torch.zeros(test_size))
mtr:add(torch.Tensor(test_size):fill(0.3), torch.zeros(test_size))
mtr:add(torch.Tensor(test_size):fill(0.4), torch.zeros(test_size))
mtr:add(torch.Tensor(test_size):fill(1), torch.Tensor(test_size):fill(1))
err = mtr:value()
tester:eq(err, 1, "Only correct guesses should provide a AUC close to 1", 10^-1)
-- Simulate a random situation where all the guesses are correct
mtr:reset()
local output = torch.abs(torch.rand(test_size)-.5)*2/3
mtr:add(output, torch.zeros(test_size))
output = torch.min(
torch.cat(torch.rand(test_size) + .75,
torch.Tensor(test_size):fill(1),
2),
2)
mtr:add(output:fill(1), torch.Tensor(test_size):fill(1))
err = mtr:value()
tester:eq(err, 1, "Simulated random correct guesses should provide a AUC close to 1", 10^-1)
end I've tried several versions of this with the estimate being around 0.75. I guess it's related to the step quality as it evaluates to 3/4 but the random attempt should in my mind smooth out the steps. |
The first unit test is a bit flaky because they contain randomness (maybe use an example for which you know the correct answer instead?). Also, note you're missing a This bug is now fixed. Thanks for spotting this! |
Thanks. The |
Okay, yeah let's fix the random seed for the test then. I don't think it is a good idea to have the unit tests fail with some non-zero probability, since we plan to increasingly rely on Travis to determine whether or not pull requests are okay. Thanks for contributing these tests! |
Done. Thank you for the excellent package and your patience with my questions. |
I've finished writing a basic test-suite for the meters and apart from issue #41 I've encountered an unexpected problem with the
tnt.AUCMeter
. The following test-case should hopefully implement classical AUC-calculation based on this paper:Unfortunately the AUC is lower (0.704) than expected 0.893. I'm not familiar with ML enough to know if there ML AUC differs in some significant way but the value 0.704 seems intuitively low (my apologies if I missed something in the coding). After looking at how the AUC is calculated there is a zero appended that could possibly be pulling the value down.
The text was updated successfully, but these errors were encountered: