Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LMNN: optimization by bounding slack term #1449

Closed
rcurtin opened this Issue Jun 27, 2018 · 10 comments

Comments

Projects
None yet
2 participants
@rcurtin
Copy link
Member

rcurtin commented Jun 27, 2018

This is an optimization for #1407. This one is focused on the following term, which appears during the computation of the LMNNFunction objective function and gradient:

          eval = metric.Evaluate(transformedDataset.col(i),
                    transformedDataset.col(targetNeighbors(j, i))) -
                 metric.Evaluate(transformedDataset.col(i),
                    transformedDataset.col(impostors(l, i)));

        // Check bounding condition.
        if (eval < -1)
        {
          // update bound.
          bp = l;
          break;
        }

We're already avoiding the computation of extra slack terms with the if (eval < -1), which is great, but we can do better! Using a (not-yet-derived) bound depending on || L_{t + 1} - L_t ||_F^2, we can bound what eval will be at this iteration using last iteration's value. If it's not possible that eval is greater than -1, then we can avoid the metric.Evaluate() call, which is great.

The first step here would be deriving the bound, then we would need to probably find a way to cache evals from previous iterations and implement the bound. I think that we can get a large speedup here, since for very many points eval will be far less than -1.

(Actually, it might be interesting to run a quick experiment and see, typically, how many points have eval greater than -1, how many have eval greater than -2, etc., to get an idea of how much speedup this will get us.)

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jun 27, 2018

I ran some of the tests as you mentioned, and I think it is apparent that there's a lot of speedups.
Here are some of the figures:
Iris:

(eval  > -1) --- (-2 < eval < -1) --- (eval < -2)
1298 ------ 862------1590
944 ------ 656------2150
667 ------ 426------2657
921 ------ 647------2182
655 ------ 408------2687
898 ------ 636------2216
646 ------ 391------2713
866 ------ 633------2251
640 ------ 375------2735

vc2:

(eval  > -1) --- (-2 < eval < -1) --- (eval < -2)
1477 ------ 210------3488
1430 ------ 226------3519
1415 ------ 259------3501
1427 ------ 257------3491
1442 ------ 246------3487
1445 ------ 257------3473
1449 ------ 255------3471
1457 ------ 248------3470

letters:

(eval  > -1) --- (-2 < eval < -1) --- (eval < -2)
30240 ------ 88219------377372
15095 ------ 41007------443898
20475 ------ 60067------419458
24889 ------ 74262------400849

There are comparatively very fewer triplets on which we actually need to calculate.

@rcurtin

This comment has been minimized.

Copy link
Member Author

rcurtin commented Jun 27, 2018

Great, this looks like it will be worthwhile to implement. Would you like to try to derive the bound, or would you like me to do that? I think it can be done based on the triangle inequality derivations in the lmnn_bounds.pdf file. I'm happy to do it if you'd prefer to work on other parts. :)

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jun 28, 2018

I just gave it a quick thought and I think the bound will be somewhat related to
eval < eval_old + || L_{t + 1} - L_t ||_F^2 * ( || transformedDataset.col(i) - transformedDataset.col(targetNeighbors(j, i))) ||_F^2 - || transformedDataset.col(i) - transformedDataset.col(impostors(l, i))) ||_F^2 )

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jun 28, 2018

I tried doing some derivations and as per finding,
eval_new <= eval_old + || L_{t + 1} - L_t ||_F^2 * ( ||x_a||^2 + 2 * ||x_i||^2 + ||x_b||^2 )
Here x_a is target neighbor
x_b is impostors
x_i is current data point

new doc 2018-06-28_1

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jun 28, 2018

So, as you mentioned in lmnn_bounds.pdf we can cache ||x_i||^2 for all points and all that remains is calculating || L_{t + 1} - L_t ||_F^2

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jun 29, 2018

I tried this and in case there was nothing wrong with my implementation than either something is wrong with the bounds or it is taking just too much time and even there is a decrement in accuracies.

@rcurtin

This comment has been minimized.

Copy link
Member Author

rcurtin commented Jul 2, 2018

I believe that your derivation is correct here, so are you sure the implementation is correct? If you want to push your code I can take a look at it.

I actually think the derivation is a little easier if you start with d(L_{t + 1} x_i, L_{t + 1} x_a) on the left hand side instead of d(L_t x_i, L_t x_a) (and the same for the second term), then you don't have to take the last step to "flip" the inequality. But it is still correct either way. :)

Another thing that can make this bound tighter is noting that the reverse triangle inequality is also bounded by zero---so we can say, e.g., d(L_t x_i, L_t x_a) >= (what you derived) and also d(L_t x_i, L_t x_a) >= 0. But even before considering that it should still work fine, so I think there must be an implementation issue or something.

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jul 2, 2018

I will check the implementation once again as soon as I am back to work :)

@rcurtin

This comment has been minimized.

Copy link
Member Author

rcurtin commented Jul 2, 2018

Of course, no hurry---no need to answer the issues while you are on a vacation. :)

@rcurtin

This comment has been minimized.

Copy link
Member Author

rcurtin commented Jul 24, 2018

This is merged, so we can close this now. Thank you for the hard work, it is nice to see such speedups. 👍

@rcurtin rcurtin closed this Jul 24, 2018

@rcurtin rcurtin added the s: fixed label Jul 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.