# LMNN: optimization by bounding slack term #1449

Closed
opened this Issue Jun 27, 2018 · 10 comments

Projects
None yet
2 participants
Member

### rcurtin commented Jun 27, 2018

 This is an optimization for #1407. This one is focused on the following term, which appears during the computation of the LMNNFunction objective function and gradient: `````` eval = metric.Evaluate(transformedDataset.col(i), transformedDataset.col(targetNeighbors(j, i))) - metric.Evaluate(transformedDataset.col(i), transformedDataset.col(impostors(l, i))); // Check bounding condition. if (eval < -1) { // update bound. bp = l; break; } `````` We're already avoiding the computation of extra slack terms with the `if (eval < -1)`, which is great, but we can do better! Using a (not-yet-derived) bound depending on `|| L_{t + 1} - L_t ||_F^2`, we can bound what `eval` will be at this iteration using last iteration's value. If it's not possible that `eval` is greater than `-1`, then we can avoid the `metric.Evaluate()` call, which is great. The first step here would be deriving the bound, then we would need to probably find a way to cache `eval`s from previous iterations and implement the bound. I think that we can get a large speedup here, since for very many points `eval` will be far less than `-1`. (Actually, it might be interesting to run a quick experiment and see, typically, how many points have `eval` greater than -1, how many have `eval` greater than -2, etc., to get an idea of how much speedup this will get us.)

Member

### manish7294 commented Jun 27, 2018

 I ran some of the tests as you mentioned, and I think it is apparent that there's a lot of speedups. Here are some of the figures: Iris: ``````(eval > -1) --- (-2 < eval < -1) --- (eval < -2) 1298 ------ 862------1590 944 ------ 656------2150 667 ------ 426------2657 921 ------ 647------2182 655 ------ 408------2687 898 ------ 636------2216 646 ------ 391------2713 866 ------ 633------2251 640 ------ 375------2735 `````` vc2: ``````(eval > -1) --- (-2 < eval < -1) --- (eval < -2) 1477 ------ 210------3488 1430 ------ 226------3519 1415 ------ 259------3501 1427 ------ 257------3491 1442 ------ 246------3487 1445 ------ 257------3473 1449 ------ 255------3471 1457 ------ 248------3470 `````` letters: ``````(eval > -1) --- (-2 < eval < -1) --- (eval < -2) 30240 ------ 88219------377372 15095 ------ 41007------443898 20475 ------ 60067------419458 24889 ------ 74262------400849 `````` There are comparatively very fewer triplets on which we actually need to calculate.
Member Author

### rcurtin commented Jun 27, 2018

 Great, this looks like it will be worthwhile to implement. Would you like to try to derive the bound, or would you like me to do that? I think it can be done based on the triangle inequality derivations in the lmnn_bounds.pdf file. I'm happy to do it if you'd prefer to work on other parts. :)
Member

### manish7294 commented Jun 28, 2018

 I just gave it a quick thought and I think the bound will be somewhat related to `eval < eval_old + || L_{t + 1} - L_t ||_F^2 * ( || transformedDataset.col(i) - transformedDataset.col(targetNeighbors(j, i))) ||_F^2 - || transformedDataset.col(i) - transformedDataset.col(impostors(l, i))) ||_F^2 )`

Open

Member

### manish7294 commented Jun 28, 2018 • edited

 I tried doing some derivations and as per finding, eval_new <= eval_old + || L_{t + 1} - L_t ||_F^2 * ( ||x_a||^2 + 2 * ||x_i||^2 + ||x_b||^2 ) Here x_a is target neighbor x_b is impostors x_i is current data point
Member

### manish7294 commented Jun 28, 2018

 So, as you mentioned in lmnn_bounds.pdf we can cache ||x_i||^2 for all points and all that remains is calculating || L_{t + 1} - L_t ||_F^2
Member

### manish7294 commented Jun 29, 2018

 I tried this and in case there was nothing wrong with my implementation than either something is wrong with the bounds or it is taking just too much time and even there is a decrement in accuracies.
Member Author

### rcurtin commented Jul 2, 2018

 I believe that your derivation is correct here, so are you sure the implementation is correct? If you want to push your code I can take a look at it. I actually think the derivation is a little easier if you start with `d(L_{t + 1} x_i, L_{t + 1} x_a)` on the left hand side instead of `d(L_t x_i, L_t x_a)` (and the same for the second term), then you don't have to take the last step to "flip" the inequality. But it is still correct either way. :) Another thing that can make this bound tighter is noting that the reverse triangle inequality is also bounded by zero---so we can say, e.g., `d(L_t x_i, L_t x_a) >= (what you derived)` and also `d(L_t x_i, L_t x_a) >= 0`. But even before considering that it should still work fine, so I think there must be an implementation issue or something.
Member

### manish7294 commented Jul 2, 2018

 I will check the implementation once again as soon as I am back to work :)
Member Author

### rcurtin commented Jul 2, 2018

 Of course, no hurry---no need to answer the issues while you are on a vacation. :)

Merged

Member Author

### rcurtin commented Jul 24, 2018

 This is merged, so we can close this now. Thank you for the hard work, it is nice to see such speedups. 👍