# Implementation of BoostMetric Distance Learning #1487

Open
wants to merge 7 commits into
from

## Conversation

Projects
None yet
2 participants
Member

### manish7294 commented Aug 2, 2018

 An implementation of boostmetric distance learning technique.

Member

### rcurtin left a comment

 Just some quick first pass comments. Do you have any timing simulations to compare it with LMNN? It would be interesting to see how it compares. Even though the two algorithms are pretty similar on paper, it seems the implementation ends up being quite different. Also, we can probably adapt some of the LMNN tests to BoostMetric. Definitely we should add tests one way or another but you probably are already thinking that. :)
 * @tparam MetricType The type of metric to use for computation. */ template class BOOSTMETRIC

#### rcurtin Aug 2, 2018

Member

Seems to me like we should call it BoostMetric.

#### manish7294 Aug 3, 2018

Author Member

Sure

 { /* nothing to do */ } // Calculate inner product of two matrices. inline double innerProduct(arma::mat& Ar, arma::mat& Z)

#### rcurtin Aug 2, 2018

Member

Why not just use arma::accu(Ar % Z)? (Actually arma::dot() might do the same thing also, but it's likely you might need a vectorise() call there too.)

#### manish7294 Aug 3, 2018

Author Member

That's much better.

 arma::vec eigval; arma::mat eigvec; arma::eig_sym(eigval, eigvec, (Acap + trans(Acap) / 2));

#### rcurtin Aug 2, 2018

Member

I'd suggest timing this specific step also. If we need we can modify Armadillo to use ARPACK to only get the largest eigenvalue, which can be a lot faster.

#### manish7294 Aug 3, 2018

Author Member

This varies from 0.020785s - 0.151616s for k = 3 to 45, where total time varies from 0.061939s - 5.971738s for iris. But basically, it depends on the number of iterations optimization takes. So, it will definitely be good to avoid calculating unnecessary eigenvalues.

#### rcurtin Aug 6, 2018

Member

It'll also take much longer for datasets with large dimensionality, so I'd be interested to see how long it takes for covertype (55d) and MNIST (784d).

 u(j) = u(j) * std::exp(-H(j) * w); // Normalize u so that sum(u_i) = 1. u /= arma::sum(u);

#### rcurtin Aug 2, 2018

Member

Do we end up with cases where u is either extremely small or 0? In that case we could avoid adding it to the sum Acap.

#### manish7294 Aug 3, 2018

Author Member

Here's the simulation supporting this - https://gist.github.com/manish7294/01bb5b6d2f5c4cbdb60dc5af541e39dc
Let me know your thoughts on this.

#### rcurtin Aug 6, 2018

Member

I see, it looks like this could be a good pruning opportunity if we have time (but it's not clear if this will provide much speedup to me, I'm not sure how much computation we save).

Member

### rcurtin commented Aug 6, 2018

 Thanks for the hard work on this; I haven't had a time to review deeper yet, but do you have any comparisons with the current LMNN code? (sorry to ask twice but I think that is one of the more important things we need to consider here)
Member Author

### manish7294 commented Aug 6, 2018

 No, worries! But I remember doing some simulations earlier, at least on a small dataset. Though I think I will do it again. I think this is what I was referring to - https://gist.github.com/manish7294/2388267666b1159ce261ce7b95dc923c
Member

### rcurtin commented Aug 15, 2018

 Right, so just FYI, I'm waiting on more accurate comparisons for both small and large datasets with the current LMNN code, to get a better idea of how this performs before I do a more in-depth review.
Member Author

### manish7294 commented Aug 15, 2018

 I am really sorry for the delay, I just got way to busy with college work mainly due to industrial internships interviews. Hopefully, everything will be over by next week. After that, I will be able to work at full pace. Maybe I am requesting too much but do you think we can keep it on hold till then or I can try to take out some time?
Member

### rcurtin commented Aug 15, 2018

 Of course! There is no hurry. :) Good luck with the interviews! I just wanted to point it out in case you were waiting on me.
Member Author

### manish7294 commented Aug 15, 2018

 Thanks a lot, I will be back as soon as everything is over :)
Member Author

### manish7294 commented Aug 24, 2018

 @rcurtin Here are some simulations I performed today - https://gist.github.com/manish7294/fc50da22451fd676d0ab0f4ccb4bc2a0 They might not be sufficient but I think they can atleast help in getting some insights. While performing simulations on letters dataset I have noticed that max eigen value continuously kept on oscillating instead of getting minimized.