Implementation of BoostMetric Distance Learning #1487

manish7294 · 2018-08-02T09:52:37Z

An implementation of boostmetric distance learning technique.

rcurtin

Just some quick first pass comments. Do you have any timing simulations to compare it with LMNN? It would be interesting to see how it compares. Even though the two algorithms are pretty similar on paper, it seems the implementation ends up being quite different.

Also, we can probably adapt some of the LMNN tests to BoostMetric. Definitely we should add tests one way or another but you probably are already thinking that. :)

rcurtin · 2018-08-02T20:22:32Z

src/mlpack/methods/boostmetric/boostmetric.hpp

+ * @tparam MetricType The type of metric to use for computation.
+ */
+template<typename MetricType = metric::SquaredEuclideanDistance>
+class BOOSTMETRIC


Seems to me like we should call it BoostMetric.

rcurtin · 2018-08-02T20:23:19Z

src/mlpack/methods/boostmetric/boostmetric_impl.hpp

+{ /* nothing to do */ }
+
+// Calculate inner product of two matrices.
+inline double innerProduct(arma::mat& Ar, arma::mat& Z)


Why not just use arma::accu(Ar % Z)? (Actually arma::dot() might do the same thing also, but it's likely you might need a vectorise() call there too.)

That's much better.

rcurtin · 2018-08-02T20:24:21Z

src/mlpack/methods/boostmetric/boostmetric_impl.hpp

+
+    arma::vec eigval;
+    arma::mat eigvec;
+    arma::eig_sym(eigval, eigvec, (Acap + trans(Acap) / 2));


I'd suggest timing this specific step also. If we need we can modify Armadillo to use ARPACK to only get the largest eigenvalue, which can be a lot faster.

This varies from 0.020785s - 0.151616s for k = 3 to 45, where total time varies from 0.061939s - 5.971738s for iris. But basically, it depends on the number of iterations optimization takes. So, it will definitely be good to avoid calculating unnecessary eigenvalues.

It'll also take much longer for datasets with large dimensionality, so I'd be interested to see how long it takes for covertype (55d) and MNIST (784d).

rcurtin · 2018-08-02T20:26:21Z

src/mlpack/methods/boostmetric/boostmetric_impl.hpp

+      u(j) = u(j) * std::exp(-H(j) * w);
+
+    // Normalize u so that sum(u_i) = 1.
+    u /= arma::sum(u);


Do we end up with cases where u is either extremely small or 0? In that case we could avoid adding it to the sum Acap.

Here's the simulation supporting this - https://gist.github.com/manish7294/01bb5b6d2f5c4cbdb60dc5af541e39dc
Let me know your thoughts on this.

I see, it looks like this could be a good pruning opportunity if we have time (but it's not clear if this will provide much speedup to me, I'm not sure how much computation we save).

…to boostmetric

rcurtin · 2018-08-06T15:01:57Z

Thanks for the hard work on this; I haven't had a time to review deeper yet, but do you have any comparisons with the current LMNN code? (sorry to ask twice but I think that is one of the more important things we need to consider here)

manish7294 · 2018-08-06T15:10:41Z

No, worries! But I remember doing some simulations earlier, at least on a small dataset. Though I think I will do it again. I think this is what I was referring to - https://gist.github.com/manish7294/2388267666b1159ce261ce7b95dc923c

rcurtin · 2018-08-15T15:47:53Z

Right, so just FYI, I'm waiting on more accurate comparisons for both small and large datasets with the current LMNN code, to get a better idea of how this performs before I do a more in-depth review.

manish7294 · 2018-08-15T16:18:46Z

I am really sorry for the delay, I just got way to busy with college work mainly due to industrial internships interviews. Hopefully, everything will be over by next week. After that, I will be able to work at full pace. Maybe I am requesting too much but do you think we can keep it on hold till then or I can try to take out some time?

rcurtin · 2018-08-15T16:24:45Z

Of course! There is no hurry. :) Good luck with the interviews! I just wanted to point it out in case you were waiting on me.

manish7294 · 2018-08-15T16:26:43Z

Thanks a lot, I will be back as soon as everything is over :)

manish7294 · 2018-08-24T13:56:12Z

@rcurtin Here are some simulations I performed today - https://gist.github.com/manish7294/fc50da22451fd676d0ab0f4ccb4bc2a0

They might not be sufficient but I think they can atleast help in getting some insights.

While performing simulations on letters dataset I have noticed that max eigen value continuously kept on oscillating instead of getting minimized.

mlpack-bot · 2019-07-19T02:25:49Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

manish7294 · 2019-07-19T04:35:22Z

Ahh, finally after a whole year of wait, bot finally took it's call and marked this one as stale :)
So, I think we should also decide something for this @rcurtin.

rcurtin · 2019-07-19T15:14:23Z

Yeah, this one fell off my list too, but I do think we should merge it in when we can. Let me try and do a review in the near future. I'll mark it 'keep open'. 👍

manish7294 added 3 commits August 2, 2018 15:14

Added boostmetric

7b07650

remove extra spaces

03476cd

update triplet()

fadd2f1

rcurtin reviewed Aug 2, 2018

View reviewed changes

manish7294 added 4 commits August 4, 2018 15:08

optimized code

c00f980

Merge branch 'master' into boostmetric

25a44ef

added tests

84c6f03

Merge branch 'boostmetric' of https://github.com/manish7294/mlpack in…

37871d7

…to boostmetric

rcurtin added c: methods t: added feature labels Jan 19, 2019

mlpack-bot bot added the s: stale label Jul 19, 2019

mlpack-bot bot removed the s: stale label Jul 19, 2019

rcurtin added the s: keep open label Jul 19, 2019

conradsnicta added s: stale and removed s: keep open labels Jul 13, 2021

mlpack-bot bot closed this Jul 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of BoostMetric Distance Learning #1487

Implementation of BoostMetric Distance Learning #1487

manish7294 commented Aug 2, 2018

rcurtin left a comment

rcurtin Aug 2, 2018

manish7294 Aug 3, 2018

rcurtin Aug 2, 2018

manish7294 Aug 3, 2018

rcurtin Aug 2, 2018

manish7294 Aug 3, 2018

rcurtin Aug 6, 2018

rcurtin Aug 2, 2018

manish7294 Aug 3, 2018

rcurtin Aug 6, 2018

rcurtin commented Aug 6, 2018

manish7294 commented Aug 6, 2018

rcurtin commented Aug 15, 2018

manish7294 commented Aug 15, 2018

rcurtin commented Aug 15, 2018

manish7294 commented Aug 15, 2018

manish7294 commented Aug 24, 2018

mlpack-bot bot commented Jul 19, 2019

manish7294 commented Jul 19, 2019

rcurtin commented Jul 19, 2019

Implementation of BoostMetric Distance Learning #1487

Implementation of BoostMetric Distance Learning #1487

Conversation

manish7294 commented Aug 2, 2018

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcurtin commented Aug 6, 2018

manish7294 commented Aug 6, 2018

rcurtin commented Aug 15, 2018

manish7294 commented Aug 15, 2018

rcurtin commented Aug 15, 2018

manish7294 commented Aug 15, 2018

manish7294 commented Aug 24, 2018

mlpack-bot bot commented Jul 19, 2019

manish7294 commented Jul 19, 2019

rcurtin commented Jul 19, 2019