Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LMNN: low-rank matrix optimization #1447

Open
rcurtin opened this Issue Jun 27, 2018 · 17 comments

Comments

Projects
None yet
2 participants
@rcurtin
Copy link
Member

rcurtin commented Jun 27, 2018

This one is already implemented in #1407, so really this issue is just more about seeing that it works okay. We want to ensure that when we run LMNN with a low-rank distance matrix, that the following things are true:

  1. We can get roughly equivalent kNN accuracy improvements. (That will be incredibly different with choice of rank and dataset.)
  2. We get some amount of speedup by the use of the low-rank dataset, and the speedup increases as the rank gets lower (if the number of iterations is held constant).

I expect that low-rank does make it faster, since kNN search will be faster as the dimension decreases. So really this issue, at this point, is more of a sanity check to make sure that this idea works as we expect.

@rcurtin rcurtin added the T: task label Jun 27, 2018

@rcurtin rcurtin changed the title LMNN: low-rank matrix optimizatio LMNN: low-rank matrix optimization Jun 27, 2018

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jun 27, 2018

I tried running a test for this one. But the results vary just too much, sometimes learning a low-rank perform better, other times not so (like for vc2 it's always way better and runtime is almost half, on other the hand with iris results are more and less same). Overall I would say reducing data dimensionality have some real visible effects on outcome. The result of which, currently I have commented out the test bc3b5bb

@rcurtin

This comment has been minimized.

Copy link
Member Author

rcurtin commented Jun 27, 2018

Right, I am not too surprised---I would expect some amount of variation depending on the dataset. iris is already very low-dimensional so I would not expect to be able to reduce dimensionality further, but for vc2 it might do better. And I bet that covertype could do well too.

Do you think you could paste some of the results you have here? (Probably multiple runs is good, since the random starting point could affect the results.)

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jun 28, 2018

Parameters (k = 5, optimizer: L-BFGS)
Iris:
Accuracy:
Initial Accuracy: 96.6667%
Rank = 1: 97.3333%
Rank = 2: 96.6667%
Rank = 3: 97.3333%
Rank = 4 (max dimensionality reached): 96.6667%

vc2:
Accuracy:
Initial Accuracy: 77.7778%
Rank = 1: 71.4976%
Rank = 2: 78.2609%
Rank = 3: 82.1256%
Rank = 4: 77.7778%
Rank = 5: 77.7778%
Rank = 6 (max dimensionality reached): 78.744%

letters_recognition (Range: 100):
Accuracy:
Initial Accuracy: 99.505%
Rank = 1: 31.92%
Rank = 2: 47.565%
Rank = 3: 95.645%
Rank = 5: 99.86%
Rank = 10: 100%
Rank = 17 (max dimensionality reached): 100%

@rcurtin

This comment has been minimized.

Copy link
Member Author

rcurtin commented Jun 28, 2018

Ah, can I ask you to re-run these simulations? Could you run each one 10 times with different random seeds, and report the mean accuracy, variance of the accuracy (or stddev, either is fine), min and max, as well as the mean/variance/min/max of the runtime? That will help give us a better idea of how much improvement we could expect. Also, could you run on the higher-dimensionality covertype dataset? (You can use a subset, that's ok.) Maybe MNIST is worth doing too but it will take a long time.

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jun 29, 2018

Parameters (k = 5, optimizer: L-BFGS)

Every data consists of 10 runs all with different seeds, the timings are on my system with DEBUG=ON and KNNAccuracy() time included as well.

Iris:
----------------------------------Accuracy------------------------ |---------------------Runtime----------------------
--------------Mean------Variance------Min-------Max---------|-------Mean------Variance------Min-------Max
Rank=1:---97.33----------0-----------97.33----97.33---------|-------0.82--------0.0003--------0.79------0.85
Rank=2:---96.67----------0-----------96.67----96.67---------|-------0.86--------0.0006--------0.83------0.92
Rank=3:---97.33----------0-----------97.33----97.33---------|-------0.89--------0.0003--------0.87------0.92
Rank=4:---96.67----------0-----------96.67----96.67---------|-------0.83--------0.0003--------0.81------0.86

vc2:
----------------------------------Accuracy------------------------ |---------------------Runtime----------------------
--------------Mean------Variance------Min-------Max---------|-------Mean------Variance------Min-------Max
Rank=1:---71.50----------0-----------71.50----71.50---------|-------2.57--------0.0020--------2.51------2.67
Rank=2:---78.26----------0-----------78.26----78.26---------|-------2.19--------0.0034--------2.11------2.29
Rank=3:---82.13----------0-----------82.13----82.13---------|-------2.49--------0.0034--------2.40------2.60
Rank=4:---77.78----------0-----------77.78----77.78---------|-------2.81--------0.0019--------2.77------2.91
Rank=5:---77.78----------0-----------77.78----77.78---------|-------2.75--------0.0060--------2.61------2.90
Rank=6:---78.74----------0-----------78.74----78.74---------|-------3.32--------0.0027--------3.22------3.40

covertype-5k (Range-10):
----------------------------------Accuracy------------------------ |---------------------Runtime----------------------
--------------Mean------Variance------Min-------Max---------|-------Mean------Variance------Min-------Max
Rank=1:---38.14----------0-----------38.14----38.14---------|------107.49------0.4584------106.42----108.78
Rank=2:---40.82----------0-----------40.82----40.82---------|------121.82------9.3717------118.57----125.11
Rank=10:--86.58---------0-----------86.58----86.58---------|------156.45------25.4719-----149.34----164.03
Rank=30:--86.20---------0-----------86.20----86.20---------|------309.86------39.6771-----304.14----319.99
Rank=40:--82.84---------0-----------86.20----86.20---------|------184.96------14.1565-----179.64----187.73
@rcurtin

This comment has been minimized.

Copy link
Member Author

rcurtin commented Jun 30, 2018

Hmm, there's no variance in the result accuracy. Are you sure this is running correctly with different random seeds?

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jun 30, 2018

These were the results I got on running on with different seeds but later on the same day I found a logical error while updating iterations, for which I had pushed a commit. So, maybe the results are not absolutely correct, I still have to check the accuracy after new changes.

@rcurtin

This comment has been minimized.

Copy link
Member Author

rcurtin commented Jul 2, 2018

Right, I saw that, but no matter what there should definitely be some amount of variance in the accuracy---I don't expect that LMNN would converge to the same result with SGD when it is starting from a different (random) initial point each time.

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jul 2, 2018

Will have to cross check with SGD like optimizers.

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jul 4, 2018

I see the problem, no matter what seed I am passing arma::randu() is initializing the matrix to same value every time. When I replaced the math::RandomSeed with arma::arma_rng::set_seed, the variance in accuracy on different seed was visible.

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jul 9, 2018

These are the results over the current LMNN code after solving RandomSeed() issue. Parameters are same as that of previous simulation except each data consists of 5 (previously it was 10) runs with different seed.

Iris:
-------------------------Accuracy------------------------ |---------------------Runtime----------------------
------------Mean------Variance------Min-------Max---------|-------Mean------Variance------Min-------Max
Rank=1:---97.33-------0.0000-------97.33----97.33---------|-------0.91--------0.0070--------0.82------1.04
Rank=2:---97.06-------0.3591-------96.67----98.00---------|-------1.32--------0.3152--------0.87------2.09
Rank=3:---96.93-------0.1307-------96.67----97.33---------|-------0.93--------0.0068--------0.88------1.07
Rank=4:---97.33-------0.2211-------96.67----98.00---------|-------1.15--------0.1355--------0.82------1.69



vc2:
-------------------------Accuracy------------------------ |---------------------Runtime----------------------
------------Mean------Variance------Min-------Max---------|-------Mean------Variance------Min-------Max
Rank=1:---73.91-------16.570-------68.59----79.71---------|-------5.02--------4.5000--------2.60------7.43
Rank=2:---77.29-------0.5834-------76.32----78.26---------|-------2.66--------0.0055--------2.57------2.76
Rank=3:---81.25-------0.6310-------80.19----82.12---------|-------2.96--------0.0104--------2.81------3.10
Rank=4:---78.93-------4.2666-------77.29----81.64---------|-------3.02--------0.0642--------2.74------3.43
Rank=5:---78.16-------0.6341-------77.29----79.22---------|-------3.17--------0.0522--------2.77------3.33
Rank=6:---78.84-------0.5136-------77.77----79.71---------|-------3.12--------0.0191--------2.99------3.34

@rcurtin

This comment has been minimized.

Copy link
Member Author

rcurtin commented Jul 9, 2018

Right, looks good, could you also give results for covertype-5k and letters_recognition? For the accuracy results it looks like as long as the rank is "high enough" (which for iris is 1 or larger and for vc2 is 3 or larger it seems), LMNN can still perform well, which is good.

The runtime results look roughly like I was hoping, but it seems like in some cases the number of iterations must be different---for instance, iris with rank 2 takes longer than rank 3, and vc2 with rank 1 takes twice as long as with rank 2. Could you verify that when you hold the number of iterations constant, we see the expected pattern? (where runtime increases as the rank increases) You could do this by running with, e.g., max_iterations = 5 or something. The accuracy results may not be great, but we'll get a good idea of how much speedup low-rank computation can give us, while not having the number of iterations be a point of confusion in our results.

Remember overall the questions we'd like to answer are:

  • Can low-rank optimization still provide good-enough performance? (seems like yes, we should see on covertype-5k and letters_recognition too)
  • Does low-rank optimization provide acceleration? (this can be done with the experiment holding the number of iterations constant)

I think we are close to being able to answer both questions positively. :)

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jul 9, 2018

Shall I take till tomorrow for further simulations? It takes quite some time ;)

I made some quick simulations on iris with max_teration : 5 and the results totally depends over seed, the lowest runtime is as low as 0.14 to as high as 0.87 (which can be seen from above results)

@rcurtin

This comment has been minimized.

Copy link
Member Author

rcurtin commented Jul 9, 2018

Sure, it's ok if it takes some time. Are you sure the maximum number of iterations is being properly passed and reached? I can't see any reason why the results for a given maximum number of iteration would depend on the random seed.

@manish7294

This comment has been minimized.

Copy link
Member

manish7294 commented Jul 9, 2018

As per the debug log number of iterations are 5 only. Though the number of node combinations and base cases calculated (which I think is a part of neighbor search) differs a lot.

@rcurtin

This comment has been minimized.

Copy link
Member Author

rcurtin commented Jul 10, 2018

Hmm, I guess the different low-rank transformations could cause significantly different structure in the data which could then cause significantly different nearest neighbor search runtimes. I would not expect that so much but I think I will try to do a little investigation. In the mean time, I do think the results are at least promising (even if the speedup is not so much). For much higher-dimensional datasets, I expect that low-rank LMNN could provide even more speedup.

@mlpack-bot

This comment has been minimized.

Copy link

mlpack-bot bot commented Feb 18, 2019

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added the s: stale label Feb 18, 2019

@rcurtin rcurtin added s: keep open and removed s: stale labels Feb 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.