Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LMNN: Tree optimization #1472

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@manish7294
Copy link
Member

manish7294 commented Jul 20, 2018

See #1445

As of now, the changes are made for just lbfgs optimizer as the plan was to adapt the same for others once it is successfully done for lbfgs. So, basically one of the Impostors() function is modified.

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 21, 2018

I spent a little while looking into the issue and I have a patch, but I'm still using this laptop where I have no way to copy-paste, so I've just mailed you the patch. It also has a few TODOs in it---there is one remaining bug to handle, and I realized the test coverage is lacking in a few places that we should fix (it will be easy to fix though).

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 22, 2018

// TODO: here is our bug. The tree's dataset does not hold the whole dataset---remember, it holds dataset.cols(indexDiff[i]). But when the call knn.Train(dataset.cols(indexDiff[i])) happens, that submatrix is copied and rearranged. So, it's likely the easiest thing to do here is pass (transformation - oldTransformation) and multiply that with Tree.Dataset(), to incrementally update the values in the dataset here.

Right, this is definitely a bug. Thanks for pointing this out.
Unfortunately we can't do that here as we apply transformation on original dataset whereas Tree.Dataset() already contains last iteration transformed dataset. So, it will be like transformation2 * (transformation1 * dataset). Need to figure out something else.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 22, 2018

@rcurtin Is it possible to stop rearranging the dataset when tree is build?

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 22, 2018

I think this will work -

Tree.Dataset() = transformation * inverseOf (oldTransformation) * Tree.Dataset();

But the problem here will be finding the inverseOf (oldTransformation) as in our case, it is not necessarily a square matrix.

What do you think?

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 22, 2018

I thunk maybe storing the original reordered data matrices after tree building into some std::vector<arma::mat>could work here. So then you could do

tree.Dataset() = transformation * origTreeData[i];

or something to this effect.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 23, 2018

I think it still won't be the original dataset. It will depend on the initial learning point and even if initial learning point is identity matrix, it won't work for optimizers using batches.

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 23, 2018

Have you tried the idea? Why do you think it won't work for optimizers using batches?

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 23, 2018

Like just after first batch is processed, transformation will be updated and hence for the second batch it will a differently transformed dataset.

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 23, 2018

Why will the transformation update be a problem? What I am saying is that we have a set of reference trees that we built when we created the Constraints() objects. We can store the datasets on which we build the trees---these are untransformed. Then, whenever we want to do a search in Impostors(), we simply set the tree's dataset to the current transformation multiplied with the original untransformed dataset, update the tree, and search. This should work for both batches and full-dataset computations. If I can clarify what I mean, just let me know.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 23, 2018

Oh! Are you referring to the reference trees being built in the LMNN function constructor. If that's the case I think that will work, just a bit of tweaking will be needed to pass them to constraints and adapt according to the search.

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 23, 2018

Right, those are the important ones to cache. If we can cache query trees for the whole-dataset evaluations that is great too but it's not likely to be possible or useful to cache a query tree for the small batch case. Do you think you have enough info and ideas to finish the implementation from here?

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 23, 2018

I think it's atleast a good starting point, I will first try doing it from here and if by any chance I face some problem, I will let you know.

@@ -384,9 +384,15 @@ class BinarySpaceTree
*/
ElemType FurthestDescendantDistance() const;

//! Modify the furthest possible descendant distance.
ElemType& FurthestDescendantDistance() { return furthestDescendantDistance; }

This comment has been minimized.

@manish7294

manish7294 Jul 23, 2018

Author Member

It seems like this change is making a number of tests to fail.

This comment has been minimized.

@rcurtin

rcurtin Jul 23, 2018

Member

What have you tried to debug this?

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 23, 2018

@rcurtin Can you please check and verify that this is what you were referring to earlier today.

@@ -390,9 +390,6 @@ class BinarySpaceTree
//! Return the minimum distance from the center of the node to any bound edge.
ElemType MinimumBoundDistance() const;

//! Modify the minimum distance from the center of the node to any bound edge.
ElemType& MinimumBoundDistance() { return minimumBoundDistance; }

This comment has been minimized.

@rcurtin

rcurtin Jul 23, 2018

Member

Right, it looks like the other version of this overload calls bound.MinWidth(). So the minimumBoundDistance member appears to be entirely unused in BinarySpaceTree and we can remove it.

This comment has been minimized.

@manish7294

manish7294 Jul 23, 2018

Author Member

And this sloves the failing tests issue as well :)

@rcurtin
Copy link
Member

rcurtin left a comment

Right, this looks good! Do you want to run some runtime simulations and see what kind of speedup we get, just like for #1466? (Later we can do simulations with both, for now, just the speedup of this over the master branch is good.)

{
double minWidth = DBL_MAX;
arma::Col<double> mins(node.Dataset().n_rows);
arma::Col<double> maxs(node.Dataset().n_rows);

This comment has been minimized.

@rcurtin

rcurtin Jul 24, 2018

Member

Technically this should actually be arma::Col<typename TreeType::ElemType> (or something like this) to make it maximally generic. So if you can change it it would be great, but it won't affect any functionality.

This comment has been minimized.

@manish7294

manish7294 Jul 25, 2018

Author Member

Done!

arma::mat& outputDistance,
const arma::mat& dataset,
const arma::Row<size_t>& labels)
const arma::Row<size_t>& labels,
const arma::mat& diffTransform)

This comment has been minimized.

@rcurtin

rcurtin Jul 24, 2018

Member

I think the name diffTransform is misleading here, shouldn't it just be transformation?

This comment has been minimized.

@manish7294

manish7294 Jul 25, 2018

Author Member

Ahh! forgot to change this one. Thanks for pointing that out! It's corrected now.

auto Tree = knnObjects[i].ReferenceTree();

Tree.Dataset() = diffTransform * origDatasets[i];
UpdateTree(Tree);

This comment has been minimized.

@rcurtin

rcurtin Jul 24, 2018

Member

If you want to make it even shorter you can do:

knnObjects[i].ReferenceTree().Dataset() = diffTransform * origDatasets[i];
UpdateTree(knnObjects[i].ReferenceTree());

In fact I am not sure if the auto will automatically infer a reference here (I don't remember exactly how auto works), so it's probably safer to do what I suggested unless auto is guaranteed to always infer a non-const reference instead of a copy or anything.

This comment has been minimized.

@manish7294

manish7294 Jul 25, 2018

Author Member

Agreed, that's much better :)

Tree.Dataset() = diffTransform * origDatasets[i];
UpdateTree(Tree);

knnObjects[i].Train(Tree);

This comment has been minimized.

@rcurtin

rcurtin Jul 24, 2018

Member

This line actually shouldn't be necessary---we've already updated the tree internally (assuming UpdateTree() is called on a reference).

This comment has been minimized.

@manish7294

manish7294 Jul 25, 2018

Author Member

Done!
Sorry for not being available yesterday, I was shifting to my new dorm. It took some paperwork and a lot of hard work and almost took half of today's time :(

This comment has been minimized.

@rcurtin

rcurtin Jul 25, 2018

Member

No worries, I know how much work a move can be! I hope the heavy lifting is all done at this point. :)

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 26, 2018

Here's the simulation, though results are not good - https://gist.github.com/manish7294/106c16e82cc4c7bf44375f6ea51b6334

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 26, 2018

Which optimizers was this for? Did you time how long the tree update part took? If the results are not good we must debug them.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 26, 2018

I have done some timing simulations. Apparently, UpdateTree() is taking quite some time and is the reason for the increase in runtimes.

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 26, 2018

Can you give full details of what you find please? In general if I am trying to help debug something like this, two sentences don't give me enough clarity to give suggestions or help out. This applies in general, not just to this particular PR debugging, by the way.

And which optimizers were you using for these simulations, and how do others perform?

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 26, 2018

I am using lbfgs for this, as currently, it's the only one for which optimization code is there. Others work over batches, so need to figure out the details for that.

Here's a simple comparison of impostors() calculation timing -
master branch lmnn takes 4.205126s for k = 10 whereas the tree optimized version takes 5.882666s to run and moreover just UpdateTree() takes 1.587435s.

This is all in terms of runtime simulations. I understand this doesn't make much sense for debugging purpose but earlier I was just mentioning the relative timings, Sorry if that created confusion. So basically, I meant that UpdateTree() is taking a significant amount of time, maybe iteratively updating all the descendants of every node is just too costly.

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 26, 2018

The results you're giving are all really inexact and it's hard for me to provide any in-depth guidance other than "have you tried to figure out why UpdateTree() is slow?"

Unfortunately the reality is that as a mentor there is no way I can devote the kind of time to this project that you are able to. I can try to do my best, but if you don't provide very in-depth observations about what is going on and why you think it is going on, then I have to make the in-depth observations myself, which can take many hours. To be honest, I don't really have the time to do that, so if you can't dig deeper here we're not likely to make any progress. If we hope to publish something on our implementation here, the publishability of our work to some extent depends on the optimization strategies being non-obvious; so, for instance, simply implementing it in C++ is not sufficient, instead we must provide additional speedup by using trees, pruning unneeded computations, and so forth. It's my belief that caching trees should be an optimization that should provide us some speedup.

Certainly you are right that iterating over all the descendants is costly, so you should be able to devise a way to update the bound values directly. Probably this means that instead of caching origDatasets, you'll have to store the trees instead in some origTrees vector.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 26, 2018

Right, I understand! Don't worry, I am already working on this. I just meant to update you that as of now implementation is slow.

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 26, 2018

Ok, that is fine---but please be sure, when you write up an update, to be really clear about what you've done, what the simulation results show, why you think it was the right thing to do, etc. Not only does this help me out, but it helps us both out later when we go back to write up the full set of accelerations we found for LMNN.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 27, 2018

c3d4c9b This change have helped in making a significant improvement in the runtimes of this PR. I think there is still a lot that can be improved.

I think I may have an idea to get some more speedup -
Currently we are calculating mins and maxs for every node using below lines of code -

arma::Col<typename TreeType::ElemType> mins(min(node.Dataset().cols(
      node.Begin(), node.Begin() + node.Count() - 1), 1));
arma::Col<typename TreeType::ElemType> maxs(max(node.Dataset().cols(
     node.Begin(), node.Begin() + node.Count() - 1), 1));

I think instead of recalculating this multiple times for each node, why don't we just calculate this ones (by iteratively calulating mins and maxs for every dimension starting from last datapoint coming to the first one ), this will take just O(dataset.n_cols * d) time & O(dataset.n_cols * d) space and then carefully fetching the mins and maxs for a particular node from the earlier calculated result. Though some more details need to be figure out. Do you think it's a reasonable idea.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 28, 2018

@rcurtin Let's say we have a node and it's uses dataset begining from node.Begin() index, so is it like that it will be using all the data points between node.Begin() & dataset.n_cols making node.Count() = dataset.n_cols - node.Begin() + 1 or the node.Count() value can be anything?

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 30, 2018

@rcurtin My above idea to compute mins and maxs only once didn't work out quite as expected since I assumed that each node utilizes the data starting from node.Begin() to end of dataset. Other than this, I don't think I am getting any other to get speedup.

One strange thing I have noticed is that (I am really sorry if it doesn't seem quite clear, It's something that is hard to explain for me) -> I am seeing an average increase in number of node combinations were scored. & base cases were calculated. when using updateTree() w.r.t to when not using updateTree(), though final distance matrix is same for both cases. And somewhere I feel this is the reason of lack of speedup.

for instance here's a comparison on iris of above two parameters:
with updateTree()

[INFO ] 215 node combinations were scored.
[INFO ] 550 base cases were calculated.
[INFO ] 244 node combinations were scored.
[INFO ] 1164 base cases were calculated.
[INFO ] 208 node combinations were scored.
[INFO ] 1686 base cases were calculated.
[DEBUG] L-BFGS iteration 1; objective 531.336, gradient norm 324.273, 0.239603.
[INFO ] 211 node combinations were scored.
[INFO ] 550 base cases were calculated.
[INFO ] 263 node combinations were scored.
[INFO ] 1100 base cases were calculated.
[INFO ] 208 node combinations were scored.
[INFO ] 1789 base cases were calculated.
[DEBUG] L-BFGS iteration 2; objective 498.662, gradient norm 188.14, 0.0614949.
[INFO ] 264 node combinations were scored.
[INFO ] 550 base cases were calculated.
[INFO ] 266 node combinations were scored.
[INFO ] 1035 base cases were calculated.
[INFO ] 230 node combinations were scored.
[INFO ] 1789 base cases were calculated.
[DEBUG] L-BFGS iteration 3; objective 498.097, gradient norm 52.4243, 0.00113241.

Average Node combinations : 234.33
Average base case : 1134.778

without updateTree()


[INFO ] 159 node combinations were scored.
[INFO ] 750 base cases were calculated.
[INFO ] 283 node combinations were scored.
[INFO ] 946 base cases were calculated.
[INFO ] 150 node combinations were scored.
[INFO ] 1505 base cases were calculated.
[DEBUG] L-BFGS iteration 1; objective 531.336, gradient norm 324.273, 0.239603.
[INFO ] 112 node combinations were scored.
[INFO ] 750 base cases were calculated.
[INFO ] 262 node combinations were scored.
[INFO ] 974 base cases were calculated.
[INFO ] 148 node combinations were scored.
[INFO ] 1653 base cases were calculated.
[DEBUG] L-BFGS iteration 2; objective 498.662, gradient norm 188.14, 0.0614949.
[INFO ] 142 node combinations were scored.
[INFO ] 750 base cases were calculated.
[INFO ] 268 node combinations were scored.
[INFO ] 1165 base cases were calculated.
[INFO ] 170 node combinations were scored.
[INFO ] 1505 base cases were calculated.
[DEBUG] L-BFGS iteration 3; objective 498.097, gradient norm 52.4243, 0.00113241.

Average Node combinations : 188.22
Average base case : 1110.888

And in my above idea of pre-computing mins and maxs, I saw the same thing like getting correct final distance matrix but a high increase in node combinations and base calculation leading to bad runtime.

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 31, 2018

Right---as we update the tree, it's no longer likely to be the best tree for search. So the number of prunes will slightly decrease (and in turn the number of scores and base cases will go up, which is exactly what you saw). However the idea here is that the cost of rebuilding the tree is far higher than the slight degradation of reusing the tree. But this only works if we can update the tree quickly.

Let me play with this one in the context of the branch I'm working in, and let's see if I can make it any better. I'll use your code as a base idea---thank you for the hard work, it will be easy to plug new ideas in because of how you've structured the code. 👍

I'll post an update here if I can get an improvement. :)

@mlpack-bot

This comment has been minimized.

Copy link

mlpack-bot bot commented Feb 18, 2019

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added the s: stale label Feb 18, 2019

@mlpack-bot mlpack-bot bot removed the s: stale label Feb 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.