Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement KDE #1301

Merged
merged 155 commits into from Jan 18, 2019

Conversation

Projects
None yet
3 participants
@robertohueso
Copy link
Contributor

commented Mar 9, 2018

Add Kernel Density Estimation algorithm with dual-trees (Based on the paper "Tree-Independent Dual-Tree Algorithms")
Referring to issue #150

Work in progress PR...

robertohueso added some commits Mar 9, 2018

Add a first implementation of KDE
Just a preview of Kernel Density Estimation algorithm implemented with dual trees
@rcurtin
Copy link
Member

left a comment

Thanks for taking on this task. It looks like a good start to me, but I would suggest writing a few simple tests next to make sure it is working ok. If something is confusing to you or doesn't seem to be working right let me know and I can try to help.

robertohueso added some commits Apr 6, 2018

Add KDE simple test
Also fix small compilation error on KDE
Python binding
Fix KDE dual-tree algorithm
There was a problem with trees using
a wrong dataset
@robertohueso

This comment has been minimized.

Copy link
Contributor Author

commented Apr 14, 2018

@rcurtin Should we allow CLI / Python binding users to select different metric/kernels/trees?
When I use templates like this it must be defined on compile time right?

template<typename MetricType = mlpack::metric::EuclideanDistance,
         typename MatType = arma::mat,
         typename KernelType = kernel::GaussianKernel,
         template<typename TreeMetricType,
                  typename TreeStatType,
                  typename TreeMatType> class TreeType = tree::KDTree>
@rcurtin
Copy link
Member

left a comment

Hi there Roberto,

Looks like there has been a lot of progress on this---thank you for taking the time to do that. I think mlpack has been missing KDE for a long time and I would be really happy to merge it.

So far the implementation looks pretty good to me. The single-tree pruning case should be pretty similar to the dual-tree pruning case. I left some comments that I hope are helpful. I think that the core of the algorithm is basically done (assuming it works and passes the tests) so the hard work is done, and only some refactoring and updating the existing code remains, then we can get to a merge.

@rcurtin Should we allow CLI / Python binding users to select different metric/kernels/trees?
When I use templates like this it must be defined on compile time right?

Yes, I think it would be good to do this. You're right that the type has to be known at compile time, so there is some wrangling to take the user's desired kernel and make it work. For instance you might need to do something like:

std::string kernelType = CLI::GetParam<std::string>("kernel");
if (kernelType == "gaussian")
{
  // get Gaussian kernel parameters and run KDE<GaussianKernel>...
}
else if (kernelType == "laplacian")
{
  // get laplacian kernel parameters and use KDE<LaplacianKernel>...
}
...

Do let me know if there is anything else I can help with. I was really busy with non-mlpack work last week but now things are a lot better for me so I am more available to help.

Show resolved Hide resolved src/mlpack/methods/kde/kde_main.cpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde.hpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde.hpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde_impl.hpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde_rules.hpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde_rules_impl.hpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde_rules_impl.hpp Outdated
Show resolved Hide resolved src/mlpack/tests/kde_test.cpp Outdated

robertohueso added some commits Apr 25, 2018

Delete leafSize parameter for KDE trees
A new constructor with a tree as a parameter
will handle different leaf sizes

robertohueso added some commits May 4, 2018

Implement Evaluate(Tree...)
Also add TreeAsArguments test
@robertohueso

This comment has been minimized.

Copy link
Contributor Author

commented May 5, 2018

Work to do:

  • Review KDERules
  • Add more and better tests
  • Implement KDE main
  • Add documentation
  • Add breadth-first support
@rcurtin

This comment has been minimized.

Copy link
Member

commented May 7, 2018

Looking good, let me know if you need any help or when it's ready for a detailed review. 👍

@robertohueso

This comment has been minimized.

Copy link
Contributor Author

commented May 10, 2018

Thanks =)

@robertohueso

This comment has been minimized.

Copy link
Contributor Author

commented Jan 7, 2019

I think this is closer to be ready :)
@rcurtin Could you have a look at the changes?

@rcurtin

rcurtin approved these changes Jan 7, 2019

Copy link
Member

left a comment

Thanks again for your hard work with this. I think this is one of the PRs that has had the most work go into it and it will definitely be the oldest bug that is resolved when we close #150 (that was originally opened in October 2011, so more than seven years ago). 👍

I left a few comments that you can optionally handle before merge, but it's up to you. There are also some things that I'll fix during merge but it would be too much work to point them all out and they are all really easy to fix. So if you want to go through and handle it now, feel free, if not, I'll do it during merge. No functionality changes, just mostly style things:

  • Adapt comments to complete sentences or make sure that they end with .
  • Sometimes add some spaces between blocks of code, like at kde_impl.hpp lines 230/231, 235/236, and so forth.
  • Reorder src/mlpack/methods/CMakeLists.txt so that it's alphabetical :)
  • Remove any underscores from variable names and convert to camelcase
  • When we have ifs where the body is multiple wrapped lines, we should enclose it in braces
  • If there is an if that has braces, the else should also have braces

Those are all really minor, so like I said, don't feel obligated.

Since this is such a big PR, I'll wait for a week before merging in case anyone else wants to take a look at it. Thanks again for the hard work over all the months. It will be really nice to have this functionality as a part of mlpack---dual-tree KDE is super fast and a beautiful showcase of fast mlpack implementations. 👍

"If specified, the KDE model will be saved here.",
"M");

// Configuration options

This comment has been minimized.

Copy link
@rcurtin

rcurtin Jan 7, 2019

Member

Often it can be really nice to specify leaf_size for kd-trees. I think you left a comment somewhere else hinting that it could be added later. So, we can open an issue to be handled later for this, or you can do it before merge---either way is just fine with me.

This comment has been minimized.

Copy link
@robertohueso

robertohueso Jan 7, 2019

Author Contributor

There're a couple of comments about that but I won't have time to do it since I have to start studying for my finals :/ but we can open an issue and I will handle that later :)

{
kdeModel = new KDEType<kernel::TriangularKernel, tree::RTree>
(relError, absError, kernel::TriangularKernel(bandwidth));
}

This comment has been minimized.

Copy link
@rcurtin

rcurtin Jan 7, 2019

Member

This is quite a collection of ifs... :) If you like (don't feel obligated), you could refactor this using template helper methods to make two sets of if statements (one for kernels, one for trees) instead of this which combines both kernels and trees.

This comment has been minimized.

Copy link
@ShikharJ

ShikharJ Jan 17, 2019

Member

I have a similar take on this, but maybe we can leave refactoring for later.

This comment has been minimized.

Copy link
@robertohueso

robertohueso Jan 18, 2019

Author Contributor

I can take care of this in the future (along with leaf sizes) probably by mid-February :)

// Can be paralellized but we avoid it for now because of a compilation
// error in visual C++ compiler.
// #pragma omp for
for (size_t i = 0; i < queryNode.NumDescendants(); ++i)

This comment has been minimized.

Copy link
@rcurtin

rcurtin Jan 7, 2019

Member

Did you want to handle this before merge? I'm not sure if it gave much speedup.

Show resolved Hide resolved src/mlpack/tests/kde_test.cpp Outdated
Show resolved Hide resolved src/mlpack/tests/kde_test.cpp Outdated
// Main params
SetInputParam("reference", reference);
SetInputParam("query", query);
SetInputParam("kernel", std::string("linux"));

This comment has been minimized.

Copy link
@rcurtin
@ShikharJ
Copy link
Member

left a comment

I only took a minor review of the code. I'll provide a thorough review over the next few days, though it mostly looks great! Thanks a lot for this valuable contribution. It'll be a great addition to mlpack!

Show resolved Hide resolved src/mlpack/methods/kde/kde.hpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde.hpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde.hpp
Show resolved Hide resolved src/mlpack/methods/kde/kde_impl.hpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde_model.hpp
Show resolved Hide resolved src/mlpack/methods/kde/kde_model.hpp
Show resolved Hide resolved src/mlpack/methods/kde/kde_rules_impl.hpp Outdated
Show resolved Hide resolved src/mlpack/tests/kde_test.cpp Outdated
@ShikharJ
Copy link
Member

left a comment

I took a more detailed look. However, I'm still not near the confidence level with regards to the correctness of the code, it will just take me some more time. Please wait before merging, I'll provide the final review soon.

Show resolved Hide resolved src/mlpack/methods/kde/kde_model.hpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde_model.hpp Outdated
@rcurtin

This comment has been minimized.

Copy link
Member

commented Jan 14, 2019

@ShikharJ I'll wait to merge until you're done reviewing. 👍

@rcurtin

This comment has been minimized.

Copy link
Member

commented Jan 16, 2019

Hey @robertohueso, we merged a new Travis configuration that should take less time and have no random failures, so if you want to push a simple commit or something to trigger a rebase against the current master and CI build, it should pass Travis this time. 👍

@robertohueso

This comment has been minimized.

Copy link
Contributor Author

commented Jan 17, 2019

Much faster! 👍

@ShikharJ
Copy link
Member

left a comment

I only have some minor comments, rest all looks good. This must have taken a huge effort and we all thank you for it.

{
kdeModel = new KDEType<kernel::TriangularKernel, tree::RTree>
(relError, absError, kernel::TriangularKernel(bandwidth));
}

This comment has been minimized.

Copy link
@ShikharJ

ShikharJ Jan 17, 2019

Member

I have a similar take on this, but maybe we can leave refactoring for later.

Show resolved Hide resolved src/mlpack/methods/kde/kde_model.hpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde_model.hpp Outdated
Show resolved Hide resolved src/mlpack/methods/kde/kde_model.hpp Outdated
* - If TreeTraits<TreeType>::RearrangesDataset is False then it is possible
* to use an empty oldFromNewReferences vector.
*
* @param referenceTree New already created reference tree.

This comment has been minimized.

Copy link
@ShikharJ

ShikharJ Jan 17, 2019

Member

I'm not sure if New already created reference tree makes sense.

This comment has been minimized.

Copy link
@robertohueso

robertohueso Jan 18, 2019

Author Contributor

Tried to improve it in 20230ae :)

Show resolved Hide resolved src/mlpack/tests/main_tests/kde_test.cpp Outdated
Improve KDE docs
- Fix typos.
- Improve expressions.
@robertohueso

This comment has been minimized.

Copy link
Contributor Author

commented Jan 18, 2019

@ShikharJ @rcurtin Thank you both for the effort you took on reviewing this PR, I know this was time consuming for you :)

@rcurtin rcurtin merged commit 20230ae into mlpack:master Jan 18, 2019

6 checks passed

LaTeX Documentation Checks Build finished.
Details
Memory Checks Build finished.
Details
Static Code Analysis Checks Build finished.
Details
Style Checks Build finished.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@rcurtin

This comment has been minimized.

Copy link
Member

commented Jan 18, 2019

🎉 Thanks again for the contribution @robertohueso. It's great to have this merged and part of the library.

For the leaf size issue, if you like, you can just work on it when you are able. Or, if you want, you can open an issue detailing what needs to be done and see if anyone is interested in doing it. Up to you how to handle that. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.