Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal B tree implementation #746

Merged
merged 14 commits into from Aug 29, 2016
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
50 changes: 49 additions & 1 deletion src/mlpack/core/tree/address.hpp
@@ -1,4 +1,28 @@

/**
* @file address.hpp
* @author Mikhail Lozhnikov
*
* This file contains a series of functions for translating points to addresses
* and back and functions for comparing addresses.
*
* The notion of addresses is described in the following paper.
* @code
* @inproceedings{bayer1997,
* author = {Bayer, Rudolf},
* title = {The Universal B-Tree for Multidimensional Indexing: General
* Concepts},
* booktitle = {Proceedings of the International Conference on Worldwide
* Computing and Its Applications},
* series = {WWCA '97},
* year = {1997},
* isbn = {3-540-63343-X},
* pages = {198--209},
* numpages = {12},
* publisher = {Springer-Verlag},
* address = {London, UK, UK},
* }
* @endcode
*/
#ifndef MLPACK_CORE_TREE_ADDRESS_HPP
#define MLPACK_CORE_TREE_ADDRESS_HPP

Expand All @@ -8,6 +32,14 @@ namespace bound {

namespace addr {

/**
* Calculate the address of a point. Be careful, the point and the address
* variables should be equal-sized and the type of the address should correspond
* to the type of the vector.
*
* @param address The resulting address.
* @param point The point that is being translated to the address.
*/
template<typename AddressType, typename VecType>
void PointToAddress(AddressType& address, const VecType& point)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, I'd like to add some documentation on what exactly is going on in this function. I spent a long time reading it, and it appears that really all we are doing (from a high level) is reordering the bits in the floating point number. So in the one-dimensional case, we have a number of the form

[[ mantissa ][ exponent ]]

but your code transforms it roughly to

[[ exponent ][ mantissa ]]

(but not exactly, since some modification of the mantissa may be necessary). In the multi-dimensional case, after we transform the representation, we have to interleave the bits of the new representation across all of the elements in the address vector. Is that correct? If so I can add those comments (or you can add them if you like, maybe it is better if you add them since you are more familiar with the code).

Copy link
Contributor Author

@lozhnikov lozhnikov Aug 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's correct. In one dimensional case this transform preserves the ordering (lower addresses correspond lower points). The function looks like DiscreteHilbertValue::CalculateValue(). I introduced that since it is faster than recursive calculation (indeed, RecursiveHilbertValue was slower than DiscreteHilbertValue).
I'll add the comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks. Sorry my intuition was so wrong with the recursive Hilbert value calculation, I think my ideas wasted a lot of your time there. :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem:) I was not sure of DiscreteHilbertValue and that was an interesting experiment.

{
Expand Down Expand Up @@ -85,6 +117,14 @@ void PointToAddress(AddressType& address, const VecType& point)
}
}

/**
* Translate the address to the point. Be careful, the point and the address
* variables should be equal-sized and the type of the address should correspond
* to the type of the vector.
*
* @param address An address to translate.
* @param point The point that corresponds to the address.
*/
template<typename AddressType, typename VecType>
void AddressToPoint(VecType& point, const AddressType& address)
{
Expand Down Expand Up @@ -153,6 +193,11 @@ void AddressToPoint(VecType& point, const AddressType& address)
}
}

/**
* Compare two addresses. The function returns 1 if the first address is greater
* than the second one, -1 if the first address is less than the second one,
* otherwise the function returns 0.
*/
template<typename AddressType1, typename AddressType2>
int CompareAddresses(const AddressType1& addr1, const AddressType2& addr2)
{
Expand All @@ -173,6 +218,9 @@ int CompareAddresses(const AddressType1& addr1, const AddressType2& addr2)
return 0;
}

/**
* Returns true if an address is contained between two other addresses.
*/
template<typename AddressType1, typename AddressType2, typename AddressType3>
bool Contains(const AddressType1& address, const AddressType2& loBound,
const AddressType3& hiBound)
Expand Down
24 changes: 13 additions & 11 deletions src/mlpack/core/tree/binary_space_tree/binary_space_tree.hpp
Expand Up @@ -528,11 +528,12 @@ class BinarySpaceTree
* @param oldFromNew Vector which will be filled with the old positions for
* each new point.
*/
size_t PerformSplit(MatType& data,
const size_t begin,
const size_t count,
const typename UBTreeSplit<BoundType<MetricType>,
MatType>::SplitInfo& splitInfo);
size_t PerformSplit(
MatType& data,
const size_t begin,
const size_t count,
const typename UBTreeSplit<BoundType<MetricType>,
MatType>::SplitInfo& splitInfo);

/**
* An overload for the universal B tree. For the first time the function
Expand All @@ -548,12 +549,13 @@ class BinarySpaceTree
* @param oldFromNew Vector which will be filled with the old positions for
* each new point.
*/
size_t PerformSplit(MatType& data,
const size_t begin,
const size_t count,
const typename UBTreeSplit<BoundType<MetricType>,
MatType>::SplitInfo& splitInfo,
std::vector<size_t>& oldFromNew);
size_t PerformSplit(
MatType& data,
const size_t begin,
const size_t count,
const typename UBTreeSplit<BoundType<MetricType>,
MatType>::SplitInfo& splitInfo,
std::vector<size_t>& oldFromNew);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed spaces. But I am not sure that you do not mean the 557th line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine as-is now, thanks for the fix. :)


/**
* Update the bound of the current node. This method does not take into
Expand Down
19 changes: 12 additions & 7 deletions src/mlpack/core/tree/binary_space_tree/binary_space_tree_impl.hpp
Expand Up @@ -880,9 +880,12 @@ template<typename MetricType,
template<typename SplitBoundType, typename SplitMatType>
class SplitType>
size_t BinarySpaceTree<MetricType, StatisticType, MatType, BoundType,
SplitType>::PerformSplit(MatType& data,
const size_t begin, const size_t count,
const typename UBTreeSplit<BoundType<MetricType>, MatType>::SplitInfo& splitInfo)
SplitType>::PerformSplit(
MatType& data,
const size_t begin,
const size_t count,
const typename UBTreeSplit<BoundType<MetricType>,
MatType>::SplitInfo& splitInfo)
{
return SplitType<BoundType<MetricType>, MatType>::PerformSplit(data, begin,
count, splitInfo);
Expand All @@ -895,9 +898,12 @@ template<typename MetricType,
template<typename SplitBoundType, typename SplitMatType>
class SplitType>
size_t BinarySpaceTree<MetricType, StatisticType, MatType, BoundType,
SplitType>::PerformSplit(MatType& data,
const size_t begin, const size_t count,
const typename UBTreeSplit<BoundType<MetricType>, MatType>::SplitInfo& splitInfo,
SplitType>::PerformSplit(
MatType& data,
const size_t begin,
const size_t count,
const typename UBTreeSplit<BoundType<MetricType>,
MatType>::SplitInfo& splitInfo,
std::vector<size_t>& oldFromNew)
{
return SplitType<BoundType<MetricType>, MatType>::PerformSplit(data, begin,
Expand All @@ -910,7 +916,6 @@ template<typename MetricType,
template<typename BoundMetricType, typename...> class BoundType,
template<typename SplitBoundType, typename SplitMatType>
class SplitType>

template<typename BoundType2>
void BinarySpaceTree<MetricType, StatisticType, MatType, BoundType, SplitType>::
UpdateBound(BoundType2& boundToUpdate)
Expand Down
27 changes: 27 additions & 0 deletions src/mlpack/core/tree/cellbound.hpp
Expand Up @@ -38,6 +38,33 @@
namespace mlpack {
namespace bound {

/**
* The CellBound class describes a bound that consists of a number of
* hyperrectangles. These hyperrectangles do not overlap each other. The bound
* is limited by an outer hyperrectangle and two addresses, the lower address
* and the high address. Thus, the bound contains all points included between
* the lower and the high addresses. The class caches the minimum bounding
* rectangle, the lower and the high addresses and the hyperrectangles
* that are described by the addresses.
*
* The notion of addresses is described in the following paper.
* @code
* @inproceedings{bayer1997,
* author = {Bayer, Rudolf},
* title = {The Universal B-Tree for Multidimensional Indexing: General
* Concepts},
* booktitle = {Proceedings of the International Conference on Worldwide
* Computing and Its Applications},
* series = {WWCA '97},
* year = {1997},
* isbn = {3-540-63343-X},
* pages = {198--209},
* numpages = {12},
* publisher = {Springer-Verlag},
* address = {London, UK, UK},
* }
* @endcode
*/
template<typename MetricType = metric::LMetric<2, true>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some documentation here for the class itself? You can reuse what you've written at the top of the file. It might also be useful to point out that you are caching the minimum bounding rectangle of the points here.

typename ElemType = double>
class CellBound
Expand Down