New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lsh table access #663

Merged
merged 12 commits into from Jun 2, 2016

Conversation

Projects
None yet
2 participants
@mentekid
Contributor

mentekid commented May 31, 2016

I modified Train() to accept an extra argument, a std::vector of projection tables (arma::mat). If the size of the vector and the tables it contains is correct, Train retrains the algorithm to use the specified projection tables.

The user can also call setProjectionTables() which simply calls Train().

The user can also see the projection tables by calling getProjectionTables()

bucketSize,
projTables
);
};

This comment has been minimized.

@rcurtin

rcurtin May 31, 2016

Member

Same here, we should call it Projections(vector<mat>&). (Don't forget to make it a reference, otherwise copying is going to happen. Also I think it should be const.)

@rcurtin

rcurtin May 31, 2016

Member

Same here, we should call it Projections(vector<mat>&). (Don't forget to make it a reference, otherwise copying is going to happen. Also I think it should be const.)

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin May 31, 2016

Member

I know I made a lot of comments on a very simple change, hopefully I am not being too picky. :) I can help with the changes I proposed, just let me know what you'd like me to do.

Member

rcurtin commented May 31, 2016

I know I made a lot of comments on a very simple change, hopefully I am not being too picky. :) I can help with the changes I proposed, just let me know what you'd like me to do.

rcurtin added a commit that referenced this pull request May 31, 2016

I'm not sure what line width Pari used, but it wasn't 80 columns.
This will probably make the merge of #663 and other LSH improvements by Yannis
harder...
@mentekid

This comment has been minimized.

Show comment
Hide comment
@mentekid

mentekid May 31, 2016

Contributor

It's ok, these are pretty simple so I think I can do them quickly tomorrow.

arma::cube slices are arranged the same way std::vector elements are in memory right? So it won't hurt performance to group tables like that - it would only hurt if they were arranged in some weird way so elements of each matrix weren't concentrated, but that wouldn't make sense.

But armadillo documentation says

Cube data is stored as a set of slices (matrices) stored contiguously within memory. Within each slice, elements are stored with column-major ordering (ie. column by column)

so we're good

Contributor

mentekid commented May 31, 2016

It's ok, these are pretty simple so I think I can do them quickly tomorrow.

arma::cube slices are arranged the same way std::vector elements are in memory right? So it won't hurt performance to group tables like that - it would only hurt if they were arranged in some weird way so elements of each matrix weren't concentrated, but that wouldn't make sense.

But armadillo documentation says

Cube data is stored as a set of slices (matrices) stored contiguously within memory. Within each slice, elements are stored with column-major ordering (ie. column by column)

so we're good

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin May 31, 2016

Member

Yep, slices are contiguous. If you volunteer to do them I will not do it then. :)

I don't expect to see a significant or even noticeable speed difference from this change, but it will simplify the code a bit and reduce memory usage at least some trivial amount.

Member

rcurtin commented May 31, 2016

Yep, slices are contiguous. If you volunteer to do them I will not do it then. :)

I don't expect to see a significant or even noticeable speed difference from this change, but it will simplify the code a bit and reduce memory usage at least some trivial amount.

@mentekid

This comment has been minimized.

Show comment
Hide comment
@mentekid

mentekid Jun 1, 2016

Contributor

I am not sure how boost serialization works, is there going to be a problem for users trying to load their models if we change the representation from vector to cube?
Shoul we write something to organize the conversion?

Contributor

mentekid commented Jun 1, 2016

I am not sure how boost serialization works, is there going to be a problem for users trying to load their models if we change the representation from vector to cube?
Shoul we write something to organize the conversion?

@mentekid

This comment has been minimized.

Show comment
Hide comment
@mentekid

mentekid Jun 1, 2016

Contributor

Is this failure related to commit e6d2ca7 ?

I don't think I changed anything so as to conflict previous versions

Contributor

mentekid commented Jun 1, 2016

Is this failure related to commit e6d2ca7 ?

I don't think I changed anything so as to conflict previous versions

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jun 1, 2016

Member

The boost serialization bit is fairly straightforward... we can use the version information with BOOST_CLASS_VERSION(). The implementation will probably look like this:

template<typename Archive>
void Serialize(Archive& ar, const unsigned int version)
{
  ...

  if (version == 0)
  {
    // In older versions, the projection tables were stored in a std::vector.
  std::vector<arma::mat> tmpProj;
  ar & data::CreateNVP(tmpProj, "projections");

  projections.set_size(tmpProj[0].n_rows, tmpProj[0].n_cols, tmpProj.size());
  for (size_t i = 0; i < tmpProj.size(); ++i)
    projections.slice(i) = tmpProj[i];
}
else
{
  ar & data::CreateNVP(projections, "projections");
}

...
}

And then you will have to set the class version number which you would normally do with BOOST_CLASS_VERSION(LSHSearch, 1); but this is a template class and the macro does not work with the template class so we have to use the expansion of the macro.

I wonder why Travis is not building this PR anymore? If it compiles and the LSH tests pass on your system I am fine with that, no need for Travis.

Member

rcurtin commented Jun 1, 2016

The boost serialization bit is fairly straightforward... we can use the version information with BOOST_CLASS_VERSION(). The implementation will probably look like this:

template<typename Archive>
void Serialize(Archive& ar, const unsigned int version)
{
  ...

  if (version == 0)
  {
    // In older versions, the projection tables were stored in a std::vector.
  std::vector<arma::mat> tmpProj;
  ar & data::CreateNVP(tmpProj, "projections");

  projections.set_size(tmpProj[0].n_rows, tmpProj[0].n_cols, tmpProj.size());
  for (size_t i = 0; i < tmpProj.size(); ++i)
    projections.slice(i) = tmpProj[i];
}
else
{
  ar & data::CreateNVP(projections, "projections");
}

...
}

And then you will have to set the class version number which you would normally do with BOOST_CLASS_VERSION(LSHSearch, 1); but this is a template class and the macro does not work with the template class so we have to use the expansion of the macro.

I wonder why Travis is not building this PR anymore? If it compiles and the LSH tests pass on your system I am fine with that, no need for Travis.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jun 1, 2016

Member

The snippet you will need for versioning will be this...

namespace boost {
namespace serialization {

template<>
template<typename SortPolicy>
struct version<LSHSearch<SortPolicy>>
{
  typedef mpl::int_<1> type;
  typedef mpl::integral_c_tag tag;
  BOOST_STATIC_CONSTANT(int, value = version::type::value);
  BOOST_MPL_ASSERT((
    boost::mpl::less<
        boost::mpl::int_<N>, boost::mpl::int_<256>>
  ));
};

It might be worth making a nice macro for this situation, or some class or something that is easy to overload.

Member

rcurtin commented Jun 1, 2016

The snippet you will need for versioning will be this...

namespace boost {
namespace serialization {

template<>
template<typename SortPolicy>
struct version<LSHSearch<SortPolicy>>
{
  typedef mpl::int_<1> type;
  typedef mpl::integral_c_tag tag;
  BOOST_STATIC_CONSTANT(int, value = version::type::value);
  BOOST_MPL_ASSERT((
    boost::mpl::less<
        boost::mpl::int_<N>, boost::mpl::int_<256>>
  ));
};

It might be worth making a nice macro for this situation, or some class or something that is easy to overload.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jun 1, 2016

Member

Five patches for serialization:

http://www.ratml.org/misc/0001-Add-a-templated-version-for-BOOST_CLASS_VERSION.patch
http://www.ratml.org/misc/0002-We-actually-need-to-wrap-mlpack-data-SecondShim-obje.patch
http://www.ratml.org/misc/0003-Include-new-serialization-version-macro.patch
http://www.ratml.org/misc/0004-Use-n_slices-not-size-to-fix-correctness.patch
http://www.ratml.org/misc/0005-Refactor-Serialize-add-backwards-compatibility-and-u.patch

You can download those then use git am to apply to your repo, and this should update this PR. Hopefully this is not too tedious, I don't know if I can add individual commits to your PR in the interface we have here. I'll remove the patches from my server once you've applied them.

Member

rcurtin commented Jun 1, 2016

Five patches for serialization:

http://www.ratml.org/misc/0001-Add-a-templated-version-for-BOOST_CLASS_VERSION.patch
http://www.ratml.org/misc/0002-We-actually-need-to-wrap-mlpack-data-SecondShim-obje.patch
http://www.ratml.org/misc/0003-Include-new-serialization-version-macro.patch
http://www.ratml.org/misc/0004-Use-n_slices-not-size-to-fix-correctness.patch
http://www.ratml.org/misc/0005-Refactor-Serialize-add-backwards-compatibility-and-u.patch

You can download those then use git am to apply to your repo, and this should update this PR. Hopefully this is not too tedious, I don't know if I can add individual commits to your PR in the interface we have here. I'll remove the patches from my server once you've applied them.

@rcurtin rcurtin merged commit 06fdfa8 into mlpack:master Jun 2, 2016

1 check failed

continuous-integration/appveyor/pr AppVeyor was unable to build non-mergeable pull request
Details
@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jun 2, 2016

Member

A few minor changes I made afterwards:

e0b6ce7 b30e697 d3e3c54 8ed22ce e3a23c2

I forgot about the versioning, which means we can't remove Projection() until 2.1.0:
https://github.com/mlpack/mlpack/blob/master/UPDATING.txt

Let me know if I broke anything. :)

Member

rcurtin commented Jun 2, 2016

A few minor changes I made afterwards:

e0b6ce7 b30e697 d3e3c54 8ed22ce e3a23c2

I forgot about the versioning, which means we can't remove Projection() until 2.1.0:
https://github.com/mlpack/mlpack/blob/master/UPDATING.txt

Let me know if I broke anything. :)

@mentekid

This comment has been minimized.

Show comment
Hide comment
@mentekid

mentekid Jun 2, 2016

Contributor

I pulled the changes from upstream:master and merged with my MultiprobeLSH branch, but there seems to be a problem when compiling mlpack_lsh.

g++ says all of the LSHSearch class variables have "not been declared in this scope". Not sure what this means, did I do something wrong when recompiling or is it something in the master branch code?

Here's the g++ error messages:

[ 95%] Building CXX object src/mlpack/methods/lsh/CMakeFiles/mlpack_lsh.dir/lsh_main.cpp.o
In file included from /home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search.hpp:377:0,
                 from /home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_main.cpp:17:
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:819:16: error: ‘SortPolicy’ was not declared in this scope
 void LSHSearch<SortPolicy>::Serialize(Archive& ar,
                ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:819:26: error: template argument 1 is invalid
 void LSHSearch<SortPolicy>::Serialize(Archive& ar,
                          ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp: In function ‘void mlpack::neighbor::Serialize(Archive&, unsigned int)’:
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:827:9: error: ‘ownsSet’ was not declared in this scope
     if (ownsSet)
         ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:828:14: error: ‘referenceSet’ was not declared in this scope
       delete referenceSet;
              ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:829:5: error: ‘ownsSet’ was not declared in this scope
     ownsSet = true;
     ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:831:18: error: ‘referenceSet’ was not declared in this scope
   ar & CreateNVP(referenceSet, "referenceSet");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:833:18: error: ‘numProj’ was not declared in this scope
   ar & CreateNVP(numProj, "numProj");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:834:18: error: ‘numTables’ was not declared in this scope
   ar & CreateNVP(numTables, "numTables");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:838:5: error: ‘projections’ was not declared in this scope
     projections.reset();
     ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:847:5: error: ‘projections’ was not declared in this scope
     projections.set_size(tmpProj[0].n_rows, tmpProj[0].n_cols, tmpProj.size());
     ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:853:20: error: ‘projections’ was not declared in this scope
     ar & CreateNVP(projections, "projections");
                    ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:856:18: error: ‘offsets’ was not declared in this scope
   ar & CreateNVP(offsets, "offsets");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:857:18: error: ‘hashWidth’ was not declared in this scope
   ar & CreateNVP(hashWidth, "hashWidth");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:858:18: error: ‘secondHashSize’ was not declared in this scope
   ar & CreateNVP(secondHashSize, "secondHashSize");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:859:18: error: ‘secondHashWeights’ was not declared in this scope
   ar & CreateNVP(secondHashWeights, "secondHashWeights");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:860:18: error: ‘bucketSize’ was not declared in this scope
   ar & CreateNVP(bucketSize, "bucketSize");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:861:18: error: ‘secondHashTable’ was not declared in this scope
   ar & CreateNVP(secondHashTable, "secondHashTable");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:862:18: error: ‘bucketContentSize’ was not declared in this scope
   ar & CreateNVP(bucketContentSize, "bucketContentSize");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:863:18: error: ‘bucketRowInHashTable’ was not declared in this scope
   ar & CreateNVP(bucketRowInHashTable, "bucketRowInHashTable");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:864:18: error: ‘distanceEvaluations’ was not declared in this scope
   ar & CreateNVP(distanceEvaluations, "distanceEvaluations");

Contributor

mentekid commented Jun 2, 2016

I pulled the changes from upstream:master and merged with my MultiprobeLSH branch, but there seems to be a problem when compiling mlpack_lsh.

g++ says all of the LSHSearch class variables have "not been declared in this scope". Not sure what this means, did I do something wrong when recompiling or is it something in the master branch code?

Here's the g++ error messages:

[ 95%] Building CXX object src/mlpack/methods/lsh/CMakeFiles/mlpack_lsh.dir/lsh_main.cpp.o
In file included from /home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search.hpp:377:0,
                 from /home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_main.cpp:17:
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:819:16: error: ‘SortPolicy’ was not declared in this scope
 void LSHSearch<SortPolicy>::Serialize(Archive& ar,
                ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:819:26: error: template argument 1 is invalid
 void LSHSearch<SortPolicy>::Serialize(Archive& ar,
                          ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp: In function ‘void mlpack::neighbor::Serialize(Archive&, unsigned int)’:
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:827:9: error: ‘ownsSet’ was not declared in this scope
     if (ownsSet)
         ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:828:14: error: ‘referenceSet’ was not declared in this scope
       delete referenceSet;
              ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:829:5: error: ‘ownsSet’ was not declared in this scope
     ownsSet = true;
     ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:831:18: error: ‘referenceSet’ was not declared in this scope
   ar & CreateNVP(referenceSet, "referenceSet");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:833:18: error: ‘numProj’ was not declared in this scope
   ar & CreateNVP(numProj, "numProj");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:834:18: error: ‘numTables’ was not declared in this scope
   ar & CreateNVP(numTables, "numTables");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:838:5: error: ‘projections’ was not declared in this scope
     projections.reset();
     ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:847:5: error: ‘projections’ was not declared in this scope
     projections.set_size(tmpProj[0].n_rows, tmpProj[0].n_cols, tmpProj.size());
     ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:853:20: error: ‘projections’ was not declared in this scope
     ar & CreateNVP(projections, "projections");
                    ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:856:18: error: ‘offsets’ was not declared in this scope
   ar & CreateNVP(offsets, "offsets");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:857:18: error: ‘hashWidth’ was not declared in this scope
   ar & CreateNVP(hashWidth, "hashWidth");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:858:18: error: ‘secondHashSize’ was not declared in this scope
   ar & CreateNVP(secondHashSize, "secondHashSize");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:859:18: error: ‘secondHashWeights’ was not declared in this scope
   ar & CreateNVP(secondHashWeights, "secondHashWeights");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:860:18: error: ‘bucketSize’ was not declared in this scope
   ar & CreateNVP(bucketSize, "bucketSize");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:861:18: error: ‘secondHashTable’ was not declared in this scope
   ar & CreateNVP(secondHashTable, "secondHashTable");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:862:18: error: ‘bucketContentSize’ was not declared in this scope
   ar & CreateNVP(bucketContentSize, "bucketContentSize");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:863:18: error: ‘bucketRowInHashTable’ was not declared in this scope
   ar & CreateNVP(bucketRowInHashTable, "bucketRowInHashTable");
                  ^
/home/et3rn1ty/Projects/MLPack/mlpack/src/mlpack/methods/lsh/lsh_search_impl.hpp:864:18: error: ‘distanceEvaluations’ was not declared in this scope
   ar & CreateNVP(distanceEvaluations, "distanceEvaluations");

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jun 2, 2016

Member

Not sure what the issue here, is the signature

template<typename Archive>
template<typename SortPolicy>
void LSHSearch<SortPolicy>::Serialize(Archive& ar, const unsigned int version)

? If not, maybe that is the issue, maybe the merge went wrong or something.

Member

rcurtin commented Jun 2, 2016

Not sure what the issue here, is the signature

template<typename Archive>
template<typename SortPolicy>
void LSHSearch<SortPolicy>::Serialize(Archive& ar, const unsigned int version)

? If not, maybe that is the issue, maybe the merge went wrong or something.

@mentekid

This comment has been minimized.

Show comment
Hide comment
@mentekid

mentekid Jun 2, 2016

Contributor

Yep, that was it, I only had template, I deleted the other one in the merge accidentaly.

By the way for some reason

template<typename SortPolicy>
template<typename Archive>

produced an error but

template<typename Archive>
template<typename SortPolicy>

didn't. The second version is how it is in the master branch.

Finally we can close this 😄

Contributor

mentekid commented Jun 2, 2016

Yep, that was it, I only had template, I deleted the other one in the merge accidentaly.

By the way for some reason

template<typename SortPolicy>
template<typename Archive>

produced an error but

template<typename Archive>
template<typename SortPolicy>

didn't. The second version is how it is in the master branch.

Finally we can close this 😄

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Jun 2, 2016

Member

Yes, the ordering makes a difference. You have to put the template declaration for the method before the template declaration for the class.

Member

rcurtin commented Jun 2, 2016

Yes, the ordering makes a difference. You have to put the template declaration for the method before the template declaration for the class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment