Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Multiprobe LSH #691

Merged
merged 29 commits into from Jun 30, 2016
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8dd409d
Implementation of Multiprobe LSH, version 1
mentekid Jun 8, 2016
8264347
Merge branch 'lsh-computeRecall' into lsh-multiprobe
mentekid Jun 8, 2016
f24ac2c
Merge branch 'lsh-deterministictest' into lsh-multiprobe
mentekid Jun 8, 2016
27510fe
There is a bug in ReturnIndicesFromTable
mentekid Jun 8, 2016
a48c842
There is a bug in ReturnIndicesFromTable
mentekid Jun 8, 2016
4adaf8f
Fixes two minor bugs causing major headaches
mentekid Jun 8, 2016
8bc5ced
Adds a first multiprobe test
mentekid Jun 9, 2016
a048f46
Merge branch 'lsh-computeRecall' into lsh-multiprobe
mentekid Jun 9, 2016
d332ea1
Add code that replaces multiprobe codes with zeros for bottleneck pro…
mentekid Jun 13, 2016
00377a8
Fixes minor typo that caused multiprobe test to not happen
mentekid Jun 13, 2016
f0638a0
Adds a deterministic test for multiprobe
mentekid Jun 14, 2016
6d11521
Removes some redundant code, fixes comments
mentekid Jun 14, 2016
fa7f62d
Fixes style, adds documentation
mentekid Jun 17, 2016
ae81ee5
Fixes style issues, optimizes code a bit
mentekid Jun 22, 2016
840efe9
Merged changes from #690 and #675
mentekid Jun 22, 2016
2af985c
Fixes bug caused merge. Changes code in lsh_test.cpp that caused test…
mentekid Jun 23, 2016
75dead3
Uses arma::Row<char> instead of std::vector for perturbation sets
mentekid Jun 23, 2016
89b3c7b
Fixes bug in perturbationValid
mentekid Jun 23, 2016
79b954a
Replaces arma::Row<bool> with std::vector<bool> to conserve space
mentekid Jun 23, 2016
e2596c5
Workaround to avoid copy of hashMat
mentekid Jun 23, 2016
5d603b2
More style fixes
mentekid Jun 24, 2016
039af23
Typo fix
mentekid Jun 24, 2016
71eda99
Removes MultiprobeDeterministicTest because it is not correct.
mentekid Jun 24, 2016
2aa839c
Makes Perturbation functions members of LSHSearch
mentekid Jun 28, 2016
cc6a5c2
Fixes a lot of style issues
mentekid Jun 28, 2016
4ba6f6d
Re-adds Deterministic Multiprobe Test
mentekid Jun 28, 2016
71edfbb
Merge branch 'master' into lsh-multiprobe
mentekid Jun 29, 2016
d418c09
Merge branch 'master' into lsh-multiprobe
mentekid Jun 29, 2016
f1e11fd
Fixes style issues in LSH Tests and LSH Class
mentekid Jun 29, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 3 additions & 3 deletions src/mlpack/methods/lsh/lsh_search.hpp
Expand Up @@ -359,10 +359,10 @@ class LSHSearch
* @param additionalProbingBins matrix. Each column will hold one additional
* bin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies here about the line wrap spacing.

*/
void GetAdditionalProbingBins(const arma::vec &queryCode,
const arma::vec &queryCodeNotFloored,
void GetAdditionalProbingBins(const arma::vec& queryCode,
const arma::vec& queryCodeNotFloored,
const size_t T,
arma::mat &additionalProbingBins) const;
arma::mat& additionalProbingBins) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spacing seems off here, the second/third/fourth arguments should align with the first.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a matter of preference, but to be consistent with the rest of the code: when declaring a variable, place the * or & adjacent to the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I didn't notice that.

//! Reference dataset.
const arma::mat* referenceSet;
Expand Down
27 changes: 13 additions & 14 deletions src/mlpack/methods/lsh/lsh_search_impl.hpp
Expand Up @@ -402,20 +402,22 @@ inline bool perturbationValid(const std::vector<size_t>& A,
}
*/

//Returns the score of a perturbation vector generated by perturbation set A.
//The score of a pertubation set (vector) is the sum of scores of the
//participating actions.
// Returns the score of a perturbation vector generated by perturbation set A.
// The score of a pertubation set (vector) is the sum of scores of the
// participating actions.
inline double perturbationScore(const std::vector<bool>& A,
const arma::vec& scores)
{
double score = 0.0;
for (size_t i = 0; i < A.size(); ++i)
score += A[i] ? scores(i) : 0; //add scores of non-zero indices
if (A[i])
score += scores(i) // add scores of non-zero indices
return score;
}

// Inline function used by GetAdditionalProbingBins. The vector shift operation
// replaces the largest element of a vector A with (largest element) + 1.
// Returns true if resulting vector is valid, otherwise false.
inline bool perturbationShift(std::vector<bool>& A)
{
size_t maxPos = 0;
Expand All @@ -434,18 +436,19 @@ inline bool perturbationShift(std::vector<bool>& A)

// Inline function used by GetAdditionalProbingBins. The vector expansion
// operation adds the element [1 + (largest_element)] to a vector A, where
// largest_element is the largest element of A.
// largest_element is the largest element of A. Returns true if resulting vector
// is valid, otherwise false.
inline bool perturbationExpand(std::vector<bool>& A)
{
// Find the last '1' in A
size_t maxPos = 0;
for (size_t i = 0; i < A.size(); ++i)
if (A[i]) //marked true
if (A[i]) // marked true
maxPos = i;

if ( maxPos + 1 < A.size()) // otherwise, this is an invalid vector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (maxPos + 1 < A.size())

{
A[maxPos+1] = 1;
A[maxPos + 1] = 1;
return true;
}
return false;
Expand All @@ -458,16 +461,12 @@ inline bool perturbationExpand(std::vector<bool>& A)
inline bool perturbationValid(const std::vector<bool>& A,
const size_t numProj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another small comment---we should name this function PerturbationValid (and the others should have a capital first letter too). It might make more sense to put them inside of the LSH class just for the sake of organization, but whatever you want to do there is fine.

{
// Stack allocation and initialization to 0 (bool check[numProj] = {0}) made
// some compilers complain, and std::vector might even be compressed (depends
// on implementation) so this saves some space.
// Use check to mark dimensions we have seen before in A. If a dimension is
// seen twice (or more), A is not a valid perturbation.
std::vector<bool> check(numProj);

if (A.size() > 2 * numProj)
{
Log::Assert(1 == 2);
return false; // This should never happen
}
return false; // This should never happen.

// Check that we only see each dimension once. If not, vector is not valid.
for (size_t i = 0; i < A.size(); ++i)
Expand Down