Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSHSearch Parallelization #700

Merged
merged 26 commits into from Jul 8, 2016
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
72999dd
Implements parallel query processing for LSH
mentekid Jun 19, 2016
95d417f
Implements parallel query processing for LSH
mentekid Jun 19, 2016
afcc881
Book-keeping of number of threads used
mentekid Jun 19, 2016
5db5423
Adds parallelization to bichromatic search
mentekid Jun 19, 2016
abef504
Adds Bichromatic Parallel Test
mentekid Jun 19, 2016
4cbd43e
Merge branch 'master' into lsh-parallelization
mentekid Jun 24, 2016
a60ff91
Adds parallelism option to CMakeFiles, removes most omp.h dependence
mentekid Jun 27, 2016
6152527
Removes numThreadsUsed variable, changes how maxThreads is initialized
mentekid Jun 27, 2016
7cf77cd
Changes placeholder code for OpenMP
mentekid Jun 27, 2016
2ca48c6
Fixes CMakeFiles and code to make openMP actually transparent
mentekid Jun 27, 2016
a6aca41
CMake trickery to maybe make travis not crash
mentekid Jun 27, 2016
3d536c7
Simplifies OpenMP transparency code
mentekid Jun 27, 2016
c04b073
Adds -Wno-unknown-pragmas
mentekid Jun 27, 2016
65983d1
Commit to switch branch
mentekid Jun 28, 2016
b95a3ce
Changes BaseCase to include loop over candidate set. Changes loop var…
mentekid Jul 1, 2016
0d38271
Removes commented-out old BaseCase code
mentekid Jul 1, 2016
3af80c3
Merges multiprobe LSH
mentekid Jul 1, 2016
b02e2f3
Changes BaseCase to include the loop over candidate set. Places paral…
mentekid Jul 6, 2016
a1e9c28
Merges 3fe0b72
mentekid Jul 6, 2016
ad8e6d3
Modifies CMakeLists to remove -DHAS_OPENMP
mentekid Jul 7, 2016
c4c8ff9
Removes old code and comments from CMakeLists.txt
mentekid Jul 7, 2016
074d726
Restores size_t for openMP loop counters, changes CmakeLists to requi…
mentekid Jul 8, 2016
f982ca5
Removes maxThreads functionality from LSHSearch class
mentekid Jul 8, 2016
1fb998f
Removes HAS_OPENMP definition in CMakeFiles
mentekid Jul 8, 2016
b92d465
Workaround for OpenMP 2.0 (based on dt_utils.cpp)
mentekid Jul 8, 2016
2fee61e
Transforms omp loop to reduction
mentekid Jul 8, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 1 addition & 2 deletions CMakeLists.txt
Expand Up @@ -27,7 +27,6 @@ option(BUILD_TESTS "Build tests." ON)
option(BUILD_CLI_EXECUTABLES "Build command-line executables." ON)
option(BUILD_SHARED_LIBS
"Compile shared libraries (if OFF, static libraries are compiled)." ON)
#option(HAS_OPENMP "Use OpenMP for parallel execution, if available." ON)

enable_testing()

Expand Down Expand Up @@ -237,7 +236,7 @@ add_definitions(-DBOOST_TEST_DYN_LINK)
# This way we can skip calls to functions defined in omp.h with code like:
# if (HAS_OPENMP == 1) { openMP code here }
# If OpenMP is found, define HAS_OPENMP to be 1. Otherwise define it to be 0.
find_package(OpenMP 3)
find_package(OpenMP 3.0.0 )
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This weird thing is happening in AppVeyor:
In line 283, it performs the OpenMP test and actually does find version 3.0.0:

277 -- Try OpenMP C flag = [/openmp]
278 -- Performing Test OpenMP_FLAG_DETECTED
279 -- Performing Test OpenMP_FLAG_DETECTED - Success
280 -- Try OpenMP CXX flag = [/openmp]
281 -- Performing Test OpenMP_FLAG_DETECTED
282 -- Performing Test OpenMP_FLAG_DETECTED - Success
283 -- Found OpenMP: /openmp (Required is at least version "3.0.0") 

Shouldn't the build actually not find OpenMP since Visual Studio only supports version 2.0 according to this?

This is the cause for the failure - My CMake code thinks it found OpenMP > 3, but it hasn't...

if (OPENMP_FOUND)
add_definitions(-DHAS_OPENMP)
set(HAS_OPENMP "1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the CMake variable HAS_OPENMP is necessary; it doesn't look like it's used anywhere. Maybe I overlooked something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am defining it here to use it in lsh_search_impl.hpp, where I check if HAS_OPENMP == 1 or HAS_OPENMP == 0. I'm not sure this is the correct usage for add_definitions() - I just saw it worked and went with it.

If I remove the num_threads() directive from the pragmas, then I guess this whole thing is skippable. num_threads() was added because I was planning to do the whole hybrid parallelization scheme, but now that's out, I can simplify the code a bit.

Expand Down
9 changes: 0 additions & 9 deletions src/mlpack/methods/lsh/lsh_search.hpp
Expand Up @@ -268,12 +268,6 @@ class LSHSearch
//! removed in mlpack 2.1.0!
const arma::mat& Projection(size_t i) { return projections.slice(i); }

//! Set the maximum number of threads the object is allowed to use
void MaxThreads(size_t numThreads) { maxThreads = numThreads;}

//! Return the current maxumum threads the object is allowed to use
size_t MaxThreads(void) const { return maxThreads; }

private:
/**
* This function takes a query and hashes it into each of the hash tables to
Expand Down Expand Up @@ -451,9 +445,6 @@ class LSHSearch
//! The number of distance evaluations.
size_t distanceEvaluations;

//! The maximum number of threads allowed.
size_t maxThreads;

}; // class LSHSearch

} // namespace neighbor
Expand Down
31 changes: 2 additions & 29 deletions src/mlpack/methods/lsh/lsh_search_impl.hpp
Expand Up @@ -12,22 +12,6 @@
namespace mlpack {
namespace neighbor {

// If OpenMP was found by the compiler and was used in compiling mlpack,
// then we can get more than one thread.
inline size_t CalculateMaxThreads()
{
// HAS_OPENMP should be defined by CMakeLists after the check for OpenMP has
// been performed.
#ifdef HAS_OPENMP
if (HAS_OPENMP) // If compiler has OpenMP support, use all available threads.
return omp_get_max_threads();
return 1; // Compiler doesn't support OpenMP. Hard-wire maxThreads to 1.
#endif

// In case HAS_OPENMP wasn't properly defined by CMakeLists, use 1 thread.
return 1;
}

// Construct the object with random tables
template<typename SortPolicy>
LSHSearch<SortPolicy>::
Expand All @@ -46,7 +30,6 @@ LSHSearch(const arma::mat& referenceSet,
bucketSize(bucketSize),
distanceEvaluations(0)
{
maxThreads = CalculateMaxThreads();
// Pass work to training function.
Train(referenceSet, numProj, numTables, hashWidthIn, secondHashSize,
bucketSize);
Expand All @@ -69,7 +52,6 @@ LSHSearch(const arma::mat& referenceSet,
bucketSize(bucketSize),
distanceEvaluations(0)
{
maxThreads = CalculateMaxThreads();
// Pass work to training function
Train(referenceSet, numProj, numTables, hashWidthIn, secondHashSize,
bucketSize, projections);
Expand All @@ -87,8 +69,6 @@ LSHSearch<SortPolicy>::LSHSearch() :
bucketSize(500),
distanceEvaluations(0)
{
// Only define maxThreads. Nothing else to do.
maxThreads = CalculateMaxThreads();
}

// Destructor.
Expand Down Expand Up @@ -763,7 +743,7 @@ void LSHSearch<SortPolicy>::ReturnIndicesFromTable(
// Retrieve candidates.
size_t start = 0;

for (long long int i = 0; i < numTablesToSearch; ++i) // For all tables
for (size_t i = 0; i < numTablesToSearch; ++i) // For all tables
{
for (size_t p = 0; p < T + 1; ++p)
{
Expand Down Expand Up @@ -842,14 +822,10 @@ void LSHSearch<SortPolicy>::Search(const arma::mat& querySet,
Timer::Start("computing_neighbors");

// Parallelization to process more than one query at a time.
// use as many threads possible but not more than allowed number
size_t numThreadsUsed = maxThreads;
#pragma omp parallel for \
num_threads ( numThreadsUsed )\
shared(avgIndicesReturned, resultingNeighbors, distances) \
schedule(dynamic)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions---

  • Is the dynamic schedule the right one to use here? My understanding was that the dynamic schedule had more overhead. In this case it seems like the default static schedule would be just fine.
  • I think the num_threads() call here will effectively set the number of threads used to the value of the environment variable OMP_NUM_THREADS, but that would be the same as the default anyway if you hadn't set num_threads(). So it makes me think that the maxThreads member is unnecessary (and the other supporting functionality).

Copy link
Contributor Author

@mentekid mentekid Jul 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the dynamic schedule the right one to use here? My understanding was that the dynamic schedule had more overhead. In this case it seems like the default static schedule would be just fine.

The problem with static scheduling is it doesn't leave room for work-stealing. Since queries get unequal sizes of candidate sets, in static scheduling some threads will finish their chunks quickly and then be useless. In dynamic scheduling, the compiler will detect slackers and give them more work to do.
I went with dynamic without trying static first, because of this. I can try that and see if there's a difference.

I think the num_threads() call here will effectively set the number of threads used to the value of the environment variable OMP_NUM_THREADS, but that would be the same as the default anyway if you hadn't set num_threads(). So it makes me think that the maxThreads member is unnecessary (and the other supporting functionality).

Yes I think I can simplify the code more now that we're not doing nested parallelism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About static vs dynamic scheduling, I ran some tests:

Sift100k

dynamic: 0.086 +/- 0.008 s
 static: 0.093 +/- 0.014 s

phy

dynamic: 3.35 +/- 0.124 s
 static: 3.46 +/- 0.203 s

Corel

dynamic: 0.228 +/- 0.03 s
 static: 0.234 +/- 0.03 s

Miniboone

dynamic: 0.711 +/- 0.07 s
 static: 0.701 +/- 0.09 s

In the first 3, I'd say dynamic is slightly faster. It's hard to tell for Miniboone because the standard deviation is much larger than the difference. I'll run covertype and pokerhand in a while when my PC is not used.

// Go through every query point. Use long int because some compilers complain
// for openMP unsigned index variables.
// Go through every query point.
for (size_t i = 0; i < querySet.n_cols; i++)
{

Expand Down Expand Up @@ -914,10 +890,7 @@ Search(const size_t k,
Timer::Start("computing_neighbors");

// Parallelization to process more than one query at a time.
// use as many threads possible but not more than allowed number
size_t numThreadsUsed = maxThreads;
#pragma omp parallel for \
num_threads ( numThreadsUsed )\
shared(avgIndicesReturned, resultingNeighbors, distances) \
schedule(dynamic)
// Go through every query point. Use long int because some compilers complain
Expand Down
8 changes: 6 additions & 2 deletions src/mlpack/tests/lsh_test.cpp
Expand Up @@ -784,8 +784,10 @@ BOOST_AUTO_TEST_CASE(ParallelBichromatic)
lshTest.Search(qdata, k, parallelNeighbors, distances);

// Now perform same search but with 1 thread
lshTest.MaxThreads(1);
size_t prevNumThreads = omp_get_max_threads(); // Store number of threads used.
omp_set_num_threads(1);
lshTest.Search(qdata, k, sequentialNeighbors, distances);
omp_set_num_threads(prevNumThreads);

// Require both have same results
double recall = LSHSearch<>::ComputeRecall(sequentialNeighbors, parallelNeighbors);
Expand Down Expand Up @@ -818,8 +820,10 @@ BOOST_AUTO_TEST_CASE(ParallelMonochromatic)
lshTest.Search(k, parallelNeighbors, distances);

// Now perform same search but with 1 thread.
lshTest.MaxThreads(1);
size_t prevNumThreads = omp_get_max_threads(); // Store number of threads used.
omp_set_num_threads(1);
lshTest.Search(k, sequentialNeighbors, distances);
omp_set_num_threads(prevNumThreads);

// Require both have same results.
double recall = LSHSearch<>::ComputeRecall(sequentialNeighbors, parallelNeighbors);
Expand Down