New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSHSearch Parallelization #700
Changes from 1 commit
72999dd
95d417f
afcc881
5db5423
abef504
4cbd43e
a60ff91
6152527
7cf77cd
2ca48c6
a6aca41
3d536c7
c04b073
65983d1
b95a3ce
0d38271
3af80c3
b02e2f3
a1e9c28
ad8e6d3
c4c8ff9
074d726
f982ca5
1fb998f
b92d465
2fee61e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,7 +27,6 @@ option(BUILD_TESTS "Build tests." ON) | |
option(BUILD_CLI_EXECUTABLES "Build command-line executables." ON) | ||
option(BUILD_SHARED_LIBS | ||
"Compile shared libraries (if OFF, static libraries are compiled)." ON) | ||
#option(HAS_OPENMP "Use OpenMP for parallel execution, if available." ON) | ||
|
||
enable_testing() | ||
|
||
|
@@ -237,7 +236,7 @@ add_definitions(-DBOOST_TEST_DYN_LINK) | |
# This way we can skip calls to functions defined in omp.h with code like: | ||
# if (HAS_OPENMP == 1) { openMP code here } | ||
# If OpenMP is found, define HAS_OPENMP to be 1. Otherwise define it to be 0. | ||
find_package(OpenMP 3) | ||
find_package(OpenMP 3.0.0 ) | ||
if (OPENMP_FOUND) | ||
add_definitions(-DHAS_OPENMP) | ||
set(HAS_OPENMP "1") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if the CMake variable There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am defining it here to use it in If I remove the num_threads() directive from the pragmas, then I guess this whole thing is skippable. num_threads() was added because I was planning to do the whole hybrid parallelization scheme, but now that's out, I can simplify the code a bit. |
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,22 +12,6 @@ | |
namespace mlpack { | ||
namespace neighbor { | ||
|
||
// If OpenMP was found by the compiler and was used in compiling mlpack, | ||
// then we can get more than one thread. | ||
inline size_t CalculateMaxThreads() | ||
{ | ||
// HAS_OPENMP should be defined by CMakeLists after the check for OpenMP has | ||
// been performed. | ||
#ifdef HAS_OPENMP | ||
if (HAS_OPENMP) // If compiler has OpenMP support, use all available threads. | ||
return omp_get_max_threads(); | ||
return 1; // Compiler doesn't support OpenMP. Hard-wire maxThreads to 1. | ||
#endif | ||
|
||
// In case HAS_OPENMP wasn't properly defined by CMakeLists, use 1 thread. | ||
return 1; | ||
} | ||
|
||
// Construct the object with random tables | ||
template<typename SortPolicy> | ||
LSHSearch<SortPolicy>:: | ||
|
@@ -46,7 +30,6 @@ LSHSearch(const arma::mat& referenceSet, | |
bucketSize(bucketSize), | ||
distanceEvaluations(0) | ||
{ | ||
maxThreads = CalculateMaxThreads(); | ||
// Pass work to training function. | ||
Train(referenceSet, numProj, numTables, hashWidthIn, secondHashSize, | ||
bucketSize); | ||
|
@@ -69,7 +52,6 @@ LSHSearch(const arma::mat& referenceSet, | |
bucketSize(bucketSize), | ||
distanceEvaluations(0) | ||
{ | ||
maxThreads = CalculateMaxThreads(); | ||
// Pass work to training function | ||
Train(referenceSet, numProj, numTables, hashWidthIn, secondHashSize, | ||
bucketSize, projections); | ||
|
@@ -87,8 +69,6 @@ LSHSearch<SortPolicy>::LSHSearch() : | |
bucketSize(500), | ||
distanceEvaluations(0) | ||
{ | ||
// Only define maxThreads. Nothing else to do. | ||
maxThreads = CalculateMaxThreads(); | ||
} | ||
|
||
// Destructor. | ||
|
@@ -763,7 +743,7 @@ void LSHSearch<SortPolicy>::ReturnIndicesFromTable( | |
// Retrieve candidates. | ||
size_t start = 0; | ||
|
||
for (long long int i = 0; i < numTablesToSearch; ++i) // For all tables | ||
for (size_t i = 0; i < numTablesToSearch; ++i) // For all tables | ||
{ | ||
for (size_t p = 0; p < T + 1; ++p) | ||
{ | ||
|
@@ -842,14 +822,10 @@ void LSHSearch<SortPolicy>::Search(const arma::mat& querySet, | |
Timer::Start("computing_neighbors"); | ||
|
||
// Parallelization to process more than one query at a time. | ||
// use as many threads possible but not more than allowed number | ||
size_t numThreadsUsed = maxThreads; | ||
#pragma omp parallel for \ | ||
num_threads ( numThreadsUsed )\ | ||
shared(avgIndicesReturned, resultingNeighbors, distances) \ | ||
schedule(dynamic) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Two questions---
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The problem with static scheduling is it doesn't leave room for work-stealing. Since queries get unequal sizes of candidate sets, in static scheduling some threads will finish their chunks quickly and then be useless. In dynamic scheduling, the compiler will detect slackers and give them more work to do.
Yes I think I can simplify the code more now that we're not doing nested parallelism. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. About static vs dynamic scheduling, I ran some tests: Sift100k
phy
Corel
Miniboone
In the first 3, I'd say dynamic is slightly faster. It's hard to tell for Miniboone because the standard deviation is much larger than the difference. I'll run covertype and pokerhand in a while when my PC is not used. |
||
// Go through every query point. Use long int because some compilers complain | ||
// for openMP unsigned index variables. | ||
// Go through every query point. | ||
for (size_t i = 0; i < querySet.n_cols; i++) | ||
{ | ||
|
||
|
@@ -914,10 +890,7 @@ Search(const size_t k, | |
Timer::Start("computing_neighbors"); | ||
|
||
// Parallelization to process more than one query at a time. | ||
// use as many threads possible but not more than allowed number | ||
size_t numThreadsUsed = maxThreads; | ||
#pragma omp parallel for \ | ||
num_threads ( numThreadsUsed )\ | ||
shared(avgIndicesReturned, resultingNeighbors, distances) \ | ||
schedule(dynamic) | ||
// Go through every query point. Use long int because some compilers complain | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This weird thing is happening in AppVeyor:
In line 283, it performs the OpenMP test and actually does find version 3.0.0:
Shouldn't the build actually not find OpenMP since Visual Studio only supports version 2.0 according to this?
This is the cause for the failure - My CMake code thinks it found OpenMP > 3, but it hasn't...