Hi Greg,
I am trying to do a simple parallel example. For some reason I am only getting 100% cpu usage, could you help explain what is going wrong? I think it will be an informative solution/documentation for people.
I am certain that my pragma commands are not being ignored by the compiler. I am not super familiar with omp, so maybe I am doing something wrong there. I have tried setting the number of threads in the pragma and on the command line.
Let me know if you need any more information. I already spent a lot of time trying to understand all of the possible solutions, but seems like it is annoyingly hard to parallelize a for loop like this. I don't really want to set up all of the threads and require them to be configured. Seems like omp would be great because you can define it on the command line.
template <typename Key, typename Value>
struct Dict {
Dict () {}
Dict ( Value default_value ) : default_value ( default_value ) {}
void __setitem__ ( py::array_t<Key> & key_array, py::array_t<Value> & value_array ) {
auto * key_array_ptr = (Key *) key_array.request().ptr;
auto * value_array_ptr = (Value *) value_array.request().ptr;
if ( key_array.size() != value_array.size() )
throw std::runtime_error("The size of the key and value must match.");
#pragma omp parallel for
for ( size_t idx = 0; idx < key_array.size(); idx++ ) {
dict.insert_or_assign( key_array_ptr[idx], value_array_ptr[idx] );
}
}
py::array_t<Value> __getitem__ ( py::array_t<Key> & key_array ) {
auto * key_array_ptr = (Key *) key_array.request().ptr;
auto result_array = py::array_t<Value> ( key_array.request().shape );
auto * result_array_ptr = (Value *) result_array.request().ptr;
#pragma omp parallel for
for ( size_t idx = 0; idx < key_array.size(); idx++ ) {
auto search = dict.find( key_array_ptr[idx] );
if ( search != dict.end() ) {
result_array_ptr[idx] = search->second;
} else {
result_array_ptr[idx] = default_value;
}
}
return result_array;
}
Value default_value;
phmap::parallel_flat_hash_map<
Key,
Value,
phmap::priv::hash_default_hash<Key>,
phmap::priv::hash_default_eq<Key>,
phmap::priv::Allocator<phmap::priv::Pair<Key,Value>>,
4,
std::mutex
> dict;
};
Hi Greg,
I am trying to do a simple parallel example. For some reason I am only getting 100% cpu usage, could you help explain what is going wrong? I think it will be an informative solution/documentation for people.
I am certain that my pragma commands are not being ignored by the compiler. I am not super familiar with omp, so maybe I am doing something wrong there. I have tried setting the number of threads in the pragma and on the command line.
Let me know if you need any more information. I already spent a lot of time trying to understand all of the possible solutions, but seems like it is annoyingly hard to parallelize a for loop like this. I don't really want to set up all of the threads and require them to be configured. Seems like omp would be great because you can define it on the command line.