The opal_hwloc_base_get_npus() function needs to count the number of bits turned on in a cpuset.
This patch replaces a linear bit counting loop with a call to an already existing efficient population count routine: hwloc_bitmap_weight()
which effectively counts the bits a word at a time.
I did not test the performance gain, but as per-node core counts keep going up, this can't hurt.
See pull request #1297