Performance degrades with parallel openmp threads #47

drlight-code · 2020-10-05T14:57:29Z

While searching for the cause of #46 I set OMP_NUM_THREADS=1 to exclude any threading issues. Interestingly, the performance increased by about a factor of 6 on this 8-core Intel machine when compared to OMP_NUM_THREADS=8.

The text was updated successfully, but these errors were encountered:

neka-nat · 2020-10-05T15:19:01Z

Hi,
Thank you for your feedback.
Do you mean that the 8 threads computation is slower?
Which algorithm is slow?

drlight-code · 2020-10-05T15:29:59Z

Hi, yes the filterreg algorithm (in the case where the computation fails to converge in the M-step) runs more than 6 times faster on one core than on 8.

neka-nat · 2020-10-05T15:44:55Z

Thanks.
Filterreg uses OpenMP when initializing sigma2 in update_sigma2 mode,
but it doesn't use OpenMP for anything else, so it should have little effect.
This is my result (my environment has 4 cores).

$ fgrep 'cpu cores' /proc/cpuinfo | sort -u | sed 's/.*: //'
4
$ OMP_NUM_THREADS=1 python time_measurement.py
Format = auto
Extension = pcd
geometry::PointCloud with 1406 points.
ICP(Open3D):  0.057436556002357975
CPD:  2.887935592996655
SVR:  0.5230609589925734
GMMTree:  0.4820108529966092
FilterReg:  0.03199184100958519
$ OMP_NUM_THREADS=4 python time_measurement.py
Format = auto
Extension = pcd
geometry::PointCloud with 1406 points.
ICP(Open3D):  0.017392859008396044
CPD:  2.795173263992183
SVR:  0.2235398410120979
GMMTree:  0.2991166110004997
FilterReg:  0.025288498000008985

drlight-code · 2020-10-06T05:42:28Z

As far as I can tell from the source code, filterreg uses OpenMP both for the pt2pt objective (kabsch.cc) as well as for the pt2pl objective (point_to_plane.cc). If I run the minimal example I posted in #46 with timing measurement added, I get the following results.

$ OMP_NUM_THREADS=8 python test_pt2pl.py 
read frames: 2
time for registration: 15.1566481590271 seconds

$ OMP_NUM_THREADS=1 python test_pt2pl.py 
read frames: 2
time for registration: 2.2706847190856934 seconds

neka-nat · 2020-10-06T11:23:58Z

Thanks.
I remember.
I tested the pt2pl case with your code and it seemed to be fine.

import os
import time
import copy

import numpy as np
import open3d as o3d
import igl

from probreg import filterreg

#import logging
#log = logging.getLogger('probreg')
#log.setLevel(logging.DEBUG)

frame_vertices = []
frame_normals = []

for filename in ['pt2pl-no-converge/frame00019.obj', 'pt2pl-no-converge/frame00018.obj']:
    [v, _, n, _, _, _] = igl.read_obj(filename)
    frame_vertices.append(o3d.utility.Vector3dVector(v))
    frame_normals.append(o3d.utility.Vector3dVector(n))

print('read frames: ' + str(len(frame_vertices)))

test_source = o3d.geometry.PointCloud()
test_source.points = frame_vertices[0]
test_source.normals = frame_normals[0]

test_target = o3d.geometry.PointCloud()
test_target.points = frame_vertices[1]
test_target.normals = frame_normals[1]

s = time.time()
mstepresult = filterreg.registration_filterreg(test_source, test_target,
                                               target_normals=test_target.normals,
                                               maxiter=1000,
                                               sigma2=0.1,
                                               tol=0.001,
                                               objective_type='pt2pl',
                                               callbacks=[])
print(time.time() - s)

$ OMP_NUM_THREADS=1 python test2.py
read frames: 2
3.148405075073242
$ OMP_NUM_THREADS=4 python test2.py
read frames: 2
3.144928216934204
$ OMP_NUM_THREADS=8 python test2.py
read frames: 2
4.1369547843933105

CPU usage history.

drlight-code · 2020-10-07T08:25:57Z

For the record:

Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 8.3.0-6' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 8.3.0 (Debian 8.3.0-6)

neka-nat added the invalid This doesn't seem right label Oct 15, 2020

neka-nat closed this as completed Oct 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degrades with parallel openmp threads #47

Performance degrades with parallel openmp threads #47

drlight-code commented Oct 5, 2020

neka-nat commented Oct 5, 2020

drlight-code commented Oct 5, 2020

neka-nat commented Oct 5, 2020 •

edited

drlight-code commented Oct 6, 2020

neka-nat commented Oct 6, 2020

drlight-code commented Oct 7, 2020

Performance degrades with parallel openmp threads #47

Performance degrades with parallel openmp threads #47

Comments

drlight-code commented Oct 5, 2020

neka-nat commented Oct 5, 2020

drlight-code commented Oct 5, 2020

neka-nat commented Oct 5, 2020 • edited

drlight-code commented Oct 6, 2020

neka-nat commented Oct 6, 2020

drlight-code commented Oct 7, 2020

neka-nat commented Oct 5, 2020 •

edited