Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degrades with parallel openmp threads #47

Closed
drlight-code opened this issue Oct 5, 2020 · 6 comments
Closed

Performance degrades with parallel openmp threads #47

drlight-code opened this issue Oct 5, 2020 · 6 comments
Labels
invalid This doesn't seem right

Comments

@drlight-code
Copy link

While searching for the cause of #46 I set OMP_NUM_THREADS=1 to exclude any threading issues. Interestingly, the performance increased by about a factor of 6 on this 8-core Intel machine when compared to OMP_NUM_THREADS=8.

@neka-nat
Copy link
Owner

neka-nat commented Oct 5, 2020

Hi,
Thank you for your feedback.
Do you mean that the 8 threads computation is slower?
Which algorithm is slow?

@drlight-code
Copy link
Author

Hi, yes the filterreg algorithm (in the case where the computation fails to converge in the M-step) runs more than 6 times faster on one core than on 8.

@neka-nat
Copy link
Owner

neka-nat commented Oct 5, 2020

Thanks.
Filterreg uses OpenMP when initializing sigma2 in update_sigma2 mode,
but it doesn't use OpenMP for anything else, so it should have little effect.
This is my result (my environment has 4 cores).

$ fgrep 'cpu cores' /proc/cpuinfo | sort -u | sed 's/.*: //'
4
$ OMP_NUM_THREADS=1 python time_measurement.py
Format = auto
Extension = pcd
geometry::PointCloud with 1406 points.
ICP(Open3D):  0.057436556002357975
CPD:  2.887935592996655
SVR:  0.5230609589925734
GMMTree:  0.4820108529966092
FilterReg:  0.03199184100958519
$ OMP_NUM_THREADS=4 python time_measurement.py
Format = auto
Extension = pcd
geometry::PointCloud with 1406 points.
ICP(Open3D):  0.017392859008396044
CPD:  2.795173263992183
SVR:  0.2235398410120979
GMMTree:  0.2991166110004997
FilterReg:  0.025288498000008985

@drlight-code
Copy link
Author

As far as I can tell from the source code, filterreg uses OpenMP both for the pt2pt objective (kabsch.cc) as well as for the pt2pl objective (point_to_plane.cc). If I run the minimal example I posted in #46 with timing measurement added, I get the following results.

$ OMP_NUM_THREADS=8 python test_pt2pl.py 
read frames: 2
time for registration: 15.1566481590271 seconds

$ OMP_NUM_THREADS=1 python test_pt2pl.py 
read frames: 2
time for registration: 2.2706847190856934 seconds

@neka-nat
Copy link
Owner

neka-nat commented Oct 6, 2020

Thanks.
I remember.
I tested the pt2pl case with your code and it seemed to be fine.

import os
import time
import copy

import numpy as np
import open3d as o3d
import igl

from probreg import filterreg

#import logging
#log = logging.getLogger('probreg')
#log.setLevel(logging.DEBUG)

frame_vertices = []
frame_normals = []

for filename in ['pt2pl-no-converge/frame00019.obj', 'pt2pl-no-converge/frame00018.obj']:
    [v, _, n, _, _, _] = igl.read_obj(filename)
    frame_vertices.append(o3d.utility.Vector3dVector(v))
    frame_normals.append(o3d.utility.Vector3dVector(n))

print('read frames: ' + str(len(frame_vertices)))

test_source = o3d.geometry.PointCloud()
test_source.points = frame_vertices[0]
test_source.normals = frame_normals[0]

test_target = o3d.geometry.PointCloud()
test_target.points = frame_vertices[1]
test_target.normals = frame_normals[1]

s = time.time()
mstepresult = filterreg.registration_filterreg(test_source, test_target,
                                               target_normals=test_target.normals,
                                               maxiter=1000,
                                               sigma2=0.1,
                                               tol=0.001,
                                               objective_type='pt2pl',
                                               callbacks=[])
print(time.time() - s)
$ OMP_NUM_THREADS=1 python test2.py
read frames: 2
3.148405075073242
$ OMP_NUM_THREADS=4 python test2.py
read frames: 2
3.144928216934204
$ OMP_NUM_THREADS=8 python test2.py
read frames: 2
4.1369547843933105

CPU usage history.

openmp

@drlight-code
Copy link
Author

For the record:

Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 8.3.0-6' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 8.3.0 (Debian 8.3.0-6) 

@neka-nat neka-nat added the invalid This doesn't seem right label Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants