You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, distance computation kernels are not perfectly load balanced. Most kernels follow this pattern: launch the number of threads on one operand and for each thread loop over points/segments of the correponding pair in the other. This can slow down the operation if the data are unbalanced/skewed.
Instead, the kernels should launch the number of thread that's makes one thread computes one pair of point-point/point-segment/segment-segment, then uses atomic operation to aggregate the result. This avoids slow down if the data is skewed.
The text was updated successfully, but these errors were encountered:
This work require further investigation on the performance for load-balanced kernel. According to a recent benchmark , the "loop-segment" kernels performs better than load-balanced kernel if the dataset is not skewed. However, there exists some ground for optimization for load balanced kernel.
Currently, distance computation kernels are not perfectly load balanced. Most kernels follow this pattern: launch the number of threads on one operand and for each thread loop over points/segments of the correponding pair in the other. This can slow down the operation if the data are unbalanced/skewed.
Instead, the kernels should launch the number of thread that's makes one thread computes one pair of point-point/point-segment/segment-segment, then uses atomic operation to aggregate the result. This avoids slow down if the data is skewed.
The text was updated successfully, but these errors were encountered: