gpu report for fast collision detection

1 Environment description

1.1 Experimental environment

Graphics Card: GTX 1060 3GB

CPU: AMD 3090xt

Memory: 64GB

Environmental certification:

GPU proof

CPU proof

1.2 Timing Description

Timing has two standards

The first criterion is: start sending data to the gpu, until the calculation result is completely sent back to the cpu

The second criterion is: the execution time of the kernel function.

2 Experiment ideas and results

2.1 Self-collision detection

2.1.1 GPU idea 1

In the implementation of caffe, Jia Yangqing said: expand the data in the CPU, and minimize the operation of the GPU to access the video memory, so that the performance can be improved. But improving the performance will also bring about the memory usage of the CPU and the memory usage of the video memory. It may cause you to have to divide the data into blocks, so that the process of collision detection becomes two for loops, each data block and the collision between each block.

Brief description of ideas:

1: The data is expanded, and the sequence of vertex data of each triangle is sent to the GPU, rather than the sequence of vertex index and all vertex data of each triangle.
2. Send the data to the gpu, and perform intersection through the tri_contact function.

method advantages

1. In theory, by significantly reducing the number of times the GPU accesses the global video memory (although the GPU will have a delay mask, it cannot be completely eliminated), the waste of time can be reduced.

method bad:

1. A lot of video memory and memory are wasted, but the waste is within the limit of constant times, and it has not increased by an order of magnitude.

experiment result timing:

total GPU time Kernel time

Experimental results show

The kernel function is as follows: This code only uses an atomicadd operation, but the code has calculation results when a single face collision exceeds 32, and errors may occur.

__global__ void gpu_selfcheck(
vec3fcu* data,
vec3icu* dataid,
vec2fcu* res,
int* ressize,
int leftstart,
int leftsize,
int size)
{
int blockid = gridDim.x*blockIdx.y + blockIdx.x;
int threadid = blockDim.x * threadIdx.y + threadIdx.x ;
int threadsize = blockDim.x * blockDim.y;
__shared__ int mutexx;
__shared__ int contacted[32] ;
mutexx = 0;

if(blockid<leftsize && (blockid+leftstart)<size)
{

int leftcur = (leftstart + blockid)*3;
vec3fcu leftdata1 = data[ leftcur ] ;
vec3fcu leftdata2 = data[ leftcur + 1] ;
vec3fcu leftdata3 = data[ leftcur + 2] ;
vec3icu leftid = dataid[leftstart+blockid];

for( int i = leftcur+threadid*3+3 ; i < size*3 ; i += threadsize*3 )
{

vec3fcu next1 = data[i];
vec3fcu next2 = data[i + 1];
vec3fcu next3 = data[i + 2];
vec3icu nextid = dataid[int(i/3)];

bool cons = true;
for(int g = 0 ; g< 3; g++)
{
for(int h = 0 ; h < 3 ; h++)
{
if(leftid[g]==nextid[h])
{
cons = false;
break ;
}
}
}

if(cons&&cutri_contact(leftdata1,leftdata2,leftdata3, next1,next2,next3))
{
int collision_id = atomicAdd(&mutexx,1) ;
contacted[collusion_id] = i/3;

}

}

if(threadid<mutexx)
{
res[blockid*32+threadid+1] = vec2fcu(leftcur/3,contacted[threadid]);
}
if(threadid==0)
{
res[blockid*32] = vec2fcu(mutexx,leftcur/3);
*ressize += mutexx;
}

}
}

2.1.2 gpu idea 2

For the most common idea, it is to inherit the teacher's code in its entirety, but provide GPU support for some key data results and corresponding methods of key data structures. In the implementation process, I used a lot of Lock struct (see Cuda by example) and a lot of synchronization, but after the teacher pointed out, I used the return value of atomicAdd to refactor all the code and eliminate all synchronization.

Brief description of ideas:

1: Send each triangle's vertex index sequence and all vertex data to the GPU
2. Construct triangle data on the gpu side, and use the tri_contact function to intersect.

method advantages

1. Moderate, low memory and video memory usage.

method bad:

1. The gpu needs to reconstruct the entire triangle from the vertex data through the vertex index inside the triangle on the gpu side.

experiment result timing:

total GPU time kernel function time

Experimental results show

__global__ void gpu_self_check_using_trifs(vec3fcu* vtxs,
vec3icu* dataid,
vec2fcu* res,
int* ressize,
int leftstart,
int leftsize,
int size)
{
int blockid = gridDim.x*blockIdx.y + blockIdx.x;
int threadid = blockDim.x * threadIdx.y + threadIdx.x ;
int threadsize = blockDim.x * blockDim.y;
__shared__ int mutexx;
__shared__ int contacted[32];
mutexx = 0;

if(blockid<leftsize && (blockid+leftstart)<size)
{

int leftcur = leftstart + blockid;
vec3fcu leftdata1 = vtxs[dataid[leftcur].x];
vec3fcu leftdata2 = vtxs[dataid[leftcur].y];
vec3fcu leftdata3 = vtxs[dataid[leftcur].z];
vec3icu leftid = dataid[leftcur];

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
BVH		BVH
cmake-build-debug		cmake-build-debug
cmake-build-release		cmake-build-release
cmake		cmake
cudalib		cudalib
external		external
images		images
inc		inc
lib		lib
set_result		set_result
src		src
CMakeLists.txt		CMakeLists.txt
README.md		README.md
objCDchecker.sln		objCDchecker.sln
objChecker.vcxproj		objChecker.vcxproj
objChecker.vcxproj.filters		objChecker.vcxproj.filters
objChecker.vcxproj.user		objChecker.vcxproj.user
vssver2.scc		vssver2.scc

lidan233/objCDchecker

Folders and files

Latest commit

History

Repository files navigation

gpu report for fast collision detection

1 Environment description

1.1 Experimental environment

1.2 Timing Description

2 Experiment ideas and results

2.1 Self-collision detection

2.1.1 GPU idea 1

2.1.2 gpu idea 2

About

Resources

Stars

Watchers

Forks

Languages