Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: improve CUDA code with C++11 features #529

Open
GPMueller opened this issue May 23, 2019 · 1 comment · May be fixed by #559
Open

Core: improve CUDA code with C++11 features #529

GPMueller opened this issue May 23, 2019 · 1 comment · May be fixed by #559

Comments

@GPMueller
Copy link
Member

GPMueller commented May 23, 2019

See https://devblogs.nvidia.com/power-cpp11-cuda-7 for a great summary.
To use lambdas properly in the backend wrapper functions, the flag --expt-extended-lambda needs to be used and CUDA Toolkit has to be 7.5 or newer (see here).

The generic parallel lambda application function might look like

template<typename Lambda, typename... Args>
__global__
void cu_parallel_execution_kernel(size_t N, Lambda lambda, Args... args)
{
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if( idx < N )
    {
        lambda(idx, args...);
    }
}
template<typename Lambda, typename... Args>
void cu_parallel_execution(size_t N, Lambda lambda, Args... args)
{
    cu_parallel_execution_kernel<<<(N+1023)/1024, 1024>>>(N, lambda, args...);
    CU_CHECK_AND_SYNC();
}

and backend wrapper functions would look like

void normalize_vectors(vectorfield & vf)
{
    auto lambda = [=] __device__ (size_t idx, Vector3 * vf) {
        vf[idx].normalize();
    };
    int n = vf.size();
    parallel_execution<<<(n+1023)/1024, 1024>>>(n, lambda, vf.data());
}

or

void add_c_cross(const scalar & c, const vectorfield & a, const vectorfield & b, vectorfield & out)
{
    auto lambda = [=] __device__ (size_t idx, scalar c, const Vector3 * a, const Vector3 * b, Vector3 * out)
    {
        out[idx] += c*a[idx].cross(b[idx]);
    };
    int n = out.size();
    _parallel_execution<<<(n+1023)/1024, 1024>>>(n, lambda, c, a.data(), b.data(), out.data());
}
@GPMueller GPMueller linked a pull request Mar 28, 2020 that will close this issue
2 tasks
@GPMueller GPMueller linked a pull request May 25, 2020 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant