Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use customised stream in thrust APIs #2

Closed
lgarithm opened this issue Sep 29, 2020 · 3 comments
Closed

use customised stream in thrust APIs #2

lgarithm opened this issue Sep 29, 2020 · 3 comments

Comments

@lgarithm
Copy link
Collaborator

@xiaoming-qxm @baoleai we can use thrust::cuda::par.on(stream).

e.g.

#include <cuda_runtime.h>

#include <thrust/device_vector.h>
#include <thrust/transform.h>

template <typename T>
class cap_by
{
    const T cap;

  public:
    cap_by(const T cap) : cap(cap) {}

    __host__ __device__ T operator()(T x) const
    {
        if (x > cap) { return cap; }
        return x;
    }
};

void f(cudaStream_t stream)
{
    int n = 1 << 10;
    using T = int;
    thrust::device_vector<T> xs(n);
    thrust::device_vector<T> ys(n);

    thrust::sort(thrust::cuda::par.on(stream), xs.begin(), xs.end());
    int k = 5;
    thrust::transform(thrust::cuda::par.on(stream), xs.begin(), xs.end(),
                      ys.begin(), cap_by<T>(k));
}

int main()
{
    cudaStream_t stream;
    cudaStreamCreate(&stream);
    f(stream);
    cudaStreamDestroy(stream);
    return 0;
}
@xiaoming-qxm
Copy link

Awesome! So we can port pytorch_quiver to actor framework easily.

@xiaoming-qxm
Copy link

But it seems that thrust APIs doesn't support asynchronous behaviors, unavoidable synchronization after every algorithm call is a performance killer.

Reference: NVIDIA/thrust#961

@lgarithm
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants