-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stream create, copy and destroy example #3470
Comments
Looks like your test time increases linearly with num of buffer you allocate does not look like hipStream issue |
In the test, the two datacenter GPUs are not installed on the same host, so I am not sure if different hosts may impact the execution time. RTX3090 Create+Copy+Synchronize+Destroy time for 1 streams and 1 buffers and 128 iterations 0.0128745 (ms) gfx1030 Create+Copy+Synchronize+Destroy time for 1 streams and 1 buffers and 128 iterations 1.99878 (ms) |
Yes, most time is spent on data copy. I updated the summary of the issue. |
@bdenhollander What is your profiler ? |
The screenshot is from Visual Studio 2019's built in profiler. |
Hi @jinz2014, If you want to profile the performance of multiple streams, you should do so with kernels that perform computation, as these can be overlapped with memory transfers. In this case you should also pin your host memory with |
Thanks. |
Running the stream create and destroy example shows that the time is about 2X-3X longer than the time on an Nvidia GPU for the following cases. Thanks for your comments and suggestions.
Link:
https://github.com/zjin-lcf/HeCBench/tree/master/src/streamCreateCopyDestroy-hip/
MI210
Create+Copy+Synchronize+Destroy time for 1 streams and 5000 buffers and 16 iterations 49.6401 (ms)
Create+Copy+Synchronize+Destroy time for 2 streams and 5000 buffers and 8 iterations 50.2982 (ms)
Create+Copy+Synchronize+Destroy time for 4 streams and 5000 buffers and 4 iterations 57.4719 (ms)
Create+Copy+Synchronize+Destroy time for 8 streams and 5000 buffers and 2 iterations 54.3432 (ms)
https://github.com/zjin-lcf/HeCBench/tree/master/src/streamCreateCopyDestroy-cuda
A100
Create+Copy+Synchronize+Destroy time for 1 streams and 5000 buffers and 16 iterations 23.3694 (ms)
Create+Copy+Synchronize+Destroy time for 2 streams and 5000 buffers and 8 iterations 23.2853 (ms)
Create+Copy+Synchronize+Destroy time for 4 streams and 5000 buffers and 4 iterations 23.38 (ms)
Create+Copy+Synchronize+Destroy time for 8 streams and 5000 buffers and 2 iterations 23.2302 (ms)
The text was updated successfully, but these errors were encountered: