New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use libkineto in profiler #46470
Use libkineto in profiler #46470
Commits on Oct 16, 2020
-
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: python test/test_profiler.py [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for a4d4124 - Browse repository at this point
Copy the full SHA a4d4124View commit details
Commits on Oct 27, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for e27f74c - Browse repository at this point
Copy the full SHA e27f74cView commit details
Commits on Nov 2, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 5c3833e - Browse repository at this point
Copy the full SHA 5c3833eView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 662431b - Browse repository at this point
Copy the full SHA 662431bView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for ea956aa - Browse repository at this point
Copy the full SHA ea956aaView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 7dfdbc9 - Browse repository at this point
Copy the full SHA 7dfdbc9View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 6725778 - Browse repository at this point
Copy the full SHA 6725778View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for e9a219b - Browse repository at this point
Copy the full SHA e9a219bView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 49a9fee - Browse repository at this point
Copy the full SHA 49a9feeView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 8edb346 - Browse repository at this point
Copy the full SHA 8edb346View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for f288623 - Browse repository at this point
Copy the full SHA f288623View commit details
Commits on Nov 3, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 979cdfa - Browse repository at this point
Copy the full SHA 979cdfaView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for c8cbeb0 - Browse repository at this point
Copy the full SHA c8cbeb0View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 226089c - Browse repository at this point
Copy the full SHA 226089cView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 266b75f - Browse repository at this point
Copy the full SHA 266b75fView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 6958eac - Browse repository at this point
Copy the full SHA 6958eacView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 97e5070 - Browse repository at this point
Copy the full SHA 97e5070View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 8d111d2 - Browse repository at this point
Copy the full SHA 8d111d2View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for bfb0360 - Browse repository at this point
Copy the full SHA bfb0360View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 1ff1a12 - Browse repository at this point
Copy the full SHA 1ff1a12View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for b3b69d8 - Browse repository at this point
Copy the full SHA b3b69d8View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 2faeb8a - Browse repository at this point
Copy the full SHA 2faeb8aView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 67c890d - Browse repository at this point
Copy the full SHA 67c890dView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for ed8babe - Browse repository at this point
Copy the full SHA ed8babeView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for ffc11fd - Browse repository at this point
Copy the full SHA ffc11fdView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for fe76b84 - Browse repository at this point
Copy the full SHA fe76b84View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 76ee80c - Browse repository at this point
Copy the full SHA 76ee80cView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 5761ea2 - Browse repository at this point
Copy the full SHA 5761ea2View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for dde5ec3 - Browse repository at this point
Copy the full SHA dde5ec3View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 3a25bd2 - Browse repository at this point
Copy the full SHA 3a25bd2View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 6023998 - Browse repository at this point
Copy the full SHA 6023998View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 0bc66a6 - Browse repository at this point
Copy the full SHA 0bc66a6View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for aa2d09e - Browse repository at this point
Copy the full SHA aa2d09eView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 91718ac - Browse repository at this point
Copy the full SHA 91718acView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 1556a7c - Browse repository at this point
Copy the full SHA 1556a7cView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 4a0fec9 - Browse repository at this point
Copy the full SHA 4a0fec9View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for bb6396a - Browse repository at this point
Copy the full SHA bb6396aView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 60b5dee - Browse repository at this point
Copy the full SHA 60b5deeView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for e1a5480 - Browse repository at this point
Copy the full SHA e1a5480View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 38a37dd - Browse repository at this point
Copy the full SHA 38a37ddView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for c6c6039 - Browse repository at this point
Copy the full SHA c6c6039View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 17767d1 - Browse repository at this point
Copy the full SHA 17767d1View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 3537e9d - Browse repository at this point
Copy the full SHA 3537e9dView commit details
Commits on Nov 4, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 043dcd2 - Browse repository at this point
Copy the full SHA 043dcd2View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for aa17339 - Browse repository at this point
Copy the full SHA aa17339View commit details
Commits on Nov 11, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 9262f92 - Browse repository at this point
Copy the full SHA 9262f92View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 8371b33 - Browse repository at this point
Copy the full SHA 8371b33View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 67d4acb - Browse repository at this point
Copy the full SHA 67d4acbView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 9f1d24f - Browse repository at this point
Copy the full SHA 9f1d24fView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for e864205 - Browse repository at this point
Copy the full SHA e864205View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 380b874 - Browse repository at this point
Copy the full SHA 380b874View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 165bb7c - Browse repository at this point
Copy the full SHA 165bb7cView commit details
Commits on Nov 12, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 445b8c1 - Browse repository at this point
Copy the full SHA 445b8c1View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 7c317f5 - Browse repository at this point
Copy the full SHA 7c317f5View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls Node ID ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 11.000us 64.71% 11.000us 11.000us 1 0 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.000us 17.65% 3.000us 3.000us 1 0 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 11.76% 2.000us 2.000us 1 0 Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 5.88% 1.000us 1.000us 1 0 aten::mm 13.86% 421.014ms 27.73% 842.019ms 421.010ms 0.000us 0.00% 0.000us 0.000us 2 0 aten::empty 0.00% 25.000us 0.00% 25.000us 12.500us 0.000us 0.00% 0.000us 0.000us 2 0 aten::stride 0.00% 0.000us 0.00% 0.000us 0.000us 0.000us 0.00% 0.000us 0.000us 3 0 aten::add 36.55% 1.110s 73.11% 2.220s 1.110s 0.000us 0.00% 0.000us 0.000us 2 0 aten::to 0.00% 9.000us 0.00% 99.000us 99.000us 0.000us 0.00% 0.000us 0.000us 1 0 aten::empty_strided 0.00% 21.000us 0.00% 21.000us 21.000us 0.000us 0.00% 0.000us 0.000us 1 0 aten::copy_ 0.00% 69.000us 0.00% 133.000us 66.500us 0.000us 0.00% 0.000us 0.000us 2 0 cudaFree 13.00% 394.907ms 13.00% 394.907ms 394.907ms 0.000us 0.00% 0.000us 0.000us 1 0 cudaDeviceGetAttribute 0.00% 1.000us 0.00% 1.000us 0.091us 0.000us 0.00% 0.000us 0.000us 11 0 cudaMalloc 0.02% 632.000us 0.02% 632.000us 210.667us 0.000us 0.00% 0.000us 0.000us 3 0 cudaMemcpy 0.00% 20.000us 0.00% 20.000us 20.000us 0.000us 0.00% 0.000us 0.000us 1 0 cudaEventCreateWithFlags 0.00% 9.000us 0.00% 9.000us 0.562us 0.000us 0.00% 0.000us 0.000us 16 0 cudaLaunchKernel 36.55% 1.110s 36.55% 1.110s 555.021ms 0.000us 0.00% 0.000us 0.000us 2 0 cudaMemcpyAsync 0.00% 33.000us 0.00% 33.000us 33.000us 0.000us 0.00% 0.000us 0.000us 1 0 cudaStreamSynchronize 0.00% 4.000us 0.00% 4.000us 4.000us 0.000us 0.00% 0.000us 0.000us 1 0 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for c904443 - Browse repository at this point
Copy the full SHA c904443View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls Node ID ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 11.000us 64.71% 11.000us 11.000us 1 0 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.000us 17.65% 3.000us 3.000us 1 0 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 11.76% 2.000us 2.000us 1 0 Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 5.88% 1.000us 1.000us 1 0 aten::mm 13.86% 421.014ms 27.73% 842.019ms 421.010ms 0.000us 0.00% 0.000us 0.000us 2 0 aten::empty 0.00% 25.000us 0.00% 25.000us 12.500us 0.000us 0.00% 0.000us 0.000us 2 0 aten::stride 0.00% 0.000us 0.00% 0.000us 0.000us 0.000us 0.00% 0.000us 0.000us 3 0 aten::add 36.55% 1.110s 73.11% 2.220s 1.110s 0.000us 0.00% 0.000us 0.000us 2 0 aten::to 0.00% 9.000us 0.00% 99.000us 99.000us 0.000us 0.00% 0.000us 0.000us 1 0 aten::empty_strided 0.00% 21.000us 0.00% 21.000us 21.000us 0.000us 0.00% 0.000us 0.000us 1 0 aten::copy_ 0.00% 69.000us 0.00% 133.000us 66.500us 0.000us 0.00% 0.000us 0.000us 2 0 cudaFree 13.00% 394.907ms 13.00% 394.907ms 394.907ms 0.000us 0.00% 0.000us 0.000us 1 0 cudaDeviceGetAttribute 0.00% 1.000us 0.00% 1.000us 0.091us 0.000us 0.00% 0.000us 0.000us 11 0 cudaMalloc 0.02% 632.000us 0.02% 632.000us 210.667us 0.000us 0.00% 0.000us 0.000us 3 0 cudaMemcpy 0.00% 20.000us 0.00% 20.000us 20.000us 0.000us 0.00% 0.000us 0.000us 1 0 cudaEventCreateWithFlags 0.00% 9.000us 0.00% 9.000us 0.562us 0.000us 0.00% 0.000us 0.000us 16 0 cudaLaunchKernel 36.55% 1.110s 36.55% 1.110s 555.021ms 0.000us 0.00% 0.000us 0.000us 2 0 cudaMemcpyAsync 0.00% 33.000us 0.00% 33.000us 33.000us 0.000us 0.00% 0.000us 0.000us 1 0 cudaStreamSynchronize 0.00% 4.000us 0.00% 4.000us 4.000us 0.000us 0.00% 0.000us 0.000us 1 0 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 1f600f8 - Browse repository at this point
Copy the full SHA 1f600f8View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls Node ID ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 11.000us 64.71% 11.000us 11.000us 1 0 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.000us 17.65% 3.000us 3.000us 1 0 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 11.76% 2.000us 2.000us 1 0 Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 5.88% 1.000us 1.000us 1 0 aten::mm 13.86% 421.014ms 27.73% 842.019ms 421.010ms 0.000us 0.00% 0.000us 0.000us 2 0 aten::empty 0.00% 25.000us 0.00% 25.000us 12.500us 0.000us 0.00% 0.000us 0.000us 2 0 aten::stride 0.00% 0.000us 0.00% 0.000us 0.000us 0.000us 0.00% 0.000us 0.000us 3 0 aten::add 36.55% 1.110s 73.11% 2.220s 1.110s 0.000us 0.00% 0.000us 0.000us 2 0 aten::to 0.00% 9.000us 0.00% 99.000us 99.000us 0.000us 0.00% 0.000us 0.000us 1 0 aten::empty_strided 0.00% 21.000us 0.00% 21.000us 21.000us 0.000us 0.00% 0.000us 0.000us 1 0 aten::copy_ 0.00% 69.000us 0.00% 133.000us 66.500us 0.000us 0.00% 0.000us 0.000us 2 0 cudaFree 13.00% 394.907ms 13.00% 394.907ms 394.907ms 0.000us 0.00% 0.000us 0.000us 1 0 cudaDeviceGetAttribute 0.00% 1.000us 0.00% 1.000us 0.091us 0.000us 0.00% 0.000us 0.000us 11 0 cudaMalloc 0.02% 632.000us 0.02% 632.000us 210.667us 0.000us 0.00% 0.000us 0.000us 3 0 cudaMemcpy 0.00% 20.000us 0.00% 20.000us 20.000us 0.000us 0.00% 0.000us 0.000us 1 0 cudaEventCreateWithFlags 0.00% 9.000us 0.00% 9.000us 0.562us 0.000us 0.00% 0.000us 0.000us 16 0 cudaLaunchKernel 36.55% 1.110s 36.55% 1.110s 555.021ms 0.000us 0.00% 0.000us 0.000us 2 0 cudaMemcpyAsync 0.00% 33.000us 0.00% 33.000us 33.000us 0.000us 0.00% 0.000us 0.000us 1 0 cudaStreamSynchronize 0.00% 4.000us 0.00% 4.000us 4.000us 0.000us 0.00% 0.000us 0.000us 1 0 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 5aacc1c - Browse repository at this point
Copy the full SHA 5aacc1cView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls Node ID ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 11.000us 64.71% 11.000us 11.000us 1 0 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.000us 17.65% 3.000us 3.000us 1 0 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 11.76% 2.000us 2.000us 1 0 Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 5.88% 1.000us 1.000us 1 0 aten::mm 13.86% 421.014ms 27.73% 842.019ms 421.010ms 0.000us 0.00% 0.000us 0.000us 2 0 aten::empty 0.00% 25.000us 0.00% 25.000us 12.500us 0.000us 0.00% 0.000us 0.000us 2 0 aten::stride 0.00% 0.000us 0.00% 0.000us 0.000us 0.000us 0.00% 0.000us 0.000us 3 0 aten::add 36.55% 1.110s 73.11% 2.220s 1.110s 0.000us 0.00% 0.000us 0.000us 2 0 aten::to 0.00% 9.000us 0.00% 99.000us 99.000us 0.000us 0.00% 0.000us 0.000us 1 0 aten::empty_strided 0.00% 21.000us 0.00% 21.000us 21.000us 0.000us 0.00% 0.000us 0.000us 1 0 aten::copy_ 0.00% 69.000us 0.00% 133.000us 66.500us 0.000us 0.00% 0.000us 0.000us 2 0 cudaFree 13.00% 394.907ms 13.00% 394.907ms 394.907ms 0.000us 0.00% 0.000us 0.000us 1 0 cudaDeviceGetAttribute 0.00% 1.000us 0.00% 1.000us 0.091us 0.000us 0.00% 0.000us 0.000us 11 0 cudaMalloc 0.02% 632.000us 0.02% 632.000us 210.667us 0.000us 0.00% 0.000us 0.000us 3 0 cudaMemcpy 0.00% 20.000us 0.00% 20.000us 20.000us 0.000us 0.00% 0.000us 0.000us 1 0 cudaEventCreateWithFlags 0.00% 9.000us 0.00% 9.000us 0.562us 0.000us 0.00% 0.000us 0.000us 16 0 cudaLaunchKernel 36.55% 1.110s 36.55% 1.110s 555.021ms 0.000us 0.00% 0.000us 0.000us 2 0 cudaMemcpyAsync 0.00% 33.000us 0.00% 33.000us 33.000us 0.000us 0.00% 0.000us 0.000us 1 0 cudaStreamSynchronize 0.00% 4.000us 0.00% 4.000us 4.000us 0.000us 0.00% 0.000us 0.000us 1 0 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 651f556 - Browse repository at this point
Copy the full SHA 651f556View commit details
Commits on Nov 13, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 12.000us 63.16% 12.000us 12.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 2.750us 14.47% 2.750us 2.750us 1 Memcpy HtoD (Pagable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.250us 11.84% 2.250us 2.250us 1 Memcpy DtoH (Device -> Pagable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 10.53% 2.000us 2.000us 1 aten::mm 25.87% 364.400ms 25.87% 364.426ms 364.426ms 0.000us 0.00% 0.000us 0.000us 1 aten::empty 0.00% 39.585us 0.00% 39.585us 19.792us 0.000us 0.00% 0.000us 0.000us 2 aten::stride 0.00% 3.363us 0.00% 3.363us 1.121us 0.000us 0.00% 0.000us 0.000us 3 aten::add 74.12% 1.044s 74.12% 1.044s 1.044s 0.000us 0.00% 0.000us 0.000us 1 aten::to 0.00% 13.155us 0.01% 116.398us 116.398us 0.000us 0.00% 0.000us 0.000us 1 aten::empty_strided 0.00% 30.365us 0.00% 30.365us 30.365us 0.000us 0.00% 0.000us 0.000us 1 aten::copy_ 0.01% 72.878us 0.01% 72.878us 72.878us 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls Node ID ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 11.000us 64.71% 11.000us 11.000us 1 0 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 3.000us 17.65% 3.000us 3.000us 1 0 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 11.76% 2.000us 2.000us 1 0 Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 5.88% 1.000us 1.000us 1 0 aten::mm 13.86% 421.014ms 27.73% 842.019ms 421.010ms 0.000us 0.00% 0.000us 0.000us 2 0 aten::empty 0.00% 25.000us 0.00% 25.000us 12.500us 0.000us 0.00% 0.000us 0.000us 2 0 aten::stride 0.00% 0.000us 0.00% 0.000us 0.000us 0.000us 0.00% 0.000us 0.000us 3 0 aten::add 36.55% 1.110s 73.11% 2.220s 1.110s 0.000us 0.00% 0.000us 0.000us 2 0 aten::to 0.00% 9.000us 0.00% 99.000us 99.000us 0.000us 0.00% 0.000us 0.000us 1 0 aten::empty_strided 0.00% 21.000us 0.00% 21.000us 21.000us 0.000us 0.00% 0.000us 0.000us 1 0 aten::copy_ 0.00% 69.000us 0.00% 133.000us 66.500us 0.000us 0.00% 0.000us 0.000us 2 0 cudaFree 13.00% 394.907ms 13.00% 394.907ms 394.907ms 0.000us 0.00% 0.000us 0.000us 1 0 cudaDeviceGetAttribute 0.00% 1.000us 0.00% 1.000us 0.091us 0.000us 0.00% 0.000us 0.000us 11 0 cudaMalloc 0.02% 632.000us 0.02% 632.000us 210.667us 0.000us 0.00% 0.000us 0.000us 3 0 cudaMemcpy 0.00% 20.000us 0.00% 20.000us 20.000us 0.000us 0.00% 0.000us 0.000us 1 0 cudaEventCreateWithFlags 0.00% 9.000us 0.00% 9.000us 0.562us 0.000us 0.00% 0.000us 0.000us 16 0 cudaLaunchKernel 36.55% 1.110s 36.55% 1.110s 555.021ms 0.000us 0.00% 0.000us 0.000us 2 0 cudaMemcpyAsync 0.00% 33.000us 0.00% 33.000us 33.000us 0.000us 0.00% 0.000us 0.000us 1 0 cudaStreamSynchronize 0.00% 4.000us 0.00% 4.000us 4.000us 0.000us 0.00% 0.000us 0.000us 1 0 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 9997011 - Browse repository at this point
Copy the full SHA 9997011View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for cfd0424 - Browse repository at this point
Copy the full SHA cfd0424View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 30114d8 - Browse repository at this point
Copy the full SHA 30114d8View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 27e4e9c - Browse repository at this point
Copy the full SHA 27e4e9cView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for bde96f6 - Browse repository at this point
Copy the full SHA bde96f6View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for b1a0292 - Browse repository at this point
Copy the full SHA b1a0292View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for b7fda07 - Browse repository at this point
Copy the full SHA b7fda07View commit details
Commits on Nov 17, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 459df8e - Browse repository at this point
Copy the full SHA 459df8eView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 09a4762 - Browse repository at this point
Copy the full SHA 09a4762View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for cafee0f - Browse repository at this point
Copy the full SHA cafee0fView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 39ff2b3 - Browse repository at this point
Copy the full SHA 39ff2b3View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 5502837 - Browse repository at this point
Copy the full SHA 5502837View commit details
Commits on Nov 20, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 7c2017b - Browse repository at this point
Copy the full SHA 7c2017bView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 525e5b5 - Browse repository at this point
Copy the full SHA 525e5b5View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 1f50e4b - Browse repository at this point
Copy the full SHA 1f50e4bView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for f70a95c - Browse repository at this point
Copy the full SHA f70a95cView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 5fed8be - Browse repository at this point
Copy the full SHA 5fed8beView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 2494879 - Browse repository at this point
Copy the full SHA 2494879View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 4f401ff - Browse repository at this point
Copy the full SHA 4f401ffView commit details
Commits on Nov 21, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for c689e6b - Browse repository at this point
Copy the full SHA c689e6bView commit details
Commits on Nov 22, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 4a5632f - Browse repository at this point
Copy the full SHA 4a5632fView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 6d0e7ab - Browse repository at this point
Copy the full SHA 6d0e7abView commit details
Commits on Nov 23, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 95b686f - Browse repository at this point
Copy the full SHA 95b686fView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for d98a5fb - Browse repository at this point
Copy the full SHA d98a5fbView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for cb7367e - Browse repository at this point
Copy the full SHA cb7367eView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 5ad0a34 - Browse repository at this point
Copy the full SHA 5ad0a34View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for d6bd96e - Browse repository at this point
Copy the full SHA d6bd96eView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 0c4faaa - Browse repository at this point
Copy the full SHA 0c4faaaView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for ab754e1 - Browse repository at this point
Copy the full SHA ab754e1View commit details
Commits on Nov 24, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for aee38e8 - Browse repository at this point
Copy the full SHA aee38e8View commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 671785f - Browse repository at this point
Copy the full SHA 671785fView commit details -
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for 8fde042 - Browse repository at this point
Copy the full SHA 8fde042View commit details
Commits on Nov 25, 2020
-
Update on "Use libkineto in profiler"
Summary: Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Differential Revision: [D25142223](https://our.internmc.facebook.com/intern/diff/D25142223) [ghstack-poisoned]
Configuration menu - View commit details
-
Copy full SHA for ca6cb73 - Browse repository at this point
Copy the full SHA ca6cb73View commit details