Skip to content

Remove clone data in PyTorch profiling#117

Merged
msaroufim merged 1 commit intogpu-mode:mainfrom
gau-nernst:profile_no_clone
Mar 10, 2026
Merged

Remove clone data in PyTorch profiling#117
msaroufim merged 1 commit intogpu-mode:mainfrom
gau-nernst:profile_no_clone

Conversation

@gau-nernst
Copy link
Copy Markdown
Contributor

Right now it looks like this

  -----------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  
  -----------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                            aten::copy_         0.00%      64.939us         0.00%     117.867us      23.573us       2.395us        64.50%       2.395us       0.479us             5  
                         Memcpy DtoD (Device -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us       2.395us        64.50%       2.395us       0.479us             5  
                                   aiter::gemm_a4w4_asm        28.59%       22.632s        28.59%       22.634s       22.634s       0.839us        22.60%       0.839us       0.839us             1  
      aiter::f4gemm_bf16_per1x32Fp4_BpreShuffle_192x128         0.00%       0.000us         0.00%       0.000us       0.000us       0.839us        22.60%       0.839us       0.839us             1  
                 _dynamic_mxfp4_quant_kernel_asm_layout         0.00%       0.000us         0.00%       0.000us       0.000us       0.479us        12.90%       0.479us       0.479us             1  
                                    hipFuncGetAttribute         0.00%       1.000us         0.00%       1.000us       0.333us       0.479us        12.90%       0.479us       0.160us             3  
                            hipGetDevicePropertiesR0600         0.00%       3.570us         0.00%       3.570us       0.892us       0.479us        12.90%       0.479us       0.120us             4  
                                 hipPointerGetAttribute         0.00%       2.920us         0.00%       2.920us       0.973us       0.479us        12.90%       0.479us       0.160us             3  
                                            aten::clone         0.00%      23.480us         0.00%     258.624us      51.725us       0.000us         0.00%       2.395us       0.479us             5  
                                    aten::empty_strided         0.00%      67.668us         0.00%     117.277us      23.455us       0.000us         0.00%       0.000us       0.000us             5  
                                         hipMemcpyAsync         0.00%      52.928us         0.00%      52.928us      10.586us       0.000us         0.00%       0.000us       0.000us             5  
                                   hipStreamIsCapturing         0.00%       1.450us         0.00%       1.450us       1.450us       0.000us         0.00%       0.000us       0.000us             1  
                                              hipMalloc         0.00%      48.159us         0.00%      48.159us      48.159us       0.000us         0.00%       0.000us       0.000us             1  
                                            aten::empty         0.00%     248.794us         0.00%     248.794us      16.586us       0.000us         0.00%       0.000us       0.000us            15  
                                    hipModuleLoadDataEx         0.00%     768.851us         0.00%     768.851us     768.851us       0.000us         0.00%       0.000us       0.000us             1  
                                  hipModuleLaunchKernel         0.00%      66.869us         0.00%      66.869us      33.434us       0.000us         0.00%       0.000us       0.000us             2  
                                             aten::view         0.00%      51.728us         0.00%      51.728us      10.346us       0.000us         0.00%       0.000us       0.000us             5  
                                       aiter::gemm_a4w4         0.01%       7.508ms       100.00%       79.158s       79.158s       0.000us         0.00%       0.839us       0.839us             1  
                            aiter::get_cu_num_custom_op         0.21%     166.191ms         0.21%     166.191ms     166.191ms       0.000us         0.00%       0.000us       0.000us             1  
                                    aiter::get_padded_m        71.18%       56.349s        71.19%       56.350s       28.175s       0.000us         0.00%       0.000us       0.000us             2  
  -----------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
  Self CPU time total: 79.159s
  Self CUDA time total: 3.713us

The aten::copy_ probably comes from _clone_data(). Hence, this PR removes it.

@msaroufim msaroufim self-requested a review March 10, 2026 16:25
@msaroufim msaroufim merged commit 761093e into gpu-mode:main Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants