Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local DRAM accesses from pcm-numa does not match memory throughput from pcm-memory #669

Closed
QiongwenXu opened this issue Feb 6, 2024 · 1 comment

Comments

@QiongwenXu
Copy link
Contributor

QiongwenXu commented Feb 6, 2024

Hi I am using PCM to measure memory channel usage. I ran a few STREAM applications (memory read/write intensive) on cpu cores 16-23. I didn't run any other applications. Then I use pcm-numa and pcm-memory to measure the memory channel usage. But Local DRAM accesses from pcm-numa (527 MB, interval is 1 second, hence throughput is 527 MB/s) does not match memory throughput from pcm-memory (ie, 49375.64 MB/s). Is my understanding incorrect or do you happen to know why this happens? Thanks!
pcm-memory:

Detected Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz "Intel(r) microarchitecture codename Cascade Lake-SP" stepping 7 microcode level 0x5003604
Update every 1 seconds
|---------------------------------------||---------------------------------------|
|--             Socket  0             --||--             Socket  1             --|
|---------------------------------------||---------------------------------------|
|--     Memory Channel Monitoring     --||--     Memory Channel Monitoring     --|
|---------------------------------------||---------------------------------------|
|-- Mem Ch  0: Reads (MB/s):    18.88 --||-- Mem Ch  0: Reads (MB/s): 11746.97 --|
|--            Writes(MB/s):     5.22 --||--            Writes(MB/s):  4684.87 --|
|--      PMM Reads(MB/s)   :     0.00 --||--      PMM Reads(MB/s)   :     0.00 --|
|--      PMM Writes(MB/s)  :     0.00 --||--      PMM Writes(MB/s)  :     0.00 --|
|-- Mem Ch  1: Reads (MB/s):    19.14 --||-- Mem Ch  1: Reads (MB/s): 11748.66 --|
|--            Writes(MB/s):     5.32 --||--            Writes(MB/s):  4686.62 --|
|--      PMM Reads(MB/s)   :     0.00 --||--      PMM Reads(MB/s)   :     0.00 --|
|--      PMM Writes(MB/s)  :     0.00 --||--      PMM Writes(MB/s)  :     0.00 --|
|-- Mem Ch  2: Reads (MB/s):    18.85 --||-- Mem Ch  2: Reads (MB/s): 11749.36 --|
|--            Writes(MB/s):     5.13 --||--            Writes(MB/s):  4686.62 --|
|--      PMM Reads(MB/s)   :     0.00 --||--      PMM Reads(MB/s)   :     0.00 --|
|--      PMM Writes(MB/s)  :     0.00 --||--      PMM Writes(MB/s)  :     0.00 --|
|-- NODE 0 Mem Read (MB/s) :    56.87 --||-- NODE 1 Mem Read (MB/s) : 35244.99 --|
|-- NODE 0 Mem Write(MB/s) :    15.67 --||-- NODE 1 Mem Write(MB/s) : 14058.11 --|
|-- NODE 0 PMM Read (MB/s):      0.00 --||-- NODE 1 PMM Read (MB/s):      0.00 --|
|-- NODE 0 PMM Write(MB/s):      0.00 --||-- NODE 1 PMM Write(MB/s):      0.00 --|
|-- NODE 0.0 NM read hit rate :  0.99 --||-- NODE 1.0 NM read hit rate :  1.00 --|
|-- NODE 0.1 NM read hit rate :  0.00 --||-- NODE 1.1 NM read hit rate :  0.00 --|
|-- NODE 0.2 NM read hit rate :  0.00 --||-- NODE 1.2 NM read hit rate :  0.00 --|
|-- NODE 0.3 NM read hit rate :  0.00 --||-- NODE 1.3 NM read hit rate :  0.00 --|
|-- NODE 0 Memory (MB/s):       72.54 --||-- NODE 1 Memory (MB/s):    49303.10 --|
|---------------------------------------||---------------------------------------|
|---------------------------------------||---------------------------------------|
|--            System DRAM Read Throughput(MB/s):      35301.86                --|
|--           System DRAM Write Throughput(MB/s):      14073.78                --|
|--             System PMM Read Throughput(MB/s):          0.00                --|
|--            System PMM Write Throughput(MB/s):          0.00                --|
|--                 System Read Throughput(MB/s):      35301.86                --|
|--                System Write Throughput(MB/s):      14073.78                --|
|--               System Memory Throughput(MB/s):      49375.64                --|
|---------------------------------------||---------------------------------------|

pcm-numa:

Core | IPC  | Instructions | Cycles  |  Local DRAM accesses | Remote DRAM Accesses
   0   0.97         24 M       25 M      3854                  21 K
   1   0.59        812 K     1366 K      1578                2367
   2   0.36        207 K      569 K       503                 653
   3   0.26        124 K      477 K       891                 443
   4   0.36        255 K      707 K       788                 545
   5   0.30        189 K      629 K       517                 344
   6   0.21        189 K      892 K       759                 528
   7   0.33        371 K     1109 K       966                 756
   8   0.32        151 K      480 K       439                 377
   9   0.31        221 K      719 K      1894                 684
  10   0.35        164 K      471 K       498                 384
  11   0.42        323 K      761 K       718                1670
  12   0.41        236 K      582 K       475                 611
  13   0.41        245 K      600 K       473                1195
  14   0.38        205 K      546 K       525                 647
  15   0.34        187 K      553 K       436                 900
  16   0.45       1315 M     2895 M        65 M                31 K
  17   0.45       1315 M     2895 M        65 M                17 K
  18   0.45       1317 M     2895 M        66 M                18 K
  19   0.45       1316 M     2895 M        66 M                15 K
  20   0.45       1315 M     2895 M        66 M                14 K
  21   0.45       1316 M     2895 M        66 M                46 K
  22   0.45       1315 M     2895 M        66 M                17 K
  23   0.45       1315 M     2895 M        65 M                18 K
  24   0.43        710 K     1651 K      3026                7390
  25   0.36        223 K      620 K      1055                1487
  26   0.38        208 K      544 K       922                1289
  27   0.26        112 K      435 K       757                 896
  28   0.28        110 K      396 K       638                 727
  29   0.33        107 K      327 K       562                 541
  30   0.32        148 K      466 K       677                 579
  31   0.10        102 K     1035 K      1069                1032
  32   0.08        162 K     2136 K      1489                 622
  33   0.42        257 K      619 K       545                 739
  34   0.39        172 K      444 K       390                 474
  35   0.46        449 K      980 K      2142                 973
  36   0.41        190 K      467 K       297                 437
  37   0.25        145 K      579 K       425                 353
  38   0.42        934 K     2219 K      1503                2496
  39   0.28        150 K      531 K      1024                 827
  40   0.37        149 K      402 K       339                 323
  41   0.34         85 K      249 K       234                 349
  42   0.38         94 K      246 K       282                 233
  43   0.45         80 K      177 K       269                 233
  44   0.33        105 K      318 K       301                 278
  45   0.34         97 K      284 K       308                 163
  46   0.37         92 K      250 K       262                 154
  47   0.36        332 K      911 K      1023                1164
  48   0.14        405 K     2848 K      2121                3280
  49   0.16        335 K     2048 K      1507                1278
  50   0.17        358 K     2064 K      1531                1392
  51   0.16        307 K     1953 K      1454                1055
  52   0.16        234 K     1478 K      1166                 874
  53   0.13        350 K     2606 K      2390                3506
  54   0.17        291 K     1727 K      1489                1019
  55   0.16        344 K     2133 K      1732                1365
  56   0.36        160 K      448 K       873                 654
  57   0.44        586 K     1333 K      2182                2141
  58   0.45        408 K      905 K      1591                1161
  59   0.44        357 K      804 K      1230                1014
  60   0.35        215 K      608 K      1117                1301
  61   0.33        157 K      478 K       733                 613
  62   0.30        197 K      654 K      1493                5732
  63   0.15        441 K     3022 K      4924                  15 K
-------------------------------------------------------------------------------------------------------------------
   *   0.45         10 G       23 G       527 M               279 K
@rdementi
Copy link
Contributor

rdementi commented Feb 21, 2024

there are a few more things to consider: pcm-numa measures accesses. Each read access can trigger 64 byte transfer (cache line) or up to two 64 byte transfers (read-for-ownership + write-back) for a write access. This depends on the architecture. 0.527*64 = 33 Gbyte/sec which is close to your read bandwidth measured by pcm-memory. Some of these accesses are writes and generate the additional write bandwidth (14 Gbyte/sec in pcm-memory). Hardware prefetches can also generate additional traffic. pcm-numa is not intended to measure exact memory bandwidth. It is more to assess remote/local access distribution.

@intel intel locked and limited conversation to collaborators Feb 24, 2024
@rdementi rdementi converted this issue into discussion #687 Feb 24, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants