How is the average memory access latency measured or calculated? #11

learning-chip · 2022-12-28T03:00:42Z

In the HDagg paper Section "Executor Evaluation" it is said that "The average memory access latency is used as a metric to measure locality." and "PAPI’s performance counters are used to measure architecture information needed in computations related to the locality and load balance metrics."

In the sptrsv_profiler.cpp example, the PAPI event list is:

aggregation/profiler/sptrsv_profiler.cpp

Lines 106 to 111 in da293bb

    
           std::vector<int> event_list = {PAPI_L1_DCM, PAPI_L1_TCM, PAPI_L2_DCM, 
        
                                          PAPI_L2_DCA,PAPI_L2_TCM,PAPI_L2_TCA, 
        
                                          PAPI_L3_TCM, PAPI_L3_TCA, 
        
                                          PAPI_TLB_DM, PAPI_TOT_CYC, PAPI_LST_INS, 
        
                                          PAPI_TOT_INS, PAPI_REF_CYC, PAPI_RES_STL, 
        
                                          PAPI_BR_INS, PAPI_BR_MSP};

I wonder how is the memory latency obtained from the above metrics?

The text was updated successfully, but these errors were encountered:

cheshmi · 2022-12-28T17:37:46Z

It is based on the average memory cycle defined in the computer architecture book (see page 75).

PAPI does not give you all counters you will need, and it changes per architecture. We used something like below:

def compute_memory_cycle_for_one_group(row, arch_params):
    dl1_miss = row['PAPI_L1_DCM'].values
    dl2_miss = row['PAPI_L2_DCM'].values
    dl3_miss = row['PAPI_L3_TCM'].values
    dl1_access = row['PAPI_LST_INS'].values
    l1_mr = dl1_miss / dl1_access
    l2_mr = dl2_miss / dl1_miss
    l3_mr = dl3_miss / dl2_miss
    l1_access_cost = arch_params['L1_ACCESS_TIME']
    l2_access_cost = arch_params['L2_ACCESS_TIME']
    l3_access_cost = arch_params['L3_ACCESS_TIME']
    mm_access_cost = arch_params['MAIN_MEMORY_ACCESS_TIME']
    avg_mem_cycle = l1_access_cost + l1_mr*(l2_access_cost + l2_mr*(l3_access_cost + l3_mr*mm_access_cost))
    exec_cycle = avg_mem_cycle  * dl1_access
    return exec_cycle

You will need some architecture parameters. You can improve the code by finding more accurate counters.

learning-chip · 2022-12-29T00:47:44Z

Thanks, that makes sense! I find that Vtune also provides an average latency metric, but it's good to calculate and verify it from scratch.

learning-chip closed this as completed Dec 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is the average memory access latency measured or calculated? #11

How is the average memory access latency measured or calculated? #11

learning-chip commented Dec 28, 2022

cheshmi commented Dec 28, 2022

learning-chip commented Dec 29, 2022

How is the average memory access latency measured or calculated? #11

How is the average memory access latency measured or calculated? #11

Comments

learning-chip commented Dec 28, 2022

cheshmi commented Dec 28, 2022

learning-chip commented Dec 29, 2022