**Elapsed Time:** 171.936s

**Clockticks:** 383,531,400,000 **Instructions Retired:** 924,037,200,000

CPI Rate: 0.415 MUX Reliability: 0.998

**Retiring:** 62.6% of Pipeline Slots 58.2% of Pipeline Slots

 FP Arithmetic:
 0.8% of uOps

 FP x87:
 0.0% of uOps

 FP Scalar:
 0.8% of uOps

 FP Vector:
 0.0% of uOps

 Other:
 99.2% of uOps

**Heavy Operations:** 4.3% of Pipeline Slots **Microcode Sequencer:** 1.0% of Pipeline Slots **Assists:** 0.0% of Pipeline Slots

Front-End Bound:

Front-End Latency:

ICache Misses:

ITLB Overhead:

Branch Resteers:

Mispredicts Resteers:

Clears Resteers:

3.0% of Pipeline Slots

7.6% of Pipeline Slots

3.0% of Clockticks

4.5% of Clockticks

4.5% of Clockticks

0.0% of Clockticks

Unknown Branches: 1.0% of Clockticks
DSB Switches: 3.3% of Clockticks
Length Changing Prefixes: 0.2% of Clockticks
MS Switches: 0.9% of Clockticks
Front-End Bandwidth: 9.1% of Pipeline Slots

**Front-End Bandwidth MITE:** 22.1% of Clockticks **Front-End Bandwidth DSB:** 6.8% of Clockticks

(Info) DSB Coverage: 50.8%

**Bad Speculation:** 10.3% of Pipeline Slots **Branch Mispredict:** 10.3% of Pipeline Slots **Machine Clears:** 0.1% of Pipeline Slots

```
Back-End Bound:
                            10.4% of Pipeline Slots
                               2.8% of Pipeline Slots
  Memory Bound:
     L1 Bound:
                                 5.3% of Clockticks
        DTLB Overhead:
                                    5.0% of Clockticks
                                       4.9% of Clockticks
           Load STLB Hit:
                                       0.0% of Clockticks
           Load STLB Miss:
        Loads Blocked by Store Forwarding: 6.5% of Clockticks
                                    0.0% of Clockticks
        Lock Latency:
        Split Loads:
                                    0.4% of Clockticks
        4K Aliasing:
                                    1.1% of Clockticks
        FB Full:
                                    0.3% of Clockticks
     L2 Bound:
                                 0.4% of Clockticks
     L3 Bound:
                                 0.8% of Clockticks
        Contested Accesses:
                                    0.0% of Clockticks
        Data Sharing:
                                    0.0% of Clockticks
        L3 Latency:
                                    3.8% of Clockticks
                                    0.0% of Clockticks
        SO Full:
     DRAM Bound:
                                 0.1% of Clockticks
        Memory Bandwidth:
                                    1.5% of Clockticks
        Memory Latency:
                                    4.3% of Clockticks
     Store Bound:
                                 1.3% of Clockticks
        Store Latency:
                                    9.0% of Clockticks
        False Sharing:
                                    0.0% of Clockticks
        Split Stores:
                                    0.2% of Clockticks
        DTLB Store Overhead:
                                    3.0% of Clockticks
           Store STLB Hit:
                                       3.0% of Clockticks
           Store STLB Hit:
                                       0.0% of Clockticks
  Core Bound:
                               7.6% of Pipeline Slots
     Divider:
                                 0.4% of Clockticks
                                 21.4% of Clockticks
     Port Utilization:
        Cycles of 0 Ports Utilized: 7.3% of Clockticks
           Serializing Operations:
                                     0.5% of Clockticks
           Mixing Vectors:
                                       0.0% of uOps
        Cycles of 1 Port Utilized: 5.0% of Clockticks
        Cycles of 2 Ports Utilized: 8.1% of Clockticks
        Cycles of 3+ Ports Utilized: 30.0% of Clockticks
           ALU Operation Utilization: 39.7% of Clockticks
              Port 0:
                                          37.3% of Clockticks
              Port 1:
                                          40.2% of Clockticks
                                          41.7% of Clockticks
              Port 5:
                                          39.6% of Clockticks
              Port 6:
           Load Operation Utilization: 35.7% of Clockticks
              Port 2:
                                          42.5% of Clockticks
                                          43.6% of Clockticks
              Port 3:
           Store Operation Utilization:
                                          30.6% of Clockticks
              Port 4:
                                          30.6% of Clockticks
              Port 7:
                                          15.8% of Clockticks
        Vector Capacity Usage (FPU): 24.6%
Average CPU Frequency: 2.266 GHz
Total Thread Count:
                            1
Paused Time:
                            0s
```

## **Effective Physical Core Utilization:** 24.4% (0.978 out of 4)

The metric value is low, which may signal a poor physical CPU cores utilization caused by:

- load imbalance
- threading runtime overhead
- contended synchronization
- thread/process underutilization
- incorrect affinity that utilizes logical cores instead of physical cores

Explore sub-metrics to estimate the efficiency of MPI and OpenMP parallelism or run the Locks and Waits analysis to identify parallel bottlenecks for other parallel runtimes.

## **Effective Logical Core Utilization:** 12.3% (0.984 out of 8)

The metric value is low, which may signal a poor logical CPU cores utilization. Consider improving physical core utilization as the first step and then look at opportunities to utilize logical cores, which in some cases can improve processor throughput and overall performance of multi-threaded applications.

## **Collection and Platform Info:**

**Application Command Line:** ./codecs/hm/encoder/TAppEncoderStatic "-c" "./configs/hm/encoder\_lowdelay\_main.cfg" "-i" "./sequences/CLASS\_C/RaceHorses\_416x240\_30.yuv" "-wdt" "416" "-hgt" "240" "-b" "./bin/hm/encoder\_lowdelay\_main.cfg/CLASS\_C/RaceHorses\_416x240\_30\_QP\_27\_hm.bin" "-o" "./rec\_yuv/hm/encoder\_lowdelay\_main.cfg/CLASS\_C/RaceHorses\_416x240\_30\_QP\_27\_hm.yuv" "-fr" "30" "-fs" "0" "-f" "50" "-q" "27"

**User Name:** root

**Operating System:** 5.4.0-65-generic DISTRIB\_ID=Ubuntu DISTRIB\_RELEASE=18.04 DISTRIB\_CODENAME=bionic DISTRIB\_DESCRIPTION="Ubuntu 18.04.5 LTS"

**Computer Name:** eimon

Result Size: 359.9 MB

**Collection start time:** 01:17:40 10/02/2021 UTC

**Collection stop time:** 01:20:32 10/02/2021 UTC

**Collector Type:** Event-based sampling driver

CPU:

Name: Intel(R) Processor code named Kabylake

ULX

**Frequency:** 1.992 GHz

**Logical CPU Count:** 8

**Cache Allocation Technology:** 

Level 2 capability: not detected

**Level 3 capability:** not detected