Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to create Linux perf event and metric json #27

Closed
wants to merge 19 commits into from
Closed

Add ability to create Linux perf event and metric json #27

wants to merge 19 commits into from

Conversation

captain5050
Copy link
Contributor

Add a script to create the Linux perf event and metric json. When the event and metric data is updated running this script will create json files for the Linux perf tool that can be mailed to the Linux kernel mailing list. A single script is used so, for example, metrics can verify events in the formula against those in the json files.

@ZhengjunXing0
Copy link

I used the new script to generate the core/uncore/metric JSON files, I diff the files, and they are almost the same except for the following :
1)Removed "CollectPEBSRecord"/ "Counter"/ "PEBScounters"/ "Speculative"
These items are not used by both old jevents.c and new jevents.py, so currently removing them should be ok, but we may use these items in the future.
2)Sort the events.
3) the following uncore events for BDX/IVT/BDW-DE/HSX/ are missed by new scripts
"EventName": "UNC_H_ADDR_OPC_MATCH.FILT",
"EventName": "UNC_H_ADDR_OPC_MATCH.ADDR",
"EventName": "UNC_H_ADDR_OPC_MATCH.OPC",
"EventName": "UNC_H_ADDR_OPC_MATCH.AD",
"EventName": "UNC_H_ADDR_OPC_MATCH.BL",
"EventName": "UNC_H_ADDR_OPC_MATCH.AK",
4) the following uncore event for BDX/IVT/BDE-DE/JKT/HSX is missed by new scripts
"EventName": "UNC_H_ADDR_OPC_MATCH.FILT",
5) the following uncore event for ICX/SNR/BDX/IVT/BDW-DE/JKT/HSX is missed by new scripts
"EventName": "UNC_I_SNOOP_RESP.HIT_ES"
6) Because the events are sorted, it is hard to diff, I manually do it, so may miss something else.

@captain5050
Copy link
Contributor Author

Thanks for looking! On reviewing the points above I don't find any issues to address in the code. A more detailed explanation is below.

  1. the following uncore events for BDX/IVT/BDW-DE/HSX/ are missed by new scripts
    "EventName": "UNC_H_ADDR_OPC_MATCH.FILT",
    "EventName": "UNC_H_ADDR_OPC_MATCH.ADDR",
    "EventName": "UNC_H_ADDR_OPC_MATCH.OPC",
    "EventName": "UNC_H_ADDR_OPC_MATCH.AD",
    "EventName": "UNC_H_ADDR_OPC_MATCH.BL",
    "EventName": "UNC_H_ADDR_OPC_MATCH.AK",

If you look at the events they have filters:

{
"Unit": "HA",
"EventCode": "0x20",
"UMask": "0x3",
"EventName": "UNC_H_ADDR_OPC_MATCH.FILT",
"BriefDescription": "QPI Address/Opcode Match; Address & Opcode Match",
"PublicDescription": "UNC_H_ADDR_OPC_MATCH.FILT",
"Counter": "0,1,2,3",
"MSRValue": "0x00",
"ELLC": "0",
"Filter": "HA_AddrMatch0[31:6], HA_AddrMatch1[13:0], HA_OpcodeMatch[5:0]",
"ExtSel": "0"
},

It is expected even in the old code that these are events are dropped:
https://github.com/intel/event-converter-for-linux-perf/blob/master/uncore_csv_json.py#L244
This behavior was discussed with Kan Liang on LKM.

  1. the following uncore event for BDX/IVT/BDE-DE/JKT/HSX is missed by new scripts
    "EventName": "UNC_H_ADDR_OPC_MATCH.FILT",

This is the same as above.

  1. the following uncore event for ICX/SNR/BDX/IVT/BDW-DE/JKT/HSX is missed by new scripts
    "EventName": "UNC_I_SNOOP_RESP.HIT_ES"

I don't see this. For example, in perf/icelakex/uncore-other.json:

{
    "BriefDescription": "Responses to snoops of any type that hit E or S line in the IIO cache",
    "EventCode": "0x12",
    "EventName": "UNC_I_SNOOP_RESP.ALL_HIT_ES",
    "PerPkg": "1",
    "PublicDescription": "Responses to snoops of any type (code, data, invalidate) that hit E or S line in the IIO cache",
    "UMask": "0x74",
    "Unit": "IRP"
},
  1. Because the events are sorted, it is hard to diff, I manually do it, so may miss something else.

Agreed. You can add a sort here:
https://github.com/intel/event-converter-for-linux-perf/blob/master/uncore_csv_json.py#L280
like:
events : list[Dict[str, str]] = sorted(list(iter), key=lambda event: event['EventName'])

to reduce this.

@ZhengjunXing0
Copy link

ZhengjunXing0 commented Nov 17, 2022 via email

@ZhengjunXing0
Copy link

ZhengjunXing0 commented Nov 17, 2022 via email

@ZhengjunXing0
Copy link

I checked all the platforms, and no other core/uncore events were missed when converted by the new scripts.
But for metrics, I find the following issues, the uncore events should not be "cpu@" (mostly in "MEM_Parallel_Reads")
SPR:
old : "MetricExpr": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / cha@UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD\,thresh\=1@",
new: "MetricExpr": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / cpu@UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD\,cmask\=1@",
SKL:
old: "MetricExpr": "arb@event\=0x80\,umask\=0x2@ / arb@event\=0x80\,umask\=0x2\,cmask\=1@",
new: "MetricExpr": "UNC_ARB_TRK_OCCUPANCY.DATA_READ / cpu@UNC_ARB_TRK_OCCUPANCY.DATA_READ\,cmask\=1@",
SKX:
old: "MetricExpr": "cha@event\=0x36\,umask\=0x21\,config\=0x40433@ / cha@event\=0x36\,umask\=0x21\,config\=0x40433\,thresh\=1@",
new: "MetricExpr": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / cpu@UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD\,cmask\=1@",
HSX:
old :"MetricExpr": "cbox@event\=0x36\,umask\=0x3\,filter_opc\=0x182@ / cbox@event\=0x36\,umask\=0x3\,filter_opc\=0x182\,thresh\=1@",
new: "MetricExpr": "UNC_C_TOR_OCCUPANCY.MISS_OPCODE@filter_opc\=0x182@ / UNC_C_TOR_OCCUPANCY.MISS_OPCODE@cmask\=1\,filter_opc\=0x182@",
CLX:
old: "MetricExpr": "cha@event\=0x36\,umask\=0x21\,config\=0x40433@ / cha@event\=0x36\,umask\=0x21\,config\=0x40433\,thresh\=1@",
new: "MetricExpr": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / cpu@UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD\,cmask\=1@",

@captain5050
Copy link
Contributor Author

So this is wrong and something that can be fixed. The old logic is here:
https://github.com/intel/event-converter-for-linux-perf/blob/master/extract-tma-metrics.py#L91

You will see that the replacement is tagged as being only for SKL, however, you have seen that it is actually applying in the old code on SPR, SKL, SKX, HSX and CLX. The replacement is happening so that the cpu@...@ expansion of the c1 doesn't happen for an uncore event. We can do the same fix in the new code here:
https://github.com/captain5050/perfmon-metrics/blob/main/scripts/create_perf_json.py#L680

However, the new code has to be explicit with a rewrite for every architecture it applies to (the 5 listed above). I'll patch the original CLs, but it is also possible to fix issues like this once the code is merged, which would unblock the EBS_Mode work that's needed on the Valkyrie metrics.

calebbiggers and others added 19 commits November 17, 2022 22:04
Add new script that will create the Linux perf command json
files. Initially the command parses ../mapfile.csv and generates the
perf version of this file in perf/mapfile.csv.
Parse the perfmon event json and generate perf style json. The output
exactly matches the existing event json event except some unused, by
perf/jevents.py, dictionary values are dropped. Specifically the
following are no longer passed through: CollectPEBSRecord, Counter,
CounterHTOff, ELLC, L1_Hit_Indication and Offcore.
Reuse the existing conversion but extend for uncore and
uncore_experimental. Add necessary fix ups to keep the perf json near
identical to existing pref json. Differences include:
- don't remove events that have a "tbd" description
- sort events
- ascii-fication during reading
No support currently for the CSV file customization of events.
Uncore CSV files modify or duplicate events, the duplicated events can
add 'filter' values to the event that make it more specific. Metrics
may also be generated.

Differences from previous approach:
 - When a new event is created then this is updated rather than the
   existing event.
 - If an event has a filter and is in the CSV file but with no filter,
   then filter value isn't dropped.
 - Events without descriptions aren't dropped.
 - There is clean up of strings to remove unnecessary whitespace and
   avoid duplicated descriptions.
 - For tigerlake there are duplicated events with different units, the
   first rather than the second event is dropped (unit 'imc' rather
   than 'h_imc').
Support conversion of TMA metrics and E-core TMA metrics. Verification
that events in the metric exist is performed, from this a minimal set
of fixups are created and more metrics supported. For BDW-DE 3 metrics
are explicitly dropped due to missing events. The valkyrie metrics
aren't inserted currently.
Load extra metrics such as ICX/metrics/perf/icx_metric_perf.json. Use
the save_form function so that missing events are identified.
This commit introduces a brief section describing TMA files.
This commit releases v1.07 event files for Sapphire Rapids.
This commit releases v1.16 event files for Alder Lake.
This commit releases v1.08 event files for Tiger Lake. An uncore
experimental event file is also introduced.
This commit releases v1.08 event files for Sapphire Rapids.
The mapfile Version column primarily uses capital V. This commit fixes a
few of my previous updates with lower case v.
Leftover from the repository rename.
@captain5050
Copy link
Contributor Author

I added the rewrites such as for SKL here:
https://github.com/captain5050/perfmon-metrics/blob/main/scripts/create_perf_json.py#L741
and verified the CPU expansion wasn't applying to other uncore events. Note, I did use the raw event encoding like the legacy code as the named event is clearer and less error prone. PTAL and let me know if there are any other outstanding issues.

@ZhengjunXing0
Copy link

I tried the updated scripts, CLX/ICX/SPR/SKL/SKX "cpu@" issue disappeared.
But the following HSX issue still existed ("thresh" change to "cmask" for "MEM_Parallel_Reads")
HSX:
old :"MetricExpr": "cbox@event=0x36,umask=0x3,filter_opc=0x182@ / cbox@event=0x36,umask=0x3,filter_opc=0x182,thresh=1@",
new: "MetricExpr": "UNC_C_TOR_OCCUPANCY.MISS_OPCODE@filter_opc=0x182@ / UNC_C_TOR_OCCUPANCY.MISS_OPCODE@cmask=1,filter_opc=0x182@",

@captain5050
Copy link
Contributor Author

Agreed, the rewrite applies more than just to HSX:
https://github.com/captain5050/event-converter-for-linux-perf/blob/master/extract-tma-metrics.py#L96

It looks like we could use format validation on top of event validation for metrics. Fwiw on a HSX:

$ ls /sys/devices/uncore_cbox_0/format
edge   filter_c6    filter_link  filter_nid  filter_state  thresh  umask
event  filter_isoc  filter_nc    filter_opc  filter_tid    tid_en

@captain5050 captain5050 closed this by deleting the head repository Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants