-
Notifications
You must be signed in to change notification settings - Fork 117
[feat] Improve performance logging #2633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1. The performance of the test is no more logged from within the test inside the
`check_performance()` method, but instead it is logged through the associated
task from test executor _after_ the test has finished. This allows us to log
also the final test result by introducing the new `check_result` test
attribute.
2. Performance logging through the `filelog` handler is completely revised:
- All performance variables are logged at once in a single line.
- A new option named `format_perfvars` is introduced for formatting the
performance variables. This will be used to format the
`%(check_perfvalues)s` specifier.
- The handler will now emit a header line describing what each logged field
is. This header is not generic, but it gets its information from the record
itself, in order to obtain the name of the performance variables that are
being logged.
- To ensure consistency of the produced log files, if a header is changed in
a subsequent run of the same test (e.g., the format or the
performance variables changed), a new file will be generated and the old
one will be saved.
- The `filelog` handler in generic settings was adapted accordingly and it
now produces pure csv output, so that it can be easily imported in other
tools for visualization.
Codecov ReportBase: 86.26% // Head: 86.15% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #2633 +/- ##
==========================================
- Coverage 86.26% 86.15% -0.12%
==========================================
Files 60 60
Lines 10985 11047 +62
==========================================
+ Hits 9476 9517 +41
- Misses 1509 1530 +21
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Also: - Make the `filelog` handler respect the `systems/prefix` configuration parameter.
Also: - Do not crash if the log format is wrong, but issue an error message instead.
We no more use to different code paths for the old and new syntax. We simply convert the old syntax (`perf_patterns`) to the new one and we proceed in a single code path.
|
@jenkins-cscs retry all |
ekouts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gone through the PR very carefully but removing %(check_perf_lower_thres)s etc could be a problem for the data we send with HTTPJSONHandler, right? Since kibana rejects non-homegenous lists, how could we send the performance data there?
This PR is indeed breaking and that's why it was scheduled for 4.0. The problem is that we don't log multiple records anymore, one for each variable, but we log everything at once. Could you check what the HTTPJSONHandler logs now and if that is by any means parsable from Elastic? If not, we could add an option to that handler to emit multiple records. |
As far as I see there is not way to get the performance in httpjson since we have to skip the Tried with your PR and this is json we are sending. We exclude {
"check_name": "HelloWorldTestOpenMP %linking=static %lang=c",
"check_jobid": "42267202",
"check_job_completion_time": "2022-10-21T10:53:35+02:00",
"check_job_completion_time_unix": 1666342415.0,
"check_info": "HelloWorldTestOpenMP %linking=static %lang=c /19d97538 @daint:gpu+PrgEnv-cray",
"check_system": "daint",
"check_partition": "gpu",
"check_environ": "PrgEnv-cray",
"check_perf_var": null,
"check_perf_value": null,
"check_perf_ref": null,
"check_perf_lower_thres": null,
"check_perf_upper_thres": null,
"check_perf_unit": null,
"check_result": "pass",
"osuser": "eirinik",
"osgroup": "csstaff",
"version": "4.0.0-dev.2",
"check_job_exitcode": 0,
"check_job_nodelist": [
"nid02356"
],
"check_build_locally": true,
"check_build_time_limit": null,
"check_descr": "C Hello, World OpenMP Static",
"check_display_name": "HelloWorldTestOpenMP %linking=static %lang=c",
"check_exclusive_access": true,
"check_executable": "./HelloWorldTestOpenMP_3",
"check_executable_opts": [],
"check_extra_resources": {},
"check_hashcode": "19d97538",
"check_keep_files": [],
"check_local": false,
"check_maintainers": [
"VH",
"EK"
],
"check_max_pending_time": null,
"check_modules": [],
"check_num_cpus_per_task": 4,
"check_num_gpus_per_node": 0,
"check_num_tasks": 1,
"check_num_tasks_per_core": null,
"check_num_tasks_per_node": 1,
"check_num_tasks_per_socket": null,
"check_outputdir": "/users/eirinik/repos/reframe-test/output/daint/gpu/PrgEnv-cray/HelloWorldTestOpenMP_19d97538",
# "check_perfvalues": {
# "daint:gpu:compilation_time": [
# 0.492619976,
# 60,
# null,
# 0.1,
# "s"
# ]
# },
"check_postbuild_cmds": [
"_rfm_build_time=\"$(($(date +%s%N)-_rfm_build_time))\"",
"echo \"Compilations time (ns): $_rfm_build_time\""
],
"check_postrun_cmds": [],
"check_prebuild_cmds": [
"_rfm_build_time=\"$(date +%s%N)\""
],
"check_prefix": "/users/eirinik/repos/cscs-reframe-tests/checks/prgenv",
"check_prerun_cmds": [],
"check_readonly_files": [],
"check_short_name": "HelloWorldTestOpenMP_19d97538",
"check_sourcepath": "hello_world_openmp.c",
"check_sourcesdir": "src/openmp",
"check_stagedir": "/users/eirinik/repos/reframe-test/stage/daint/gpu/PrgEnv-cray/HelloWorldTestOpenMP_19d97538",
"check_strict_check": true,
"check_tags": [
"production",
"craype"
],
"check_time_limit": null,
"check_unique_name": "HelloWorldTestOpenMP_3",
"check_use_multithreading": null,
"check_valid_prog_environs": [
"PrgEnv-aocc",
"PrgEnv-cray",
"PrgEnv-cray_classic",
"PrgEnv-gnu",
"PrgEnv-intel",
"PrgEnv-pgi",
"PrgEnv-gnu-nocuda",
"PrgEnv-pgi-nocuda"
],
"check_valid_systems": [
"daint:gpu",
"daint:mc",
"dom:gpu",
"dom:mc",
"arolla:cn",
"arolla:pn",
"tsa:cn",
"tsa:pn"
],
"check_variables": {
"CRAYPE_LINK_TYPE": "static",
"OMP_NUM_THREADS": "4"
},
"facility": "reframe",
"data-version": "1.0"
}I think we should add the option to that handler to emit multiple records or give the option to send |
|
@ekouts I think you are right. We should bring back the option of emitting multiple lines/records per test. I had a discussion over this PR with @flx42 as well and it seems that abolishing the multiline |
This is disabled by default and can be enabled by setting the `perflog_compat` logging configuration parameter. This will restore the performance logging behaviour of pre-4.0 versions.
|
@ekouts Could you retry this PR? To enable the old behaviour, you need to set |
This PR improves significantly how performance logging is done, as well as how the
fileloghandler behaves. More specifically:The performance of the test is no more logged from within the test inside the
check_performance()method, but instead it is logged through the associated task from test executor after the test has finished. This allows us to log the final test result by introducing the newcheck_resultlog record attribute. We could also log any task-related information, but we can add this on-demand in the future.Performance logging through the
fileloghandler is completely revised:format_perfvarsis introduced for formatting the performance variables. This will be used to format the%(check_perfvalues)sspecifier.fileloghandler in generic settings was adapted accordingly and it now produces pure csv output, so that it can be easily imported in other tools for visualization.If either the
formator theperfvars_formatare wrong, we don't crash anymore and we issue an error message as a log message.The performance checking phase has been revised and simplified. We do not maintain anymore two code paths for the legacy
perf_patternsand the new syntax. We convert the old syntax to the new one and continue from there.Users can still opt in for the old behaviour where a a new record was logged for each performance variable and the
$(check_perf_*)sspecifier was valid in each log record and not only for formatting the%(check_perfvalues)s. This behaviour can be enabled by setting thelogging/perflog_compatconfiguration parameter totrue.Closes #1543.
Closes #2394.
Todos
FIXMEnote inpipeline.pyfileloghandler