-
Notifications
You must be signed in to change notification settings - Fork 117
Redesign of performance logging and support for Graylog #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR adds the support to log the performance checks that do use using graylog. The implementation adds the following fields in graylog: * check_info which contains the check name the system:partition and PrgEnv * check_name which shows the check name * check_partition which displays the system partition used * check_perf_reference which shows the reference value of a given test * check_perf_upper_thres which displays the upper limit threshold of a test * check_perf_lower_thres which displays the lower limit threshold of a check * check_perf_value which holds the actual perf value recorded on that system * check_system which shows the system on which the test has ran * data-version which display the reframe version of the data * version which displays the reframe version
vkarak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that performance logging is configurable, it makes sense to add also unit tests to test what gets printed in the log file.
reframe/settings.py
Outdated
| @@ -1 +1 @@ | |||
| ../config/generic.py No newline at end of file | |||
| ../config/cscs.py No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this.
config/cscs.py
Outdated
| _perf_logging_config = { | ||
| 'level': 'INFO', | ||
| 'handlers': { | ||
| '__h_graylog': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to have the handlers as a list and each handler entry should rather have a type attribute specifying is purpose. So the handlers entry would look like this:
'handlers': [
{
'type': 'graylog',
...
},
{
'type': 'file',
...
}
]What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@victorusu The problem with this syntax is that we should change also the standard logging syntax...
reframe/core/logging.py
Outdated
|
|
||
| import reframe | ||
| import reframe.core.debug as debug | ||
| from reframe.settings import settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now obsolete. You should get the settings from the config.py using the corresponding function.
| def getlogger(): | ||
| return _context_logger | ||
|
|
||
| def getperflogger(check): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may omit completely the check argument and get the actual check from the current logger returned from getlogger(). This logger will be associated with the currently executing check. If there is no check associated, this method should throw.
reframe/core/pipeline.py
Outdated
| perf_extra['check_perf_lower_thres'] = low_thres | ||
|
|
||
| if high_thres is not None: | ||
| perf_extra['check_perf_upper_thres'] = high_thres |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember, the Graylog backend needs some arguments passed in the extra argument? Could you please remind me what is exactly needed? It seems we need to fine tune a bit this here.
|
@victorusu Please update also this branch with the latest master, cos it's quite behind. |
This commit brings several changes: - New syntax for logging handler configuration. The new syntax is more like JSON. A warning is issued if the old syntax is used and it is automatically converted to the new. - Logging fields for checks are enriched with performance information - A special logging handler is introduced that dynamically creates a file based on live information of the first record to be logged. This handler is used for file performance logging, where each performance test must have its own performance file. - No more properties in settings, because requires some programming skills to edit and adapt to future settings changes.
| job_submit_timeout = 60 | ||
| checks_path = ['checks/'] | ||
| checks_path_recurse = True | ||
| site_configuration = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made all these attributes public and removed properties. See the PR description for a rationale. Opinions?
| '%(check_perf_var)s=%(check_perf_value)s|' | ||
| 'ref=%(check_perf_ref)s ' | ||
| '(l=%(check_perf_lower_thres)s, ' | ||
| 'u=%(check_perf_upper_thres)s)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the produced performance log. Let me know if you like it:
2018-05-25T17:17:40|reframe 2.13-dev0|magma_zsymmetrize_prod on dom:gpu using PrgEnv-gnu|jobid=746609|cpu_perf=0.916756|ref=0.93 (l=None, u=0.05)
2018-05-25T17:17:40|reframe 2.13-dev0|magma_zsymmetrize_prod on dom:gpu using PrgEnv-gnu|jobid=746609|gpu_perf=158.9385|ref=158.4 (l=None, u=0.05)
reframe/core/logging.py
Outdated
| import reframe | ||
| import reframe.core.debug as debug | ||
| from reframe.core.exceptions import ConfigError, LoggingError | ||
| from reframe.settings import settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to remove this.
reframe/core/logging.py
Outdated
| 'check_perf_ref': None, | ||
| 'check_perf_lower_thres': None, | ||
| 'check_perf_upper_thres': None, | ||
| 'data-version': reframe.VERSION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not using data-version, but perhaps I could. @victorusu What is the intention for that?
| hdlr = _create_graylog_handler(handler_config) | ||
| if hdlr is None: | ||
| sys.stderr.write('WARNING: could not initialize the ' | ||
| 'graylog handler; ignoring...\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am getting this warning, which means that Python-bare packages does not contain it. We should add it there.
| def process(self, msg, kwargs): | ||
| # Setup dynamic fields of the check | ||
| def _update_check_extras(self): | ||
| """Return a dictionary with all the check-specific information.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is obsolete.
reframe/core/pipeline.py
Outdated
| # Performance logging | ||
| self._perf_logger = logging.null_logger | ||
| self._perf_logfile = None | ||
| self._perf_logdir = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used.
reframe/frontend/cli.py
Outdated
| try: | ||
| logging.configure_logging(settings.logging_config) | ||
| logging.configure_logging(settings.logging_config, | ||
| settings.perf_logging_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will through an AttributeError if somebody tries to use this version with an old configuration file. I could perhaps catch it and issue a warning message. It would not be a problem if I was passing it as None, if the attribute didn't exist, but this would silently make the framework not to produce performance logs.
unittests/test_cli.py
Outdated
| captured_stdout = StringIO() | ||
| captured_stderr = StringIO() | ||
| print(sys.argv) | ||
| print(' '.join(sys.argv)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should remove this.
|
@victorusu Can you review and reply to my comments? PS: Technically, you cannot review, because you submitted this PR, but just make your comments. |
Performance logging logic (log files, log directory creatation) is now completely part of the performance logging configuration. More specifically, the performance log directory and files are only relevant to the `filelog` backend logging handler. For this reason, the logic of creating the logging prefix is now removed from the runtime resources as well.
reframe/core/logging.py
Outdated
|
|
||
| self.baseFilename = os.path.join(dirname, record.check_name + '.log') | ||
| self.stream = self._streams.get(self.baseFilename, None) | ||
| self._streams[self.baseFilename] = self.stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go after the super().emit(...)
- Also some fine-tuning of the Graylog backend impl.
Codecov Report
@@ Coverage Diff @@
## master #213 +/- ##
==========================================
- Coverage 91.3% 91.13% -0.18%
==========================================
Files 68 68
Lines 8107 8231 +124
==========================================
+ Hits 7402 7501 +99
- Misses 705 730 +25
Continue to review full report at Codecov.
|
config/cscs.py
Outdated
| 'handlers': [ | ||
| { | ||
| 'type': 'graylog', | ||
| 'host': 'your-sever-here', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to your-server-here
docs/running.rst
Outdated
| By default this field has the form ``<check_name> on <current_partition> using <current_environment>``. | ||
| It can be configured on a per test basis by overriding the :func:`info <reframe.core.pipeline.RegressionTest.info>` method in your regression test. | ||
| It can be configured on a per test basis by overriding the :func:`info <reframe.core.pipeline.RegressionTest.info>` method of a specific regression test. | ||
| - ``check_jobid``: Prints the job or process id of the job or process associated with currently executing regression test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to ... associated with the currently executing regression test
docs/running.rst
Outdated
| - ``check_outputdir``: The output directory associated with the currently executing test. | ||
| - ``check_partition``: The system partition a test is currently executing. | ||
| - ``check_stagedir``: The stage directory associated with the currently executing test. | ||
| - ``check_system``: The host system a test is currently executing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chage to The host system on which a test is currently executing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change it to "The host system where this test is currently executing."
docs/running.rst
Outdated
| """"""""""""""""""""""""""""""""" | ||
|
|
||
| The type of this handler is ``graylog`` and it logs performance data to a `Graylog <https://www.graylog.org/>`__ server. | ||
| Graylog is distributed enterprise log management service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to Graylog is a distributed ...
docs/running.rst
Outdated
| * ``extras``: (optional) A set of optional user attributes to be passed with each log record to the server. | ||
| These may depend on the server configuration. | ||
|
|
||
| This log handler uses internally `pygelf <https://pypi.org/project/pygelf/>`__, so this module Python must be available, otherwise this log handler will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to ... so this Python module must be available
|
Except for those minor documentation changes it looks fine to me. |
|
@victorusu Did you have time to check if Graylog backends actually works? |
|
@vkarak, I am getting the following error: |
|
@victorusu The problem seems to come from here. Can you try converting |
|
@victorusu Or just call |
- A log message is required for the server to accept the log, so we generate a default `sent by $USER` message, if not specified. Also: - Fix how check tags are logged. - Fix output of tags in check listing.
|
@teojgo Can you reiterate over this PR and approve it? It's now final and ready to be merged. Graylog backend works also fine. |
|
@jenkins-cscs retry dom |
Also: - Adapted PBS config file - PrettyPreinter prints colored messages for warnings and errors. - Simplify message format for Graylog handler.
EDIT by @vkarak
This PR, along with the support for Graylog performance logging, brings a redesign of the performance logging:
New syntax for logging handler configuration.
The new syntax is more like JSON.
A warning is issued if the old syntax is used and it is automatically
converted to the new.
Logging fields for checks are enriched with performance information
A special logging handler is introduced that dynamically creates a
file based on live information of the first record to be logged.
This handler is used for file performance logging, where each
performance test must have its own performance file.
No more properties in settings, because requires some programming
skills to edit and adapt to future settings changes.
Still todo:
Update documentationThe following description is now obsolete