Redesign of performance logging and support for Graylog #213

victorusu · 2018-03-21T13:02:30Z

EDIT by @vkarak

This PR, along with the support for Graylog performance logging, brings a redesign of the performance logging:

New syntax for logging handler configuration.
The new syntax is more like JSON.
A warning is issued if the old syntax is used and it is automatically
converted to the new.
Logging fields for checks are enriched with performance information
A special logging handler is introduced that dynamically creates a
file based on live information of the first record to be logged.
This handler is used for file performance logging, where each
performance test must have its own performance file.
No more properties in settings, because requires some programming
skills to edit and adapt to future settings changes.

Still todo:

~~Update documentation~~

The following description is now obsolete

This PR adds the support to log the performance checks that do use using
graylog.

The implementation adds the following fields in graylog:

check_info which contains the check name the system:partition and PrgEnv

check_name which shows the check name

check_partition which displays the system partition used

check_perf_reference which shows the reference value of a given test

check_perf_upper_thres which displays the upper limit threshold of a test

check_perf_lower_thres which displays the lower limit threshold of a check

check_perf_value which holds the actual perf value recorded on that system

check_system which shows the system on which the test has ran

data-version which display the reframe version of the data

version which displays the reframe version

This PR adds the support to log the performance checks that do use using graylog. The implementation adds the following fields in graylog: * check_info which contains the check name the system:partition and PrgEnv * check_name which shows the check name * check_partition which displays the system partition used * check_perf_reference which shows the reference value of a given test * check_perf_upper_thres which displays the upper limit threshold of a test * check_perf_lower_thres which displays the lower limit threshold of a check * check_perf_value which holds the actual perf value recorded on that system * check_system which shows the system on which the test has ran * data-version which display the reframe version of the data * version which displays the reframe version

vkarak

Now that performance logging is configurable, it makes sense to add also unit tests to test what gets printed in the log file.

vkarak · 2018-04-03T09:38:16Z

reframe/settings.py

@@ -1 +1 @@
-../config/generic.py
+../config/cscs.py


Please revert this.

vkarak · 2018-04-03T09:41:53Z

config/cscs.py

+    _perf_logging_config = {
+        'level': 'INFO',
+        'handlers': {
+            '__h_graylog': {


I think it would be better to have the handlers as a list and each handler entry should rather have a type attribute specifying is purpose. So the handlers entry would look like this:

'handlers': [ { 'type': 'graylog', ... }, { 'type': 'file', ... } ]

What do you think?

@victorusu The problem with this syntax is that we should change also the standard logging syntax...

vkarak · 2018-04-03T09:43:19Z

reframe/core/logging.py


 import reframe
 import reframe.core.debug as debug
+from reframe.settings import settings


This is now obsolete. You should get the settings from the config.py using the corresponding function.

vkarak · 2018-04-03T11:57:16Z

reframe/core/logging.py

 def getlogger():
    return _context_logger
+
+def getperflogger(check):


We may omit completely the check argument and get the actual check from the current logger returned from getlogger(). This logger will be associated with the currently executing check. If there is no check associated, this method should throw.

vkarak · 2018-04-03T12:14:03Z

reframe/core/pipeline.py

+                        perf_extra['check_perf_lower_thres'] = low_thres
+
+                    if high_thres is not None:
+                        perf_extra['check_perf_upper_thres'] = high_thres


As far as I remember, the Graylog backend needs some arguments passed in the extra argument? Could you please remind me what is exactly needed? It seems we need to fine tune a bit this here.

vkarak · 2018-04-03T12:16:43Z

@victorusu Please update also this branch with the latest master, cos it's quite behind.

This commit brings several changes: - New syntax for logging handler configuration. The new syntax is more like JSON. A warning is issued if the old syntax is used and it is automatically converted to the new. - Logging fields for checks are enriched with performance information - A special logging handler is introduced that dynamically creates a file based on live information of the first record to be logged. This handler is used for file performance logging, where each performance test must have its own performance file. - No more properties in settings, because requires some programming skills to edit and adapt to future settings changes.

vkarak · 2018-05-25T15:41:43Z

config/cscs.py

+    job_submit_timeout = 60
+    checks_path = ['checks/']
+    checks_path_recurse = True
+    site_configuration = {


I made all these attributes public and removed properties. See the PR description for a rationale. Opinions?

vkarak · 2018-05-25T15:43:52Z

config/cscs.py

+                    '%(check_perf_var)s=%(check_perf_value)s|'
+                    'ref=%(check_perf_ref)s '
+                    '(l=%(check_perf_lower_thres)s, '
+                    'u=%(check_perf_upper_thres)s)'


vkarak · 2018-05-25T15:44:14Z

reframe/core/logging.py

 import reframe
 import reframe.core.debug as debug
+from reframe.core.exceptions import ConfigError, LoggingError
+from reframe.settings import settings


I need to remove this.

vkarak · 2018-05-25T15:46:11Z

reframe/core/logging.py

+                'check_perf_ref': None,
+                'check_perf_lower_thres': None,
+                'check_perf_upper_thres': None,
+                'data-version': reframe.VERSION,


I am not using data-version, but perhaps I could. @victorusu What is the intention for that?

vkarak · 2018-05-25T15:48:01Z

reframe/core/logging.py

+            hdlr = _create_graylog_handler(handler_config)
+            if hdlr is None:
+                sys.stderr.write('WARNING: could not initialize the '
+                                 'graylog handler; ignoring...\n')


I am getting this warning, which means that Python-bare packages does not contain it. We should add it there.

vkarak · 2018-05-25T15:48:40Z

reframe/core/logging.py

-    def process(self, msg, kwargs):
-        # Setup dynamic fields of the check
+    def _update_check_extras(self):
+        """Return a dictionary with all the check-specific information."""


This comment is obsolete.

vkarak · 2018-05-25T15:49:15Z

reframe/core/pipeline.py

        # Performance logging
        self._perf_logger = logging.null_logger
-        self._perf_logfile = None
+        self._perf_logdir = None


This is not used.

vkarak · 2018-05-25T15:53:00Z

reframe/frontend/cli.py

    try:
-        logging.configure_logging(settings.logging_config)
+        logging.configure_logging(settings.logging_config,
+                                  settings.perf_logging_config)


This will through an AttributeError if somebody tries to use this version with an old configuration file. I could perhaps catch it and issue a warning message. It would not be a problem if I was passing it as None, if the attribute didn't exist, but this would silently make the framework not to produce performance logs.

vkarak · 2018-05-25T15:53:24Z

unittests/test_cli.py

    captured_stdout = StringIO()
    captured_stderr = StringIO()
-    print(sys.argv)
+    print(' '.join(sys.argv))


I should remove this.

vkarak · 2018-05-25T15:54:27Z

@victorusu Can you review and reply to my comments?

PS: Technically, you cannot review, because you submitted this PR, but just make your comments.

Performance logging logic (log files, log directory creatation) is now completely part of the performance logging configuration. More specifically, the performance log directory and files are only relevant to the `filelog` backend logging handler. For this reason, the logic of creating the logging prefix is now removed from the runtime resources as well.

vkarak · 2018-05-29T23:48:22Z

reframe/core/logging.py

+
+        self.baseFilename = os.path.join(dirname, record.check_name + '.log')
+        self.stream = self._streams.get(self.baseFilename, None)
+        self._streams[self.baseFilename] = self.stream


This should go after the super().emit(...)

- Also some fine-tuning of the Graylog backend impl.

codecov-io · 2018-06-07T15:32:15Z

Codecov Report

Merging #213 into master will decrease coverage by 0.17%.
The diff coverage is 82.57%.

@@            Coverage Diff             @@
##           master     #213      +/-   ##
==========================================
- Coverage    91.3%   91.13%   -0.18%     
==========================================
  Files          68       68              
  Lines        8107     8231     +124     
==========================================
+ Hits         7402     7501      +99     
- Misses        705      730      +25

Impacted Files	Coverage Δ
unittests/test_fields.py	`100% <ø> (ø)`	⬆️
reframe/core/runtime.py	`87.39% <ø> (-0.01%)`	⬇️
unittests/test_logging.py	`98.48% <100%> (+0.19%)`	⬆️
reframe/settings.py	`100% <100%> (+32%)`	⬆️
reframe/core/exceptions.py	`82.17% <100%> (+0.13%)`	⬆️
unittests/resources/settings.py	`100% <100%> (+12%)`	⬆️
reframe/core/pipeline.py	`94.7% <100%> (-0.02%)`	⬇️
reframe/frontend/cli.py	`78.57% <57.89%> (-1.85%)`	⬇️
reframe/frontend/printer.py	`86.66% <66.66%> (-2.07%)`	⬇️
reframe/core/logging.py	`84.19% <79.45%> (-5.36%)`	⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdf2090...71f4e30. Read the comment docs.

teojgo · 2018-06-13T08:51:45Z

config/cscs.py

+        'handlers': [
+            {
+                'type': 'graylog',
+                'host': 'your-sever-here',


Change to your-server-here

teojgo · 2018-06-13T08:55:23Z

docs/running.rst

    By default this field has the form ``<check_name> on <current_partition> using <current_environment>``.
-    It can be configured on a per test basis by overriding the :func:`info <reframe.core.pipeline.RegressionTest.info>` method in your regression test.
+    It can be configured on a per test basis by overriding the :func:`info <reframe.core.pipeline.RegressionTest.info>` method of a specific regression test.
+  - ``check_jobid``: Prints the job or process id of the job or process associated with currently executing regression test.


Change to ... associated with the currently executing regression test

teojgo · 2018-06-13T08:56:40Z

docs/running.rst

+  - ``check_outputdir``: The output directory associated with the currently executing test.
+  - ``check_partition``: The system partition a test is currently executing.
+  - ``check_stagedir``: The stage directory associated with the currently executing test.
+  - ``check_system``: The host system a test is currently executing.


Chage to The host system on which a test is currently executing.

I will change it to "The host system where this test is currently executing."

teojgo · 2018-06-13T09:00:59Z

docs/running.rst

+"""""""""""""""""""""""""""""""""
+
+The type of this handler is ``graylog`` and it logs performance data to a `Graylog <https://www.graylog.org/>`__ server.
+Graylog is distributed enterprise log management service.


Change to Graylog is a distributed ...

teojgo · 2018-06-13T09:02:01Z

docs/running.rst

+* ``extras``: (optional) A set of optional user attributes to be passed with each log record to the server.
+  These may depend on the server configuration.
+
+This log handler uses internally `pygelf <https://pypi.org/project/pygelf/>`__, so this module Python must be available, otherwise this log handler will be ignored.


Change to ... so this Python module must be available

teojgo · 2018-06-13T12:27:56Z

Except for those minor documentation changes it looks fine to me.

vkarak · 2018-06-14T12:28:17Z

@victorusu Did you have time to check if Graylog backends actually works?

victorusu · 2018-06-18T06:53:42Z

@vkarak, I am getting the following error:

FAILURE INFO for gromacs_gpu_prod_check 
  * System partition: daint:gpu
  * Environment: PrgEnv-gnu
  * Stage directory: /users/hvictor/Work/grafana-test/reframe/stage/gpu/gromacs_gpu_prod_check/PrgEnv-gnu
  * Job type: batch job (id=8060458)
  * Maintainers: ['VH']
  * Failing phase: performance
  * Reason: unexpected error: Object of type 'set' is not JSON serializable
Traceback (most recent call last):
  File "/users/hvictor/Work/grafana-test/reframe/reframe/frontend/executors/__init__.py", line 58, in _safe_call
    return fn(*args, **kwargs)
  File "/users/hvictor/Work/grafana-test/reframe/reframe/core/pipeline.py", line 1003, in performance
    self.check_performance()
  File "/users/hvictor/Work/grafana-test/reframe/reframe/core/pipeline.py", line 1042, in check_performance
    ref, low_thres, high_thres)
  File "/users/hvictor/Work/grafana-test/reframe/reframe/core/logging.py", line 390, in log_performance
    self.log(level, msg)
  File "/users/hvictor/Work/grafana-test/reframe/reframe/core/logging.py", line 405, in log
    super().log(level, msg, *args, **kwargs)
  File "/opt/python/3.6.1.1/lib/python3.6/logging/__init__.py", line 1672, in log
    self.logger._log(level, msg, args, **kwargs)
  File "/opt/python/3.6.1.1/lib/python3.6/logging/__init__.py", line 1442, in _log
    self.handle(record)
  File "/opt/python/3.6.1.1/lib/python3.6/logging/__init__.py", line 1452, in handle
    self.callHandlers(record)
  File "/opt/python/3.6.1.1/lib/python3.6/logging/__init__.py", line 1514, in callHandlers
    hdlr.handle(record)
  File "/opt/python/3.6.1.1/lib/python3.6/logging/__init__.py", line 863, in handle
    self.emit(record)
  File "/apps/daint/UES/jenkins/6.0.UP04/gpu/easybuild/software/PyExtensions/3.6-CrayGNU-17.12/lib/python3.6/site-packages/pygelf-0.3.1-py3.6.egg/pygelf/handlers.py", line 156, in emit
    data = self.convert_record_to_gelf(record)
  File "/apps/daint/UES/jenkins/6.0.UP04/gpu/easybuild/software/PyExtensions/3.6-CrayGNU-17.12/lib/python3.6/site-packages/pygelf-0.3.1-py3.6.egg/pygelf/handlers.py", line 35, in convert_record_to_gelf
    self.compress
  File "/apps/daint/UES/jenkins/6.0.UP04/gpu/easybuild/software/PyExtensions/3.6-CrayGNU-17.12/lib/python3.6/site-packages/pygelf-0.3.1-py3.6.egg/pygelf/gelf.py", line 66, in pack
    packed = json.dumps(gelf).encode('utf-8')
  File "/opt/python/3.6.1.1/lib/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/opt/python/3.6.1.1/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/opt/python/3.6.1.1/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/opt/python/3.6.1.1/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'set' is not JSON serializable

vkarak · 2018-06-18T07:19:37Z

@victorusu The problem seems to come from here. Can you try converting tags to a list in this point and retry?

vkarak · 2018-06-18T07:20:45Z

@victorusu Or just call str() on it.

- A log message is required for the server to accept the log, so we generate a default `sent by $USER` message, if not specified. Also: - Fix how check tags are logged. - Fix output of tags in check listing.

vkarak · 2018-06-19T15:02:55Z

@teojgo Can you reiterate over this PR and approve it? It's now final and ready to be merged. Graylog backend works also fine.

vkarak · 2018-06-20T08:42:35Z

@jenkins-cscs retry dom

Also: - Adapted PBS config file - PrettyPreinter prints colored messages for warnings and errors. - Simplify message format for Graylog handler.

victorusu requested a review from vkarak March 21, 2018 13:02

victorusu self-assigned this Mar 21, 2018

victorusu added this to the ReFrame sprint 2018w11 milestone Mar 21, 2018

victorusu added request for enhancement enhancement labels Mar 21, 2018

vkarak modified the milestones: ReFrame sprint 2018w11, Upcoming sprint Mar 26, 2018

vkarak requested changes Apr 3, 2018

View reviewed changes

vkarak modified the milestones: ReFrame sprint 2018w13, Upcoming sprint Apr 15, 2018

vkarak modified the milestones: ReFrame sprint 2018w16, Upcoming sprint May 3, 2018

vkarak self-assigned this May 3, 2018

Vasileios Karakasis added 2 commits May 24, 2018 09:57

Merge branch 'master' into feature/graylog-perf-support

501694d

vkarak requested a review from teojgo May 25, 2018 15:39

vkarak changed the title ~~Adding support graylog performance logging~~ [WIP] Redesign of performance logging and support for Graylog May 25, 2018

vkarak reviewed May 25, 2018

View reviewed changes

vkarak requested changes May 25, 2018

View reviewed changes

vkarak reviewed May 29, 2018

View reviewed changes

vkarak added the prio: important label Jun 1, 2018

Documentation for new performance logging feature

59aceb1

- Also some fine-tuning of the Graylog backend impl.

vkarak changed the title ~~[WIP] Redesign of performance logging and support for Graylog~~ Redesign of performance logging and support for Graylog Jun 4, 2018

vkarak modified the milestones: ReFrame sprint 2018w20, Upcoming sprint Jun 4, 2018

vkarak approved these changes Jun 7, 2018

View reviewed changes

Merge branch 'master' into feature/graylog-perf-support

c8149e5

More unit tests for logging.

f6edec3

teojgo requested changes Jun 13, 2018

View reviewed changes

Address PR comments

9670422

vkarak mentioned this pull request Jun 19, 2018

[feat] Evaluate and log all performance values before asserting them #334

Closed

Fix Graylog logging

a48bf1c

- A log message is required for the server to accept the log, so we generate a default `sent by $USER` message, if not specified. Also: - Fix how check tags are logged. - Fix output of tags in check listing.

teojgo approved these changes Jun 19, 2018

View reviewed changes

Merge branch 'master' into feature/graylog-perf-support

ecda652

Ignore performance logging if not configured

71f4e30

Also: - Adapted PBS config file - PrettyPreinter prints colored messages for warnings and errors. - Simplify message format for Graylog handler.

vkarak merged commit 4185d5f into master Jun 20, 2018

vkarak deleted the feature/graylog-perf-support branch June 20, 2018 11:28

vkarak mentioned this pull request Jun 21, 2018

Support for sending performance logs to Graylog #83

Closed

		@@ -1 +1 @@
		../config/generic.py No newline at end of file
		../config/cscs.py No newline at end of file

Redesign of performance logging and support for Graylog #213

Redesign of performance logging and support for Graylog #213

Uh oh!

Conversation

victorusu commented Mar 21, 2018 • edited by vkarak Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkarak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vkarak commented Apr 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vkarak commented May 25, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Jun 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vkarak Jun 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

teojgo commented Jun 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkarak commented Jun 14, 2018

Uh oh!

victorusu commented Jun 18, 2018

Uh oh!

vkarak commented Jun 18, 2018

Uh oh!

vkarak commented Jun 18, 2018

Uh oh!

vkarak commented Jun 19, 2018

Uh oh!

vkarak commented Jun 20, 2018

Uh oh!

Reviewers

victorusu commented Mar 21, 2018 •

edited by vkarak

Loading

codecov-io commented Jun 7, 2018 •

edited

Loading

vkarak Jun 14, 2018 •

edited

Loading

teojgo commented Jun 13, 2018 •

edited

Loading