Skip to content

Conversation

@rsarm
Copy link
Contributor

@rsarm rsarm commented Jun 18, 2020

Closes #667

EDIT from @vkarak:

This PR adds to ReFrame the capability of producing a detailed JSON report whenever it runs. By default, the report is placed under $HOME/.reframe/reports. Each new ReFrame run generates a new report. A command line argument, configuration parameter and environment variable controlling this are added. All other reports generated from ReFrame should use this JSON. This is done already for the standard failure report but it is left for future work for the current performance report (the performance data of tests is already in the JSON). Here is an example output running StreamTest:

{
  "session_info": {
    "cmdline": "./bin/reframe -c config/tresa.py -c cscs-checks/microbenchmarks/stream/stream.py --skip-prgenv-check --skip-system-check --performance-report -r",
    "config_file": "<builtin>",
    "data_version": "1.0",
    "hostname": "tresa.local",
    "prefix_output": "/Users/karakasv/Repositories/reframe/output",
    "prefix_stage": "/Users/karakasv/Repositories/reframe/stage",
    "user": "karakasv",
    "version": "3.1-dev2 (rev: 4c650425)",
    "workdir": "/Users/karakasv/Repositories/reframe",
    "time_start": "2020-07-24T00:04:00+0200",
    "time_end": "2020-07-24T00:04:02+0200",
    "time_elapsed": 1.8402886390686035,
    "num_cases": 1,
    "num_failures": 0
  },
  "runs": [
    {
      "num_cases": 1,
      "num_failures": 0,
      "runid": 0,
      "testcases": [
        {
          "build_stderr": "rfm_StreamTest_build.err",
          "build_stdout": "rfm_StreamTest_build.out",
          "description": "STREAM Benchmark",
          "environment": "builtin",
          "fail_reason": null,
          "fail_phase": null,
          "jobid": 75868,
          "job_stderr": "rfm_StreamTest_job.err",
          "job_stdout": "rfm_StreamTest_job.out",
          "name": "StreamTest",
          "maintainers": [
            "RS",
            "SK"
          ],
          "nodelist": [
            "tresa.local"
          ],
          "outputdir": "/Users/karakasv/Repositories/reframe/output/generic/default/builtin/StreamTest",
          "perfvars": [
            {
              "name": "triad",
              "reference": 0,
              "thres_lower": null,
              "thres_upper": null,
              "unit": "MB/s",
              "value": 17160.2
            }
          ],
          "result": "success",
          "stagedir": null,
          "scheduler": "local",
          "system": "generic:default",
          "tags": [
            "craype",
            "production"
          ],
          "time_compile": 0.7874908447265625,
          "time_performance": 0.0019030570983886719,
          "time_run": 0.988239049911499,
          "time_sanity": 0.0004889965057373047,
          "time_setup": 0.014341115951538086,
          "time_total": 1.8320262432098389
        }
      ]
    }
  ]
}

TODO:

  • Update documentation.

@rsarm rsarm added this to the ReFrame sprint 20.09 milestone Jun 18, 2020
@rsarm rsarm requested a review from vkarak June 18, 2020 07:48
@rsarm rsarm self-assigned this Jun 18, 2020
@rsarm rsarm requested a review from ekouts June 18, 2020 07:49
@rsarm
Copy link
Contributor Author

rsarm commented Jun 18, 2020

It looks something like this

[
    {
        "test": "UlimitCheck",
        "description": "Checking the output of ulimit -s in node.",
        "result": "fail",
        "system": "dom:gpu",
        "environment": "PrgEnv-gnu",
        "stagedir": "/users/sarafael/git/reframe/cscs-checks/prgenv/stage/dom/gpu/PrgEnv-gnu/UlimitCheck",
        "outputdir": "/users/sarafael/git/reframe/cscs-checks/prgenv/output/dom/gpu/PrgEnv-gnu/UlimitCheck",
        "nodelist": "nid00039",
        "jobtype": "batch job",
        "jobid": 1054524,
        "maintainers": [
            "RS",
            "CB"
        ],
        "tags": [
            "scs",
            "craype",
            "production"
        ],
        "retries": 0
    },
    {
        "test": "UlimitCheck",
        "description": "Checking the output of ulimit -s in node.",
        "result": "fail",
        "system": "dom:mc",
        "environment": "PrgEnv-gnu",
        "stagedir": "/users/sarafael/git/reframe/cscs-checks/prgenv/stage/dom/mc/PrgEnv-gnu/UlimitCheck",
        "outputdir": "/users/sarafael/git/reframe/cscs-checks/prgenv/output/dom/mc/PrgEnv-gnu/UlimitCheck",
        "nodelist": "nid00405",
        "jobtype": "batch job",
        "jobid": 1054525,
        "maintainers": [
            "RS",
            "CB"
        ],
        "tags": [
            "scs",
            "craype",
            "production"
        ],
        "retries": 0
    }
]

@vkarak vkarak marked this pull request as draft June 18, 2020 16:28
Copy link
Contributor

@ekouts ekouts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two small comments, that are more questions

@vkarak
Copy link
Contributor

vkarak commented Jun 22, 2020

@rsarm Can you put an example of the output in case of a successful test?

@rsarm
Copy link
Contributor Author

rsarm commented Jun 22, 2020

@vkarak I didn't make too much distinction. The only difference is that the key result has the values success or fail.

@vkarak
Copy link
Contributor

vkarak commented Jun 22, 2020

Apart from @ekouts' comments, here are some additional ones:

  • nodelist should be a list.
  • Replace jobtype with scheduler and use the registered_name of the scheduler registered with this partition.

We need also the following fields:

  • failing_phase: the phase that the test has failed or none if it hasn't.
  • failure_reason: the failure reason.
  • build_stdout: the basename of build's stdout file.
  • build_stderr: the basename of build's stderr file.
  • job_stdout: the basename of job's stdout file.
  • job_stderr: the basename of job's stderr file.

@rsarm
Copy link
Contributor Author

rsarm commented Jun 23, 2020

Success:

[
    {
        "name": "UlimitCheck",
        "description": "Checking the output of ulimit -s in node.",
        "system": "dom:gpu",
        "environment": "PrgEnv-gnu",
        "tags": [
            "scs",
            "craype",
            "production"
        ],
        "maintainers": [
            "RS",
            "CB"
        ],
        "scheduler": "slurm",
        "job_stdout": "rfm_UlimitCheck_job.out",
        "job_stderr": "rfm_UlimitCheck_job.err",
        "jobid": 1061294,
        "nodelist": [
            "nid00038"
        ],
        "build_stdout": "rfm_UlimitCheck_build.out",
        "build_stderr": "rfm_UlimitCheck_build.err",
        "result": "success",
        "outputdir": "/users/sarafael/git/reframe/cscs-checks/prgenv/output/dom/gpu/PrgEnv-gnu/UlimitCheck"
    }
]

Fail:

[
    {
        "name": "UlimitCheck",
        "description": "Checking the output of ulimit -s in node.",
        "system": "dom:gpu",
        "environment": "PrgEnv-gnu",
        "tags": [
            "production",
            "scs",
            "craype"
        ],
        "maintainers": [
            "RS",
            "CB"
        ],
        "scheduler": "slurm",
        "job_stdout": "rfm_UlimitCheck_job.out",
        "job_stderr": "rfm_UlimitCheck_job.err",
        "jobid": 1061295,
        "nodelist": [
            "nid00038"
        ],
        "build_stdout": "rfm_UlimitCheck_build.out",
        "build_stderr": "rfm_UlimitCheck_build.err",
        "result": "fail",
        "failing_reason": "sanity error: pattern `Thr soft limit is unlimited' not found in `rfm_UlimitCheck_job.out'",
        "failing_phase": "sanity",
        "stagedir": "/users/sarafael/git/reframe/cscs-checks/prgenv/stage/dom/gpu/PrgEnv-gnu/UlimitCheck"
    }
]

@codecov-commenter
Copy link

codecov-commenter commented Jun 23, 2020

Codecov Report

Merging #1377 into master will increase coverage by 0.10%.
The diff coverage is 96.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1377      +/-   ##
==========================================
+ Coverage   91.83%   91.93%   +0.10%     
==========================================
  Files          82       82              
  Lines       12801    12898      +97     
==========================================
+ Hits        11756    11858     +102     
+ Misses       1045     1040       -5     
Impacted Files Coverage Δ
reframe/core/systems.py 88.19% <ø> (ø)
reframe/frontend/cli.py 79.44% <90.24%> (+0.91%) ⬆️
reframe/frontend/statistics.py 95.62% <98.66%> (+4.02%) ⬆️
reframe/core/pipeline.py 92.56% <100.00%> (-0.15%) ⬇️
unittests/test_cli.py 90.97% <100.00%> (+0.16%) ⬆️
unittests/test_policies.py 98.71% <100.00%> (+0.05%) ⬆️
reframe/core/exceptions.py 90.07% <0.00%> (+4.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69afc67...65b1333. Read the comment docs.

@rsarm rsarm changed the title [wip][feat] Create a JSON output report [feat] Create a JSON output report Jun 24, 2020
@rsarm rsarm marked this pull request as ready for review June 24, 2020 14:48
@vkarak vkarak requested a review from ekouts July 3, 2020 07:21
Copy link
Contributor

@ekouts ekouts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ekouts
Copy link
Contributor

ekouts commented Jul 6, 2020

@jenkins-cscs retry daint

1 similar comment
@ekouts
Copy link
Contributor

ekouts commented Jul 6, 2020

@jenkins-cscs retry daint

Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core part of the PR is fine, except a couple of really minor omissions. We need to think a bit more on the CLI part and how we export the functionality to the user.

@rsarm
Copy link
Contributor Author

rsarm commented Jul 21, 2020

@jenkins-cscs retry all

@vkarak
Copy link
Contributor

vkarak commented Jul 22, 2020

@rsarm Even the generic unit tests from Travis are failing.

@vkarak
Copy link
Contributor

vkarak commented Jul 22, 2020

The same if I run them locally. Can you please fix them?

@vkarak
Copy link
Contributor

vkarak commented Jul 22, 2020

Ok, I see the problem. You don't handle correctly the case where the $HOME/.local/reframe directory does not exist:

./bin/reframe: OS error: [Errno 2] No such file or directory: '/Users/user/.local/reframe'

@vkarak vkarak assigned vkarak and rsarm and unassigned rsarm Jul 22, 2020
@vkarak
Copy link
Contributor

vkarak commented Jul 22, 2020

But the unit tests must not create artifacts outside the temp directory that they are using. So just creating that directory is not acceptable. It needs further thinking.

Copy link
Contributor

@vkarak vkarak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix the PR.

@vkarak
Copy link
Contributor

vkarak commented Jul 22, 2020

@rsarm We will need also some documentation for this. For the moment we won't need a detailed description of what the report contains.

@vkarak vkarak modified the milestone: ReFrame sprint 20.11 Jul 22, 2020
@vkarak vkarak changed the title [feat] Create a JSON output report [feat] Produce detailed JSON report for a regression testing session Jul 23, 2020
@pep8speaks
Copy link

pep8speaks commented Jul 23, 2020

Hello @rsarm, Thank you for updating!

Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide

Comment last updated at 2020-07-24 09:55:50 UTC

Vasileios Karakasis added 4 commits July 24, 2020 00:18
@vkarak vkarak merged commit fb9f24b into reframe-hpc:master Jul 24, 2020
@rsarm rsarm deleted the feat/json-output branch March 10, 2021 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a JSON output report file

5 participants