New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine-readable output of tests #1284

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
@hauleth

hauleth commented Sep 17, 2015

### Structure
Output is stream of lines containing JSON objects separated by new line. Each

This comment has been minimized.

@skade

skade Sep 17, 2015

It would make sense to agree on something like http://dataprotocols.org/ndjson/ or http://jsonlines.org/ for that or is that up to TAP-J to define?

@skade

skade Sep 17, 2015

It would make sense to agree on something like http://dataprotocols.org/ndjson/ or http://jsonlines.org/ for that or is that up to TAP-J to define?

This comment has been minimized.

@hauleth

hauleth Sep 17, 2015

It could also make usage of \1E ASCII code which refers to "Record Separator". This is still under consideration, but 1 record per line is simple enough.

@hauleth

hauleth Sep 17, 2015

It could also make usage of \1E ASCII code which refers to "Record Separator". This is still under consideration, but 1 record per line is simple enough.

This comment has been minimized.

@skade

skade Sep 17, 2015

I'd take '\n' or '\r\n', it's a pretty common format in the NoSQL-world (e.g. Elasticsearch bulk uses it) and JSON can be written newline-free very well.

@skade

skade Sep 17, 2015

I'd take '\n' or '\r\n', it's a pretty common format in the NoSQL-world (e.g. Elasticsearch bulk uses it) and JSON can be written newline-free very well.

This comment has been minimized.

@hauleth

hauleth Sep 17, 2015

But is inconsistent between platform as Windows still uses \r\n as a line separator and I believe that println! macro respect that. Although this needs reconsideration as I chosen new line as it was simple enough to implement in first draft of specs.

@hauleth

hauleth Sep 17, 2015

But is inconsistent between platform as Windows still uses \r\n as a line separator and I believe that println! macro respect that. Although this needs reconsideration as I chosen new line as it was simple enough to implement in first draft of specs.

This comment has been minimized.

@skade

skade Sep 17, 2015

That inconsistency doesn't matter in that case as '{"test": "foo"}\r' is the same JSON value as '{"test": "foo"}'. Even if your IO isn't prepared for that, this is not an issue.

@skade

skade Sep 17, 2015

That inconsistency doesn't matter in that case as '{"test": "foo"}\r' is the same JSON value as '{"test": "foo"}'. Even if your IO isn't prepared for that, this is not an issue.

# Drawbacks
This is breaking change in tooling and will require new tool that will provide
current functionality for compatibility reasons, but IMHO small pain for big gain.

This comment has been minimized.

@skade

skade Sep 17, 2015

Is the test output a committed API? Using the test API directly requires #[feature(test)], so it is not breaking a committed API.

@skade

skade Sep 17, 2015

Is the test output a committed API? Using the test API directly requires #[feature(test)], so it is not breaking a committed API.

This comment has been minimized.

@hauleth

hauleth Sep 17, 2015

No, test isn't commited. But testing is available in stable Rust and change in that matter will be visible to non-compiler-devs also so I marked it as a breaking change.

@hauleth

hauleth Sep 17, 2015

No, test isn't commited. But testing is available in stable Rust and change in that matter will be visible to non-compiler-devs also so I marked it as a breaking change.

This comment has been minimized.

@masklinn

masklinn Sep 17, 2015

Why not make the machine-readable output an option and keep the human-readable output the default? Machine-readable output by default makes for a dreadful interactive experience.

FWIW pytest outputs a human-readable format to stdout and optionally a machine-readable one to a specified file.

The ability to output both human-readable and machine-readable is actually convenient in CI, you can send the human-readable output to a human-readable log and the machine-readable output to the CI's parser.

@masklinn

masklinn Sep 17, 2015

Why not make the machine-readable output an option and keep the human-readable output the default? Machine-readable output by default makes for a dreadful interactive experience.

FWIW pytest outputs a human-readable format to stdout and optionally a machine-readable one to a specified file.

The ability to output both human-readable and machine-readable is actually convenient in CI, you can send the human-readable output to a human-readable log and the machine-readable output to the CI's parser.

This comment has been minimized.

@hauleth

hauleth Sep 17, 2015

That's the point of drawbacks. That this should be written, but IMHO isn't point of this RFC. I can rewrite it to provide flag that will print out machine-readable style.

About writing to file machine-readable and on stdout human-readable then I think that one should use Unix tools:

cargo test | tee output.log | formatter

Where formatter is tool that change machine-readable input into desired human-readable output.

@hauleth

hauleth Sep 17, 2015

That's the point of drawbacks. That this should be written, but IMHO isn't point of this RFC. I can rewrite it to provide flag that will print out machine-readable style.

About writing to file machine-readable and on stdout human-readable then I think that one should use Unix tools:

cargo test | tee output.log | formatter

Where formatter is tool that change machine-readable input into desired human-readable output.

Show outdated Hide outdated text/0000-machine-readable-tests-output.md
# Summary
Replace current test output with machine-readable one and add thin compatibility
layer on top of that.

This comment has been minimized.

@killercup

killercup Sep 17, 2015

Member

and add thin compatibility layer on top of that.

What do you mean by "compatibility layer"? The JSON-based output (instead of binary Rust structs)? In Drawbacks you write that new tooling will be required for compatibility but this sentence sounds to me like that tooling is actually part of the RFC.

@killercup

killercup Sep 17, 2015

Member

and add thin compatibility layer on top of that.

What do you mean by "compatibility layer"? The JSON-based output (instead of binary Rust structs)? In Drawbacks you write that new tooling will be required for compatibility but this sentence sounds to me like that tooling is actually part of the RFC.

This comment has been minimized.

@hauleth

hauleth Sep 17, 2015

I mean display. I don't know how tooling should be done. There should be external program that will parse output and display it in the same way it is. Maybe it should be included into rustc but maybe into cargo. I have no idea and this need to be resolved.

@hauleth

hauleth Sep 17, 2015

I mean display. I don't know how tooling should be done. There should be external program that will parse output and display it in the same way it is. Maybe it should be included into rustc but maybe into cargo. I have no idea and this need to be resolved.

@killercup

This comment has been minimized.

Show comment
Hide comment
@killercup

killercup Sep 17, 2015

Member

I really like this proposal. Thanks for writing this!

Member

killercup commented Sep 17, 2015

I really like this proposal. Thanks for writing this!

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 17, 2015

Member

cc @rust-lang/tools, definitely our territory!

Thanks for the RFC @hauleth! Some thoughts of mine:

  • Perhaps there could be a record for when a test starts, as well as when it finishes? For interactive displays it could perhaps be nice to see what tests are in progress.
  • We probably want to start off pretty conservative and not have many fields required by default unless necessary. For example could we drop build and rustc from the suite record?
  • For a test record, the duration field could be dropped if we had start/stop records.
  • Could this spec how the libtest library will be changed? E.g. it's good that this is preserving backwards compatibility, but we'll probably want a new flag to test binaries which generates this form of output.
  • Does this output also affect rustdoc --test? I think it probably should and we'd just want to forward flags to the "test binary" basically.

I personally always find it a good exercise to implement features like this ahead of time to get a good feeling about what's needed to implement the current functionality we have today. Along those lines it'd certainly help weed out what's needed for maintaining the same output in terms of benchmarks and tests, but it may also be a pretty significant chunk of work!

Member

alexcrichton commented Sep 17, 2015

cc @rust-lang/tools, definitely our territory!

Thanks for the RFC @hauleth! Some thoughts of mine:

  • Perhaps there could be a record for when a test starts, as well as when it finishes? For interactive displays it could perhaps be nice to see what tests are in progress.
  • We probably want to start off pretty conservative and not have many fields required by default unless necessary. For example could we drop build and rustc from the suite record?
  • For a test record, the duration field could be dropped if we had start/stop records.
  • Could this spec how the libtest library will be changed? E.g. it's good that this is preserving backwards compatibility, but we'll probably want a new flag to test binaries which generates this form of output.
  • Does this output also affect rustdoc --test? I think it probably should and we'd just want to forward flags to the "test binary" basically.

I personally always find it a good exercise to implement features like this ahead of time to get a good feeling about what's needed to implement the current functionality we have today. Along those lines it'd certainly help weed out what's needed for maintaining the same output in terms of benchmarks and tests, but it may also be a pretty significant chunk of work!

@hauleth

This comment has been minimized.

Show comment
Hide comment
@hauleth

hauleth Sep 17, 2015

About first point. I assumed that suite object will be the beginning of tests so additional object would be unneeded.

About second. It is completely possible to do so. I've just added that to be sure that we are running valid binary.

About third. Of course it could be solution, but I don't see reason behind providing timestamp of beginning and ending of test, duration will do as well (maybe it's only me, but I doesn't care when my test started, I want to know how long it has been running).

About last two it would be recommended to do so. It could simplify calls.

About implementing that: I had that in mind, but currently I hadn't enough time to dig in rustc to find where and what so I what I loved in Ruby and power it up to work nice with Rust and provide functionality I need in my work. So it is very opinionated.

hauleth commented Sep 17, 2015

About first point. I assumed that suite object will be the beginning of tests so additional object would be unneeded.

About second. It is completely possible to do so. I've just added that to be sure that we are running valid binary.

About third. Of course it could be solution, but I don't see reason behind providing timestamp of beginning and ending of test, duration will do as well (maybe it's only me, but I doesn't care when my test started, I want to know how long it has been running).

About last two it would be recommended to do so. It could simplify calls.

About implementing that: I had that in mind, but currently I hadn't enough time to dig in rustc to find where and what so I what I loved in Ruby and power it up to work nice with Rust and provide functionality I need in my work. So it is very opinionated.

@jgraham

This comment has been minimized.

Show comment
Hide comment
@jgraham

jgraham Sep 17, 2015

For comparison we designed a similar format for logging browser test results at Mozilla. You probably don't need everything in that format, and may want some different things, but it's another point in the possible design space to consider. One of the driving considerations there was when you have tests from a third party source (e.g. because you are implementing some specification) there can be tests that you expect to fail but which you don't wish to edit. If that isn't a case you care about here you are likely to make differnt design choices.

jgraham commented Sep 17, 2015

For comparison we designed a similar format for logging browser test results at Mozilla. You probably don't need everything in that format, and may want some different things, but it's another point in the possible design space to consider. One of the driving considerations there was when you have tests from a third party source (e.g. because you are implementing some specification) there can be tests that you expect to fail but which you don't wish to edit. If that isn't a case you care about here you are likely to make differnt design choices.

@cmr

This comment has been minimized.

Show comment
Hide comment
@cmr

cmr Sep 17, 2015

Member

In general I'm a big fan of anything TAP based, and this JSON encoding removes most of problems with TAP. Human-readable output should definitely be the default, though. Perhaps an environment variable such as RUST_MACHINE_OUTPUT or a universally understood command-line to enable machine output.

These tools are primarily for people, not machines. Changing Cargo to consume the machine-readable format is not an issue.

Looking forward to seeing something like this develop!

Member

cmr commented Sep 17, 2015

In general I'm a big fan of anything TAP based, and this JSON encoding removes most of problems with TAP. Human-readable output should definitely be the default, though. Perhaps an environment variable such as RUST_MACHINE_OUTPUT or a universally understood command-line to enable machine output.

These tools are primarily for people, not machines. Changing Cargo to consume the machine-readable format is not an issue.

Looking forward to seeing something like this develop!

@diwic

This comment has been minimized.

Show comment
Hide comment
@diwic

diwic Sep 18, 2015

cargo test should continue to have human readable format, and machine output could be available with cargo test --json or so.

diwic commented Sep 18, 2015

cargo test should continue to have human readable format, and machine output could be available with cargo test --json or so.

@erickt

This comment has been minimized.

Show comment
Hide comment
@erickt

erickt Sep 19, 2015

This is a great start! I think it could also be useful to capture the rustc command line arguments so that we can observe optimization levels, enabled feature flags, and etc.

For reference, a bunch of other test libraries produce the xUnit XML File Format.

erickt commented Sep 19, 2015

This is a great start! I think it could also be useful to capture the rustc command line arguments so that we can observe optimization levels, enabled feature flags, and etc.

For reference, a bunch of other test libraries produce the xUnit XML File Format.

@tomjakubowski

This comment has been minimized.

Show comment
Hide comment
@tomjakubowski

tomjakubowski Sep 19, 2015

Contributor

In general I'm a big fan of anything TAP based, and this JSON encoding removes most of problems with TAP. Human-readable output should definitely be the default, though. Perhaps an environment variable such as RUST_MACHINE_OUTPUT or a universally understood command-line to enable machine output.

I don't see why it should be an environment variable; environment variables are much harder to reason about than command line flags (it's a bit like dynamic scope vs. lexical scope), and it's not like we need to adjust some behavior deep within a stack of executing programs.

A flag on the compiled test runner (--format=tap, bikeshedding welcome) combined with a flag on cargo test itself seems like the right fit to me.

Contributor

tomjakubowski commented Sep 19, 2015

In general I'm a big fan of anything TAP based, and this JSON encoding removes most of problems with TAP. Human-readable output should definitely be the default, though. Perhaps an environment variable such as RUST_MACHINE_OUTPUT or a universally understood command-line to enable machine output.

I don't see why it should be an environment variable; environment variables are much harder to reason about than command line flags (it's a bit like dynamic scope vs. lexical scope), and it's not like we need to adjust some behavior deep within a stack of executing programs.

A flag on the compiled test runner (--format=tap, bikeshedding welcome) combined with a flag on cargo test itself seems like the right fit to me.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Sep 21, 2015

Member

@hauleth

About first point. I assumed that suite object will be the beginning of tests so additional object would be unneeded.

Ah yeah I meant in addition to the suite object there'd also be an object for "this test has started to run". That gives consumers a notion of what tests are currently running, e.g. those you've seen start records for and haven't seen end records for. Additionally if a consumer of the output wants to provide a progress bar this would be useful information perhaps.

About third. Of course it could be solution, but I don't see reason behind providing timestamp of beginning and ending of test, duration will do as well (maybe it's only me, but I doesn't care when my test started, I want to know how long it has been running).

Ah just in the sense that we don't have a lot of timestamp support in the standard library just yet, so it'd be difficult to implement this in-tree (e.g. whereas it'd be pretty easy to do it out-of-tree), so for ease of implementation we may want to avoid this for now.

About implementing that: I had that in mind, but currently I hadn't enough time to dig in rustc to find where and what so I what I loved in Ruby and power it up to work nice with Rust and provide functionality I need in my work. So it is very opinionated.

Ah yeah no worries! You're not on the hook to implement this or anything like that, just some musings from me!

Member

alexcrichton commented Sep 21, 2015

@hauleth

About first point. I assumed that suite object will be the beginning of tests so additional object would be unneeded.

Ah yeah I meant in addition to the suite object there'd also be an object for "this test has started to run". That gives consumers a notion of what tests are currently running, e.g. those you've seen start records for and haven't seen end records for. Additionally if a consumer of the output wants to provide a progress bar this would be useful information perhaps.

About third. Of course it could be solution, but I don't see reason behind providing timestamp of beginning and ending of test, duration will do as well (maybe it's only me, but I doesn't care when my test started, I want to know how long it has been running).

Ah just in the sense that we don't have a lot of timestamp support in the standard library just yet, so it'd be difficult to implement this in-tree (e.g. whereas it'd be pretty easy to do it out-of-tree), so for ease of implementation we may want to avoid this for now.

About implementing that: I had that in mind, but currently I hadn't enough time to dig in rustc to find where and what so I what I loved in Ruby and power it up to work nice with Rust and provide functionality I need in my work. So it is very opinionated.

Ah yeah no worries! You're not on the hook to implement this or anything like that, just some musings from me!

@withoutboats withoutboats referenced this pull request Sep 26, 2015

Closed

i10n #1292

@yoshuawuyts

This comment has been minimized.

Show comment
Hide comment
@yoshuawuyts

yoshuawuyts Sep 29, 2015

What are the limitations on TAP that warrant the invention of another format? To my understanding this format would be TAP-J based, meaning it's something new.

Though being slightly inconvenient to parse, I think that using TAP yields more benefits in terms of tooling, adoption, familiarity and interoperability than creating something new.

It's quite annoying if every language comes with their own custom way of formatting test output. E.g. I don't think Rust should fall into the same trap Golang fell into with their test output:

$ go test -v
=== RUN TestPrintSomething
Say hi
--- PASS: TestPrintSomething (0.00 seconds)
    v_test.go:10: Say bye
PASS
ok      so/v    0.002s

yoshuawuyts commented Sep 29, 2015

What are the limitations on TAP that warrant the invention of another format? To my understanding this format would be TAP-J based, meaning it's something new.

Though being slightly inconvenient to parse, I think that using TAP yields more benefits in terms of tooling, adoption, familiarity and interoperability than creating something new.

It's quite annoying if every language comes with their own custom way of formatting test output. E.g. I don't think Rust should fall into the same trap Golang fell into with their test output:

$ go test -v
=== RUN TestPrintSomething
Say hi
--- PASS: TestPrintSomething (0.00 seconds)
    v_test.go:10: Say bye
PASS
ok      so/v    0.002s
@hauleth

This comment has been minimized.

Show comment
Hide comment
@hauleth

hauleth Sep 29, 2015

@yoshuawuyts the only problem with TAP is that this is primitive format, that doesn't provide way to message a lot of things (like type of test, performance, etc.) in uniform way. RusTAP is created to fit into Rust testing framework as it has some quirks that original TAP-J doesn't cover (i.e. benches).

hauleth commented Sep 29, 2015

@yoshuawuyts the only problem with TAP is that this is primitive format, that doesn't provide way to message a lot of things (like type of test, performance, etc.) in uniform way. RusTAP is created to fit into Rust testing framework as it has some quirks that original TAP-J doesn't cover (i.e. benches).

@IanConnolly

This comment has been minimized.

Show comment
Hide comment
@IanConnolly

IanConnolly Oct 27, 2015

I'd be happy to help with moving this along, as I've been wanting this myself recently.

IanConnolly commented Oct 27, 2015

I'd be happy to help with moving this along, as I've been wanting this myself recently.

@hauleth hauleth referenced this pull request Oct 28, 2015

Open

Publish results somewhere #7

@brson

This comment has been minimized.

Show comment
Hide comment
@brson

brson Nov 4, 2015

Contributor

My quick comments:

  • Never heard of TAP-J. Need to consider it.
  • Doesn't consider cargo integration
    • people do testing through cargo
    • cargo has other types of tests and I want to be to analyze their results
  • Printing to stdout has problems
    • cargo also prints to stdout
    • test cases can print to stdout when the test runner is not capturing
Contributor

brson commented Nov 4, 2015

My quick comments:

  • Never heard of TAP-J. Need to consider it.
  • Doesn't consider cargo integration
    • people do testing through cargo
    • cargo has other types of tests and I want to be to analyze their results
  • Printing to stdout has problems
    • cargo also prints to stdout
    • test cases can print to stdout when the test runner is not capturing
@hauleth

This comment has been minimized.

Show comment
Hide comment
@hauleth

hauleth Jan 8, 2016

Closing as I want to rewrite it for TAP13 protocol (wider usage, and already existing tools).

hauleth commented Jan 8, 2016

Closing as I want to rewrite it for TAP13 protocol (wider usage, and already existing tools).

@hauleth hauleth closed this Jan 8, 2016

@tj

This comment has been minimized.

Show comment
Hide comment
@tj

tj Jan 9, 2016

IMO the problem with Go's is that you can't replace the output generation. I know the Go team has an issue open for considering JSON output, but to me It would be so much nicer if you could just:

import (
  _ "my/fancy/test/output" // register a replacement hook
)

Then you don't have to fight about JSON, TAP, etc, just use whatever you like. Maybe Rust could go that route instead of a base format?

tj commented Jan 9, 2016

IMO the problem with Go's is that you can't replace the output generation. I know the Go team has an issue open for considering JSON output, but to me It would be so much nicer if you could just:

import (
  _ "my/fancy/test/output" // register a replacement hook
)

Then you don't have to fight about JSON, TAP, etc, just use whatever you like. Maybe Rust could go that route instead of a base format?

@andrew-d

This comment has been minimized.

Show comment
Hide comment
@andrew-d

andrew-d Jan 9, 2016

👍 for that - being able to write a "test output plugin" (or whatever it ends up being called) seems like the best way to handle this.

andrew-d commented Jan 9, 2016

👍 for that - being able to write a "test output plugin" (or whatever it ends up being called) seems like the best way to handle this.

@jonastepe

This comment has been minimized.

Show comment
Hide comment
@jonastepe

jonastepe Jan 9, 2016

Then there still has to be a sensible default or fallback for those that do not provide such a plugin.

jonastepe commented Jan 9, 2016

Then there still has to be a sensible default or fallback for those that do not provide such a plugin.

@jonastepe

This comment has been minimized.

Show comment
Hide comment
@jonastepe

jonastepe Jan 9, 2016

However, I am also in favor of this plugin approach.

jonastepe commented Jan 9, 2016

However, I am also in favor of this plugin approach.

@hauleth

This comment has been minimized.

Show comment
Hide comment
@hauleth

hauleth Jan 9, 2016

I am in the progress of rewriting libtest to allow writing reporters in sensible way. About TAP - TAP version 13 allow YAML test description after each test which seems IMHO good approach. As YAML is superset of JSON it solves both problems.

Łukasz Jan Niemier

Dnia 9 sty 2016 o godz. 09:55 Jonas Tepe notifications@github.com napisał(a):

However, I am also in favor of this plugin approach.


Reply to this email directly or view it on GitHub.

hauleth commented Jan 9, 2016

I am in the progress of rewriting libtest to allow writing reporters in sensible way. About TAP - TAP version 13 allow YAML test description after each test which seems IMHO good approach. As YAML is superset of JSON it solves both problems.

Łukasz Jan Niemier

Dnia 9 sty 2016 o godz. 09:55 Jonas Tepe notifications@github.com napisał(a):

However, I am also in favor of this plugin approach.


Reply to this email directly or view it on GitHub.

@kamalmarhubi

This comment has been minimized.

Show comment
Hide comment
@kamalmarhubi

kamalmarhubi Jan 24, 2016

Contributor

@hauleth

Closing as I want to rewrite it for TAP13 protocol (wider usage, and already existing tools).

Do you need any help with this? I'd very much like testing to get better in Rust!

Contributor

kamalmarhubi commented Jan 24, 2016

@hauleth

Closing as I want to rewrite it for TAP13 protocol (wider usage, and already existing tools).

Do you need any help with this? I'd very much like testing to get better in Rust!

@milgner

This comment has been minimized.

Show comment
Hide comment
@milgner

milgner Oct 9, 2016

Is this still being worked on? I think Rust would greatly benefit from this as it makes integration of tests into CI systems much more elegant. TAP 13 seems like a good format, too, even if it doesn't explicitly contain a differentiation between regular tests and benchmarks. But I guess a # benchmark directive could be added to address this?

milgner commented Oct 9, 2016

Is this still being worked on? I think Rust would greatly benefit from this as it makes integration of tests into CI systems much more elegant. TAP 13 seems like a good format, too, even if it doesn't explicitly contain a differentiation between regular tests and benchmarks. But I guess a # benchmark directive could be added to address this?

@denniscollective

This comment has been minimized.

Show comment
Hide comment
@denniscollective

denniscollective Sep 3, 2017

Can you reopen this?

denniscollective commented Sep 3, 2017

Can you reopen this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment