Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON Reporter for more detailed workunit stats #7392

Merged
merged 9 commits into from Mar 22, 2019

Conversation

Projects
None yet
5 participants
@codealchemy
Copy link
Contributor

commented Mar 15, 2019

Problem

Currently there is no single source of detailed structured data for builds. RunTracker has the ability to post certain stats to external URLs or a json on the user's machine, but these stats do not include much information of each workunit. HtmlReporter has more detail, but not structured in a way that could be easily consumed by others.

Creating such a single source for build data could be leveraged in different use cases:

  • More data that the RunTracker could post externally (that would survive a clean-all)
    • If this supported sending deltas, the build could be viewed remotely in near-real time
  • Improve other reporters by having a single json that could be polled for deltas / progress

Solution

This adds a new JsonReporter that collects the outputs from each workunit and writes them to (currently) a json file (similar to how the HtmlReporter writes outputs to an html dir).

There are several more steps to plug this in fully for reporting these stats to an external URL, though that work could be broken up over subsequent PRs, specifically:

  • How to incorporate a JsonReporter into the current stats that are reported (perhaps by a versioning flag on the run tracker - which leads to the question of how tightly coupled to the RunTracker this should be).

Additionally, things that aren't solved here, but could be done as next steps:

  • Compressing the json for a completed build, since it's conceivable that those reports could be quite large.
  • Sending deltas during the build instead of a complete file at the end.

@codealchemy codealchemy marked this pull request as ready for review Mar 15, 2019


additional_data = {
'outcome': workunit.outcome_string(workunit.outcome()),
'end_time': workunit.end_time,

This comment has been minimized.

Copy link
@codealchemy

codealchemy Mar 15, 2019

Author Contributor

Related - #7389

Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved tests/python/pants_test/reporting/test_json_reporter.py Outdated
@benjyw
Copy link
Contributor

left a comment

Great work - I had a few nits, and a suggestion for the test. See inline.

Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved tests/python/pants_test/reporting/test_json_reporter.py Outdated
@benjyw
Copy link
Contributor

left a comment

Suggested some simplification to the report file handling - see inline!

Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated

codealchemy added some commits Mar 15, 2019

@codealchemy codealchemy force-pushed the codealchemy:run-stats-update branch from aa71530 to 55bbe98 Mar 18, 2019

Wrap json output in str to mitigate py2 issue
Exception message: write() argument 1 must be unicode, not str
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated
"""Implementation of Reporter callback."""

with open(self.report_path, 'w+') as fp:
fp.write(str(json.dumps(

This comment has been minimized.

Copy link
@benjyw

benjyw Mar 19, 2019

Contributor

OK, so there's a lot going on here, hidden behind this str hack, which is a bit voodoo and not fully correct (or rather, only correct because of certain facts about how json.dumps works).

Here's the deal:

Python2 is very loose about converting character strings to byte arrays and back as needed, and it uses the interpreter's default encoding to do so.

So in Python2 you can write bytes to a file opened in text mode or a string to a file opened in binary mode, and it works.

But Python3 is stricter: you must write a string to a file opened in text mode, and bytes to a file opened in binary mode.

(best way to get a feel for all this is to try it in a REPL).

Now, json.dumps() always returns a byte array (except in some weird circumstances on Python2, which we can ignore). This is because JSON itself supports encoding all non-ASCII characters using \uXXXX sequences. So any valid JSON can always be represented in pure ASCII, so there is no need to worry about encodings.

When you cast it to a str, on Python2 this is a no-op (dumps already returned a byte array), and on Python3 this turns the result of dumps from a byte array to a character string using whatever the interpreter default encoding is, and then fp.write() immediately decodes it using the same default encoding. This is obviously wonky.

Now, in practice all these encodings and decodings are no-ops, because dumps returned ASCII, but it's still confusing and unnecessary. It's not obvious to the casual reader what is going on, or why it works out to be correct in practice.

Instead of casting to str, you can just open the file in binary mode: wb. Then you can write the result of dumps directly.

This comment has been minimized.

Copy link
@codealchemy

codealchemy Mar 19, 2019

Author Contributor

Got it - thanks for the explanation. I still ran into issues in binary mode (on Python3) using wb (got a TypeError: a bytes-like object is required, not 'str') - looking back through RunTracker there appears to be two ways of resolving that issue -

params = cls._json_dump_options(stats)
if PY2:
params = params.decode('utf-8')
try:
with open(file_name, 'w') as f:
f.write(params)

mode = 'w' if PY3 else 'wb'
safe_file_dump(stats_file, self._json_dump_options(stats), mode=mode)

Both work here, though latter appears to be what was added most recently and it feels like that might be preferred. Adding safe_file_dump for writing the json report file looks to clear up the test failures related to clean-all as well.

Show resolved Hide resolved src/python/pants/reporting/json_reporter.py Outdated

codealchemy added some commits Mar 19, 2019

A few more tweaks / updates
File handling to handle py2 / 3, rewording comment for clarity, and add property for the report path to mitigate concerns of it being overwritten
@benjyw

benjyw approved these changes Mar 19, 2019

@benjyw benjyw requested review from illicitonion and Eric-Arellano Mar 19, 2019

@Eric-Arellano
Copy link
Contributor

left a comment

Cool! Some changes requested for Py3 support.


from future.moves.itertools import zip_longest
from future.utils import PY3
from six import string_types

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Mar 19, 2019

Contributor

We don't want to use this line for two reasons:

  1. We should never use Six, as Future is a better and safer version. For example, six.StringIO behaves differently between Py2 vs Py3, which is bad. The only exception is for our checker code, as it cannot use Future due to being a universal wheel.
  2. We should never use string_types, as it contradicts Py3 goal of splitting unicode from bytes. Instead, try to clarify the API, and if we really do need to support both types, use the idiom isinstance(my_stringy_thing, (str, bytes)).
def _render_messages(self, *msg_elements):
def _message_details(element):
if isinstance(element, string_types):
element = [element]

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Mar 19, 2019

Contributor

This part is a little confusing to me, specifically because I don't know what the type of *msg_elements might be? Is it like this?

JsonPrimitive = Union[str, bytes, int, float]
JsonElement = Union[JsonPrimitive, Dict[str, JsonElement], List[JsonElement]]

We can't define this in type hints yet, but think there are other ways to express this. For example, add some docstring, or add more ifinstance checks where relevant, such as

if ininstance(element, str):
  element = [element]
elif not isinstance(element, (int, list, dict):
  raise ValueError("Invalid element.")

This comment has been minimized.

Copy link
@codealchemy

codealchemy Mar 19, 2019

Author Contributor

Got it, this was another part that came over from the HtmlReporter - and it seems how we handle message elements in do_handle_log in similar (in that they use six.string_types) across all reporters so far.

Per report.py, it looks like this should either be a tuple or a str:

def log(self, workunit, level, *msg_elements):
"""Log a message.
Each element of msg_elements is either a message string or a (message, detail) pair.
"""
with self._lock:
for reporter in self._reporters.values():
reporter.handle_log(workunit, level, *msg_elements)

In which case it seems like it would be simpler to invert that logic and make element a tuple if it's not already and drop string_types entirely? Something like:

if not isinstance(element, tuple):
  element = (element,)

Not sure if that might risk running into other issues though... thoughts?

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Mar 19, 2019

Contributor

Hm, if this is how we handle reporting in every other reporter, then I won't block and it would be outside the scope to fix here.

Instead, I added a task to #6071 for us to clean this up.

You can keep everything as is!

@illicitonion
Copy link
Contributor

left a comment

Looks great, thanks!

I'd love a quick integration test if possible, just to show that all the pieces actually work in real life :)

Thanks!

@codealchemy codealchemy force-pushed the codealchemy:run-stats-update branch from 25e5e3e to 7a58632 Mar 22, 2019

@codealchemy codealchemy force-pushed the codealchemy:run-stats-update branch from 7a58632 to 9ac0583 Mar 22, 2019

@illicitonion illicitonion merged commit b963e89 into pantsbuild:master Mar 22, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@codealchemy codealchemy deleted the codealchemy:run-stats-update branch Mar 22, 2019

jsirois added a commit that referenced this pull request Apr 2, 2019

Add more pants run information to the json reporter (#7474)
Followup to #7392 - this adds `run_info` and `pantsd_stats` to the json
report and brings the data contained in the json reporter in line with
what's returned in the current run tracker `stats` (the difference being
the shape of workunit info) in preparation for introducing a
configurable flag for selecting one or the other.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.