Include API that will store target info in run_tracker #4561

Merged
merged 16 commits into from May 13, 2017

Conversation

Projects
None yet
4 participants
@dotordogh
Contributor

dotordogh commented May 7, 2017

Problem

We would like to collect stats about the targets and tests being run so we would like to add a way to track this info.

Solution

The change included some refactoring to ease code re-use, a method to parse all the test information from the xml files, and two additional methods to collect target info and test info.

Result

The changes shouldn't affect the end-user behavior. This change is for future development.

@stuhood

Thanks Dorothy!

src/python/pants/goal/run_tracker.py
+ else:
+ self._target_data.update({target: {scope: {key: val}}})
+
+ def report_test_info(self, scope, target, key, test_info):

This comment has been minimized.

@stuhood

stuhood May 8, 2017

Member

I don't think RunTracker should gain awareness of tests... given that this method is called from junit and pytest, it seems like maybe pants.task.testrunner_task_mixin.TestRunnerTaskMixin would be a better place for it?

@stuhood

stuhood May 8, 2017

Member

I don't think RunTracker should gain awareness of tests... given that this method is called from junit and pytest, it seems like maybe pants.task.testrunner_task_mixin.TestRunnerTaskMixin would be a better place for it?

+ def _get_test_info_from_junitxml(self, junitxml):
+ tests_in_path = []
+
+ SUCCESS = 0

This comment has been minimized.

@stuhood

stuhood May 8, 2017

Member

These magic values should be strings instead, and should probably move up to TestRunnerTaskMixin.

SUCCESS = 'success'
FAILURE = 'failure'
ERROR = 'error'

The loss in readability of magic values in the JSON outweighs any possible efficiency gains of using integers. If need be, internal systems can map from string constants to custom values.

@stuhood

stuhood May 8, 2017

Member

These magic values should be strings instead, and should probably move up to TestRunnerTaskMixin.

SUCCESS = 'success'
FAILURE = 'failure'
ERROR = 'error'

The loss in readability of magic values in the JSON outweighs any possible efficiency gains of using integers. If need be, internal systems can map from string constants to custom values.

+ test_error = testcase.getElementsByTagName('error')
+ test_fail = testcase.getElementsByTagName('failure')
+ test_skip = testcase.getElementsByTagName('skipped')
+

This comment has been minimized.

@stuhood

stuhood May 8, 2017

Member

There is a fair amount of duplication between this and the python parsing.

My hope would be that parsing out the "lowest common denominator" of data, and skipping language-specific information (like "classname") would allow for one implementation of this in TestRunnerTaskMixin, with custom-code-per-language only coming in later.

Does that sound feasible?

@stuhood

stuhood May 8, 2017

Member

There is a fair amount of duplication between this and the python parsing.

My hope would be that parsing out the "lowest common denominator" of data, and skipping language-specific information (like "classname") would allow for one implementation of this in TestRunnerTaskMixin, with custom-code-per-language only coming in later.

Does that sound feasible?

This comment has been minimized.

@dotordogh

dotordogh May 9, 2017

Contributor

There is also the option of adding all the xml parsing to a util file since there is also some overlap with parsing the failed tests' targets for pytests and junit tests

@dotordogh

dotordogh May 9, 2017

Contributor

There is also the option of adding all the xml parsing to a util file since there is also some overlap with parsing the failed tests' targets for pytests and junit tests

src/python/pants/goal/run_tracker.py
+ new_key = [str(key_item) for key_item in key]
+ key = '#'.join(new_key)
+
+ if target in self._target_data:

This comment has been minimized.

@stuhood

stuhood May 8, 2017

Member

A somewhat less error prone pattern for "create if not exists" is to do:

target_data = self._target_data.get(target, None)
if target_data is None:
  self._target_data[target] = target_data = {}
target_data[scope] ... # etc

Also possible to use setdefault or DefaultDict... DefaultDict might be the best option here.

@stuhood

stuhood May 8, 2017

Member

A somewhat less error prone pattern for "create if not exists" is to do:

target_data = self._target_data.get(target, None)
if target_data is None:
  self._target_data[target] = target_data = {}
target_data[scope] ... # etc

Also possible to use setdefault or DefaultDict... DefaultDict might be the best option here.

src/python/pants/goal/run_tracker.py
+ # Stringify list of keys to make one key
+ if not isinstance(key, basestring):
+ new_key = [str(key_item) for key_item in key]
+ key = '#'.join(new_key)

This comment has been minimized.

@stuhood

stuhood May 8, 2017

Member

This should trigger nesting rather than suffixing like this. Will likely need to do it recursively.

@stuhood

stuhood May 8, 2017

Member

This should trigger nesting rather than suffixing like this. Will likely need to do it recursively.

@dotordogh dotordogh changed the title from [WIP] Include API that will store target info in run_tracker to Include API that will store target info in run_tracker May 9, 2017

@stuhood stuhood requested review from benjyw and mateor May 9, 2017

@stuhood

stuhood approved these changes May 9, 2017

Awesome, thanks Dorothy! A few comments.

+ Add test information to target information
+ :param string scope: The running scope
+ :param Target target: The target that we want to store the test info under
+ :param list of strings or string key: The key for the info being stored

This comment has been minimized.

@stuhood

stuhood May 9, 2017

Member

In the context of tests, I think explicitly changing the key parameter to a string test_name is fine... the key-list bit is primarily important for generality in the RunTracker API, but less so here.

@stuhood

stuhood May 9, 2017

Member

In the context of tests, I think explicitly changing the key parameter to a string test_name is fine... the key-list bit is primarily important for generality in the RunTracker API, but less so here.

+ self._xml_path = xml_path
+ self._cause = cause
+
+ @property

This comment has been minimized.

@stuhood

stuhood May 9, 2017

Member

I don't think it's necessary to defensively make these @propertys... if you name the underlying fields self.xml_path and self.cause, it has the same effect, and you can convert it into an @property later if you need to.

@stuhood

stuhood May 9, 2017

Member

I don't think it's necessary to defensively make these @propertys... if you name the underlying fields self.xml_path and self.cause, it has the same effect, and you can convert it into an @property later if you need to.

src/python/pants/goal/run_tracker.py
+
+ def report_target_info(self, scope, target, key, val):
+ """
+ Add target information to run_info under target_data

This comment has been minimized.

@stuhood

stuhood May 9, 2017

Member

Style nit: this one-liner "slug" should be on the previous line, end in a period, and have a blank line after it.

@stuhood

stuhood May 9, 2017

Member

Style nit: this one-liner "slug" should be on the previous line, end in a period, and have a blank line after it.

This comment has been minimized.

@stuhood

stuhood May 9, 2017

Member

Oh, also: can you mark this method ":API: public" to signal that it is a pants public API? See http://www.pantsbuild.org/deprecation_policy.html

@stuhood

stuhood May 9, 2017

Member

Oh, also: can you mark this method ":API: public" to signal that it is a pants public API? See http://www.pantsbuild.org/deprecation_policy.html

src/python/pants/goal/run_tracker.py
+ if target_data is None:
+ self._target_data.update({target: {scope: val_to_store}})
+ else:
+ scope_data = self._target_data[target].get(scope, None)

This comment has been minimized.

@stuhood

stuhood May 9, 2017

Member

This reduces a bit further. Concretely, in this case:

scope_data = self._target_data[target].get(scope, None)
if scope_data is None:
  self._target_data[target][scope] = scope_data = {}
scope_data.update(val_to_store)
@stuhood

stuhood May 9, 2017

Member

This reduces a bit further. Concretely, in this case:

scope_data = self._target_data[target].get(scope, None)
if scope_data is None:
  self._target_data[target][scope] = scope_data = {}
scope_data.update(val_to_store)

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

This has different behavior than the current code - it won't stomp on all previous data from all scopes for a target, as the current code will, but it will stomp on any data for the (target, scope) pair that shares a common key prefix. Is that what we want?

@benjyw

benjyw May 12, 2017

Contributor

This has different behavior than the current code - it won't stomp on all previous data from all scopes for a target, as the current code will, but it will stomp on any data for the (target, scope) pair that shares a common key prefix. Is that what we want?

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

Also, scope_data = self._target_data[target] will crash if there is no existing data for target, so self._target_data would have to be a defaultdict (which is probably a good idea).

@benjyw

benjyw May 12, 2017

Contributor

Also, scope_data = self._target_data[target] will crash if there is no existing data for target, so self._target_data would have to be a defaultdict (which is probably a good idea).

src/python/pants/goal/run_tracker.py
+ :param list of keys key: The key for the info being stored
+ :param dict or string val: The value of the info being stored
+ """
+ def insert_key_value(keys, val_item, index):

This comment has been minimized.

@stuhood

stuhood May 9, 2017

Member

A comment explaining this function would be good...

# Recursively constructs a nested dict with keys made up of the components of the given key list.

...or maybe that explanation should go in the docstring for report_target_info?

@stuhood

stuhood May 9, 2017

Member

A comment explaining this function would be good...

# Recursively constructs a nested dict with keys made up of the components of the given key list.

...or maybe that explanation should go in the docstring for report_target_info?

+ if _JUNIT_XML_MATCHER.match(name):
+ parse_junit_xml_file(os.path.join(junit_xml_path, name))
+ else:
+ parse_junit_xml_file(junit_xml_path)

This comment has been minimized.

@jennazz

jennazz May 9, 2017

You don't need to check if it matches the _JUNIT_XML_MATCHER here?

@jennazz

jennazz May 9, 2017

You don't need to check if it matches the _JUNIT_XML_MATCHER here?

src/python/pants/goal/run_tracker.py
+ Add target information to run_info under target_data
+ :param string scope: The running scope
+ :param string target: The target that we want to store info for
+ :param string or list of items to make into a key key: The key for the info being stored

This comment has been minimized.

@jennazz

jennazz May 9, 2017

'key' is typed twice here.

@jennazz

jennazz May 9, 2017

'key' is typed twice here.

+ if _XML_MATCHER.match(name):
+ parse_xml_file(os.path.join(xml_path, name))
+ else:
+ parse_xml_file(xml_path)

This comment has been minimized.

@jennazz

jennazz May 9, 2017

You don't need to check if it matches _XML_MATCHER here?

@jennazz

jennazz May 9, 2017

You don't need to check if it matches _XML_MATCHER here?

This comment has been minimized.

@dotordogh

dotordogh May 10, 2017

Contributor

I think we could but I don't think it is necessary since the path is generated. @stuhood, @benjyw, and @mateor, thoughts? I could add it to junit xml parser and pytest runner.

@dotordogh

dotordogh May 10, 2017

Contributor

I think we could but I don't think it is necessary since the path is generated. @stuhood, @benjyw, and @mateor, thoughts? I could add it to junit xml parser and pytest runner.

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

Correct, I don't think you need to match it. The matcher is useful when given only a dir, because how else would you know which files in the dir are relevant? But if someone gives you a file, they already know what they want you to do with it, no matter what the name is.

@benjyw

benjyw May 12, 2017

Contributor

Correct, I don't think you need to match it. The matcher is useful when given only a dir, because how else would you know which files in the dir are relevant? But if someone gives you a file, they already know what they want you to do with it, no matter what the name is.

@stuhood stuhood added this to the 1.3.0 milestone May 10, 2017

@benjyw

Hi @dotordogh - the code looks good, as far as I can tell. What is lacking is appropriate documentation. If I'm having trouble figuring out what all this does, and how it works, then the casual reader will too. Admittedly, I'm not putting a lot of energy into figuring it out, but then I shouldn't really have to...

Once this is appropriately documented, I'll take another look at the logic itself. :)

src/python/pants/goal/run_tracker.py
+ def report_target_info(self, scope, target, key, val):
+ """Add target information to run_info under target_data.
+
+ Will Recursively construct a nested dict with the keys provided

This comment has been minimized.

@benjyw

benjyw May 10, 2017

Contributor

Nit: end the sentence with a period. Also, you mention "keys provided", but the argument is named "key", singular. Please clarify.

@benjyw

benjyw May 10, 2017

Contributor

Nit: end the sentence with a period. Also, you mention "keys provided", but the argument is named "key", singular. Please clarify.

src/python/pants/goal/run_tracker.py
@@ -144,6 +144,8 @@ def __init__(self, *args, **kwargs):
self._aborted = False
+ self._target_data = {}

This comment has been minimized.

@benjyw

benjyw May 10, 2017

Contributor

Document what the keys and values in this dict are expected to be?

@benjyw

benjyw May 10, 2017

Contributor

Document what the keys and values in this dict are expected to be?

src/python/pants/base/run_info.py
@@ -66,7 +67,10 @@ def add_infos(self, *keyvals, **kwargs):
if k in self._info:
raise ValueError('info key "{}" already exists with value {}. '
'Cannot add it again with value {}.'.format(k, self._info[k], v))
- self._info[k] = v
+ if k == 'target_data':

This comment has been minimized.

@benjyw

benjyw May 10, 2017

Contributor

Might be nicer to keep RunInfo simple, and move this to the caller, which already knows when to do this eval, instead of having to know about this magic key string here?

@benjyw

benjyw May 10, 2017

Contributor

Might be nicer to keep RunInfo simple, and move this to the caller, which already knows when to do this eval, instead of having to know about this magic key string here?

src/python/pants/goal/run_tracker.py
+
+ Will Recursively construct a nested dict with the keys provided
+
+ :param string scope: The running scope

This comment has been minimized.

@benjyw

benjyw May 10, 2017

Contributor

Nit: "running scope" isn't really a term of art. How about "The scope we're reporting information for."?

@benjyw

benjyw May 10, 2017

Contributor

Nit: "running scope" isn't really a term of art. How about "The scope we're reporting information for."?

src/python/pants/goal/run_tracker.py
+
+ :API: public
+ """
+ def insert_key_value(keys, val_item, index):

This comment has been minimized.

@benjyw

benjyw May 10, 2017

Contributor

This is quite confusing. It's generally not clear what this helper function does. In particular, the first param is called "keys", and you index into it, implying that it's a list, but below you pass into it a value named "key", singular. Copious comments explaining what the types of the various arguments are, and what this function does with them, and why, would be very helpful...

@benjyw

benjyw May 10, 2017

Contributor

This is quite confusing. It's generally not clear what this helper function does. In particular, the first param is called "keys", and you index into it, implying that it's a list, but below you pass into it a value named "key", singular. Copious comments explaining what the types of the various arguments are, and what this function does with them, and why, would be very helpful...

+
+ @staticmethod
+ def parse_test_info(xml_path, error_handler, additional_testcase_attributes=None):
+ """Parses the junit file for info needed about each test

This comment has been minimized.

@benjyw

benjyw May 10, 2017

Contributor

Nit: Period.

@benjyw

benjyw May 10, 2017

Contributor

Nit: Period.

@@ -63,6 +66,88 @@ def execute(self):
all_targets = self._get_targets()
self._execute(all_targets)
+ def report_test_info(self, scope, target, test_name, test_info):
+ """Add test information to target information

This comment has been minimized.

@benjyw

benjyw May 10, 2017

Contributor

Nit: Period.

@benjyw

benjyw May 10, 2017

Contributor

Nit: Period.

@@ -55,6 +55,9 @@ def add_misses(self, cache_name, targets, causes): pass
artifact_cache_stats = DummyArtifactCacheStats()
+ def report_target_info(self, scope, target, key, val): pass

This comment has been minimized.

@benjyw

benjyw May 10, 2017

Contributor

?

@benjyw

benjyw May 10, 2017

Contributor

?

This comment has been minimized.

@stuhood

stuhood May 10, 2017

Member

These are the mocked out versions of these methods for the test context.

@stuhood

stuhood May 10, 2017

Member

These are the mocked out versions of these methods for the test context.

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

👍 But s/key/keys/ for consistency.

@benjyw

benjyw May 12, 2017

Contributor

👍 But s/key/keys/ for consistency.

+ self.context.run_tracker.report_target_info(scope, address, [test_name], test_info)
+
+ @staticmethod
+ def parse_test_info(xml_path, error_handler, additional_testcase_attributes=None):

This comment has been minimized.

@benjyw

benjyw May 10, 2017

Contributor

Don't we already have two functions to parse the junit-style xml report (one for junit and one for pytest)? Can they be merged? I'm not sure we want three of them...

@benjyw

benjyw May 10, 2017

Contributor

Don't we already have two functions to parse the junit-style xml report (one for junit and one for pytest)? Can they be merged? I'm not sure we want three of them...

This comment has been minimized.

@dotordogh

dotordogh May 10, 2017

Contributor

I believe the two you are thinking of parse failed targets for each failed test and not the actual test information. See here and here.

@dotordogh

dotordogh May 10, 2017

Contributor

I believe the two you are thinking of parse failed targets for each failed test and not the actual test information. See here and here.

@stuhood

This comment has been minimized.

Show comment
Hide comment
@stuhood

stuhood May 10, 2017

Member

Thanks @benjyw! Regarding documentation: @dotordogh : I agree that expanding the method docs for this would be good. You might also consider adding a paragraph here, which is automatically rendered here.

Member

stuhood commented May 10, 2017

Thanks @benjyw! Regarding documentation: @dotordogh : I agree that expanding the method docs for this would be good. You might also consider adding a paragraph here, which is automatically rendered here.

@benjyw

Thanks for the comprehensive documentation! Things are much clearer now. Still have a few comments.

src/docs/dev_tasks.md
@@ -265,3 +265,15 @@ Pants allows more fine grained cache management, although it then becomes the re
if self.artifact_cache_writes_enabled():
self.update_artifact_cache((vt, [output_location]))
+Recording Target Specific Data
+------------------------------
+If you would like to track target information like the targets being run, their

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

s/like/, such as/

These are examples, not analogies.

@benjyw

benjyw May 12, 2017

Contributor

s/like/, such as/

These are examples, not analogies.

src/docs/dev_tasks.md
+Recording Target Specific Data
+------------------------------
+If you would like to track target information like the targets being run, their
+run times, or some other target-specific piece of data, `run_tracker` has this

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

s/has this ability by calling/provides this ability via/

In the original phrasing it sounds like the run_tracker calls the method, not that it's a method on the run_tracker.

@benjyw

benjyw May 12, 2017

Contributor

s/has this ability by calling/provides this ability via/

In the original phrasing it sounds like the run_tracker calls the method, not that it's a method on the run_tracker.

@@ -528,6 +538,17 @@ def _do_run_tests(self, targets):
safe_mkdir(external_junit_xml_dir)
shutil.copy(junitxml_path, external_junit_xml_dir)
failed_targets = self._get_failed_targets_from_junitxml(junitxml_path, targets)
+
+ def parse_error_handler(parse_error):
+ # Simple error handler to pass to xml parsing function

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

nit: Period.

@benjyw

benjyw May 12, 2017

Contributor

nit: Period.

src/python/pants/goal/run_tracker.py
+ """ Recursively constructs a nested dictionary with the keys pointing to the value.
+
+ :param list of string all_keys: The keys that will be recursively nested.
+ :param dict or string val_item: The value of the information being stored.

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

"item" means something specific in python dicts: a (key, value) pair (e.g., dict.items() is a list of such pairs). So probably best to just call this val.

@benjyw

benjyw May 12, 2017

Contributor

"item" means something specific in python dicts: a (key, value) pair (e.g., dict.items() is a list of such pairs). So probably best to just call this val.

+ if _XML_MATCHER.match(name):
+ parse_xml_file(os.path.join(xml_path, name))
+ else:
+ parse_xml_file(xml_path)

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

Correct, I don't think you need to match it. The matcher is useful when given only a dir, because how else would you know which files in the dir are relevant? But if someone gives you a file, they already know what they want you to do with it, no matter what the name is.

@benjyw

benjyw May 12, 2017

Contributor

Correct, I don't think you need to match it. The matcher is useful when given only a dir, because how else would you know which files in the dir are relevant? But if someone gives you a file, they already know what they want you to do with it, no matter what the name is.

@@ -55,6 +55,9 @@ def add_misses(self, cache_name, targets, causes): pass
artifact_cache_stats = DummyArtifactCacheStats()
+ def report_target_info(self, scope, target, key, val): pass

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

👍 But s/key/keys/ for consistency.

@benjyw

benjyw May 12, 2017

Contributor

👍 But s/key/keys/ for consistency.

src/python/pants/goal/run_tracker.py
+ def report_target_info(self, scope, target, keys, val):
+ """Add target information to run_info under target_data.
+
+ Will Recursively construct a nested dict with the keys provided.

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

Am I right in observing that in practice, keys is always a singleton? If so, do we need the complexity of the recursion?

@benjyw

benjyw May 12, 2017

Contributor

Am I right in observing that in practice, keys is always a singleton? If so, do we need the complexity of the recursion?

This comment has been minimized.

@stuhood

stuhood May 12, 2017

Member

The idea behind this API is that tasks might need additional structure, but will be making multiple calls to this method, rather than batching things up like the test code is doing.

So rather than calling:

report_target_info(scope, target, ['this'], {'is': {'a': {'deeply': {'namespaced': 'value'}}}})

...they'd call:

report_target_info(scope, target, ['this', 'is', 'a', 'deeply', 'namespaced'], 'value')
@stuhood

stuhood May 12, 2017

Member

The idea behind this API is that tasks might need additional structure, but will be making multiple calls to this method, rather than batching things up like the test code is doing.

So rather than calling:

report_target_info(scope, target, ['this'], {'is': {'a': {'deeply': {'namespaced': 'value'}}}})

...they'd call:

report_target_info(scope, target, ['this', 'is', 'a', 'deeply', 'namespaced'], 'value')

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

OK, but we have no current use for this, and this is not as straightforward as it appears. E.g., if you call:

report_target_info(scope, target, ['this', 'is', 'a', 'deeply', 'namespaced'], 'value1')
and then
report_target_info(scope, target, ['this', 'is', 'another', 'deeply', 'namespaced'], 'value2')

You might reasonably expect the value for (scope, target) to be:

{ 'this': { 'is' : { 
  'a': {'deeply': {'namespaced': 'value1' }},
  'another': {'deeply': {'namespaced': 'value2' }}
}}}

But actually, if I understand the current code correctly, it would be:

{ 'this': { 'is' : { 
  'another: {'deeply': {'namespaced': 'value2' }}
}}}

So the semantics are unclear, and it might be best to just avoid the problem entirely, rather than anticipate usage that hasn't actually been needed yet?

@benjyw

benjyw May 12, 2017

Contributor

OK, but we have no current use for this, and this is not as straightforward as it appears. E.g., if you call:

report_target_info(scope, target, ['this', 'is', 'a', 'deeply', 'namespaced'], 'value1')
and then
report_target_info(scope, target, ['this', 'is', 'another', 'deeply', 'namespaced'], 'value2')

You might reasonably expect the value for (scope, target) to be:

{ 'this': { 'is' : { 
  'a': {'deeply': {'namespaced': 'value1' }},
  'another': {'deeply': {'namespaced': 'value2' }}
}}}

But actually, if I understand the current code correctly, it would be:

{ 'this': { 'is' : { 
  'another: {'deeply': {'namespaced': 'value2' }}
}}}

So the semantics are unclear, and it might be best to just avoid the problem entirely, rather than anticipate usage that hasn't actually been needed yet?

This comment has been minimized.

@stuhood

stuhood May 12, 2017

Member

Yea, that looks like a bug. My primary concern is that this usecase works:

report_target_info(scope, target, ['this', 'is', 'a', 'deeply', 'namespaced'], 'value1')
report_target_info(scope, target, ['this', 'is', 'another', 'deeply', 'namespaced'], 'value2')

If need be, removing the ability to also use:

report_target_info(scope, target, ['this'], {'is': {'a': {'deeply': {'namespaced': 'value'}}}})

...would be preferable to me.

@stuhood

stuhood May 12, 2017

Member

Yea, that looks like a bug. My primary concern is that this usecase works:

report_target_info(scope, target, ['this', 'is', 'a', 'deeply', 'namespaced'], 'value1')
report_target_info(scope, target, ['this', 'is', 'another', 'deeply', 'namespaced'], 'value2')

If need be, removing the ability to also use:

report_target_info(scope, target, ['this'], {'is': {'a': {'deeply': {'namespaced': 'value'}}}})

...would be preferable to me.

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

OK, but, to verify that we're both on the same page, the use case you mention as your primary concern currently doesn't work. So it must be fixed (and appropriately tested), or we could just punt on the "list of keys" functionality for this release and add it in a separate change without the pressure of a release. Since your current use only needs a singleton, you could just pass in a single string key, and then the future change that adds the nesting functionality can test for string vs. list key and act accordingly.

@benjyw

benjyw May 12, 2017

Contributor

OK, but, to verify that we're both on the same page, the use case you mention as your primary concern currently doesn't work. So it must be fixed (and appropriately tested), or we could just punt on the "list of keys" functionality for this release and add it in a separate change without the pressure of a release. Since your current use only needs a singleton, you could just pass in a single string key, and then the future change that adds the nesting functionality can test for string vs. list key and act accordingly.

This comment has been minimized.

@stuhood

stuhood May 12, 2017

Member

So it must be fixed (and appropriately tested)

Correct.

@stuhood

stuhood May 12, 2017

Member

So it must be fixed (and appropriately tested)

Correct.

src/python/pants/goal/run_tracker.py
+
+ :API: public
+ """
+ def create_dict_with_nested_keys_and_val(all_keys, val_item, index):

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

I think this doesn't need the all_keys param at all? It can access keys directly from its closure.

@benjyw

benjyw May 12, 2017

Contributor

I think this doesn't need the all_keys param at all? It can access keys directly from its closure.

@benjyw

Doc and implementation changes look good, but I have some remaining comments about the behavior of updates of this nested dict data structure. There seems to be some confusion. E.g., unless I'm drastically misreading things, the current code is not capable of producing the multi-scope example you give in the comment!

This really needs a unit test, to prove that its behavior is as expected (and to establish what those expectations are...)

src/python/pants/goal/run_tracker.py
+ return {keys[index]: value}
+
+ val_to_store = create_dict_with_nested_keys_and_val(val, len(keys) - 1)
+ target_data = self._target_data.get(target, None)

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

It's a bit confusing that a value from a dict called "_target_data" is itself called "target_data".

Maybe rename _target_data to _target_to_data ? Or adopt Stu's more concise suggestion below, which avoids the intermediate var entirely.

@benjyw

benjyw May 12, 2017

Contributor

It's a bit confusing that a value from a dict called "_target_data" is itself called "target_data".

Maybe rename _target_data to _target_to_data ? Or adopt Stu's more concise suggestion below, which avoids the intermediate var entirely.

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

Also, I don't think you need the explicit default of None in get(target, None): The default is implicitly None if unspecified.

@benjyw

benjyw May 12, 2017

Contributor

Also, I don't think you need the explicit default of None in get(target, None): The default is implicitly None if unspecified.

src/python/pants/goal/run_tracker.py
+ def report_target_info(self, scope, target, keys, val):
+ """Add target information to run_info under target_data.
+
+ Will Recursively construct a nested dict with the keys provided.

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

OK, but we have no current use for this, and this is not as straightforward as it appears. E.g., if you call:

report_target_info(scope, target, ['this', 'is', 'a', 'deeply', 'namespaced'], 'value1')
and then
report_target_info(scope, target, ['this', 'is', 'another', 'deeply', 'namespaced'], 'value2')

You might reasonably expect the value for (scope, target) to be:

{ 'this': { 'is' : { 
  'a': {'deeply': {'namespaced': 'value1' }},
  'another': {'deeply': {'namespaced': 'value2' }}
}}}

But actually, if I understand the current code correctly, it would be:

{ 'this': { 'is' : { 
  'another: {'deeply': {'namespaced': 'value2' }}
}}}

So the semantics are unclear, and it might be best to just avoid the problem entirely, rather than anticipate usage that hasn't actually been needed yet?

@benjyw

benjyw May 12, 2017

Contributor

OK, but we have no current use for this, and this is not as straightforward as it appears. E.g., if you call:

report_target_info(scope, target, ['this', 'is', 'a', 'deeply', 'namespaced'], 'value1')
and then
report_target_info(scope, target, ['this', 'is', 'another', 'deeply', 'namespaced'], 'value2')

You might reasonably expect the value for (scope, target) to be:

{ 'this': { 'is' : { 
  'a': {'deeply': {'namespaced': 'value1' }},
  'another': {'deeply': {'namespaced': 'value2' }}
}}}

But actually, if I understand the current code correctly, it would be:

{ 'this': { 'is' : { 
  'another: {'deeply': {'namespaced': 'value2' }}
}}}

So the semantics are unclear, and it might be best to just avoid the problem entirely, rather than anticipate usage that hasn't actually been needed yet?

src/python/pants/goal/run_tracker.py
+ val_to_store = create_dict_with_nested_keys_and_val(val, len(keys) - 1)
+ target_data = self._target_data.get(target, None)
+ if target_data is None:
+ self._target_data.update({target: {scope: val_to_store}})

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

This will stomp on any previous data for the given target, including any previous data for this scope and for any other scopes. I assume this isn't what we want?

@benjyw

benjyw May 12, 2017

Contributor

This will stomp on any previous data for the given target, including any previous data for this scope and for any other scopes. I assume this isn't what we want?

src/python/pants/goal/run_tracker.py
+ if target_data is None:
+ self._target_data.update({target: {scope: val_to_store}})
+ else:
+ scope_data = self._target_data[target].get(scope, None)

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

This has different behavior than the current code - it won't stomp on all previous data from all scopes for a target, as the current code will, but it will stomp on any data for the (target, scope) pair that shares a common key prefix. Is that what we want?

@benjyw

benjyw May 12, 2017

Contributor

This has different behavior than the current code - it won't stomp on all previous data from all scopes for a target, as the current code will, but it will stomp on any data for the (target, scope) pair that shares a common key prefix. Is that what we want?

src/python/pants/goal/run_tracker.py
+ if target_data is None:
+ self._target_data.update({target: {scope: val_to_store}})
+ else:
+ scope_data = self._target_data[target].get(scope, None)

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

Also, scope_data = self._target_data[target] will crash if there is no existing data for target, so self._target_data would have to be a defaultdict (which is probably a good idea).

@benjyw

benjyw May 12, 2017

Contributor

Also, scope_data = self._target_data[target] will crash if there is no existing data for target, so self._target_data would have to be a defaultdict (which is probably a good idea).

src/python/pants/goal/run_tracker.py
+ if target_data is None:
+ self._target_data.update({target: {scope: val_to_store}})
+ else:
+ scope_data = self._target_data[target].get(scope, None)

This comment has been minimized.

@benjyw

benjyw May 12, 2017

Contributor

Ditto here re not needing the explicit default in get(scope, None).

@benjyw

benjyw May 12, 2017

Contributor

Ditto here re not needing the explicit default in get(scope, None).

@benjyw

benjyw approved these changes May 13, 2017

Thanks for bearing with me on the changes. This looks good to me now, and I look forward to seeing the followup change too!

@stuhood stuhood merged commit d523736 into pantsbuild:master May 13, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

stuhood added a commit that referenced this pull request May 15, 2017

Include API that will store target info in run_tracker (#4561)
### Problem
We would like to collect stats about the targets and tests being run so we would like to add a way to track this info. 

### Solution

The change included some refactoring to ease code re-use, a method to parse all the test information from the xml files, and two additional methods to collect target info and test info.

### Result

The changes shouldn't affect the end-user behavior. This change is for future development.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment