Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add content histograms to longitudinal view #155

Merged
merged 1 commit into from Jan 19, 2017

Conversation

harterrt
Copy link
Contributor

@harterrt harterrt commented Dec 23, 2016

This change also makes it easy to add new histogram endpoints in the
future. Just add a path and suffix to histogramLocations or
keyedHistogramLocations. This whitelist updates both the parsing code
and the dataset schema.

I prototyped a version of this code which would remove the whitelist by
automatically scanning for new keys under "payload.processes". However,
we have to add these new keys/suffixes to the dataset schema.


This change is Reviewable

@mozilla-autolander-deprecated

Autolander could not find a bug number in your pull request title. All pull requests should be in the format of: Bug [number] - [description].

@harterrt
Copy link
Contributor Author

@vitillo, Please take a look

@codecov-io
Copy link

codecov-io commented Dec 23, 2016

Current coverage is 20.13% (diff: 93.75%)

Merging #155 into master will increase coverage by 0.50%

@@             master       #155   diff @@
==========================================
  Files            25         25          
  Lines          2323       2329     +6   
  Methods        2161       2163     +2   
  Messages          0          0          
  Branches        106        112     +6   
==========================================
+ Hits            456        469    +13   
+ Misses         1867       1860     -7   
  Partials          0          0          

Powered by Codecov. Last update 71ad6ab...b4e1ffb

@vitillo
Copy link
Contributor

vitillo commented Dec 29, 2016

Reviewed 3 of 3 files at r1.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

  private val histogramLocations = List(
    HistogramLocation("payload.histograms", ""),
    HistogramLocation("payload.processes.content.histograms", "_CONTENT")

I am not sure I understand why we would need to add (manually?) new keys/suffixes to the dataset schema if we want to automatically scan for new processes under payload.processes.

Ideally we don't want to edit this job every time a new process type is added so it would be great if we can automate this part.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 32 at r1 (raw file):

  private val suffixes = (histogramLocations ++ keyedHistogramLocations).map(_.suffix).distinct

  private def parseHistogramLocation[HistFormat : Manifest](

s/HistFormat/HistogramFormat


Comments from Reviewable

@harterrt
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 2 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, vitillo (Roberto Agostino Vitillo) wrote…

I am not sure I understand why we would need to add (manually?) new keys/suffixes to the dataset schema if we want to automatically scan for new processes under payload.processes.

Ideally we don't want to edit this job every time a new process type is added so it would be great if we can automate this part.

Every histogram name needs to be registered in the schema generated by buildSchema[0]. Without this list of suffixes, we would need to scan through pings to collect new process types before generating the schema.

[0] https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala#L187


Comments from Reviewable

@vitillo
Copy link
Contributor

vitillo commented Dec 29, 2016

Review status: all files reviewed at latest revision, 2 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, harterrt (Ryan Harter) wrote…

Every histogram name needs to be registered in the schema generated by buildSchema[0]. Without this list of suffixes, we would need to scan through pings to collect new process types before generating the schema.

[0] https://github.com/mozilla/telemetry-batch-view/blob/master/src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala#L187

@Dexterp37, is there a static list of valid processes somewhere?


Comments from Reviewable

@harterrt
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 3 unresolved discussions.


src/test/scala/com/mozilla/telemetry/LongitudinalTest.scala, line 186 at r1 (raw file):

        "payload.histograms" -> compact(render(histograms)),
        "payload.keyedHistograms" -> compact(render(keyedHistograms)),
        "payload.processes.content.histograms" -> compact(render(histograms)),

@vitillo, if I understand your comment in Bug 1325653 [0], this field will not be available in the Heka message, correct? I'll have to refactor this change to read directly from the payload object. Please confirm.

[0] https://bugzilla.mozilla.org/show_bug.cgi?id=1325653#c1


Comments from Reviewable

@vitillo
Copy link
Contributor

vitillo commented Dec 30, 2016

Review status: all files reviewed at latest revision, 3 unresolved discussions.


src/test/scala/com/mozilla/telemetry/LongitudinalTest.scala, line 186 at r1 (raw file):

Previously, harterrt (Ryan Harter) wrote…

@vitillo, if I understand your comment in Bug 1325653 [0], this field will not be available in the Heka message, correct? I'll have to refactor this change to read directly from the payload object. Please confirm.

[0] https://bugzilla.mozilla.org/show_bug.cgi?id=1325653#c1

That was a more general comment about why things are what they are.

I assumed you had already checked where exactly those fields sit in the payload; if not you can easily do that from the Hindsight UI.


Comments from Reviewable

@Dexterp37
Copy link
Contributor

Review status: all files reviewed at latest revision, 3 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, vitillo (Roberto Agostino Vitillo) wrote…

@Dexterp37, is there a static list of valid processes somewhere?

The list of Telemetry supported processes is only available here: the ping section names are built from that by removing the "#" character. I might refactor this a bit as part of bug 1278556 that is supposed to land soon-ish.


Comments from Reviewable

@vitillo
Copy link
Contributor

vitillo commented Jan 2, 2017

Review status: all files reviewed at latest revision, 3 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, Dexterp37 (Alessio Placitelli) wrote…

The list of Telemetry supported processes is only available here: the ping section names are built from that by removing the "#" character. I might refactor this a bit as part of bug 1278556 that is supposed to land soon-ish.

Hm, that's not very parser friendly though. What about storing the valid process types in a yaml or json file somewhere?


src/test/scala/com/mozilla/telemetry/LongitudinalTest.scala, line 186 at r1 (raw file):

Previously, vitillo (Roberto Agostino Vitillo) wrote…

That was a more general comment about why things are what they are.

I assumed you had already checked where exactly those fields sit in the payload; if not you can easily do that from the Hindsight UI.

See also the corresponding code that generates the data stored on S3.


Comments from Reviewable

@Dexterp37
Copy link
Contributor

Review status: all files reviewed at latest revision, 3 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, vitillo (Roberto Agostino Vitillo) wrote…

Hm, that's not very parser friendly though. What about storing the valid process types in a yaml or json file somewhere?

I just filed bug 1328230 for that.


Comments from Reviewable

@harterrt harterrt force-pushed the child_histograms branch 6 times, most recently from f750165 to a733ec4 Compare January 4, 2017 23:41
@harterrt
Copy link
Contributor Author

harterrt commented Jan 5, 2017

Review status: 0 of 3 files reviewed at latest revision, 3 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, Dexterp37 (Alessio Placitelli) wrote…

I just filed bug 1328230 for that.

@vitillo, can we continue with this PR and refactor once Bug 1328230 is complete?


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 32 at r1 (raw file):

Previously, vitillo (Roberto Agostino Vitillo) wrote…

s/HistFormat/HistogramFormat

Done.


src/test/scala/com/mozilla/telemetry/LongitudinalTest.scala, line 186 at r1 (raw file):

Previously, vitillo (Roberto Agostino Vitillo) wrote…

See also the corresponding code that generates the data stored on S3.

Great. I refactored this change to read content histograms directly from the json payload.

I ran the revised code over data from December 2016 and the results look good. The resulting data is in s3://telemetry-test-bucket/longitudinal/v20170103/.


Comments from Reviewable

@vitillo
Copy link
Contributor

vitillo commented Jan 5, 2017

Review status: 0 of 3 files reviewed at latest revision, 2 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, harterrt (Ryan Harter) wrote…

@vitillo, can we continue with this PR and refactor once Bug 1328230 is complete?

@Dexterp37, do you have a rough idea when that Bug will be tackled and how long it might take?


Comments from Reviewable

@vitillo
Copy link
Contributor

vitillo commented Jan 5, 2017

Review status: 0 of 3 files reviewed at latest revision, 2 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, vitillo (Roberto Agostino Vitillo) wrote…

@Dexterp37, do you have a rough idea when that Bug will be tackled and how long it might take?

@harterrt in the meantime you can assume that you are given a list with process names, e.g. List("content", "gpu"), instead of histogram locations. That will make it easier to adapt the code later on.


Comments from Reviewable

@Dexterp37
Copy link
Contributor

Review status: 0 of 3 files reviewed at latest revision, 2 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, vitillo (Roberto Agostino Vitillo) wrote…

@harterrt in the meantime you can assume that you are given a list with process names, e.g. List("content", "gpu"), instead of histogram locations. That will make it easier to adapt the code later on.

We're going to tackle that as soon as bug 1278556 lands, so probably in a week or so.


Comments from Reviewable

@harterrt harterrt force-pushed the child_histograms branch 3 times, most recently from 59198fb to 684d33e Compare January 17, 2017 21:46
@harterrt
Copy link
Contributor Author

Review status: 0 of 3 files reviewed at latest revision, 2 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, Dexterp37 (Alessio Placitelli) wrote…

We're going to tackle that as soon as bug 1278556 lands, so probably in a week or so.

@vitillo done. PTAL.

I'm happy to refactor once bug 1328230 is closed, but I'd like to unblock this PR.


src/test/scala/com/mozilla/telemetry/LongitudinalTest.scala, line 186 at r1 (raw file):

Previously, harterrt (Ryan Harter) wrote…

Great. I refactored this change to read content histograms directly from the json payload.

I ran the revised code over data from December 2016 and the results look good. The resulting data is in s3://telemetry-test-bucket/longitudinal/v20170103/.

Done.


Comments from Reviewable

@vitillo
Copy link
Contributor

vitillo commented Jan 18, 2017

Reviewed 1 of 3 files at r2, 2 of 2 files at r4.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, harterrt (Ryan Harter) wrote…

@vitillo done. PTAL.

I'm happy to refactor once bug 1328230 is closed, but I'd like to unblock this PR.

OK, please open a Bug for the refactor.


src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala, line 805 at r4 (raw file):

      // Get the payload of the Heka frame, which contains the payload for the
      // main ping.
      val payload = parse(x.getOrElse("payload", return).asInstanceOf[String]).extract[JValue]

Doesn't parse return a JValue? I am not sure why there is a call to extract.


Comments from Reviewable

@harterrt
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


src/main/scala/com/mozilla/telemetry/histograms/Histogram.scala, line 22 at r1 (raw file):

Previously, vitillo (Roberto Agostino Vitillo) wrote…

OK, please open a Bug for the refactor.

Done:
https://bugzilla.mozilla.org/show_bug.cgi?id=1332117


src/main/scala/com/mozilla/telemetry/views/Longitudinal.scala, line 805 at r4 (raw file):

Previously, vitillo (Roberto Agostino Vitillo) wrote…

Doesn't parse return a JValue? I am not sure why there is a call to extract.

Done.


Comments from Reviewable

@harterrt harterrt force-pushed the child_histograms branch 2 times, most recently from e7d036d to 9d032d3 Compare January 18, 2017 23:16
This change also makes it easy to add new histogram endpoints in the
future. Just add a path and suffix to histogramLocations or
keyedHistogramLocations. This whitelist updates both the parsing code
and the dataset schema.

I prototyped a version of this code which would remove the whitelist by
automatically scanning for new keys under "payload.processes". However,
we have to add these new keys/suffixes to the dataset schema.
@harterrt
Copy link
Contributor Author

Rebased to clear the merge conflict.


Review status: 1 of 3 files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

@harterrt
Copy link
Contributor Author

@vitillo RFAL


Review status: 1 of 3 files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

@vitillo
Copy link
Contributor

vitillo commented Jan 19, 2017

:lgtm:


Reviewed 2 of 2 files at r5.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

@vitillo vitillo merged commit 7139c4a into mozilla:master Jan 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants