Limit build ids to the last 6 months #76

maurodoglio · 2017-09-20T20:47:10Z

This will reduce the number of aggregates we generate, which should
speed up aggregation and reduce the size of the output parquet file.

codecov-io · 2017-09-20T20:52:51Z

Codecov Report

Merging #76 into master will decrease coverage by 0.2%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##           master      #76      +/-   ##
==========================================
- Coverage   87.05%   86.85%   -0.21%     
==========================================
  Files           3        3              
  Lines         340      350      +10     
  Branches        6       10       +4     
==========================================
+ Hits          296      304       +8     
- Misses         44       46       +2

Impacted Files	Coverage Δ
.../mozilla/telemetry/streaming/ErrorAggregator.scala	`86.16% <100%> (ø)`	⬆️
...in/scala/com/mozilla/telemetry/pings/package.scala	`87% <81.81%> (-0.78%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb6f527...0d1a386. Read the comment docs.

fbertsch · 2017-09-20T21:08:46Z

src/main/scala/com/mozilla/telemetry/pings/package.scala

+    def normalizedBuildId(): Option[String] = {
+      `environment.build`.flatMap(_.buildId) match {
+        case Some(buildId: String) => {
+          val buildIdDay = buildId.slice(0, 6).toString()


This is taking the YYYYMM of the build, you need YYYYMMDD

Right you are

fbertsch · 2017-09-20T21:09:59Z

src/test/scala/com/mozilla/telemetry/streaming/TestErrorAggregator.scala

+    import spark.implicits._
+    val messages = TestUtils.generateMainMessages(
+      1, Some(Map(
+        "environment.build" -> """{"buildId": "20170602"""",


These tests are testing the case of a build after a submission date. We should be testing the opposite - a build BEFORE a submission date; specifically: a build < 6 months before a submission_date, and a build > 6 months before a submission date. Any builds after a submission_date should be ignored, since that makes no sense!

Good catch, I also found another bug while fixing this 😄

maurodoglio · 2017-09-21T14:14:31Z

This is RFAL

fbertsch · 2017-09-21T14:53:26Z

src/main/scala/com/mozilla/telemetry/pings/package.scala

+        case Some(buildId: String) => {
+          val buildIdDay = buildId.slice(0, 8).toString()
+          val buildDateFormat = DateTimeFormat.forPattern("yyyyMMdd")
+          val buildDateTime = buildDateFormat.parseDateTime(buildIdDay)


This will fail on improper dates which our schema validation [0] doesn't necessarily catch. For example, 00000000 is validated, but it will cause this to fail with org.joda.time.IllegalFieldValueException: Cannot parse "00000000": Value 0 for monthOfYear must be in the range [1,12].

[0] https://github.com/mozilla-services/mozilla-pipeline-schemas/blob/master/schemas/telemetry/main/main.4.schema.json#L11

This will reduce the number of aggregates we generate, which should speed up aggregation and reduce the size of the output parquet file.

fbertsch

🚢 🤡

maurodoglio requested a review from fbertsch September 20, 2017 20:47

maurodoglio self-assigned this Sep 20, 2017

fbertsch suggested changes Sep 20, 2017

View reviewed changes

maurodoglio force-pushed the normalize-build-id branch from 4aa35ae to fd82386 Compare September 21, 2017 14:28

fbertsch suggested changes Sep 21, 2017

View reviewed changes

Mauro Doglio added 3 commits September 21, 2017 17:44

Limit build ids to the last 6 months

a3c7dd5

This will reduce the number of aggregates we generate, which should speed up aggregation and reduce the size of the output parquet file.

Move normalization methods to val

b9372e5

Remove leftover print statement

0d1a386

maurodoglio force-pushed the normalize-build-id branch from bccb50b to 0d1a386 Compare September 21, 2017 16:55

fbertsch approved these changes Sep 21, 2017

View reviewed changes

fbertsch merged commit 0192c35 into master Sep 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit build ids to the last 6 months #76

Limit build ids to the last 6 months #76

maurodoglio commented Sep 20, 2017

codecov-io commented Sep 20, 2017 •

edited

fbertsch Sep 20, 2017

maurodoglio Sep 21, 2017

fbertsch Sep 20, 2017

maurodoglio Sep 21, 2017

maurodoglio commented Sep 21, 2017

fbertsch Sep 21, 2017

fbertsch left a comment

Limit build ids to the last 6 months #76

Limit build ids to the last 6 months #76

Conversation

maurodoglio commented Sep 20, 2017

codecov-io commented Sep 20, 2017 • edited

Codecov Report

fbertsch Sep 20, 2017

Choose a reason for hiding this comment

maurodoglio Sep 21, 2017

Choose a reason for hiding this comment

fbertsch Sep 20, 2017

Choose a reason for hiding this comment

maurodoglio Sep 21, 2017

Choose a reason for hiding this comment

maurodoglio commented Sep 21, 2017

fbertsch Sep 21, 2017

Choose a reason for hiding this comment

fbertsch left a comment

Choose a reason for hiding this comment

codecov-io commented Sep 20, 2017 •

edited