Skip to content
This repository has been archived by the owner on Dec 6, 2019. It is now read-only.

Limit build ids to the last 6 months #76

Merged
merged 3 commits into from Sep 21, 2017
Merged

Limit build ids to the last 6 months #76

merged 3 commits into from Sep 21, 2017

Conversation

maurodoglio
Copy link
Contributor

This will reduce the number of aggregates we generate, which should
speed up aggregation and reduce the size of the output parquet file.

@maurodoglio maurodoglio self-assigned this Sep 20, 2017
@codecov-io
Copy link

codecov-io commented Sep 20, 2017

Codecov Report

Merging #76 into master will decrease coverage by 0.2%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #76      +/-   ##
==========================================
- Coverage   87.05%   86.85%   -0.21%     
==========================================
  Files           3        3              
  Lines         340      350      +10     
  Branches        6       10       +4     
==========================================
+ Hits          296      304       +8     
- Misses         44       46       +2
Impacted Files Coverage Δ
.../mozilla/telemetry/streaming/ErrorAggregator.scala 86.16% <100%> (ø) ⬆️
...in/scala/com/mozilla/telemetry/pings/package.scala 87% <81.81%> (-0.78%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb6f527...0d1a386. Read the comment docs.

def normalizedBuildId(): Option[String] = {
`environment.build`.flatMap(_.buildId) match {
case Some(buildId: String) => {
val buildIdDay = buildId.slice(0, 6).toString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is taking the YYYYMM of the build, you need YYYYMMDD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right you are

import spark.implicits._
val messages = TestUtils.generateMainMessages(
1, Some(Map(
"environment.build" -> """{"buildId": "20170602"""",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are testing the case of a build after a submission date. We should be testing the opposite - a build BEFORE a submission date; specifically: a build < 6 months before a submission_date, and a build > 6 months before a submission date. Any builds after a submission_date should be ignored, since that makes no sense!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I also found another bug while fixing this 😄

@maurodoglio
Copy link
Contributor Author

This is RFAL

case Some(buildId: String) => {
val buildIdDay = buildId.slice(0, 8).toString()
val buildDateFormat = DateTimeFormat.forPattern("yyyyMMdd")
val buildDateTime = buildDateFormat.parseDateTime(buildIdDay)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail on improper dates which our schema validation [0] doesn't necessarily catch. For example, 00000000 is validated, but it will cause this to fail with org.joda.time.IllegalFieldValueException: Cannot parse "00000000": Value 0 for monthOfYear must be in the range [1,12].

[0] https://github.com/mozilla-services/mozilla-pipeline-schemas/blob/master/schemas/telemetry/main/main.4.schema.json#L11

Mauro Doglio added 3 commits September 21, 2017 17:44
This will reduce the number of aggregates we generate, which should
speed up aggregation and reduce the size of the output parquet file.
Copy link
Contributor

@fbertsch fbertsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 🤡

@fbertsch fbertsch merged commit 0192c35 into master Sep 21, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants