Crash aggregates #56

Uberi · 2016-04-12T21:33:42Z

Intended to have behaviour equivalent to https://github.com/mozilla/moz-crash-rate-aggregates, but faster.

It's also a little bit simpler, since the Spark Scala API is somewhat richer and more robust than PySpark.

codecov-io · 2016-04-12T21:39:39Z

Current coverage is 54.26%

Merging #56 into master will increase coverage by +0.08%

2 files (not in diff) in ...streams/main_summary were deleted. more
4 files (not in diff) in ...c/main/scala/streams were deleted. more
2 files (not in diff) in .../src/main/scala/heka were deleted. more
8 files (not in diff) in ...-view/src/main/scala were deleted. more
File ...lientCountView.scala (not in diff) was modified. more
- Misses +2
- Partials 0
- Hits 0

@@             master        #56   diff @@
==========================================
  Files            32         17    -15   
  Lines          1402       1539   +137   
  Methods        1338       1481   +143   
  Messages          0          0          
  Branches         48         55     +7   
==========================================
+ Hits            759        835    +76   
- Misses          643        704    +61   
  Partials          0          0

Powered by Codecov. Last updated by d260b7b...35e3ea1

vitillo · 2016-04-13T15:03:13Z

src/main/scala/streams/Crash.scala

+      ).toMap
+      val statsMap = (statsNames, stats).zipped.toMap
+
+      val schema = buildSchema()


We are trying to move away from Avro schema in new datasets and use only the one provided by SparkSQL. Basically you would have to port over the same code you used in your Python version.

Uberi · 2016-04-15T20:13:44Z

After a lot of debugging, I found out that Parquet-Avro 1.8.1 (the most recent version) entirely breaks DataFrame.write.parquet:

libraryDependencies += "org.apache.parquet" % "parquet-avro" % "1.8.1",

Basically, if we even install parquet-avro, writing Parquet files from dataframes no longer works:

java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$MessageTypeBuilder.addFields([Lorg/apache/parquet/schema/Type;)Lorg/apache/parquet/schema/Types$GroupBuilder;
(...16 more lines...)

There's a resolved issue in Parquet for this, but it's only for 1.8.2-SNAPSHOT, so it isn't really considered stable yet.

We should (1) move this into a separate project (so we don't need to depend on parquet-avro at all), or (2) move away from the Avro stuff all at once.

vitillo · 2016-04-25T13:08:32Z

src/main/scala/views/CrashAggregateView.scala

+  }
+
+  def main(args: Array[String]) {
+    // load configuration for the 


Comment is incomplete.

Uberi · 2016-04-25T15:43:32Z

Changed:

Added comment about ping aggregates.
Clean up code a bit.
Use list with single element to identify top-level fields.

vitillo · 2016-04-26T13:58:05Z

This looks good, did you compare the output generated by this job with the one generated by the Python one?

vitillo · 2016-04-27T08:38:28Z

Could you please also add some documentation about the metrics collected in this job similarly to what you have written down for the Python one?

Uberi · 2016-04-28T22:27:19Z

Changes:

Add some better docs.
Test the scala aggregator against the crash aggregator on nightly, aurora, and beta:
- Scala output: https://console.aws.amazon.com/s3/home?region=us-west-2#&bucket=telemetry-test-bucket&prefix=crash_aggregates/v1/submission_date%3D2016-04-10/
- In the Scala one, there's 10 files since it was set to coalesce(10). In this PR it's set back to 1.
- Python output: https://console.aws.amazon.com/s3/home?region=us-west-2#&bucket=telemetry-test-bucket&prefix=crash_aggregates/vPy/submission_date%3D2016-04-10/
The differences can be explained by the Scala version being slightly more strict: it doesn't allow crash pings that have subsession lengths or main pings with no subsession lengths. In Python these are still processed.
Upload path for parquet files updated to match the Python one.

…types

…l works in production

vitillo · 2016-05-03T12:43:58Z

Looks good, thanks!

vitillo reviewed Apr 13, 2016
View reviewed changes

vitillo reviewed Apr 25, 2016
View reviewed changes

Uberi added 16 commits April 29, 2016 15:09

Initial crash stream

452437d

Working crash rate aggregates derived stream

04e82d8

Add tests and do all the fixes necessary to make the tests pass

64b2d66

Fix client count view test not cleaning up

e87e34c

Make the derived stream runnable

9525c56

Validate docType since we're running this over all the different doc …

f70e824

…types

Remove usage of DerivedStream

bacef91

Try to fix JSON4S error

1cfa6f8

avro-parquet breaks parquet writing in isolation, but apparently stil…

2c4c408

…l works in production

Implement feedback from PR - add some comments, clean up code a bit

ee03fbb

Fix off by one in day iteration

f4aa7bf

Update README

0a1cc04

Update ping counts to exclude crash pings

da5e8c9

Fix tests, which broke due to the way we do fake pings

bf97878

Update docs for crash aggregates and view-based derived datasets

2ccd056

Tweak docs and parquet code

35e3ea1

Uberi force-pushed the crash-aggregates branch from 1460fbc to 35e3ea1 Compare April 29, 2016 19:10

Uberi mentioned this pull request Apr 29, 2016

Streams to views #62

Closed

vitillo merged commit 8f6afba into mozilla:master May 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Crash aggregates #56

Crash aggregates #56

Uh oh!

Uberi commented Apr 12, 2016

Uh oh!

codecov-io commented Apr 12, 2016 •

edited

Loading

Uh oh!

vitillo Apr 13, 2016

Uh oh!

Uberi commented Apr 15, 2016

Uh oh!

vitillo Apr 25, 2016

Uh oh!

Uberi commented Apr 25, 2016

Uh oh!

vitillo commented Apr 26, 2016

Uh oh!

vitillo commented Apr 27, 2016

Uh oh!

Uberi commented Apr 28, 2016

Uh oh!

vitillo commented May 3, 2016

Uh oh!

Uh oh!

Crash aggregates #56

Crash aggregates #56

Uh oh!

Conversation

Uberi commented Apr 12, 2016

Uh oh!

codecov-io commented Apr 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 54.26%

Uh oh!

vitillo Apr 13, 2016

Choose a reason for hiding this comment

Uh oh!

Uberi commented Apr 15, 2016

Uh oh!

vitillo Apr 25, 2016

Choose a reason for hiding this comment

Uh oh!

Uberi commented Apr 25, 2016

Uh oh!

vitillo commented Apr 26, 2016

Uh oh!

vitillo commented Apr 27, 2016

Uh oh!

Uberi commented Apr 28, 2016

Uh oh!

vitillo commented May 3, 2016

Uh oh!

Uh oh!

codecov-io commented Apr 12, 2016 •

edited

Loading