Permalink
Fetching contributors…
Cannot retrieve contributors at this time
25 lines (20 sloc) 711 Bytes

The Main Summary dataset

The Main Summary dataset is generated by src/main/scala/com/mozilla/telemetry/views/MainSummaryView.scala.

Generating the dataset

For distributed execution, we can build a self-contained JAR file, then run it with Spark. For example, to generate the main_summary dataset for April 12, 2016 to April 28, 2016, and storing the resulting data in an s3 bucket called example_bucket:

sbt assembly
spark-submit \
    --master yarn \
    --deploy-mode client \
    --class com.mozilla.telemetry.views.MainSummaryView \
    telemetry-batch-view-1.1.jar \
    --bucket example_bucket \
    --from 20160412 \
    --to 20160428