# WebRender dashboard ETL script

You probably want to look at the [published dashboard](file:///Users/tsmith/projects/webrender-dashboard/dashboard.html).

You can check [scheduled run status](https://dbc-caf9527b-e073.cloud.databricks.com/#job/715).


### What's different?

* Metrics are summarized by build (vs date).
* Performance metrics are aggregated over users; this is important because it reduces the impacts of outlier users on our understanding of product performance, and because it reflects how WebRender changes the user experience at the user level.
* Pings are artisanally hand-selected so that comparisons between experiment branches are fair even in the presence of the various enrollment weirdnesses.

In [2]:
%r
library(boot)
library(dplyr, warn.conflicts=FALSE)
library(ggplot2)
library(sparklyr, warn.conflicts=FALSE)
library(tidyr, warn.conflicts=FALSE)

sc = spark_connect(method="databricks")

# Save result tables so we can access them from hala


In [3]:
from moztelemetry.dataset import Dataset
import pandas as pd
from pyspark.sql import Row
from pyspark.sql import functions as f
from pyspark.sql.types import StructType, StructField, StringType, BooleanType, IntegerType, DoubleType, LongType, MapType
from statsmodels.stats.weightstats import DescrStatsW

In [4]:
  Dataset.from_source("telemetry").schema

In [5]:


pings = (
  Dataset
  .from_source("telemetry")
  .where(
    docType="main",
  )
  .where(appUpdateChannel="beta")
  .where(submissionDate='20181220')
  .select(
    app_build_id="application.buildId",
    environment='environment.system',
    dwrite_init_problem="payload.histograms.DWRITEFONT_INIT_PROBLEM",
    client_id="clientId",
    normalized_channel="meta.normalizedChannel",
    profile_subsession_counter="payload.info.profileSubsessionCounter",
    session_id="payload.info.sessionId",
  )
  .records(sc, sample=0.1)
)

In [6]:
#def 
pings = pings.filter(lambda p: p["dwrite_init_problem"])
pings.cache()
pings.take(1)

In [7]:
pings.filter(lambda p: 'windowsBuildNumber' in p['environment']['os']).map(lambda p: p['environment']['os']['windowsBuildNumber']).countByValue()

In [8]:
pings.count()