Telemetry analysis for unified-urlbar experiment.
https://bugzilla.mozilla.org/show_bug.cgi?id=1219505

In [1]:
import pandas as pd
import ujson as json
import numpy as np

from moztelemetry import get_pings, get_pings_properties, get_one_ping_per_client

%pylab inline

Unable to parse whitelist (/home/hadoop/anaconda2/lib/python2.7/site-packages/moztelemetry/bucket-whitelist.json). Assuming all histograms are acceptable.
Populating the interactive namespace from numpy and matplotlib




Define the pings we care about.
The experiment ran on Firefox Beta 44 and 45, between Jan 11th and Feb 24th.
We care about "main" telemetry pings.

NOTE: For now, while we are developing the notebook, we only take a small fraction for a single day.

In [2]:
PING_OPTIONS = {
    "app": "Firefox",
    "channel": "beta",
    "version": ("44.0", "45.0"),
    "build_id": "*",
    "submission_date": ("20160113", "20160203"),
    "fraction": 0.1
}
pings = get_pings(sc, doc_type="main", **PING_OPTIONS)

pings.count()

22925554

We only need a subset of the ping data.

In [3]:
pings_data = get_pings_properties(pings,
                                  ["clientId",
                                   "environment/addons/activeExperiment/id",
                                   "environment/addons/activeExperiment/branch",
                                   "environment/settings/defaultSearchEngine",
                                   "environment/settings/userPrefs/browser.urlbar.suggest.searches",
                                   "environment/settings/userPrefs/browser.urlbar.userMadeSearchSuggestionsChoice",
                                   "payload/simpleMeasurements/UITelemetry/toolbars/defaultKept",
                                   "payload/simpleMeasurements/UITelemetry/toolbars/countableEvents/__DEFAULT__/search/searchbar",
                                   "payload/simpleMeasurements/UITelemetry/toolbars/countableEvents/__DEFAULT__/search/urlbar",
                                   "payload/simpleMeasurements/UITelemetry/toolbars/countableEvents/__DEFAULT__/search/abouthome",
                                   "payload/simpleMeasurements/UITelemetry/toolbars/countableEvents/__DEFAULT__/search/newtab",
                                   "payload/simpleMeasurements/UITelemetry/toolbars/countableEvents/__DEFAULT__/search-oneoff",
                                   "payload/simpleMeasurements/UITelemetry/toolbars/countableEvents/__DEFAULT__/click-builtin-item/urlbar/search-settings",
                                   "payload/simpleMeasurements/UITelemetry/toolbars/countableEvents/__DEFAULT__/click-builtin-item/searchbar/search-settings",
                                   "payload/histograms/FX_URLBAR_SELECTED_RESULT_TYPE"])

To prevent pseudoreplication, let's consider only a single submission for each client. As this step requires a distributed shuffle, it should always be run only after extracting the attributes of interest with get_pings_properties.

In [4]:
pings_data = get_one_ping_per_client(pings_data)
pings_data.count()

4093667

Only consider pings from users having the experiment.
Also discard users not having a default urlbar.

In [5]:
def experiment_filter(d):
    toolbar = d["payload/simpleMeasurements/UITelemetry/toolbars/defaultKept"]
    try:
        return d["environment/addons/activeExperiment/id"] == "unified-urlbar@experiments.mozilla.org" \
            and d["environment/addons/activeExperiment/branch"] in ("control", "unified", "customized") \
            and toolbar is not None and "urlbar-container" in toolbar
    except KeyError:
        raise ValueError("Whoa nellie, missing a key: " + repr(d))

experiment_data = pings_data.filter(experiment_filter).cache()

How many pings are left?

In [6]:
experiment_data.count()

255483

Fix non serializable values.

In [7]:
def process_data(d):
    urlbar_result = d["payload/histograms/FX_URLBAR_SELECTED_RESULT_TYPE"]
    if urlbar_result is not None:
        d["payload/histograms/FX_URLBAR_SELECTED_RESULT_TYPE"] = int(urlbar_result[5])

    widgetsInDefaultPosition = d["payload/simpleMeasurements/UITelemetry/toolbars/defaultKept"]
    if widgetsInDefaultPosition is not None:
        d["payload/simpleMeasurements/UITelemetry/toolbars/defaultKept"] = "search-container" in widgetsInDefaultPosition
    return d

serializable_data = experiment_data.map(lambda d: process_data(d))

Store the experiment data for later analysis

In [8]:
s3_path = "s3n://net-mozaws-prod-us-west-2-pipeline-analysis/mak/unified-urlbar/v1/"
serializable_data.saveAsTextFile(s3_path)