# TAAR – Evaluating existing recommenders

Not every recommender can always make a recommendation. To evaluate the individual recommenders for the ensemble, we want to find out how often this is the case and how well the recommenders complement each other.

This notebook either needs to be executed in the [TAAR](http://github.com/mozilla/taar) repository or somewhere where TAAR is in the Python path, because some TAAR recommenders are loaded in.

## Retrieving the relevant variables from the longitudinal dataset

In [105]:
%%time
frame = sqlContext.sql("""
WITH valid_clients AS (
    SELECT *
    FROM longitudinal
    WHERE normalized_channel='release' AND build IS NOT NULL AND build[0].application_name='Firefox'
),

addons AS (
    SELECT client_id, feature_row.*
    FROM valid_clients
    LATERAL VIEW explode(active_addons[0]) feature_row
),
    
non_system_addons AS(
    SELECT client_id, collect_set(key) AS installed_addons
    FROM addons
    WHERE NOT value.is_system
    GROUP BY client_id
)

SELECT
    l.client_id,
    non_system_addons.installed_addons,
    settings[0].locale AS locale,
    geo_city[0] AS geoCity,
    subsession_length[0] AS subsessionLength,
    system_os[0].name AS os,
    scalar_parent_browser_engagement_total_uri_count[0].value AS total_uri,
    scalar_parent_browser_engagement_tab_open_event_count[0].value as tab_open_count,
    places_bookmarks_count[0].sum as bookmark_count,
    scalar_parent_browser_engagement_unique_domains_count[0].value as unique_tlds,
    profile_creation_date[0] as profile_date,
    submission_date[0] as submission_date
FROM valid_clients l LEFT OUTER JOIN non_system_addons
ON l.client_id = non_system_addons.client_id
""")

rdd = frame.rdd

CPU times: user 140 ms, sys: 12 ms, total: 152 ms
Wall time: 18min 43s


## Loading addon data (AMO)

We need to load the addon database to find out which addons are considered useful by TAAR.

In [448]:
import boto3
import json
import logging

from botocore.exceptions import ClientError

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

AMO_DUMP_BUCKET = 'telemetry-parquet'
AMO_DUMP_KEY = 'telemetry-ml/addon_recommender/addons_database.json'

In [449]:
def load_amo_external_whitelist():
    """ Download and parse the AMO add-on whitelist.
    :raises RuntimeError: the AMO whitelist file cannot be downloaded or contains
                          no valid add-ons.
    """
    final_whitelist = []
    amo_dump = {}
    try:
        # Load the most current AMO dump JSON resource.
        s3 = boto3.client('s3')
        s3_contents = s3.get_object(Bucket=AMO_DUMP_BUCKET, Key=AMO_DUMP_KEY)
        amo_dump = json.loads(s3_contents['Body'].read())
    except ClientError:
        logger.exception("Failed to download from S3", extra={
            "bucket": AMO_DUMP_BUCKET,
            "key": AMO_DUMP_KEY})

    # If the load fails, we will have an empty whitelist, this may be problematic.
    for key, value in amo_dump.items():
        addon_files = value.get('current_version', {}).get('files', {})
        # If any of the addon files are web_extensions compatible, it can be recommended.
        if any([f.get("is_webextension", False) for f in addon_files]):
            final_whitelist.append(value['guid'])

    if len(final_whitelist) == 0:
        raise RuntimeError("Empty AMO whitelist detected")

    return final_whitelist

In [450]:
whitelist = set(load_amo_external_whitelist())

INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3-us-west-2.amazonaws.com


## Filtering out legacy addons 

This is a helper function that takes a list of addon IDs and only returns the IDs of addons that are useful for TAAR.

In [452]:
def get_whitelisted_addons(installed_addons):
    return whitelist.intersection(installed_addons)

## Completing client data

In [453]:
from dateutil.parser import parse as parse_date
from datetime import datetime

In [482]:
def compute_weeks_ago(formatted_date):
    try:
        date = parse_date(formatted_date).replace(tzinfo=None)
    except ValueError: # raised when the date is in an unknown format
        return float("inf")
    
    days_ago = (datetime.today() - date).days
    return days_ago / 7

In [483]:
def complete_client_data(client_data):
    client = client_data.asDict()
    
    client['installed_addons'] = client['installed_addons'] or []
    client['disabled_addon_ids'] = get_whitelisted_addons(client['installed_addons'])
    client['locale'] = str(client['locale'])
    client['profile_age_in_weeks'] = compute_weeks_ago(client['profile_date'])
    client['submission_age_in_weeks'] = compute_weeks_ago(client['submission_date'])
    
    return client

## Evaluating the existing recommenders

To check if a recommender is able to make a recommendation, it's sometimes easier and cleaner to directly query it instead of checking the important attributes ourselves. For example, this is the case for the locale recommender.

In [484]:
from taar.recommenders import CollaborativeRecommender, LegacyRecommender, LocaleRecommender

In [485]:
class DummySimilarityRecommender:
    def can_recommend(self, client_data):
        REQUIRED_FIELDS = ["geoCity", "subsessionLength", "locale", "os", "bookmark_count", "tab_open_count",
                           "total_uri", "unique_tlds"]

        has_fields = all([client_data.get(f, None) is not None for f in REQUIRED_FIELDS])
        return has_fields

In [486]:
recommenders = {
    "collaborative": CollaborativeRecommender(),
    "legacy": LegacyRecommender(),
    "locale": LocaleRecommender(),
    "similarity": DummySimilarityRecommender()
}

INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3-us-west-2.amazonaws.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3-us-west-2.amazonaws.com


In [487]:
def test_recommenders(client):
    return tuple([recommender.can_recommend(client) for recommender in recommenders.values()])

## Computing combined counts

We iterate over all clients in the longitudinal dataset, change the attributes to the expected format and then query the individual recommenders.

In [488]:
from operator import add
from collections import defaultdict

In [489]:
rdd_completed = rdd.map(complete_client_data)

In [490]:
def analyse(rdd):
    results = rdd\
        .map(test_recommenders)\
        .map(lambda x: (x, 1))\
        .reduceByKey(add)\
        .collect()
        
    return defaultdict(int, results)

In [491]:
%time results = analyse(rdd_completed)

CPU times: user 1.35 s, sys: 148 ms, total: 1.5 s
Wall time: 11min 48s


In [492]:
num_clients = sum(results.values())
total_results = results

## Computing individual counts

In [493]:
individual_counts = []

for i in range(len(recommenders)):
    count = 0
    
    for key, key_count in results.items():
        if key[i]:
            count += key_count
            
    individual_counts.append(count)

## Displaying the results

In [494]:
from pandas import DataFrame

In [495]:
def format_int(num):
    return "{:,}".format(num)

In [496]:
def format_frequency(frequency):
    return "%.5f" % frequency

In [497]:
def get_relative_counts(counts, total=num_clients):
    return [format_frequency(count / float(total)) for count in counts]

This is a bit hacky. Sorting a data frame by formatted counts does not work; so we have to add the unformatted ones, sort the data frame, and then remove that column again.

In [498]:
def sorted_dataframe(df, order, key="unformatted_counts"):
    df[key] = order
    return df.sort_values(by=key, ascending=False).drop(key, axis=1)

### Individual counts

In [499]:
df = DataFrame(index=recommenders.keys(),
          columns=["Relative count"],
          data=get_relative_counts(individual_counts)
)

sorted_dataframe(df, individual_counts)

Unnamed: 0,Relative count
locale,0.99977
collaborative,0.41949
similarity,0.28339
legacy,0.0


$\implies$ The locale and collaborative recommenders are able to generate recommendations most of the time. The legacy recommender can only make recommendations very seldomly as not many users seem to have (legacy) addons installed.

### Combined counts

It's interesting to see how well the individual recommenders complement each other. In the following, we count how often different combinations of the recommenders can make recommendations.

The table is easier to read if cells are empty if a recommender is not available. If this is not desired, these variables can be changed:

In [500]:
recommender_available_label = "Available"
recommender_unavailable_label = ""

In [501]:
def format_labels(keys):
    return tuple([recommender_available_label if key else recommender_unavailable_label for key in keys])

In [502]:
def format_data(keys, counts):
    formatted_keys = map(format_labels, keys)
    return [elems + count for elems, count in zip(formatted_keys, zip(*counts))]

In [503]:
columns = recommenders.keys() + ["Relative counts"]

counts = get_relative_counts(results.values())
data = format_data(results.keys(), [counts])

In [504]:
df = DataFrame(columns=columns, data=data)
sorted_dataframe(df, results.values())

Unnamed: 0,locale,legacy,collaborative,similarity,Relative counts
1,Available,,,,0.44747
2,Available,,Available,,0.26897
0,Available,,Available,Available,0.15043
4,Available,,,Available,0.1329
6,,,,,0.00011
5,,,Available,,6e-05
7,,,Available,Available,3e-05
3,,,,Available,3e-05


$\implies$ If any recommender is available, then the locale recommenders is generally also available. Other than that, there is a good chance the the collaborative recommender is available.
There is only a very small portion of cases where the similarity recommender was able to make a recommendation, when locale/collaborative were not; and not a single such case for the legacy recommender.

### Grouped by number of available recommenders

In [505]:
from itertools import groupby
from operator import itemgetter

In [506]:
from IPython.display import display, Markdown

In [507]:
for num, group in groupby(sorted(results.keys(), key=sum), sum):
    display(Markdown("#### %d available recommender%s" % (num, "s" if num != 1 else "")))
    
    sub_keys = list(group)
    formatted_keys = map(format_labels, sub_keys)
    
    sub_counts = [results[key] for key in sub_keys]
    sub_counts_to_total = get_relative_counts(sub_counts)
    sub_counts_to_table = get_relative_counts(sub_counts, sum(sub_counts))
    
    zipped_data = zip(formatted_keys, sub_counts_to_total, sub_counts_to_table)
    data = [elems + (counts, table_counts) for elems, counts, table_counts in zipped_data]
    
    columns = recommenders.keys() + ["Relative to all", "Relative to this table"]
    
    df = DataFrame(columns=columns, data=data)
    df = sorted_dataframe(df, sub_counts)
    display(df)

#### 0 available recommenders

Unnamed: 0,locale,legacy,collaborative,similarity,Relative to all,Relative to this table
0,,,,,0.00011,1.0


#### 1 available recommender

Unnamed: 0,locale,legacy,collaborative,similarity,Relative to all,Relative to this table
0,Available,,,,0.44747,0.9998
2,,,Available,,6e-05,0.00014
1,,,,Available,3e-05,6e-05


#### 2 available recommenders

Unnamed: 0,locale,legacy,collaborative,similarity,Relative to all,Relative to this table
0,Available,,Available,,0.26897,0.66924
1,Available,,,Available,0.1329,0.33068
2,,,Available,Available,3e-05,7e-05


#### 3 available recommenders

Unnamed: 0,locale,legacy,collaborative,similarity,Relative to all,Relative to this table
0,Available,,Available,Available,0.15043,1.0


## By dates

In this section, we perform a similar analysis as before but on subsets of the data. These subsets are specified by when the client profile was generated. `conditions` is a list that contains ranges for the profile age in weeks. The end of the range is exclusive, similar to ranges in Python's standard library.

In [508]:
conditions = [
    (0, 1),
    (1, 2),
    (2, 3),
    (3, 4)
]

In [509]:
import numpy as np
from numpy import argsort
from itertools import product

In [510]:
def attribute_between(attr, min_weeks, max_weeks):
    return lambda client: min_weeks <= client[attr] < max_weeks

In [511]:
def get_conditioned_results(attr, conditions):
    conditioned_results = {}

    for (min_weeks, max_weeks) in conditions:
        sub_rdd = rdd_completed.filter(attribute_between(attr, min_weeks, max_weeks))
        conditioned_results[(min_weeks, max_weeks)] = analyse(sub_rdd)
        
    return conditioned_results

### By profile age in weeks

In [512]:
%time conditioned_results = get_conditioned_results("profile_age_in_weeks", conditions)

CPU times: user 4.71 s, sys: 428 ms, total: 5.14 s
Wall time: 46min 37s


To make things a little bit easier to read, only recommender combinations that actually appear are displayed in the table.

In [513]:
def nonzero_combinations(conditioned_results):
    combinations = []

    for sub_result in conditioned_results.values():
        combinations += [key for key, value in sub_result.items() if value > 0]

    return set(combinations)

In [514]:
combinations = nonzero_combinations(conditioned_results)

In [515]:
def display_individual_filtered_results(conditioned_results, combinations, label):
    display(Markdown("### Filtering on the %s, Python-like exclusive ranges" % label))

    counts = []
    titles = []

    columns = recommenders.keys() + ["Relative counts"]

    for key in conditions:
        sub_results = conditioned_results[key]
        values = [sub_results[sub_key] for sub_key in combinations]
        summed = sum(values)

        sub_counts = get_relative_counts(values, summed)
        data = format_data(combinations, [sub_counts])
        counts.append(sub_counts)

        title = "Between %d and %d weeks" % key
        titles.append(title)
        display(Markdown("#### %s" % title))

        df = DataFrame(columns=columns, data=data)
        df = sorted_dataframe(df, values)
        display(df)

    return counts, titles

In [516]:
counts, titles = display_individual_filtered_results(conditioned_results, combinations, label="profile age")

### Filtering on the profile age, Python-like exclusive ranges

#### Between 0 and 1 weeks

Unnamed: 0,locale,legacy,collaborative,similarity,Relative counts
1,Available,,,,0.52749
2,Available,,Available,,0.28995
4,Available,,,Available,0.10739
0,Available,,Available,Available,0.07517
3,,,,Available,0.0
5,,,Available,,0.0
6,,,,,0.0
7,,,Available,Available,0.0


#### Between 1 and 2 weeks

Unnamed: 0,locale,legacy,collaborative,similarity,Relative counts
1,Available,,,,0.55644
2,Available,,Available,,0.22328
4,Available,,,Available,0.13656
0,Available,,Available,Available,0.08356
6,,,,,0.0001
5,,,Available,,6e-05
7,,,Available,Available,1e-05
3,,,,Available,0.0


#### Between 2 and 3 weeks

Unnamed: 0,locale,legacy,collaborative,similarity,Relative counts
1,Available,,,,0.53226
2,Available,,Available,,0.22522
4,Available,,,Available,0.14883
0,Available,,Available,Available,0.09353
5,,,Available,,6e-05
6,,,,,6e-05
7,,,Available,Available,3e-05
3,,,,Available,1e-05


#### Between 3 and 4 weeks

Unnamed: 0,locale,legacy,collaborative,similarity,Relative counts
1,Available,,,,0.52304
2,Available,,Available,,0.22984
4,Available,,,Available,0.14583
0,Available,,Available,Available,0.10121
6,,,,,5e-05
5,,,Available,,3e-05
7,,,Available,Available,2e-05
3,,,,Available,0.0


To make things a little bit easier to read, we can display all results in a single table.

In [517]:
def display_merged_filtered_results(counts, titles, total_results, combinations, label):
    values = [total_results[sub_key] for sub_key in combinations]
    sub_counts = get_relative_counts(values)
    counts.append(sub_counts)
    titles.append("Total, without any condition")  

    columns = recommenders.keys() + titles
    data = format_data(combinations, counts)

    df = DataFrame(columns=columns, data=data)
    df = sorted_dataframe(df, counts[0])

    display(Markdown("### Filtering on the %s, Python-like exclusive ranges – All in one table" % label))
    display(df)

In [518]:
display_merged_filtered_results(counts, titles, total_results, combinations, label="profile age")

### Filtering on the profile age, Python-like exclusive ranges – All in one table

Unnamed: 0,locale,legacy,collaborative,similarity,Between 0 and 1 weeks,Between 1 and 2 weeks,Between 2 and 3 weeks,Between 3 and 4 weeks,"Total, without any condition"
1,Available,,,,0.52749,0.55644,0.53226,0.52304,0.44747
2,Available,,Available,,0.28995,0.22328,0.22522,0.22984,0.26897
4,Available,,,Available,0.10739,0.13656,0.14883,0.14583,0.1329
0,Available,,Available,Available,0.07517,0.08356,0.09353,0.10121,0.15043
3,,,,Available,0.0,0.0,1e-05,0.0,3e-05
5,,,Available,,0.0,6e-05,6e-05,3e-05,6e-05
6,,,,,0.0,0.0001,6e-05,5e-05,0.00011
7,,,Available,Available,0.0,1e-05,3e-05,2e-05,3e-05


### By submission date in weeks

In [519]:
%time conditioned_results_submission_date = get_conditioned_results("submission_age_in_weeks", conditions)

CPU times: user 4.8 s, sys: 324 ms, total: 5.12 s
Wall time: 46min 36s


In [520]:
label = "submission date"
combinations = nonzero_combinations(conditioned_results_submission_date)
counts, titles = display_individual_filtered_results(conditioned_results_submission_date, combinations, label)
display_merged_filtered_results(counts, titles, total_results, combinations, label)

### Filtering on the submission date, Python-like exclusive ranges

#### Between 0 and 1 weeks

Unnamed: 0,locale,legacy,collaborative,similarity,Relative counts
0,Available,,Available,Available,0.30124
2,Available,,Available,,0.25782
1,Available,,,,0.25026
4,Available,,,Available,0.19053
6,,,,,5e-05
3,,,,Available,5e-05
7,,,Available,Available,3e-05
5,,,Available,,2e-05


#### Between 1 and 2 weeks

Unnamed: 0,locale,legacy,collaborative,similarity,Relative counts
1,Available,,,,0.37644
2,Available,,Available,,0.24523
0,Available,,Available,Available,0.19097
4,Available,,,Available,0.18718
6,,,,,8e-05
5,,,Available,,4e-05
7,,,Available,Available,3e-05
3,,,,Available,2e-05


#### Between 2 and 3 weeks

Unnamed: 0,locale,legacy,collaborative,similarity,Relative counts
1,Available,,,,0.46093
2,Available,,Available,,0.27125
0,Available,,Available,Available,0.13411
4,Available,,,Available,0.13353
6,,,,,9e-05
5,,,Available,,5e-05
3,,,,Available,2e-05
7,,,Available,Available,2e-05


#### Between 3 and 4 weeks

Unnamed: 0,locale,legacy,collaborative,similarity,Relative counts
1,Available,,,,0.47828
2,Available,,Available,,0.27232
0,Available,,Available,Available,0.12519
4,Available,,,Available,0.12397
6,,,,,0.00015
5,,,Available,,5e-05
3,,,,Available,2e-05
7,,,Available,Available,2e-05


### Filtering on the submission date, Python-like exclusive ranges – All in one table

Unnamed: 0,locale,legacy,collaborative,similarity,Between 0 and 1 weeks,Between 1 and 2 weeks,Between 2 and 3 weeks,Between 3 and 4 weeks,"Total, without any condition"
0,Available,,Available,Available,0.30124,0.19097,0.13411,0.12519,0.15043
2,Available,,Available,,0.25782,0.24523,0.27125,0.27232,0.26897
1,Available,,,,0.25026,0.37644,0.46093,0.47828,0.44747
4,Available,,,Available,0.19053,0.18718,0.13353,0.12397,0.1329
3,,,,Available,5e-05,2e-05,2e-05,2e-05,3e-05
6,,,,,5e-05,8e-05,9e-05,0.00015,0.00011
7,,,Available,Available,3e-05,3e-05,2e-05,2e-05,3e-05
5,,,Available,,2e-05,4e-05,5e-05,5e-05,6e-05
