# TAAR – Evaluating existing recommenders

Not every recommender can always make a recommendation. To evaluate the individual recommenders for the ensemble, we want to find out how often this is the case and how well the recommenders complement each other.

This notebook either needs to be executed in the [TAAR](http://github.com/mozilla/taar) repository or somewhere where TAAR is in the Python path, because some TAAR recommenders are loaded in.

## Retrieving the relevant variables from the longitudinal dataset

In [54]:
%%time
frame = sqlContext.sql("""
WITH addons AS (
    SELECT client_id, feature_row.*
    FROM longitudinal
    LATERAL VIEW explode(active_addons[1]) feature_row
),
    
non_system_addons AS(
    SELECT client_id, collect_set(key) AS installed_addons
    FROM addons
    WHERE NOT value.is_system
    GROUP BY client_id
)

SELECT
    l.client_id,
    non_system_addons.installed_addons,
    l.settings[1].locale AS locale,
    l.geo_city[1] AS geoCity,
    subsession_length[1] AS subsessionLength,
    system_os[1].name AS os,
    scalar_parent_browser_engagement_total_uri_count[1].value AS total_uri,
    scalar_parent_browser_engagement_tab_open_event_count[1].value as tab_open_count,
    places_bookmarks_count[1].sum as bookmark_count,
    scalar_parent_browser_engagement_unique_domains_count[1].value as unique_tlds
FROM longitudinal l LEFT OUTER JOIN non_system_addons
ON l.client_id = non_system_addons.client_id
""")

rdd = frame.rdd

CPU times: user 128 ms, sys: 28 ms, total: 156 ms
Wall time: 18min 16s


## Loading addon data (AMO)

We need to load the addon database to find out which addons are legacy addons.

In [3]:
from taar.recommenders.utils import get_s3_json_content

In [4]:
AMO_DUMP_BUCKET = 'telemetry-parquet'
AMO_DUMP_KEY = 'telemetry-ml/addon_recommender/addons_database.json'

In [5]:
amo_dump = get_s3_json_content(AMO_DUMP_BUCKET, AMO_DUMP_KEY)

## Filtering out legacy addons

This is a helper function that takes a list of addon IDs and only returns the IDs that are from legacy addons.

In [6]:
def get_legacy_addons(installed_addons):
    legacy_addons = []
    
    for addon_id in installed_addons:
        if addon_id in amo_dump:
            addon = amo_dump[addon_id]
            addon_files = addon.get('current_version', {}).get('files', {})

            is_webextension = any([f.get("is_webextension", False) for f in addon_files])
            is_legacy = not is_webextension

            if is_legacy:
                legacy_addons.append(addon_id)
            
    return legacy_addons

## Completing client data

In [7]:
def complete_client_data(client_data):
    client = client_data.asDict()
    
    client['installed_addons'] = client['installed_addons'] or []
    client['disabled_addon_ids'] = get_legacy_addons(client['installed_addons'])
    client['locale'] = str(client['locale'])
    
    return client

## Evaluating the existing recommenders

To check if a recommender is able to make a recommendation, it's sometimes easier and cleaner to directly query it instead of checking the important attributes ourselves. For example, this is the case for the locale recommender.

In [8]:
from taar.recommenders import CollaborativeRecommender, LegacyRecommender, LocaleRecommender

In [9]:
class DummySimilarityRecommender:
    def can_recommend(self, client_data):
        REQUIRED_FIELDS = ["geoCity", "subsessionLength", "locale", "os", "bookmark_count", "tab_open_count",
                           "total_uri", "unique_tlds"]

        has_fields = all([client_data.get(f, None) is not None for f in REQUIRED_FIELDS])
        return has_fields

In [10]:
recommenders = {
    "collaborative": CollaborativeRecommender(),
    "legacy": LegacyRecommender(),
    "locale": LocaleRecommender(),
    "similarity": DummySimilarityRecommender()
}

In [11]:
def test_recommenders(client):
    return tuple([recommender.can_recommend(client) for recommender in recommenders.values()])

## Computing combined counts

We iterate over all clients in the longitudinal dataset, change the attributes to the expected format and then query the individual recommenders.

In [12]:
from operator import add
from collections import defaultdict

In [55]:
%%time
results = rdd\
    .map(complete_client_data)\
    .map(test_recommenders)\
    .map(lambda x: (x, 1))\
    .reduceByKey(add)\
    .collect()

CPU times: user 9.54 s, sys: 468 ms, total: 10 s
Wall time: 11min 5s


In [56]:
results = defaultdict(int, results)

In [57]:
num_clients = sum(results.values())

## Computing individual counts

In [58]:
individual_counts = []

for i in range(len(recommenders)):
    count = 0
    
    for key, key_count in results.items():
        if key[i]:
            count += key_count
            
    individual_counts.append(count)

## Displaying the results

In [59]:
from pandas import DataFrame

In [60]:
def format_int(num):
    return "{:,}".format(num)

In [61]:
def format_frequency(frequency):
    return "%.5f" % frequency

In [62]:
def get_relative_counts(counts, total=num_clients):
    return [format_frequency(count / float(total)) for count in counts]

This is a bit hacky. Sorting a data frame by formatted counts does not work; so we have to add the unformatted ones, sort the data frame, and then remove that column again.

In [63]:
def sorted_dataframe(df, order, key="unformatted_counts"):
    df[key] = order
    return df.sort_values(by=key, ascending=False).drop(key, axis=1)

### Individual counts

In [65]:
df = DataFrame(index=recommenders.keys(),
          columns=["Relative count"],
          data=get_relative_counts(individual_counts)
)

sorted_dataframe(df, individual_counts)

Unnamed: 0,Relative count
locale,0.89227
collaborative,0.37663
similarity,0.2752
legacy,0.0147


$\implies$ The locale and collaborative recommenders are able to generate recommendations most of the time. The legacy recommender can only make recommendations very seldomly as not many users seem to have (legacy) addons installed.

### Combined counts

It's interesting to see how well the individual recommenders complement each other. In the following, we count how often different combinations of the recommenders can make recommendations.

The table is easier to read if cells are empty if a recommender is not available. If this is not desired, these variables can be changed:

In [98]:
recommender_available_label = "Available"
recommender_unavailable_label = ""

In [99]:
def format_labels(keys):
    return tuple([recommender_available_label if key else recommender_unavailable_label for key in keys])

In [100]:
def format_data(keys, counts):
    formatted_keys = map(format_labels, keys)
    return [elems + (count,) for elems, count in zip(formatted_keys, *counts)]

In [101]:
columns = recommenders.keys() + ["Relative counts"]

counts = get_relative_counts(results.values())
data = format_data(results.keys(), [counts])

In [102]:
df = DataFrame(columns=columns, data=data)
sorted_dataframe(df, results.values())

Unnamed: 0,locale,legacy,collaborative,similarity,Relative counts
1,Available,,,,0.38608
3,Available,,Available,,0.22323
0,Available,,Available,Available,0.13861
5,Available,,,Available,0.12965
2,,,,,0.10762
10,Available,Available,Available,,0.00781
7,Available,Available,Available,Available,0.00689
6,,,Available,,5e-05
9,,,Available,Available,3e-05
11,,,,Available,3e-05


$\implies$ If any recommender is available, then the locale recommenders is generally also available. Other than that, there is a good chance the the collaborative recommender is available.
There is only a very small portion of cases where the similarity recommender was able to make a recommendation, when locale/collaborative were not; and not a single such case for the legacy recommender.

### Grouped by number of available recommenders

In [92]:
from itertools import groupby
from operator import itemgetter

In [93]:
from IPython.display import display, Markdown

In [103]:
for num, group in groupby(sorted(results.keys(), key=sum), sum):
    display(Markdown("#### %d available recommender%s" % (num, "s" if num != 1 else "")))
    
    sub_keys = list(group)
    formatted_keys = map(format_labels, sub_keys)
    
    sub_counts = [results[key] for key in sub_keys]
    sub_counts_to_total = get_relative_counts(sub_counts)
    sub_counts_to_table = get_relative_counts(sub_counts, sum(sub_counts))
    
    zipped_data = zip(formatted_keys, sub_counts_to_total, sub_counts_to_table)
    data = [elems + (counts, table_counts) for elems, counts, table_counts in zipped_data]
    
    columns = recommenders.keys() + ["Relative to all", "Relative to this table"]
    
    df = DataFrame(columns=columns, data=data)
    df = sorted_dataframe(df, sub_counts)
    display(df)

#### 0 available recommenders

Unnamed: 0,locale,legacy,collaborative,similarity,Relative to all,Relative to this table
0,,,,,0.10762,1.0


#### 1 available recommender

Unnamed: 0,locale,legacy,collaborative,similarity,Relative to all,Relative to this table
0,Available,,,,0.38608,0.99979
1,,,Available,,5e-05,0.00014
2,,,,Available,3e-05,7e-05


#### 2 available recommenders

Unnamed: 0,locale,legacy,collaborative,similarity,Relative to all,Relative to this table
0,Available,,Available,,0.22323,0.63255
1,Available,,,Available,0.12965,0.36736
3,,,Available,Available,3e-05,8e-05
2,,Available,Available,,0.0,1e-05


#### 3 available recommenders

Unnamed: 0,locale,legacy,collaborative,similarity,Relative to all,Relative to this table
0,Available,,Available,Available,0.13861,0.94667
2,Available,Available,Available,,0.00781,0.05332
1,,Available,Available,Available,0.0,1e-05


#### 4 available recommenders

Unnamed: 0,locale,legacy,collaborative,similarity,Relative to all,Relative to this table
0,Available,Available,Available,Available,0.00689,1.0
