# Users without task impressions

This work is tracked in [T243571](https://phabricator.wikimedia.org/T243571). Per the task description, there are some users who have activated the Newcomer Tasks module, but do not have any task impression events. What's going on with those?

Marshall sent me some user IDs through email. The first one on Arabic is on mobile, and I note that on the mobile site the user has to click on the Newcomer Task module to get a task, otherwise they just see an overview.

The Czech user is on desktop, but appears to only have a couple of visit to the Homepage. The two visits are 4 seconds apart, and they happen about 40 seconds after the user registered. On the first visit, the SE module loads correctly and fetches tasks, but no task impression is shown. On the second visit, the SE module doesn't complete fetching tasks. Maybe the user navigated away before it finished loading?

The Vietnamese user activates the SE module, gets an SE module impression, but no task fetch impression. Again, did they leave before it got done loading?

Key here for this QA is to split by desktop/mobile. For desktop events, identify all SE impression events with a complete state, and join them with SE task fetch events. For mobile events, identify what proportion of users with summary impressions of an activated SE module also clicked through to view the overlay. Those that clicked through to the overlay, see if tasks finished loading.

In [14]:
import datetime as dt

import pandas as pd
import numpy as np

from wmfdata import hive, mariadb

from growth import utils

from tabulate import tabulate

In [2]:
## Configuration variables

## Czech Wikipedia setup:

## User IDs of known users to exclude (Stephane, Elena, and Marshall's accounts)
cs_known_users = set([322106, 339583, 341191, 341611, 433381, 433382, 433511, 404765, 421667,
                      427625, 437386, 181724, 272273, 339583, 437386, 439783, 439792, 138342,
                      392634, 404765, 275298, 458487, 458049])

cs_start_timestamp = dt.datetime(2019, 11, 21, 0, 24, 0)
cs_end_timestamp = dt.datetime(2020, 10, 5, 0, 0, 0)

## Korean Wikipedia setup:

## User IDs of known users to exclude
ko_known_users = set([384066, 539296, 539299, 539302, 539303, 539304, 539305, 539306, 539307,
                      539298, 416361, 416360, 413162, 495265, 518393, 518394, 518396, 530285,
                      531579, 531785, 536786, 536787, 542720, 542721, 542722, 543192, 543193,
                      544145, 544283, 470932, 38759, 555673])

ko_start_timestamp = cs_start_timestamp
ko_end_timestamp = cs_end_timestamp

## Vietnamese Wikipedia setup:

vi_known_users = set()

vi_start_timestamp = cs_start_timestamp
vi_end_timestamp = cs_end_timestamp

## Arabic Wikipedia setup:

ar_known_users = set([1683215, 1696618, 1696619])

ar_start_timestamp = cs_start_timestamp
ar_end_timestamp = cs_end_timestamp

SPARK_CONFIG = {
    "spark.dynamicAllocation.maxExecutors": 128,
    "spark.executor.memory": "1g",
    "spark.executor.cores": 2
}

In [3]:
## Grab the user IDs of known test accounts so they can be added to the exclusion list

username_patterns = ["MMiller", "Zilant", "Roan", "KHarlan", "MWang", "SBtest", "Rho2019"]

known_user_query = '''
SELECT user_id
FROM user
WHERE user_name LIKE "{name_pattern}%"
'''

for u_pattern in username_patterns:
    cs_new_known = mariadb.run(known_user_query.format(
        name_pattern = u_pattern), 'cswiki')
    cs_known_users = cs_known_users | set(cs_new_known['user_id'])

for u_pattern in username_patterns:
    ko_new_known = mariadb.run(known_user_query.format(
        name_pattern = u_pattern), 'kowiki')
    ko_known_users = ko_known_users | set(ko_new_known['user_id'])
    
for u_pattern in username_patterns:
    vi_new_known = mariadb.run(known_user_query.format(
        name_pattern = u_pattern), 'viwiki')
    vi_known_users = vi_known_users | set(vi_new_known['user_id'])
    
for u_pattern in username_patterns:
    ar_new_known = mariadb.run(known_user_query.format(
        name_pattern = u_pattern), 'viwiki')
    ar_known_users = ar_known_users | set(ar_new_known['user_id'])
    
known_users = {
    'cswiki' : cs_known_users,
    'kowiki' : ko_known_users,
    'viwiki' : vi_known_users,
    'arwiki' : ar_known_users
}

## Desktop

In [7]:
def get_desktop_events(known_users):
    '''
    For all sessions with an impression of a completed SE module, left join it with all sessions
    with at least one complete SE task fetch event, and left join it with all sessions with at
    least one SE task impression event. This only looks for desktop events, because mobile
    events have a different workflow.
    '''
    
    event_query = '''
    SELECT hpm.wiki, hpm.homepage_pageview_token AS impression_token,
           fetch.homepage_pageview_token AS fetch_token,
           task.homepage_pageview_token AS task_impression_token
    FROM (
        SELECT DISTINCT wiki, event.homepage_pageview_token
        FROM event.homepagemodule
        WHERE (year = 2020 OR (year = 2019 AND month >= 11))
        AND event.action = "impression"
        AND event.module = "suggested-edits"
        AND event.is_mobile = false
        AND event.state = "activated"
        AND (
            (wiki = "cswiki" AND event.user_id NOT IN ({cs_known}))
            OR (wiki = "kowiki" AND event.user_id NOT IN ({ko_known}))
            OR (wiki = "viwiki" AND event.user_id NOT IN ({vi_known}))
            OR (wiki = "arwiki" AND event.user_id NOT IN ({ar_known}))
        )
    ) AS hpm
    LEFT JOIN (
        SELECT DISTINCT wiki, event.homepage_pageview_token
        FROM event.homepagemodule
        WHERE (year = 2020 OR (year = 2019 AND month >= 11))
        AND event.action = "se-fetch-tasks"
        AND event.is_mobile = false    
    ) AS fetch
    ON (hpm.wiki = fetch.wiki
        AND hpm.homepage_pageview_token = fetch.homepage_pageview_token)
    LEFT JOIN (
        SELECT DISTINCT wiki, event.homepage_pageview_token
        FROM event.homepagemodule
        WHERE (year = 2020 OR (year = 2019 AND month >= 11))
        AND event.action = "se-task-impression"
        AND event.is_mobile = false
    ) AS task
    ON (hpm.wiki = task.wiki
        AND hpm.homepage_pageview_token = task.homepage_pageview_token)
    '''
    
    return(hive.run(event_query.format(
        cs_known = ','.join([str(uid) for uid in known_users['cswiki']]),
        ko_known = ','.join([str(uid) for uid in known_users['kowiki']]),
        vi_known = ','.join([str(uid) for uid in known_users['viwiki']]),
        ar_known = ','.join([str(uid) for uid in known_users['arwiki']])
    ), spark_config = SPARK_CONFIG))

In [9]:
desktop_events = get_desktop_events(known_users)

In [10]:
## Binary variable for having a fetch-tasks event
desktop_events['did_fetch'] = pd.notna(desktop_events['fetch_token'])

## Binary variable for having a task impression, conditioned on having fetched
desktop_events['did_impress'] = pd.notna(desktop_events['task_impression_token'])

Per-wiki aggregation of whether sessions had a "se-fetch-tasks" event or not:

In [11]:
desktop_fetch_agg = (desktop_events.groupby(['wiki', 'did_fetch'])
                     .agg({'impression_token' : 'count'})
                     .rename(columns = {'impression_token' : 'n_sessions'}))
desktop_fetch_agg['perc'] = (100 * desktop_fetch_agg['n_sessions'] /
                          desktop_fetch_agg.groupby('wiki')['n_sessions'].transform('sum'))
desktop_fetch_agg.round(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,n_sessions,perc
wiki,did_fetch,Unnamed: 2_level_1,Unnamed: 3_level_1
arwiki,False,350,4.1
arwiki,True,8187,95.9
cswiki,False,106,2.3
cswiki,True,4545,97.7
kowiki,False,72,3.6
kowiki,True,1933,96.4
viwiki,False,191,5.6
viwiki,True,3211,94.4


In [None]:
print(tabulate(desktop_fetch_agg.round(1), tablefmt = 'github', headers = 'keys'))

Similar aggregation, but this time also whether the session had an impression event:

In [12]:
desktop_impress_agg = (desktop_events.groupby(['wiki', 'did_fetch', 'did_impress'])
                     .agg({'impression_token' : 'count'})
                     .rename(columns = {'impression_token' : 'n_sessions'}))
desktop_impress_agg['perc'] = (100 * desktop_impress_agg['n_sessions'] /
                          desktop_impress_agg.groupby(['wiki'])['n_sessions'].transform('sum'))
desktop_impress_agg.round(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,n_sessions,perc
wiki,did_fetch,did_impress,Unnamed: 3_level_1,Unnamed: 4_level_1
arwiki,False,False,348,4.1
arwiki,False,True,2,0.0
arwiki,True,False,243,2.8
arwiki,True,True,7944,93.1
cswiki,False,False,106,2.3
cswiki,True,False,53,1.1
cswiki,True,True,4492,96.6
kowiki,False,False,72,3.6
kowiki,True,False,79,3.9
kowiki,True,True,1854,92.5


In [23]:
print(tabulate(desktop_impress_agg.reset_index().round(1).rename(
    columns = {'wiki' : 'Wiki',
               'did_fetch' : '`se-fetch-tasks`',
               'did_impress' : '`se-task-impression`',
               'n_sessions' : 'N sessions',
               'perc' : 'Percentage'}).replace({
    True : 'Yes',
    False : 'No',
    'arwiki' : 'Arabic',
    'cswiki' : 'Czech',
    'kowiki' : 'Korean',
    'viwiki' : 'Vietnamese'}),
               tablefmt = 'github', headers = 'keys', showindex = False))

| Wiki       | `se-fetch-tasks`   | `se-task-impression`   |   N sessions | Percentage   |
|------------|--------------------|------------------------|--------------|--------------|
| Arabic     | No                 | No                     |          348 | 4.1          |
| Arabic     | No                 | Yes                    |            2 | No           |
| Arabic     | Yes                | No                     |          243 | 2.8          |
| Arabic     | Yes                | Yes                    |         7944 | 93.1         |
| Czech      | No                 | No                     |          106 | 2.3          |
| Czech      | Yes                | No                     |           53 | 1.1          |
| Czech      | Yes                | Yes                    |         4492 | 96.6         |
| Korean     | No                 | No                     |           72 | 3.6          |
| Korean     | Yes                | No                     |           79 | 3.9          |

We see some session see the task fetch complete, but do not have a task impression event. This varies from 1.1% in Czech to 3.7% in Korean, with Arabic and Vietnamese in between these. These proportion don't seem alarming to me, unless these are supposed to be fast and occur immediately after another. We also do not know how many of these stem from unreliable EventLogging.

We can also see that there are handul of sessions that do *not* have a task fetch event, but *do* have a task impression event. In that case, as well as in the previous case, I suspect this is partly due to EventLogging not being completely reliable.

## Mobile events

As mentioned above, identify what proportion of users with summary impressions of an activated SE module also clicked through to view the overlay. For those that clicked through to the overlay, see if tasks finished loading.

"mobile-details" is the server-side version of the module, used if the user clicks through before all JavaScript is loaded. Data from HomepageModule shows that they're an order of magnitude or two fewer such events than "mobile-overlay", so I'm just going to ignore them for this analysis.

In [24]:
def get_mobile_events(known_users):
    '''
    See description above.
    '''

    ## The SE funnel on mobile is:
    ## 1: mobile-summary impression with state "activated"
    ## 2: mobile-overlay impression with state "activated" (user clicked to see the suggested edits)
    ## 3: se-fetch-tasks
    ## 4: se-task-impression
    
    event_query = '''
    SELECT hpm.wiki, hpm.homepage_pageview_token AS impression_token,
           overlay.homepage_pageview_token AS overlay_token,
           fetch.homepage_pageview_token AS fetch_token,
           task_impression.homepage_pageview_token AS task_impression_token
    FROM (
        SELECT DISTINCT wiki, event.homepage_pageview_token
        FROM event.homepagemodule
        WHERE (year = 2020 OR (year = 2019 AND month >= 11))
        AND event.action = "impression"
        AND event.module = "suggested-edits"
        AND event.mode = "mobile-summary"
        AND event.state = "activated"
        AND event.is_mobile = true
        AND (
            (wiki = "cswiki" AND event.user_id NOT IN ({cs_known}))
            OR (wiki = "kowiki" AND event.user_id NOT IN ({ko_known}))
            OR (wiki = "viwiki" AND event.user_id NOT IN ({vi_known}))
            OR (wiki = "arwiki" AND event.user_id NOT IN ({ar_known}))
        )
    ) AS hpm
    LEFT JOIN (
        SELECT DISTINCT wiki, event.homepage_pageview_token
        FROM event.homepagemodule
        WHERE (year = 2020 OR (year = 2019 AND month >= 11))
        AND event.action = "impression"
        AND event.module = "suggested-edits"
        AND event.mode = "mobile-overlay"
        AND event.is_mobile = true
    ) AS overlay
    ON (hpm.wiki = overlay.wiki
        AND hpm.homepage_pageview_token = overlay.homepage_pageview_token)
    LEFT JOIN (
        SELECT DISTINCT wiki, event.homepage_pageview_token
        FROM event.homepagemodule
        WHERE (year = 2020 OR (year = 2019 AND month >= 11))
        AND event.action = "se-fetch-tasks"
        AND event.is_mobile = true
    ) AS fetch
    ON (hpm.wiki = fetch.wiki
        AND hpm.homepage_pageview_token = fetch.homepage_pageview_token)
    LEFT JOIN (
        SELECT DISTINCT wiki, event.homepage_pageview_token
        FROM event.homepagemodule
        WHERE (year = 2020 OR (year = 2019 AND month >= 11))
        AND event.action = "se-task-impression"
        AND event.is_mobile = true
    ) AS task_impression
    ON (hpm.wiki = task_impression.wiki
        AND hpm.homepage_pageview_token = task_impression.homepage_pageview_token)
    '''
    
    return(hive.run(event_query.format(
        cs_known = ','.join([str(uid) for uid in known_users['cswiki']]),
        ko_known = ','.join([str(uid) for uid in known_users['kowiki']]),
        vi_known = ','.join([str(uid) for uid in known_users['viwiki']]),
        ar_known = ','.join([str(uid) for uid in known_users['arwiki']])
    ), spark_config = SPARK_CONFIG))

In [25]:
mobile_events = get_mobile_events(known_users)

In [26]:
## Binary variable for having an overlay event
mobile_events['did_overlay'] = pd.notna(mobile_events['overlay_token'])

## Binary variable for having a fetch-tasks event
mobile_events['did_fetch'] = pd.notna(mobile_events['fetch_token'])

## Binary variable for having a task impression
mobile_events['did_task_impress'] = pd.notna(mobile_events['task_impression_token'])

## Clicking through to the mobile overlay

First, we aggregate to what extent users clicked through to the overlay. Because we're using HomepageModule to identify sessions, users who have those events should have completed loading the Homepage such that they would also potentially have a "mobile-overlay" event. In other words, users who left the Homepage before the page finished loading won't be part of this analysis.

In [27]:
mobile_overlay_agg = (mobile_events.groupby(['wiki', 'did_overlay'])
                     .agg({'impression_token' : 'count'})
                     .rename(columns = {'impression_token' : 'n_sessions'}))
mobile_overlay_agg['perc'] = (100 * mobile_overlay_agg['n_sessions'] /
                          mobile_overlay_agg.groupby(['wiki'])['n_sessions'].transform('sum'))
mobile_overlay_agg.round(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,n_sessions,perc
wiki,did_overlay,Unnamed: 2_level_1,Unnamed: 3_level_1
arwiki,False,9929,84.0
arwiki,True,1896,16.0
cswiki,False,1234,87.8
cswiki,True,172,12.2
kowiki,False,5279,93.5
kowiki,True,368,6.5
viwiki,False,1017,85.6
viwiki,True,171,14.4


In [29]:
print(tabulate(mobile_overlay_agg.reset_index().round(1).rename(
    columns = {'wiki' : 'Wiki',
               'did_overlay' : 'Saw overlay',
               'did_fetch' : '`se-fetch-tasks`',
               'did_impress' : '`se-task-impression`',
               'n_sessions' : 'N sessions',
               'perc' : 'Percentage'}).replace({
    True : 'Yes',
    False : 'No',
    'arwiki' : 'Arabic',
    'cswiki' : 'Czech',
    'kowiki' : 'Korean',
    'viwiki' : 'Vietnamese'}),
               tablefmt = 'github', headers = 'keys', showindex = False))

| Wiki       | Saw overlay   |   N sessions |   Percentage |
|------------|---------------|--------------|--------------|
| Arabic     | No            |         9929 |         84   |
| Arabic     | Yes           |         1896 |         16   |
| Czech      | No            |         1234 |         87.8 |
| Czech      | Yes           |          172 |         12.2 |
| Korean     | No            |         5279 |         93.5 |
| Korean     | Yes           |          368 |          6.5 |
| Vietnamese | No            |         1017 |         85.6 |
| Vietnamese | Yes           |          171 |         14.4 |


For three of the four wikis, we see a consistent click-through rate around 15% somewhere (14.5–17.6%), while on Korean it's much lower (7.1%). Maybe our call-to-action can be stronger to make it more attractive to users to explore the suggested edits?

## Did they fetch tasks?

For users who did click through to the overlay, to what extent did their tasks complete loading?

In [48]:
mobile_fetch_agg = (mobile_events.loc[mobile_events['did_overlay'] == True]
                    .groupby(['wiki', 'did_fetch'])
                    .agg({'impression_token' : 'count'})
                    .rename(columns = {'impression_token' : 'n_sessions'}))
mobile_fetch_agg['perc'] = (100 * mobile_fetch_agg['n_sessions'] /
                          mobile_fetch_agg.groupby(['wiki'])['n_sessions'].transform('sum'))
mobile_fetch_agg.round(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,n_sessions,perc
wiki,did_fetch,Unnamed: 2_level_1,Unnamed: 3_level_1
arwiki,False,30,1.7
arwiki,True,1765,98.3
cswiki,False,1,0.5
cswiki,True,191,99.5
kowiki,False,6,1.7
kowiki,True,343,98.3
testwiki,True,28,100.0
viwiki,False,3,1.9
viwiki,True,155,98.1


It's clear that in only a few cases do users not complete the task fetching, <2% looks to be the norm. I doubt this is a cause for concern.

## Did they see a suggested task?

For users who completed loading tasks, did they also have a task impression event?

In [49]:
mobile_task_impression_agg = (mobile_events.loc[mobile_events['did_fetch'] == True]
                              .groupby(['wiki', 'did_task_impress'])
                              .agg({'impression_token' : 'count'})
                              .rename(columns = {'impression_token' : 'n_sessions'}))
mobile_task_impression_agg['perc'] = (100 * mobile_task_impression_agg['n_sessions'] /
                          mobile_task_impression_agg.groupby(['wiki'])['n_sessions'].transform('sum'))
mobile_task_impression_agg.round(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,n_sessions,perc
wiki,did_task_impress,Unnamed: 2_level_1,Unnamed: 3_level_1
arwiki,False,22,1.2
arwiki,True,1745,98.8
cswiki,False,1,0.5
cswiki,True,190,99.5
kowiki,False,2,0.6
kowiki,True,341,99.4
testwiki,True,28,100.0
viwiki,True,155,100.0


We do see a handful of users on some wikis completing the fetching process but not also seeing a task impression event. The largest proportion is on Arabic with 1.2%. Again not a cause for concern?

In [31]:
mobile_impress_agg = (mobile_events.groupby(['wiki', 'did_fetch', 'did_task_impress'])
                     .agg({'impression_token' : 'count'})
                     .rename(columns = {'impression_token' : 'n_sessions'}))
mobile_impress_agg['perc'] = (100 * mobile_impress_agg['n_sessions'] /
                          mobile_impress_agg.groupby(['wiki'])['n_sessions'].transform('sum'))
mobile_impress_agg.round(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,n_sessions,perc
wiki,did_fetch,did_task_impress,Unnamed: 3_level_1,Unnamed: 4_level_1
arwiki,False,False,9960,84.2
arwiki,True,False,23,0.2
arwiki,True,True,1842,15.6
cswiki,False,False,1235,87.8
cswiki,True,False,1,0.1
cswiki,True,True,170,12.1
kowiki,False,False,5286,93.6
kowiki,True,False,2,0.0
kowiki,True,True,359,6.4
viwiki,False,False,1021,85.9


In [33]:
print(tabulate(mobile_impress_agg.reset_index().round(1).rename(
    columns = {'wiki' : 'Wiki',
               'did_fetch' : '`se-fetch-tasks`',
               'did_task_impress' : '`se-task-impression`',
               'n_sessions' : 'N sessions',
               'perc' : 'Percentage'}).replace({
    True : 'Yes',
    False : 'No',
    'arwiki' : 'Arabic',
    'cswiki' : 'Czech',
    'kowiki' : 'Korean',
    'viwiki' : 'Vietnamese'}),
               tablefmt = 'github', headers = 'keys', showindex = False))

| Wiki       | `se-fetch-tasks`   | `se-task-impression`   | N sessions   | Percentage   |
|------------|--------------------|------------------------|--------------|--------------|
| Arabic     | No                 | No                     | 9960         | 84.2         |
| Arabic     | Yes                | No                     | 23           | 0.2          |
| Arabic     | Yes                | Yes                    | 1842         | 15.6         |
| Czech      | No                 | No                     | 1235         | 87.8         |
| Czech      | Yes                | No                     | Yes          | 0.1          |
| Czech      | Yes                | Yes                    | 170          | 12.1         |
| Korean     | No                 | No                     | 5286         | 93.6         |
| Korean     | Yes                | No                     | 2            | No           |
| Korean     | Yes                | Yes                    | 359          | 6.4          |

In [52]:
mobile_impress_agg = (mobile_events.groupby(['wiki', 'did_overlay', 'did_fetch', 'did_task_impress'])
                     .agg({'impression_token' : 'count'})
                     .rename(columns = {'impression_token' : 'n_sessions'}))
mobile_impress_agg['perc'] = (100 * mobile_impress_agg['n_sessions'] /
                          mobile_impress_agg.groupby(['wiki', 'did_overlay'])['n_sessions'].transform('sum'))
mobile_impress_agg.round(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,n_sessions,perc
wiki,did_overlay,did_fetch,did_task_impress,Unnamed: 4_level_1,Unnamed: 5_level_1
arwiki,False,False,False,9299,100.0
arwiki,False,True,False,1,0.0
arwiki,False,True,True,1,0.0
arwiki,True,False,False,30,1.7
arwiki,True,True,False,21,1.2
arwiki,True,True,True,1744,97.2
cswiki,False,False,False,1129,100.0
cswiki,True,False,False,1,0.5
cswiki,True,True,False,1,0.5
cswiki,True,True,True,190,99.0
