# Welcome Survey post-deployment QA

This notebook runs a set of sanity checks to give an indication of whether the Welcome Survey works as expected or not. It's largely based on the leading indicators defined for the first Welcome Survey deployment on the Czech and Korean Wikipedias.

## Configuration and setup

Below are a set of cells that import libraries and need setting up to support the given wiki we've deployed to.

In [1]:
import json
import datetime as dt
import pandas as pd

from wmfdata import hive
from growth import utils, db

You can find the source for `wmfdata` at https://github.com/neilpquinn/wmfdata


In [2]:
## Configuration variables

## Wiki we've deployed to
wiki = 'arwiki'

## Start timestamp from T226221#5334213, end timestamp set reasonably
start_ts = dt.datetime(2019, 7, 15, 18, 11, 11)
end_ts = dt.datetime(2019, 7, 22, 0, 0, 0)

## Set of known users, can be initialized with known accounts you want to make sure are ignored
known_users = set([1683215, 1696618, 1696619])

In [3]:
## Connect to the wiki's database
db_conn = db.get_db_conn(wiki)

In [4]:
username_patterns = ["MMiller", "Zilant", "Roan", "KHarlan", "MWang", "SBtest", "KacemMhenni"]

known_user_query = '''
SELECT user_id
FROM {wiki}.user
WHERE user_registration >= "{start_timestamp}"
AND user_name LIKE "{name_pattern}%"
'''

for u_pattern in username_patterns:
    new_known = pd.read_sql_query(known_user_query.format(
        wiki = wiki,
        start_timestamp = start_ts.strftime(utils.mw_format),
        name_pattern = u_pattern), db_conn)
    known_users = known_users | set(new_known['user_id'])


## Process responses

The cells below here collect the responses of users registered between `start_ts` and `end_ts` (that are not known users) and stores them in a data frame.

In [5]:
response_query = '''
SELECT up_user, up_value
FROM {wiki}.user_properties
JOIN {wiki}.user
ON up_user=user_id
WHERE up_property = 'welcomesurvey-responses'
AND user_registration >= "{start_timestamp}"
AND user_registration < "{end_timestamp}"
AND up_user NOT IN ({id_list})
'''

responses = pd.read_sql_query(
    response_query.format(
        wiki = wiki,
        start_timestamp = start_ts.strftime(utils.mw_format),
        end_timestamp = end_ts.strftime(utils.mw_format),
        id_list = ','.join([str(uid) for uid in known_users])),
    db_conn).applymap(utils.try_decode).rename(columns = utils.try_decode)

In [6]:
def process_responses(df):
    '''
    Process the survey responses found in the given DataFrame `df` and determine what group
    the user was in, as well as whether they saved, skipped, or abandoned the survey.
    Returns a `pandas.DataFrame` with five columns:
    user group (A/C), user ID, how the user reacted (save/skip/abandon), rendering timestamp, and save timestamp
    (if the user responded to the survey)
    '''
    
    groups = []
    userids = []
    responses = []
    render_timestamps = []
    submit_timestamps = []
    
    for row in df.itertuples():
        user_id = row.up_user
        response = json.loads(row.up_value)
        
        userids.append(user_id)
        
        if response['_group'] == 'exp2_target_popup':
            groups.append('C')
        elif response['_group'] == 'exp2_target_specialpage':
            groups.append('treatment')
        elif response['_group'] == 'exp1_group1':
            groups.append('target')
        elif response['_group'] == 'exp1_group2':
            groups.append('control')
        elif response['_group'] == 'NONE':
            groups.append('control')
            
        if not '_render_date' in response \
            or not response['_render_date']:
            render_timestamps.append(None)
        else:
            render_timestamps.append(dt.datetime.strptime(response['_render_date'],
                                                          '%Y%m%d%H%M%S'))
            
        if not '_submit_date' in response:
            responses.append('abandon')
            submit_timestamps.append(None)
            continue
        else:
            submit_timestamps.append(dt.datetime.strptime(response['_submit_date'],
                                                          '%Y%m%d%H%M%S'))
            if '_skip' in response and response['_skip'] == True:
                responses.append('skip')
            else:
                responses.append('save')
            
    return(pd.DataFrame(
        {'group': groups,
         'user_id': userids,
         'response': responses,
         'render_ts' : render_timestamps,
         'submit_ts' : submit_timestamps}))

response_df = process_responses(responses)

## Treatment/control group balance

When deploying the Welcome Survey to a new wiki, we prefer to run an A/B test in order to learn whether the survey changes user behavior, particularly whether users leave Wikipedia. Group assignment to the treatment (survey) and control groups is done randomly, and we want to check that it results in relatively evenly balanced groups (over time it should).

In [7]:
control_split = (response_df.groupby('group')['user_id']
                 .count()
                 .reset_index()
                 .rename(columns = {'user_id' : 'n'}))
control_split['percent'] = 100 * control_split['n'] / control_split['n'].sum()
control_split.round(1)

Unnamed: 0,group,n,percent
0,control,743,51.5
1,treatment,699,48.5


## Proportion of save/skip/abandon

In [8]:
response_types = (response_df.loc[response_df['group'] == 'treatment']
                  .groupby('response')['user_id']
                  .count()
                  .reset_index()
                  .rename(columns={'user_id' : 'n'}))
response_types['percent'] = 100 * response_types['n'] / response_types['n'].sum()
response_types.round(1)

Unnamed: 0,response,n,percent
0,abandon,124,17.7
1,save,500,71.5
2,skip,75,10.7


For historic comparisons, the save/skip/abandon rates on Czech and Korean Wikipedias as reported [in the initial WelcomeSurvey report](https://www.mediawiki.org/wiki/Growth/Analytics_updates/Welcome_survey_initial_report) after a month of deployment were as follows:

| Wiki | % saved | % skipped | % abandoned
| ----- | ----- | ----- | -----
| Czech | 67.4 | 12.1 | 20.5
| Korean | 62.0 | 11.8 | 26.2

### Split by desktop/mobile

Are there meaningful differences in response rates between the desktop and mobile sites? We can answer this by using the ServerSideAccountCreation schema to identify where the account was created, and combine that with our responses.

In [9]:
## Query to retrieve the user ID and whether the registration was on the mobile site
## for all non-autocreated non-app registrations between the given timestamps for the given wiki.

mob_query = '''SELECT event.userid AS user_id, event.displaymobile
FROM event_sanitized.serversideaccountcreation
WHERE year = 2019
AND month >= 7
AND dt BETWEEN "{start_timestamp}" AND "{end_timestamp}"
AND event.isapi = 0
AND event.isselfmade = 1
AND wiki = "{wiki}"
AND event.userid NOT IN ({id_list})
'''

mob_flags = hive.run(mob_query.format(
  start_timestamp = start_ts.strftime(utils.hive_format),
  end_timestamp = end_ts.strftime(utils.hive_format),
  wiki = wiki,
  id_list = ','.join([str(uid) for uid in known_users])))

In [10]:
response_mob_df = response_df.merge(mob_flags, on = 'user_id')

In [11]:
response_types_mob = (response_mob_df.loc[response_mob_df['group'] == 'treatment']
                      .groupby(['displaymobile', 'response'])['user_id']
                      .count()
                      .reset_index()
                      .rename(columns={'user_id' : 'n'}))
response_types_mob['percent'] = (
    100 * response_types_mob['n'] /
    response_types_mob.groupby('displaymobile')['n'].transform('sum'))
response_types_mob.round(1)

Unnamed: 0,displaymobile,response,n,percent
0,False,abandon,42,20.1
1,False,save,147,70.3
2,False,skip,20,9.6
3,True,abandon,82,16.8
4,True,save,352,72.0
5,True,skip,55,11.2


## Actual abandonment

We are wondering if the rate of users who abandon the wiki after having encountered the survey is comparable to users in the control group. To answer that question, we need data from the EditorJourney schema. If that is not deployed to the given wiki, then this question cannot be answered.

What we do to answer this question is that we count the number of events logged by EditorJourney. It should be 3 for any non-surveyed user, and 4 for someone who saw the survey. The events for a control-group user are:

1. Welcome
2. Central login
3. Redirect to the account creation context

A surveyed user has the survey show up as the new item 3, then the redirect post survey (or some other activity on their part). This means that if a control-group user has three or less events, they abandoned the site (for at least 24 hours).

Similarly, if a survey user has three or less events, they did not complete the survey and similarly abandoned the site for at least 24 hours. This approach will by definition label any user who saved or skipped the survey as not abandoning the site. This is partly because we are primarily interested in understanding whether users who abandon the survey *also abandon the site*.

In [12]:
ej_survey = '''
SELECT event.user_id, count(*) AS num_events
FROM event.editorjourney
WHERE year = 2019
AND month >= 7
AND wiki = "{wiki}"
AND dt BETWEEN "{start_timestamp}" AND "{end_timestamp}"
AND event.user_id NOT IN ({id_list})
GROUP BY event.user_id
LIMIT 100000
''' 

In [13]:
event_counts = hive.run(ej_survey.format(
    start_timestamp = start_ts.strftime(utils.hive_format),
    end_timestamp = end_ts.strftime(utils.hive_format),
    wiki = wiki,
    id_list = ','.join([str(uid) for uid in known_users])
))

In [14]:
response_events_df = response_mob_df.merge(event_counts, how = "left", on = "user_id").fillna(0)

In [41]:
response_events_df['did_abandon'] = False
response_events_df.loc[(response_events_df.group == 'control') &
                       (response_events_df.num_events <= 3), 'did_abandon'] = True
response_events_df.loc[(response_events_df.group == 'treatment') &
                       (response_events_df.num_events <= 3), 'did_abandon'] = True

### Overall abandonment rate

Across the entire population, what's the abandonment rate?

In [42]:
overall_abandonment = (response_events_df
                       .groupby('did_abandon')['user_id']
                       .count()
                       .reset_index()
                       .rename(columns={'user_id' : 'n'}))
overall_abandonment['percent'] = 100 * overall_abandonment['n'] / overall_abandonment['n'].sum()
overall_abandonment.round(1)

Unnamed: 0,did_abandon,n,percent
0,False,1315,91.3
1,True,126,8.7


This overall rate is roughly where we'd expect it to be, as we've seen similar numbers on Czech and Korean Wikipedias. On those wikis the abandonment rate was somewhat higher, though.

### Abandonment by group

Here we split it up into the survey and control groups.

In [44]:
by_group_abandonment = (response_events_df
                       .groupby(['group', 'did_abandon'])['user_id']
                       .count()
                       .reset_index()
                       .rename(columns={'user_id' : 'n'}))

by_group_abandonment['percent'] = (100 * by_group_abandonment['n'] /
                                   by_group_abandonment.groupby('group')['n'].transform('sum'))
by_group_abandonment.round(1)

Unnamed: 0,group,did_abandon,n,percent
0,control,False,663,89.2
1,control,True,80,10.8
2,treatment,False,652,93.4
3,treatment,True,46,6.6


Compared to Czech and Korean Wikipedias, these control group abandonment rate is about where we would expect them to be. The treatment group abandonment rate is much lower than expected, it wouldn't be out of the norm for it to be about the same as the control group.

### By group and response

We further dig in by looking at whether the survey users are more likely to abandon depending on what their response is.

In [45]:
by_group_response_abandonment = (response_events_df
                       .groupby(['group', 'response', 'did_abandon'])['user_id']
                       .count()
                       .reset_index()
                       .rename(columns={'user_id' : 'n'}))

by_group_response_abandonment['percent'] = (100 * by_group_response_abandonment['n'] /
                                   by_group_response_abandonment.groupby(['group', 'response'])['n'].transform('sum'))
by_group_response_abandonment.round(1)

Unnamed: 0,group,response,did_abandon,n,percent
0,control,abandon,False,663,89.2
1,control,abandon,True,80,10.8
2,treatment,abandon,False,80,64.5
3,treatment,abandon,True,44,35.5
4,treatment,save,False,497,99.6
5,treatment,save,True,2,0.4
6,treatment,skip,False,75,100.0


Here we can more clearly see that the rate of users in the survey group leaving is roughly similar to the rate  in the control group. Note that the counts are currently rather small, so these trends might change.

### Split by desktop/mobile, group, and response

Finally, we add the desktop/mobile aspect to it, to see if there are meaningful differences between desktop and mobile users.

In [46]:
mob_by_group_response_abandonment = (response_events_df
                       .groupby(['displaymobile', 'group', 'response', 'did_abandon'])['user_id']
                       .count()
                       .reset_index()
                       .rename(columns={'user_id' : 'n'}))

mob_by_group_response_abandonment['percent'] = (100 * mob_by_group_response_abandonment['n'] /
                                   mob_by_group_response_abandonment.groupby(['displaymobile', 'group', 'response'])['n'].transform('sum'))
mob_by_group_response_abandonment.round(1)

Unnamed: 0,displaymobile,group,response,did_abandon,n,percent
0,False,control,abandon,False,215,91.1
1,False,control,abandon,True,21,8.9
2,False,treatment,abandon,False,27,64.3
3,False,treatment,abandon,True,15,35.7
4,False,treatment,save,False,147,100.0
5,False,treatment,skip,False,20,100.0
6,True,control,abandon,False,448,88.4
7,True,control,abandon,True,59,11.6
8,True,treatment,abandon,False,53,64.6
9,True,treatment,abandon,True,29,35.4


## Number of questions answered

Do users tend to answer all the questions?

In [20]:
def count_answers(df):
    '''
    Process the survey responses for Variation C found in the given DataFrame `df` and count
    the number of responses for each user. Returns a `pandas.DataFrame` with three columns:
    user ID, how the user reacted (save/skip/abandon), and the number of questions answered.
    
    Note that we never count answers to the question about whether they wanted to be contacted
    by a mentor, as that question is always stored (it's either True or False).
    '''
    
    userids = []
    responses = []
    n_answers = []
    
    for _, row in df.iterrows():
        user_id = row[0]
        response = json.loads(row[1])
        
        userids.append(user_id)
        
        num_answers = 0
        
        if not '_submit_date' in response:
            responses.append('abandon')
            n_answers.append(0)
            continue
        elif '_skip' in response and response['_skip'] == True:
            responses.append('skip')
        else:
            responses.append('save')
            
        ## Did the user answer the "Have you edited Wikipedia before?" question?
        if 'edited' in response and response['edited'] != None:
            num_answers += 1
            
        ## Did the user answer the "Why did you create an account?" question?
        if 'reason' in response and \
            (response['reason'] != None or \
             response['reason'] != '' or \
             response['reason'] != 'placeholder'):
            num_answers += 1
            
        ## Did the user fill out any topics?
        topics = []
        if 'topics' in response:
            topics.extend(response['topics'])

        if 'topics-other-js' in response:
            topics.extend(response['topics-other-js'])
            
        if topics:
            num_answers += 1
    
        n_answers.append(num_answers)

    return(pd.DataFrame(
        {'user_id': userids,
         'response': responses,
         'num_answers': n_answers}))

num_answers_df = count_answers(responses)

In [21]:
## Grouped by number of answers, for users who didn't skip or abandon:

answers_per_user = (num_answers_df.loc[num_answers_df['response'] == 'save']
                    .groupby('num_answers')['response']
                    .count()
                    .reset_index()
                    .rename(columns={'response' : 'n'}))
answers_per_user['percent'] = 100 * answers_per_user['n'] / answers_per_user['n'].sum()
answers_per_user.round(1)

Unnamed: 0,num_answers,n,percent
0,2,63,12.6
1,3,437,87.4


## No freetext responses

We need to verify that the list of topics does not include any freetext responses as that should be turned off in this version of the Welcome Survey.

In [22]:
## Check for freetext responses.

## This is the set of autocompleted topics. We need it, because we'll ignore all of them.
autocomplete_topics = set(["entertainment", "food and drink", "biography",
                          "military", "economics", "technology", "film",
                          "philosophy", "business", "politics", "government",
                          "engineering", "crafts and hobbies", "games",
                          "health", "social science", "transportation",
                          "education"])

def check_topics(df):
    '''
    Iterate through rows of survey responses in the given Pandas data frame `df`.
    If the response was posted prior to `last_timestamp`, or the user is in a group matching
    any named in the set `control_groups`, the response will be ignored.
    
    Returns a set of topics added, either through the JavaScript-enabled free text form with
    autocompletion, or by a user that did not have JavaScript enabled.
    '''
    topics = set()
    
    for row in df.itertuples():
        user_id = row.up_user
        response = json.loads(row.up_value)
        
        ## Did the user skip the survey?
        if '_skip' in response:
            continue
            
        ## NOTE: we strip whitespace from the responses, so we'll have to do that when
        ## matching it later!
        if 'topics-other-js' in response \
            and response['topics-other-js']:
            topics.update([topic.strip() for topic in response['topics-other-js']
                           if topic not in autocomplete_topics])

        if 'topics-other-nojs' in response \
            and response['topics-other-nojs']:
            topics.add(response['topics-other-nojs'].strip())
        
    return(topics)
    
freetext_topics = check_topics(responses)

In [23]:
freetext_topics

set()

If the set is empty, then there are no freetext responses.