# Task Navigation and Clicks

What percent of newcomers navigate between tasks or click on a given task in the Newcomer Tasks module?

In this case, we'll combine our canonical user table with HomepageModule to identify users in the experiment who also visited the Homepage. We'll combine this with information on task navigation and clicks from HomepageModule to aggregate for each user.

We use HomepageModule as the source for users visiting the Homepage to account for users who block EventLogging or don't have JavaScript enabled.

In [2]:
import json
import datetime as dt

from collections import defaultdict

import numpy as np
import pandas as pd

from wmfdata import hive, spark, mariadb
from growth import utils

In [16]:
## Start timestamp of the Variant C/D experiment, which is when the edit tag bug fix went into place
exp_start_ts = dt.datetime(2020, 10, 28, 18, 40, 2)

## Ordered list of wikis that we'll be gathering data for
## Note that we're excluding euwiki due to their small number of registrations
wikis = ['cswiki', 'kowiki', 'viwiki', 'arwiki', 'ukwiki', 'huwiki', 'srwiki', 'hywiki',
         'frwiki', 'fawiki', 'hewiki', 'ruwiki', 'plwiki', 'ptwiki', 'svwiki', 'trwiki']

## The mediawiki_history snapshot we'll be working with
wmf_snapshot = '2020-11'

## The canonical user table that we'll join mediawiki_history with
canonical_user_table = 'nettrom_growth.hp_variant_test2'

## Where we write out the resulting dataset for further analysis
navigation_click_counts_output_filename = 'datasets/variant-test-2-navigation-click-counts.tsv'

## Navigation and Click Data

In [13]:
navigation_click_query = '''
WITH hp AS ( -- visits, navigation, clicks within 24 hours of registration
    SELECT users.wiki_db, users.user_id,
    1 AS visited_homepage,
    SUM(IF(event.action = "se-task-navigation", 1, 0)) AS num_task_navigations,
    SUM(IF(event.action = "se-task-click", 1, 0)) AS num_task_clicks
    FROM {exp_user_table} users
    JOIN event_sanitized.homepagemodule hpm
    ON (users.wiki_db = hpm.wiki
        AND users.user_id = hpm.event.user_id)
    WHERE hpm.year = 2020
    AND hpm.month IN (10, 11)
    AND hpm.dt > "{start_ts}"
    AND (unix_timestamp(hpm.dt, "yyyy-MM-dd'T'HH:mm:ss'Z'") -
         unix_timestamp(users.user_registration, "yyyyMMddHHmmss") < 86400) -- within 24 hours of registration
    GROUP BY users.wiki_db, users.user_id
)
SELECT u.wiki_db, u.user_id, u.user_registration, u.reg_on_mobile, u.hp_enabled, u.hp_variant,
       COALESCE(hp.visited_homepage, 0) AS visited_homepage,
       COALESCE(hp.num_task_navigations, 0) AS num_task_navigations,
       COALESCE(hp.num_task_clicks, 0) AS num_task_clicks
FROM {exp_user_table} AS u
LEFT JOIN hp
ON u.wiki_db = hp.wiki_db
AND u.user_id = hp.user_id
'''

In [14]:
navigation_click_counts = spark.run(
    navigation_click_query.format(
        start_ts = exp_start_ts.strftime(utils.hive_format),
        exp_user_table = canonical_user_table
    )
)

In [15]:
len(navigation_click_counts)

67599

In [None]:
navigation_click_counts.loc[navigation_click_counts['visited_homepage'] > 0].head()

In [None]:
navigation_click_counts.loc[navigation_click_counts['num_task_navigations'] > 0].head()

In [None]:
navigation_click_counts.loc[navigation_click_counts['num_task_clicks'] > 0].head()

Write out the dataset for importing into R for further analysis.

In [21]:
navigation_click_counts.to_csv(navigation_click_counts_output_filename,
                           header = True, index = False, sep = '\t')