# Predicting Post-click Engagement Modes in Online News 

This notebook provides an example for using the trained Gaussian Mixture Model from [1] for identifying post-click engagement modes. If you end up using this model, please cite the work below.

[1] Nir Grinberg. 2018. Identifying Modes of User Engagement with Online News and Their Relationship to Information Gain in Text. In WWW 2018: The 2018 Web Conference, April 23–27, 2018, Lyon, France. ACM, New York, NY, USA, 10 pages. https://doi.org/https://doi.org/10.1145/3178876.3186180

In [1]:
import numpy as np
import cPickle
from sklearn import mixture

In [2]:
# input: an N by 6 matrix of post-click engagement summaries. N=5 in the example below. 
# The columns are Depth (px), Dwell Time (sec), Active Engagement (sec), 
# Relative Depth (fraction), Speed (px/min), Normalized Engagement (sec).
# Notice that bounce backs (defined as <10 secs dwell time) were separated out prior to clustering, 
# so you may want to "hard assign" such engagements to a separate cluster as done in the paper. 
eng_data_raw = np.array(
    [[   2970.0,   202.66,   1.11,   2.84,   872.1 ,   759.08],
     [   5245.0,   159.9 ,   1.01,   3.56,  1963.57,   435.55],
     [   2463.0,    81.14,   0.29,   1.14,  1806.78,   130.6 ],
     [   4211.0,    16.75,   0.13,   0.96, 14971.21,    19.97],
     [   1063.0,   368.65,   0.06,   0.39,   173.85,    23.76]])
eng_data = eng_data_raw
eng_data[:,0] /= 100 # the model was trained on Depth in units of 100 pixels
eng_data[:,2] *= 100 # the model was trained on Rel. Depth in 0-100 percent
eng_data = np.log2(1+eng_data)

In [3]:
# load trained model
with open('reads_balanced_gmm.pickle', "rb") as f_in:
    gmm = cPickle.load(f_in)



In [4]:
# helper function for pretty printing numbers
import contextlib

@contextlib.contextmanager
def printoptions(*args, **kwargs):
    original = np.get_printoptions()
    np.set_printoptions(*args, **kwargs)
    try:
        yield
    finally: 
        np.set_printoptions(**original)

In [5]:
# predict hard assigments for each session summary
gmm.predict(eng_data)

array([0, 2, 0, 4, 3])

In [6]:
# get posterior probabilities
gmm.predict_proba(eng_data)

array([[7.73079589e-01, 2.69841151e-07, 6.43879378e-06, 2.26913702e-01,
        1.64301757e-13],
       [2.11651698e-07, 1.34451227e-10, 9.99915098e-01, 8.46902179e-05,
        4.36043107e-10],
       [8.01730666e-01, 1.65471885e-06, 4.79008473e-08, 1.85134265e-01,
        1.31333660e-02],
       [6.81120337e-12, 7.92945164e-10, 1.20250439e-36, 9.16894408e-07,
        9.99999082e-01],
       [3.88501462e-04, 6.92884242e-04, 3.65064903e-59, 9.98918614e-01,
        1.51767890e-42]])

In [7]:
# same as above, just fewer digits
with printoptions(precision=2, suppress=True):
    print(gmm.predict_proba(eng_data))

[[0.77 0.   0.   0.23 0.  ]
 [0.   0.   1.   0.   0.  ]
 [0.8  0.   0.   0.19 0.01]
 [0.   0.   0.   0.   1.  ]
 [0.   0.   0.   1.   0.  ]]


One can see that the hard assignment assigned each engagement summary to the most probable mode in this case

In [8]:
# Let's look at the cluster means. 
# Each row is a cluster mean in pre-log space with depth (first column) in units of 100 pixels
with printoptions(precision=2, suppress=True):
    print(2**gmm.means_-1)

[[  23.85   87.87   30.46    1.1  1628.72   86.17]
 [   6.56  647.42    8.14    0.37   50.58    9.6 ]
 [  60.38  226.99   79.21    1.68 1596.82  134.03]
 [  18.24  398.92   15.42    0.94  273.76   43.42]
 [  23.05   24.31   13.24    1.06 5671.11   37.54]]


The paper reorders these for ease of exposition as 2 (read), 5 (shallow), 3 (read long), 4 (idle), 1 (scan)

In [9]:
# For complition, let's print the covariance matrices:
for cov in gmm.covariances_:
    print np.around(cov,decimals=2)

[[ 0.19  0.15  0.22  0.04  0.04  0.11]
 [ 0.15  0.67  0.38  0.02 -0.52  0.29]
 [ 0.22  0.38  1.05  0.04 -0.16  0.88]
 [ 0.04  0.02  0.04  0.11  0.02  0.25]
 [ 0.04 -0.52 -0.16  0.02  0.57 -0.18]
 [ 0.11  0.29  0.88  0.25 -0.18  1.46]]
[[ 1.48  2.26  0.87  0.35  0.3   0.82]
 [ 2.26 11.78  0.93  0.47 -7.78  2.89]
 [ 0.87  0.93  2.29  0.3   0.21  1.85]
 [ 0.35  0.47  0.3   0.16  0.07  0.37]
 [ 0.3  -7.78  0.21  0.07  9.51 -0.88]
 [ 0.82  2.89  1.85  0.37 -0.88  4.85]]
[[ 0.36  0.27  0.31  0.12  0.09  0.12]
 [ 0.27  1.43  0.76  0.06 -1.16  0.56]
 [ 0.31  0.76  1.21  0.08 -0.45  0.97]
 [ 0.12  0.06  0.08  0.32  0.06  0.42]
 [ 0.09 -1.16 -0.45  0.06  1.25 -0.44]
 [ 0.12  0.56  0.97  0.42 -0.44  1.6 ]]
[[ 0.95  1.5   0.87  0.2  -0.5   0.47]
 [ 1.5   5.8   2.16  0.29 -4.22  1.57]
 [ 0.87  2.16  2.99  0.19 -1.25  2.73]
 [ 0.2   0.29  0.19  0.19 -0.08  0.43]
 [-0.5  -4.22 -1.25 -0.08  3.68 -1.07]
 [ 0.47  1.57  2.73  0.43 -1.07  3.65]]
[[ 0.63  0.2   0.3   0.12  0.45 -0.05]
 [ 0.2   0.36  0.32  