# Offline-to-Online Training

Our model is trained generatively---the observed data log-likelihood is maximized using the EM algorithm. However, our goal is to deploy the model in a predictive setting. We want to predict the most likely future trajectory given (1) any baseline information and (2) the noisy marker values observed so far. The focus of this notebook is to understand how we can adjust the parameters of a generative trajectory model in order to improve the performance on the trajectory prediction task.

## Related Work

The paper by [Raina and Ng (2003)](http://ai.stanford.edu/~rajatr/papers/nips03-hybrid.pdf) describes a hybrid generative/discriminative model. One of the key ideas in this work is the relative importance of random variables in the generative model when applied in a predictive context (i.e. the generative model is used to derive a conditional probability through Bayes rule). On page 3 there is an interesting point: they show that the decision rule for binary classification of UseNet documents can be formulated as a comparison between the sum of log-likelihood terms. They note that if features are extracted from, say, the message title and the message body there are many more log-likelihood terms for the body than there are for the title. The title, however, may be informative for making the decision. The NBC (or more generally any generatively trained classifier) will treat them all equally, however.

## Experimental Setup

The metric of interest is the mean absolute error aggregated within the usual buckets we've defined---(1,2], (2,4], (4,8], and (8,25]. We'll begin by looking at predictions made after observing one year of data (i.e. we will only train a single online-adapted model). We will compare to two baselines. The first baseline will be the predictions made using the MAP estimate of the subtype under full information (i.e. observing all of the individual's pFVC data) and the second baseline will be the MAP estimate of the subtype under one year of data (i.e. the standard conditional prediction obtained via application of Bayes rule to the generative model).

## Methods

For each individual $i$, let $y_i$ denote the vector of observed measurements, $t_i$ the measurement times, and $x_i$ the vector of covariates used in the population and subpopulation models. Each individual is associated with a subtype, which we denote using $z_i \in \{1, \ldots, K\}$. Let $\Phi_1(t_i)$ denote the population feature matrix, $\Phi_2(t_i)$ denote the subpopulation feature matrix, and $\Phi_3(t_i)$ denote the individual-specific long-term effects feature matrix.

In the generative model, we specify the marginal probability of subtype membership and the conditional probability of observed markers given subtype membership. The marginal probability of subtype membership is modeled using softmax multiclass regression:

$$ p(z_i = k \mid w_{1:K}) \propto \exp \{ x_i^\top w_k \}. $$

The conditional probability of a marker sequence given subtype membership is

$$ p(y_i \mid z_i = k, \beta_{1:K}) = \mathcal{N} ( m_i(k), \Sigma_i ), $$

where

$$ m_i(k) = \Phi_1(t_i) \Lambda x_i + \Phi_2(t_i) \beta_k $$
and
$$ \Sigma_i = \Phi_3(t_i) \Sigma_b \Phi_3^\top(t_i) + K_{\text{OU}}(t_i) + \sigma^2 \mathbf{1}. $$

Given some observed data $y_i$, the posterior over subtype membership $z_i$ is

$$ p(z_i = k \mid y_i) \propto p(z_i = k \mid w_{1:K}) p(y_i \mid z_i = k, \beta_{1:K}). $$

## Code

In [1]:
import numpy as np
import pandas as pd

from imp import reload

In [2]:
import sys
sys.path.append('/Users/pschulam/Git/mypy')

In [140]:
np.set_printoptions(precision=2)

### B-spline Basis

This basis is **hard-coded** to implement the exact basis functions used to fit the model in the R code.

In [4]:
from mypy import bsplines

boundaries   = (-1.0, 23.0)
degree       = 2
num_features = 6
basis = bsplines.universal_basis(boundaries, degree, num_features)

### Kernel Function

In [5]:
from mypy.util import as_row, as_col

def kernel(x1, x2=None, a_const=1.0, a_ou=1.0, l_ou=1.0):
    symmetric = x2 is None
    d = differences(x1, x1) if symmetric else differences(x1, x2)
    K = a_const * np.ones_like(d)
    K += ou_kernel(d, a_ou, l_ou)
    if symmetric:
        K += np.eye(x1.size)
    return K

def ou_kernel(d, a, l):
    return a * np.exp( - np.abs(d) / l )

def differences(x1, x2):
    return as_col(x1) - as_row(x2)

In [6]:
x_test = np.linspace(0, 20, 41)
X_test = basis.eval(x_test)
K_test = kernel(x_test, a_const=16.0, a_ou=36.0, l_ou=2.0)

In [7]:
X_test[:5, :]

array([[ 0.6944,  0.2917,  0.0139,  0.    ,  0.    ,  0.    ],
       [ 0.5625,  0.4062,  0.0312,  0.    ,  0.    ,  0.    ],
       [ 0.4444,  0.5   ,  0.0556,  0.    ,  0.    ,  0.    ],
       [ 0.3403,  0.5729,  0.0868,  0.    ,  0.    ,  0.    ],
       [ 0.25  ,  0.625 ,  0.125 ,  0.    ,  0.    ,  0.    ]])

In [8]:
K_test[:5, :5]

array([[ 53.    ,  44.0368,  37.8351,  33.0052,  29.2437],
       [ 44.0368,  53.    ,  44.0368,  37.8351,  33.0052],
       [ 37.8351,  44.0368,  53.    ,  44.0368,  37.8351],
       [ 33.0052,  37.8351,  44.0368,  53.    ,  44.0368],
       [ 29.2437,  33.0052,  37.8351,  44.0368,  53.    ]])

### Softmax Model

In [650]:
import scipy.optimize as opt

from mypy.models import softmax
reload(softmax)

<module 'mypy.models.softmax' from '/Users/pschulam/Git/mypy/mypy/models/softmax.py'>

### Trajectory Model

In [65]:
from scipy.stats import multivariate_normal

a_const = 16.0
a_ou    = 36.0
l_ou    = 2.0

def phi1(x):
    return np.ones((x.size, 1))

def phi2(x):
    return basis.eval(x)

def gp_posterior(tnew, t, y, kern, **kwargs):
    from numpy import dot
    from scipy.linalg import inv, solve
    
    K11 = kern(tnew, **kwargs)
    K12 = kern(tnew, t, **kwargs)
    K22 = kern(t, **kwargs)
    
    m = dot(K12, solve(K22, y))
    K = K11 - dot(K12, solve(K22, K12.T))
    
    return m, K

def trajectory_means(t, x, b, B):
    from numpy import dot
    
    P1 = phi1(t)
    P2 = phi2(t)
    
    m1 = dot(P1, dot(b, x)).ravel()
    m2 = dot(B, P2.T)
    
    return m1 + m2

def trajectory_logl(t, x, y, z, B, b):
    if t.size < 1:
        return 0.0
    
    m = trajectory_means(t, x, b, B)[z]
    S = kernel(t, a_const=a_const, a_ou=a_ou, l_ou=l_ou)
    
    return multivariate_normal.logpdf(y, m, S)

### Load Parameters

In [20]:
b = np.loadtxt('param/pop.dat')
B = np.loadtxt('param/subpop.dat')
W = np.loadtxt('param/marginal.dat')
W = np.r_[ np.zeros((1, W.shape[1])), W ]

In [86]:
from scipy.misc import logsumexp

def model_prior(t, x1, x2, y, b, B, W):
    return softmax.regression_log_proba(x2, W)

def model_likelihood(t, x1, x2, y, b, B, W):
    k = B.shape[0]
    return np.array([trajectory_logl(t, x1, y, z, B, b) for z in range(k)])

def model_posterior(t, x1, x2, y, b, B, W):
    prior = model_prior(t, x1, x2, y, b, B, W)
    likel = model_likelihood(t, x1, x2, y, b, B, W)
    lp = prior + likel
    return np.exp(lp - logsumexp(lp))

def model_evidence(t, x1, x2, y, b, B, W):
    prior = model_prior(t, x1, x2, y, b, B, W)
    likel = model_likelihood(t, x1, x2, y, b, B, W)
    lp = prior + likel
    return logsumexp(lp)

### Load Data

In [159]:
from copy import deepcopy

def PatientData(tbl):
    pd = {}
    pd['ptid'] = int(tbl['ptid'].values[0])
    pd['t']    = tbl['years_seen_full'].values.copy()
    pd['y']    = tbl['pfvc'].values.copy()
    pd['x1']   = np.asarray(tbl.loc[:, ['female', 'afram']].drop_duplicates()).ravel()
    pd['x2']   = np.asarray(tbl.loc[:, ['female', 'afram', 'aca', 'scl']].drop_duplicates()).ravel()
    pd['x2']   = np.r_[ 1.0, pd['x2'] ]
    return pd

def truncated_data(pd, censor_time):
    obs = pd['t'] <= censor_time
    pdc = deepcopy(pd)
    pdc['t'] = pd['t'][obs]
    pdc['y'] = pd['y'][obs]
    return pdc, pd['t'][~obs]

def eval_prior(pd, b=b, B=B, W=W):
    return model_prior(pd['t'], pd['x1'], pd['x2'], pd['y'], b, B, W)

def eval_likel(pd, b=b, B=B, W=W):
    return model_likelihood(pd['t'], pd['x1'], pd['x2'], pd['y'], b, B, W)

def run_inference(pd, b=b, B=B, W=W):
    ll = model_loglik(pd['t'], pd['x1'], pd['x2'], pd['y'], b, B, W)
    posterior = model_posterior(pd['t'], pd['x1'], pd['x2'], pd['y'], b, B, W)
    return ll, posterior

In [50]:
pfvc = pd.read_csv('data/benchmark_pfvc.csv')
data = [PatientData(tbl) for _, tbl in pfvc.groupby('ptid')]

In [89]:
ll, pst = run_inference(data[9], b, B, W)
np.round(pst, 3)

array([ 0.   ,  0.   ,  0.   ,  0.01 ,  0.022,  0.117,  0.816,  0.035])

### Online Tuning Algorithm

We're going to tune the posterior predictions of our model at a given time point by adjusting the relative strengths of the likelihood and the prior used to determine the likelihood ratio that determines the posterior. The goal is to fit the *full information posterior* by modifying the *partial information posterior*. For any observed marker sequence $y_i$, we can express the posterior probabilities by specifying the log of the likelihood ratios of each subtype to some *pivot* subtype.

$$ r_{11} = \log \frac{p(z = 1)}{p(z = 1)} + \log \frac{p(y_i \mid z = 1)}{p(y_i \mid z = 1)} $$

$$ r_{21} = \log \frac{p(z = 2)}{p(z = 1)} + \log \frac{p(y_i \mid z = 2)}{p(y_i \mid z = 1)} $$

$$ r_{31} = \log \frac{p(z = 3)}{p(z = 1)} + \log \frac{p(y_i \mid z = 3)}{p(y_i \mid z = 1)} $$

$$ \ldots $$

Note that the first ratio is 0 since it is the log of a ratio that will always be 1. When making a MAP estimate of an individual's subtype, the maximum of these ratios is selected. More generally, if want to match the partial information posterior to the full information posterior as closely as possible, then we want to match these ratios as closely as possible. This suggests a simple adjustment algorithm--- fit $K - 1$ separate regressions where the features are the log ratios of each term in the joint distribution.

In [524]:
def log_ratio(L, pivot=0):
    R = L - L[:, pivot][:, np.newaxis]
    return R

In [525]:
full_log_priors = np.array([eval_prior(d) for d in data])
full_log_likels = np.array([eval_likel(d) for d in data])

yr01_log_priors = np.array([eval_prior(truncated_data(d, 1.0)[0]) for d in data])
yr01_log_likels = np.array([eval_likel(truncated_data(d, 1.0)[0]) for d in data])

yr02_log_priors = np.array([eval_prior(truncated_data(d, 2.0)[0]) for d in data])
yr02_log_likels = np.array([eval_likel(truncated_data(d, 2.0)[0]) for d in data])

In [526]:
L1 = log_ratio(yr01_log_priors)
L2 = log_ratio(yr01_log_likels)
Y  = log_ratio(full_log_priors) + log_ratio(full_log_likels)

#### Algorithm 1

In [527]:
def fit_adjustment(y, x1, x2):
    from scipy.linalg import lstsq
    n = y.size
    X = np.c_[ np.ones(n), x1, x2 ]
    w, _, _, _ = lstsq(X, y)
    return w

def make_adjustment(x1, x2, w):
    n = x1.size
    X = np.c_[ np.ones(n), x1, x2 ]
    return np.dot(X, w)

In [528]:
Yhat = np.zeros_like(Y)
N, K = Yhat.shape
W    = np.zeros((K, 3))
for k in range(1, K):
    w = fit_adjustment(Y[:, k], L1[:, k], L2[:, k])
    W[k] = w
    Yhat[:, k] = make_adjustment(L1[:, k], L2[:, k], w)

In [529]:
full_log_ratio = Y
yr01_log_ratio = L1 + L2

In [530]:
P  = np.array([softmax.softmax_func(y) for y in full_log_ratio])
Q1 = np.array([softmax.softmax_func(y) for y in yr01_log_ratio])
Q2 = np.array([softmax.softmax_func(y) for y in Yhat])

In [531]:
np.sum(P * np.log(P))

-430.77831436230178

In [532]:
np.sum(P * np.log(Q1))

-790.52486194944299

In [533]:
np.sum(P * np.log(Q2))

-1244.9793742351435

This simple approach doesn't work very well using the multinomial regression objective as an evaluation, but this makes sense because each of the weights are learned entirely independently. Another option for evaluation is to look at whether the MAP under the adjusted distribution agrees more with the MAP under full information that the map under partial information.

In [534]:
np.mean(np.argmax(P, axis=1) == np.argmax(Q1, axis=1))

0.63690476190476186

In [535]:
np.mean(np.argmax(P, axis=1) == np.argmax(Q2, axis=1))

0.5267857142857143

Again, the results are not good. This isn't hopeless, though, since the way we trained the adjustment is pretty simple. For completeness, however, let's take a look at the confusion matrix to see if any key mistakes are being corrected using this approach.

In [536]:
from sklearn.metrics import confusion_matrix
confusion_matrix(np.argmax(P, axis=1), np.argmax(Q1, axis=1))

array([[ 17,   4,   2,   1,   0,   0,   0,   0],
       [  3, 103,  21,   7,   4,   1,   0,   0],
       [  0,  24,  77,   6,   3,   0,   1,   0],
       [  0,  10,  21,  51,  21,   0,   6,   0],
       [  0,   0,   2,  26,  93,   7,   5,   3],
       [  0,   0,   2,   0,  18,  35,   6,   1],
       [  0,   1,   4,  10,  14,   7,  35,   1],
       [  0,   0,   0,   1,   1,   0,   0,  17]])

In [537]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(Q2, axis=1))

array([[ 5, 16,  0,  0,  3,  0,  0,  0],
       [ 0, 84, 49,  0,  6,  0,  0,  0],
       [ 0, 15, 82,  1, 13,  0,  0,  0],
       [ 0,  6, 41, 26, 35,  1,  0,  0],
       [ 0,  0,  5, 11, 84, 32,  3,  1],
       [ 0,  0,  0,  1,  9, 49,  1,  2],
       [ 0,  1, 11, 10, 11, 30,  9,  0],
       [ 0,  0,  0,  0,  1,  2,  1, 15]])

#### Algorithm 2

In [538]:
def multinom_pred(W, P, X1, X2):
    Z = np.zeros_like(P)
    N, K = Z.shape
    for k in range(1, K):
        w = W[k]
        X = np.c_[ np.ones(N), X1[:, k], X2[:, k] ]
        Z[:, k] = np.dot(X, w)
    
    Q = np.array([softmax.softmax_func(z) for z in Z])
    return Q

def multinom_cost(W, P, X1, X2):
    Q = multinom_pred(W, P, X1, X2)
    return np.sum(P * np.log(Q))

In [539]:
def multinom_grad(W, P, X1, X2):
    Z = np.zeros_like(P)
    N, K = Z.shape
    for k in range(1, K):
        w = W[k]
        X = np.c_[ np.ones(N), X1[:, k], X2[:, k] ]
        Z[:, k] = np.dot(X, w)
    Q = np.array([softmax.softmax_func(z) for z in Z])
        
    D = np.zeros_like(W)
    for k in range(1, K):
        for i, z in enumerate(Z):
            g = softmax.softmax_grad(z)
            x = np.r_[ 1.0, X1[i, k], X2[i, k] ]
            for j in range(K):
                D[k] += P[i, j] / Q[i, j] * g[j, k] * x
            
    return D

In [545]:
from sklearn.preprocessing import StandardScaler
#X1 = StandardScaler().fit_transform(L1)
X1 = (L1 - L1.mean()) / np.std(L1)
#X2 = StandardScaler().fit_transform(L2)
X2 = (L2 - L2.mean()) / np.std(L2)
W0 = np.zeros((P.shape[1], 3))

In [546]:
multinom_cost(W0, P, X1, X2)

-1397.3847160088496

In [547]:
multinom_grad(W0, P, X1, X2)

array([[   0.    ,    0.    ,    0.    ],
       [  51.7189,   -8.9617,  -48.1461],
       [  33.5423,  -14.7138,  -29.7908],
       [  26.2598,   13.4334,    7.1831],
       [  37.7879,   33.0711,   68.7322],
       [ -15.1784,   44.7643,   82.9596],
       [ -12.5592,    1.1756,   59.9074],
       [ -63.4855,   41.0997,  105.9259]])

In [548]:
def check_grad(f, x0, eps=1e-10):
    f0 = f(x0)
    n = x0.size
    g = np.zeros_like(x0)
    for i in range(n):
        dt = np.zeros_like(x0)
        dt[i] += eps
        f1 = f(x0 + dt)
        g[i] = (f1 - f0) / eps
        
    return g

In [549]:
wshape = W0.shape
f = lambda w: -multinom_cost(w.reshape(wshape), P, X1, X2)
g = lambda w: -multinom_grad(w.reshape(wshape), P, X1, X2).ravel()

In [551]:
check_grad(f, W0.ravel(), 1e-8).reshape(wshape)

array([[   0.    ,    0.    ,    0.    ],
       [ -51.7189,    8.9617,   48.1461],
       [ -33.5423,   14.7138,   29.7908],
       [ -26.2598,  -13.4334,   -7.1831],
       [ -37.7879,  -33.071 ,  -68.7322],
       [  15.1784,  -44.7643,  -82.9596],
       [  12.5592,   -1.1756,  -59.9074],
       [  63.4855,  -41.0997, -105.9259]])

In [552]:
s = opt.minimize(f, W.ravel(), jac=g, method='BFGS')

In [553]:
W2 = s.x.reshape(wshape)
W2

array([[  0.    ,   0.    ,   0.    ],
       [  7.2443,   5.7611,   9.5801],
       [  7.3638,   5.7388,   9.9947],
       [  7.4089,   5.6485,  10.5577],
       [  7.2686,   5.6118,  10.688 ],
       [  7.375 ,   5.5861,  10.7389],
       [  7.3706,   5.5802,  10.7423],
       [  7.2817,   5.7079,  10.7069]])

In [554]:
multinom_cost(W2, P, X1, X2)

-777.60273430499296

We have a slight improvement in terms of log-likelihood. Let's check the accuracy of MAP subtype estimates.

In [556]:
Q3 = multinom_pred(W2, P, X1, X2)

In [557]:
np.mean(np.argmax(P, axis=1) == np.argmax(Q1, axis=1))

0.63690476190476186

In [558]:
np.mean(np.argmax(P, axis=1) == np.argmax(Q3, axis=1))

0.6339285714285714

No improvement on MAP accuracy.

In [594]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(Q1, axis=1))

array([[ 17,   4,   2,   1,   0,   0,   0,   0],
       [  3, 103,  21,   7,   4,   1,   0,   0],
       [  0,  24,  77,   6,   3,   0,   1,   0],
       [  0,  10,  21,  51,  21,   0,   6,   0],
       [  0,   0,   2,  26,  93,   7,   5,   3],
       [  0,   0,   2,   0,  18,  35,   6,   1],
       [  0,   1,   4,  10,  14,   7,  35,   1],
       [  0,   0,   0,   1,   1,   0,   0,  17]])

In [593]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(Q3, axis=1))

array([[17,  4,  2,  1,  0,  0,  0,  0],
       [ 3, 96, 29,  6,  4,  1,  0,  0],
       [ 0, 21, 80,  7,  3,  0,  0,  0],
       [ 0, 10, 21, 57, 16,  0,  5,  0],
       [ 0,  0,  2, 28, 87, 11,  7,  1],
       [ 0,  0,  2,  0, 16, 37,  6,  1],
       [ 0,  1,  4, 11, 12,  8, 36,  0],
       [ 0,  0,  0,  1,  1,  1,  0, 16]])

In [604]:
subtypes = pd.read_csv('benchmark_pfvc_subtypes.csv')
subtypes['subtype'] = np.argmax(Q3, axis=1) + 1
subtypes.to_csv('benchmark_pfvc_1y_subtypes_multinom.csv', index=False)

In [605]:
!Rscript score_predictions.R benchmark_pfvc_1y_subtypes_multinom.csv

Loading required package: methods
Source: local data frame [4 x 2]

     bin   mae
1  (1,2]  4.83
2  (2,4]  6.78
3  (4,8]  9.19
4 (8,25] 11.06
Source: local data frame [8 x 5]

  true_subtype (1,2] (2,4] (4,8] (8,25]
1            1  5.68  9.47  7.15  16.67
2            2  3.91  4.54  6.27   6.55
3            3  4.02  5.02  6.38   6.54
4            4  4.74  6.97 10.94  14.49
5            5  5.04  7.06 10.39  15.04
6            6  4.76  8.35 10.64  10.67
7            7  6.17  9.10 12.86  19.92
8            8  5.65  6.43  4.98   1.53
Source: local data frame [8 x 9]

  true_subtype    1     2     3     4     5     6     7     8
1            1 3.60  8.80 43.35 52.91    NA    NA    NA    NA
2            2 8.02  3.33  8.27 16.43 11.93 31.94    NA    NA
3            3   NA 10.56  3.57 11.86  8.81    NA    NA    NA
4            4   NA 18.54  7.75  3.49 14.26    NA 16.51    NA
5            5   NA    NA  5.61 11.20  4.79 10.59 26.34 32.66
6            6   NA    NA 21.57    NA 12.80  3.38  9.03 1

#### Future Directions

1. Use a Bayesian multinomial logistic regression classifier to sidestep the inexpressive linear model.

2. Change the objective function to more directly reflect the cost function used to evaluate the model (i.e. not all subtype misclassification mistakes are equal, some mistakes are more costly and perhaps we'd like to reflect that in the learning procedure).

3. Incorporate additional likelihood ratios based on other longitudinally measured outcomes.

### Adding Additional Markers

In [651]:
L3 = np.loadtxt('param/gi_5.0_ratios.dat')
L4 = np.loadtxt('param/pdc_5.0_ratios.dat')
L5 = np.loadtxt('param/hrt_5.0_ratios.dat')

In [652]:
#X3 = StandardScaler().fit_transform(L3)
X3 = (L3 - L3.mean()) / np.std(L3)
#X4 = StandardScaler().fit_transform(L4)
X4 = (L4 - L4.mean()) / np.std(L4)
#X5 = StandardScaler().fit_transform(L5)
X5 = (L5 - L5.mean()) / np.std(L5)

In [653]:
def multinom_pred2(W, P, XX):
    Z = np.zeros_like(P)
    N, K = Z.shape
    for k in range(1, K):
        w = W[k]
        X = np.ones(N)
        for Xi in XX:
            X = np.c_[ X, Xi[:, k] ]
        Z[:, k] = np.dot(X, w)
    
    Q = np.array([softmax.softmax_func(z) for z in Z])
    return Q

def multinom_cost2(W, P, XX):
    Q = multinom_pred2(W, P, XX)
    return np.sum(P * np.log(Q))

In [654]:
def multinom_grad2(W, P, XX):
    Z = np.zeros_like(P)
    N, K = Z.shape
    for k in range(1, K):
        w = W[k]
        X = np.ones(N)
        for Xi in XX:
            X = np.c_[ X, Xi[:, k] ]
        Z[:, k] = np.dot(X, w)
    Q = np.array([softmax.softmax_func(z) for z in Z])
        
    D = np.zeros_like(W)
    for k in range(1, K):
        for i, z in enumerate(Z):
            g = softmax.softmax_grad(z)
            x = np.array([Xi[i, k] for Xi in XX])
            x = np.r_[ 1.0, x ]
            for j in range(K):
                D[k] += P[i, j] / Q[i, j] * g[j, k] * x
            
    return D

In [655]:
W0 = np.zeros((P.shape[1], 6))
f2 = lambda w: -multinom_cost2(w.reshape(W0.shape), P, [X1, X2, X3, X4, X5])
g2 = lambda w: -multinom_grad2(w.reshape(W0.shape), P, [X1, X2, X3, X4, X5]).ravel()

In [656]:
g2(W0.ravel()).reshape(W0.shape)

array([[  -0.    ,   -0.    ,   -0.    ,   -0.    ,   -0.    ,   -0.    ],
       [ -51.7189,    8.9617,   48.1461,   -7.2826,  -24.1162,   12.8552],
       [ -33.5423,   14.7138,   29.7908,    2.1474,   -6.9574,   11.9928],
       [ -26.2598,  -13.4334,   -7.1831,  -12.1964,  -18.2787,  -15.8236],
       [ -37.7879,  -33.0711,  -68.7322,  -22.7263,  -39.2572,   -3.8876],
       [  15.1784,  -44.7643,  -82.9596,  -13.1645,  -24.4863,   -0.9232],
       [  12.5592,   -1.1756,  -59.9074,  -10.9538,  -94.7906,  -25.9305],
       [  63.4855,  -41.0997, -105.9259,   -4.0192,  -13.823 ,   -1.6348]])

In [657]:
check_grad(f2, W0.ravel(), 1e-8).reshape(W0.shape)

array([[   0.    ,    0.    ,    0.    ,    0.    ,    0.    ,    0.    ],
       [ -51.7189,    8.9617,   48.1461,   -7.2826,  -24.1162,   12.8552],
       [ -33.5423,   14.7138,   29.7908,    2.1474,   -6.9574,   11.9928],
       [ -26.2598,  -13.4334,   -7.183 ,  -12.1963,  -18.2787,  -15.8236],
       [ -37.7879,  -33.071 ,  -68.7322,  -22.7263,  -39.2572,   -3.8876],
       [  15.1784,  -44.7643,  -82.9596,  -13.1644,  -24.4863,   -0.9232],
       [  12.5592,   -1.1756,  -59.9074,  -10.9538,  -94.7906,  -25.9305],
       [  63.4855,  -41.0997, -105.9258,   -4.0192,  -13.823 ,   -1.6348]])

In [658]:
s2 = opt.minimize(f2, np.random.normal(scale=5.0, size=W0.shape).ravel(), jac=g2, method='BFGS')

In [661]:
Q5 = multinom_pred2(s2.x.reshape(W0.shape), P, [X1, X2, X3, X4, X5])

In [662]:
np.mean(np.argmax(P, axis=1) == np.argmax(Q5, axis=1))

0.68005952380952384

In [663]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(Q1, axis=1))

array([[ 17,   4,   2,   1,   0,   0,   0,   0],
       [  3, 103,  21,   7,   4,   1,   0,   0],
       [  0,  24,  77,   6,   3,   0,   1,   0],
       [  0,  10,  21,  51,  21,   0,   6,   0],
       [  0,   0,   2,  26,  93,   7,   5,   3],
       [  0,   0,   2,   0,  18,  35,   6,   1],
       [  0,   1,   4,  10,  14,   7,  35,   1],
       [  0,   0,   0,   1,   1,   0,   0,  17]])

In [664]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(Q5, axis=1))

array([[ 18,   4,   2,   0,   0,   0,   0,   0],
       [  3, 104,  25,   3,   2,   2,   0,   0],
       [  0,  21,  79,   9,   1,   1,   0,   0],
       [  0,   8,  18,  63,  14,   1,   5,   0],
       [  0,   2,   2,  22,  93,  11,   3,   3],
       [  0,   0,   2,   0,  12,  41,   5,   2],
       [  0,   1,   4,   9,   7,   7,  44,   0],
       [  0,   0,   0,   1,   1,   2,   0,  15]])

In [665]:
subtypes = pd.read_csv('benchmark_pfvc_subtypes.csv')
subtypes['subtype'] = np.argmax(Q5, axis=1) + 1
subtypes.to_csv('benchmark_pfvc_1y_subtypes_multinom2.csv', index=False)

In [666]:
!Rscript score_predictions.R benchmark_pfvc_1y_subtypes_multinom2.csv

Loading required package: methods
Source: local data frame [4 x 2]

     bin  mae
1  (1,2] 4.62
2  (2,4] 6.04
3  (4,8] 7.67
4 (8,25] 8.51
Source: local data frame [8 x 5]

  true_subtype (1,2] (2,4] (4,8] (8,25]
1            1  5.37  7.14  6.37  10.80
2            2  3.38  4.40  5.19   5.44
3            3  4.40  4.93  6.22   6.48
4            4  4.36  6.05  7.94   9.98
5            5  5.05  6.60  8.99  10.56
6            6  4.91  7.68 10.19   8.81
7            7  5.22  6.71  9.57  14.79
8            8  5.91  6.43  4.98   1.53
Source: local data frame [8 x 9]

  true_subtype    1     2     3     4     5     6     7     8
1            1 3.45 11.47 43.35    NA    NA    NA    NA    NA
2            2 8.33  3.28  8.40 11.29  5.48 34.50    NA    NA
3            3   NA  9.32  3.72 12.68 15.16 21.62    NA    NA
4            4   NA 15.23  8.69  3.87 13.18  4.46 11.56    NA
5            5   NA 17.31  6.91  9.75  4.37 10.71 35.43 24.56
6            6   NA    NA 21.57    NA 12.58  4.37 11.50 14.50


#### Cross validated results with 5 years of aux

In [484]:
from sklearn.cross_validation import KFold

In [667]:
L3 = np.loadtxt('param/gi_5.0_ratios.dat')
L4 = np.loadtxt('param/pdc_5.0_ratios.dat')
L5 = np.loadtxt('param/hrt_5.0_ratios.dat')

#X3 = StandardScaler().fit_transform(L3)
X3 = (L3 - L3.mean()) / L3.std()
#X4 = StandardScaler().fit_transform(L4)
X4 = (L4 - L4.mean()) / L4.std()
#X5 = StandardScaler().fit_transform(L5)
X5 = (L5 - L5.mean()) / L5.std()

XX     = [X1, X2, X3, X4, X5]
W0     = np.zeros((P.shape[1], len(XX) + 1))

nfolds = 10
accs   = []
WW     = []
sols   = []
Qfinal = np.zeros_like(P)

for train, test in KFold(P.shape[0], nfolds, shuffle=True, random_state=0):
    print('Starting new fold.')
    f2 = lambda w: -multinom_cost2(w.reshape(W0.shape), P[train], [Xi[train] for Xi in XX])
    g2 = lambda w: -multinom_grad2(w.reshape(W0.shape), P[train], [Xi[train] for Xi in XX]).ravel()
    s2 = opt.minimize(f2, W0.ravel(), jac=g2, method='BFGS')
    sols.append(s2)
    W  = s2.x.reshape(W0.shape)
    WW.append(W)
    Qfinal[test] = multinom_pred2(W, P[test], [Xi[test] for Xi in XX])
    accs.append(np.mean(np.argmax(P[test], axis=1) == np.argmax(Qfinal[test], axis=1)))

Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.


In [675]:
subtypes = pd.read_csv('benchmark_pfvc_subtypes.csv')
subtypes['subtype'] = np.argmax(Qfinal, axis=1) + 1
subtypes.to_csv('benchmark_pfvc_1y_subtypes_multinom3.csv', index=False)

In [676]:
!Rscript score_predictions.R benchmark_pfvc_1y_subtypes_multinom3.csv

Loading required package: methods
Source: local data frame [4 x 2]

     bin  mae
1  (1,2] 4.78
2  (2,4] 6.21
3  (4,8] 8.14
4 (8,25] 9.12
Source: local data frame [8 x 5]

  true_subtype (1,2] (2,4] (4,8] (8,25]
1            1  5.63  8.62  6.37  10.80
2            2  3.44  4.31  5.23   6.33
3            3  4.28  4.97  6.81   6.99
4            4  4.35  6.09  8.49  10.94
5            5  5.68  6.77  9.67  10.90
6            6  5.03  8.16 10.63   8.98
7            7  5.33  6.80  9.68  15.97
8            8  6.06  7.33  8.16   1.53
Source: local data frame [8 x 9]

  true_subtype    1     2     3     4     5     6     7     8
1            1 3.41 11.47 32.23    NA    NA    NA    NA    NA
2            2 8.33  3.28  9.21 11.29  6.87 31.94    NA    NA
3            3   NA  9.67  3.78 12.09    NA 21.62    NA    NA
4            4   NA 14.61  9.00  3.90 13.87  4.46 12.37    NA
5            5   NA 18.35  7.50  9.89  4.39 10.86 34.91 24.56
6            6   NA    NA 21.57    NA 12.89  4.40 11.22 14.50


#### Partial auxiliary information

In [677]:
L3 = np.loadtxt('param/gi_1.0_ratios.dat')
L4 = np.loadtxt('param/pdc_1.0_ratios.dat')
L5 = np.loadtxt('param/hrt_1.0_ratios.dat')

X3 = (L3 - L3.mean()) / np.std(L3)
X4 = (L4 - L4.mean()) / np.std(L4)
X5 = (L5 - L5.mean()) / np.std(L5)

XX = [X1, X2, X3, X4, X5]
W0 = np.zeros((P.shape[1], len(XX) + 1))

In [678]:
f2 = lambda w: -multinom_cost2(w.reshape(W0.shape), P, XX)
g2 = lambda w: -multinom_grad2(w.reshape(W0.shape), P, XX).ravel()
s2 = opt.minimize(f2, W0.ravel(), jac=g2, method='BFGS')
W  = s2.x.reshape(W0.shape)
QQ = multinom_pred2(W, P, XX)

In [679]:
s2

      jac: array([ -0.0000e+00,  -0.0000e+00,  -0.0000e+00,  -0.0000e+00,
        -0.0000e+00,  -0.0000e+00,   2.5561e-07,  -1.3180e-06,
        -1.3497e-07,   1.2260e-07,   8.1624e-08,  -6.1790e-08,
         7.5914e-07,   3.5882e-07,   2.2768e-07,  -4.4178e-07,
        -2.2847e-07,  -3.5519e-07,   5.0500e-07,   1.3016e-06,
         9.3563e-07,   1.1893e-06,   5.7149e-07,   7.3290e-08,
         5.1786e-07,   1.1884e-06,   2.9156e-07,   1.7932e-06,
         2.2108e-08,   3.3173e-07,  -2.9477e-06,  -2.3275e-06,
        -4.4286e-06,  -1.1282e-06,  -2.5045e-06,  -3.0526e-07,
         5.4301e-07,   8.4115e-07,   9.0397e-07,  -7.0955e-08,
         3.2644e-06,  -1.6482e-07,   7.3053e-07,  -6.9838e-08,
         2.5223e-06,   1.1423e-07,   1.9763e-06,  -2.1059e-07])
   status: 0
  success: True
        x: array([  0.0000e+00,   0.0000e+00,   0.0000e+00,   0.0000e+00,
         0.0000e+00,   0.0000e+00,   7.3706e+00,   5.6264e+00,
         9.6464e+00,   3.7640e-01,  -2.2715e-01,   2.5330e-01,
   

In [680]:
np.mean(np.argmax(P, axis=1) == np.argmax(QQ, axis=1))

0.63690476190476186

In [681]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(QQ, axis=1))

array([[ 16,   5,   2,   1,   0,   0,   0,   0],
       [  2, 100,  28,   6,   2,   1,   0,   0],
       [  1,  20,  80,   9,   1,   0,   0,   0],
       [  0,  11,  20,  57,  17,   2,   2,   0],
       [  0,   0,   3,  28,  86,  10,   8,   1],
       [  0,   0,   2,   0,  15,  37,   6,   2],
       [  0,   2,   3,   9,  13,   8,  37,   0],
       [  0,   0,   0,   0,   1,   2,   1,  15]])

In [431]:
W

array([[  0.00e+00,   0.00e+00,   0.00e+00,   0.00e+00],
       [  5.30e+04,  -2.38e+04,  -1.15e+05,  -7.81e+04],
       [  3.43e+04,  -2.50e+04,  -6.14e+04,  -5.27e+04],
       [  2.69e+04,   9.25e+03,   4.95e+01,  -1.42e+04],
       [  3.87e+04,   2.70e+04,   6.26e+04,   2.70e+04],
       [ -1.55e+04,   4.61e+04,   7.36e+04,   4.44e+04],
       [ -1.29e+04,   2.81e+03,   5.92e+04,   3.73e+04],
       [ -6.50e+04,   6.22e+03,   4.63e+04,   4.82e+04]])

In [422]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(QQ, axis=1))

array([[  0,  24,   0,   0,   0,   0,   0,   0],
       [  0, 139,   0,   0,   0,   0,   0,   0],
       [  0, 111,   0,   0,   0,   0,   0,   0],
       [  0,  74,   9,  25,   1,   0,   0,   0],
       [  0,  63,   9,  58,   6,   0,   0,   0],
       [  0,   6,   5,  39,  12,   0,   0,   0],
       [  0,  20,   3,  40,   9,   0,   0,   0],
       [  0,   1,   0,   5,  13,   0,   0,   0]])

In [682]:
subtypes = pd.read_csv('benchmark_pfvc_subtypes.csv')
subtypes['subtype'] = np.argmax(QQ, axis=1) + 1
subtypes.to_csv('benchmark_pfvc_1y_subtypes_multinom4.csv', index=False)

In [683]:
!Rscript score_predictions.R benchmark_pfvc_1y_subtypes_multinom4.csv

Loading required package: methods
Source: local data frame [4 x 2]

     bin   mae
1  (1,2]  4.83
2  (2,4]  6.84
3  (4,8]  9.23
4 (8,25] 10.78
Source: local data frame [8 x 5]

  true_subtype (1,2] (2,4] (4,8] (8,25]
1            1  5.68  9.47  7.15  16.67
2            2  3.80  4.56  6.26   5.65
3            3  3.98  4.95  6.46   6.63
4            4  4.82  7.18 11.04  13.11
5            5  5.26  7.38 10.38  15.44
6            6  4.56  7.47  9.64  10.67
7            7  6.37  9.87 14.09  20.49
8            8  4.93  6.43  4.98   1.53
Source: local data frame [8 x 9]

  true_subtype    1     2     3     4     5     6     7     8
1            1  3.6  8.80 43.35 52.91    NA    NA    NA    NA
2            2  8.7  3.39  7.72 17.68  9.12 31.94    NA    NA
3            3 21.3 10.37  3.62 11.56  8.25    NA    NA    NA
4            4   NA 21.20  7.53  3.65 14.24  4.33 11.24    NA
5            5   NA    NA  5.97 11.87  4.30 11.03 26.55 32.66
6            6   NA    NA 21.57    NA 12.16  3.60  9.03 1

#### Cross validated partial information

In [728]:
L3 = np.loadtxt('param/gi_1.0_ratios.dat')
L4 = np.loadtxt('param/pdc_1.0_ratios.dat')
L5 = np.loadtxt('param/hrt_1.0_ratios.dat')
L6 = np.loadtxt('param/pv1_1.0_ratios.dat')
L7 = np.loadtxt('param/rp_1.0_ratios.dat')

X3 = (L3 - L3.mean()) / np.std(L3)
X4 = (L4 - L4.mean()) / np.std(L4)
X5 = (L5 - L5.mean()) / np.std(L5)
X6 = (L6 - L6.mean()) / np.std(L6)
X7 = (L7 - L7.mean()) / np.std(L7)

XX = [X1, X2, X3, X4, X5, X6, X7]
W0 = np.zeros((P.shape[1], len(XX) + 1))

nfolds = 10
accs   = []
WW     = []
sols   = []
Qfinal = np.zeros_like(P)

for train, test in KFold(P.shape[0], nfolds, shuffle=True, random_state=0):
    print('Starting new fold.')
    f2 = lambda w: -multinom_cost2(w.reshape(W0.shape), P[train], [Xi[train] for Xi in XX])
    g2 = lambda w: -multinom_grad2(w.reshape(W0.shape), P[train], [Xi[train] for Xi in XX]).ravel()
    s2 = opt.minimize(f2, W0.ravel(), jac=g2, method='BFGS')
    sols.append(s2)
    W  = s2.x.reshape(W0.shape)
    WW.append(W)
    Qfinal[test] = multinom_pred2(W, P[test], [Xi[test] for Xi in XX])
    accs.append(np.mean(np.argmax(P[test], axis=1) == np.argmax(Qfinal[test], axis=1)))

Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.


In [735]:
[s.success for s in sols]

[True, True, True, True, True, True, True, True, True, True]

In [736]:
np.mean(np.argmax(P, axis=1) == np.argmax(Qfinal, axis=1))

0.62648809523809523

In [737]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(Q1, axis=1))

array([[ 17,   4,   2,   1,   0,   0,   0,   0],
       [  3, 103,  21,   7,   4,   1,   0,   0],
       [  0,  24,  77,   6,   3,   0,   1,   0],
       [  0,  10,  21,  51,  21,   0,   6,   0],
       [  0,   0,   2,  26,  93,   7,   5,   3],
       [  0,   0,   2,   0,  18,  35,   6,   1],
       [  0,   1,   4,  10,  14,   7,  35,   1],
       [  0,   0,   0,   1,   1,   0,   0,  17]])

In [739]:
D = confusion_matrix(np.argmax(P, axis=1), np.argmax(Q1, axis=1))

In [744]:
A = D / D.sum(axis=1)[:, np.newaxis]
b = np.bincount(np.argmax(P, axis=1), minlength=P.shape[1]) / P.shape[0]

In [748]:
c = np.dot(A, b)

In [751]:
Qconf = Q1 * (c / b)
Qconf /= Qconf.sum(axis=1)[:, np.newaxis]

In [753]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(Qconf, axis=1))

array([[19,  2,  2,  1,  0,  0,  0,  0],
       [ 8, 96, 23,  7,  2,  3,  0,  0],
       [ 1, 20, 79,  7,  0,  3,  1,  0],
       [ 0, 10, 21, 54, 16,  0,  8,  0],
       [ 0,  0,  2, 33, 76, 15,  7,  3],
       [ 0,  0,  2,  0,  9, 44,  6,  1],
       [ 0,  1,  4, 10, 12, 10, 34,  1],
       [ 0,  0,  0,  1,  1,  0,  0, 17]])

In [738]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(Qfinal, axis=1))

array([[ 17,   3,   3,   1,   0,   0,   0,   0],
       [  2, 102,  26,   4,   4,   1,   0,   0],
       [  0,  23,  80,   8,   0,   0,   0,   0],
       [  0,  14,  21,  51,  19,   1,   3,   0],
       [  0,   1,   3,  24,  86,   9,  10,   3],
       [  0,   0,   2,   0,  15,  37,   6,   2],
       [  0,   1,   3,  11,  13,   9,  34,   1],
       [  0,   0,   0,   1,   1,   3,   0,  14]])

In [704]:
subtypes = pd.read_csv('benchmark_pfvc_subtypes.csv')
subtypes['subtype'] = np.argmax(Qfinal, axis=1) + 1
subtypes.to_csv('benchmark_pfvc_1y_subtypes_multinom5.csv', index=False)

In [705]:
!Rscript score_predictions.R benchmark_pfvc_1y_subtypes_multinom5.csv

Loading required package: methods
Source: local data frame [4 x 2]

     bin   mae
1  (1,2]  4.95
2  (2,4]  7.02
3  (4,8]  9.41
4 (8,25] 11.32
Source: local data frame [8 x 5]

  true_subtype (1,2] (2,4] (4,8] (8,25]
1            1  5.80 10.23  7.15  16.67
2            2  3.80  4.41  6.05   6.55
3            3  3.85  4.63  6.41   6.72
4            4  5.16  7.33 11.37  13.32
5            5  5.76  8.18 11.46  18.43
6            6  4.55  8.36 10.64  10.67
7            7  5.86  9.05 12.14  16.34
8            8  6.05  7.50  6.42   1.53
Source: local data frame [8 x 9]

  true_subtype    1     2     3     4     5     6     7     8
1            1 3.60  8.60 32.23 52.91    NA    NA    NA    NA
2            2 8.26  3.36  8.37 17.82 11.93 31.94    NA    NA
3            3   NA 10.36  3.57 11.97    NA    NA    NA    NA
4            4   NA 20.33  7.75  3.70 13.53  4.46 10.72    NA
5            5   NA  9.65  8.62 13.13  4.13 12.65 26.54 29.39
6            6   NA    NA 21.57    NA 12.74  3.39  9.04 1

#### Cross-validated partial information MAP target

In [727]:
L3 = np.loadtxt('param/gi_1.0_ratios.dat')
L4 = np.loadtxt('param/pdc_1.0_ratios.dat')
L5 = np.loadtxt('param/hrt_1.0_ratios.dat')
L6 = np.loadtxt('param/pv1_1.0_ratios.dat')
L7 = np.loadtxt('param/rp_1.0_ratios.dat')

X3 = (L3 - L3.mean()) / np.std(L3)
X4 = (L4 - L4.mean()) / np.std(L4)
X5 = (L5 - L5.mean()) / np.std(L5)
X6 = (L6 - L6.mean()) / np.std(L6)
X7 = (L7 - L7.mean()) / np.std(L7)

XX = [X1, X2, X3, X4, X5, X6, X7]
W0 = np.zeros((P.shape[1], len(XX) + 1))

nfolds = 10
accs   = []
WW     = []
sols   = []
Qfinal = np.zeros_like(P)
Pmap   = np.array([softmax.onehot_encode(np.argmax(p), P.shape[1]) for p in P])
Pmap  += 1e-4
Pmap  /= Pmap.sum(axis=1)[:, np.newaxis]

for train, test in KFold(P.shape[0], nfolds, shuffle=True, random_state=0):
    print('Starting new fold.')
    f2 = lambda w: -multinom_cost2(w.reshape(W0.shape), Pmap[train], [Xi[train] for Xi in XX])
    g2 = lambda w: -multinom_grad2(w.reshape(W0.shape), Pmap[train], [Xi[train] for Xi in XX]).ravel()
    s2 = opt.minimize(f2, W0.ravel(), jac=g2, method='BFGS')
    if not s2.success:
        print('Failed.')
        break
    sols.append(s2)
    W  = s2.x.reshape(W0.shape)
    WW.append(W)
    Qfinal[test] = multinom_pred2(W, Pmap[test], [Xi[test] for Xi in XX])
    accs.append(np.mean(np.argmax(Pmap[test], axis=1) == np.argmax(Qfinal[test], axis=1)))

Starting new fold.


KeyboardInterrupt: 

In [724]:
np.mean(np.argmax(P, axis=1) == np.argmax(Qfinal, axis=1))

0.6205357142857143

In [725]:
subtypes = pd.read_csv('benchmark_pfvc_subtypes.csv')
subtypes['subtype'] = np.argmax(Qfinal, axis=1) + 1
subtypes.to_csv('benchmark_pfvc_1y_subtypes_multinom5b.csv', index=False)

In [726]:
!Rscript score_predictions.R benchmark_pfvc_1y_subtypes_multinom5b.csv

Loading required package: methods
Source: local data frame [4 x 2]

     bin   mae
1  (1,2]  5.07
2  (2,4]  7.02
3  (4,8]  9.50
4 (8,25] 11.35
Source: local data frame [8 x 5]

  true_subtype (1,2] (2,4] (4,8] (8,25]
1            1  5.80 10.23  7.15  16.67
2            2  3.82  4.51  6.34   6.89
3            3  4.91  5.48  7.51   7.46
4            4  4.98  7.41 11.10  13.18
5            5  5.67  7.84 11.39  17.70
6            6  4.68  8.10 10.33  10.33
7            7  5.77  8.78 12.16  16.34
8            8  6.51  6.70  4.98   1.53
Source: local data frame [8 x 9]

  true_subtype    1     2     3     4     5     6     7     8
1            1 3.60  8.60 32.23 52.91    NA    NA    NA    NA
2            2 8.02  3.35  8.18 18.00 11.93 31.94    NA    NA
3            3   NA 10.57  3.64 11.99  3.09    NA 20.81 52.81
4            4   NA 20.03  8.57  3.74 13.31  4.46 10.72    NA
5            5   NA 15.47  8.62 13.26  4.22 13.26 25.60 29.39
6            6   NA    NA 21.57    NA 12.68  3.73  9.13  

#### Cross-validated 2 year partial information

In [709]:
L3 = np.loadtxt('param/gi_2.0_ratios.dat')
L4 = np.loadtxt('param/pdc_2.0_ratios.dat')
L5 = np.loadtxt('param/hrt_2.0_ratios.dat')
L6 = np.loadtxt('param/pv1_2.0_ratios.dat')
L7 = np.loadtxt('param/rp_2.0_ratios.dat')

X3 = (L3 - L3.mean()) / np.std(L3)
X4 = (L4 - L4.mean()) / np.std(L4)
X5 = (L5 - L5.mean()) / np.std(L5)
X6 = (L6 - L6.mean()) / np.std(L6)
X7 = (L7 - L7.mean()) / np.std(L7)

XX = [X1, X2, X3, X4, X5, X6, X7]
W0 = np.zeros((P.shape[1], len(XX) + 1))

nfolds = 10
accs   = []
WW     = []
sols   = []
Qfinal = np.zeros_like(P)

for train, test in KFold(P.shape[0], nfolds, shuffle=True, random_state=0):
    print('Starting new fold.')
    f2 = lambda w: -multinom_cost2(w.reshape(W0.shape), P[train], [Xi[train] for Xi in XX])
    g2 = lambda w: -multinom_grad2(w.reshape(W0.shape), P[train], [Xi[train] for Xi in XX]).ravel()
    s2 = opt.minimize(f2, W0.ravel(), jac=g2, method='BFGS')
    sols.append(s2)
    W  = s2.x.reshape(W0.shape)
    WW.append(W)
    Qfinal[test] = multinom_pred2(W, P[test], [Xi[test] for Xi in XX])
    accs.append(np.mean(np.argmax(P[test], axis=1) == np.argmax(Qfinal[test], axis=1)))

Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.
Starting new fold.


In [710]:
np.mean(np.argmax(P, axis=1) == np.argmax(Qfinal, axis=1))

0.6383928571428571

In [711]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(Q1, axis=1))

array([[ 17,   4,   2,   1,   0,   0,   0,   0],
       [  3, 103,  21,   7,   4,   1,   0,   0],
       [  0,  24,  77,   6,   3,   0,   1,   0],
       [  0,  10,  21,  51,  21,   0,   6,   0],
       [  0,   0,   2,  26,  93,   7,   5,   3],
       [  0,   0,   2,   0,  18,  35,   6,   1],
       [  0,   1,   4,  10,  14,   7,  35,   1],
       [  0,   0,   0,   1,   1,   0,   0,  17]])

In [712]:
confusion_matrix(np.argmax(P, axis=1), np.argmax(Qfinal, axis=1))

array([[17,  4,  2,  1,  0,  0,  0,  0],
       [ 3, 99, 29,  4,  4,  0,  0,  0],
       [ 0, 24, 72, 11,  2,  0,  2,  0],
       [ 0,  8, 19, 57, 16,  1,  8,  0],
       [ 0,  3,  3, 28, 85,  8,  4,  5],
       [ 0,  0,  2,  0, 14, 39,  5,  2],
       [ 0,  1,  2,  5,  9,  8, 47,  0],
       [ 0,  0,  0,  0,  0,  4,  2, 13]])

In [713]:
subtypes = pd.read_csv('benchmark_pfvc_subtypes.csv')
subtypes['subtype'] = np.argmax(Qfinal, axis=1) + 1
subtypes.to_csv('benchmark_pfvc_1y_subtypes_multinom6.csv', index=False)

In [714]:
!Rscript score_predictions.R benchmark_pfvc_1y_subtypes_multinom6.csv

Loading required package: methods
Source: local data frame [4 x 2]

     bin   mae
1  (1,2]  4.55
2  (2,4]  6.57
3  (4,8]  8.71
4 (8,25] 10.69
Source: local data frame [8 x 5]

  true_subtype (1,2] (2,4] (4,8] (8,25]
1            1  5.68  9.47  7.15  16.67
2            2  3.59  4.37  5.56   6.34
3            3  4.23  4.94  6.67   7.19
4            4  4.22  6.71  9.89  12.47
5            5  5.10  6.63 10.52  16.46
6            6  4.73  8.24 10.27   9.21
7            7  5.03  8.89 10.70  16.22
8            8  5.38  6.38  7.98   1.53
Source: local data frame [8 x 9]

  true_subtype    1     2     3     4     5     6     7     8
1            1 3.60  8.80 43.35 52.91    NA    NA    NA    NA
2            2 8.33  3.34  8.36 11.52 11.93    NA    NA    NA
3            3   NA 10.19  3.72 11.33  3.09    NA 24.22    NA
4            4   NA 16.59  8.22  3.15 13.02  4.46 11.46    NA
5            5   NA 15.29  5.97 10.69  4.79  8.72 24.03 29.78
6            6   NA    NA 21.57    NA 11.88  4.45 12.24 1

### Conclusions and Next Steps

One surprising result is that the accuracy of the MAP estimate does not necessarily translate into unchanged classification performance. The MAP accuracies are not too different, but we see some reasonable improvements in prediction accuracy.

There are a few outlying issues here.

1. What is the best way to train the generative models used to compute the likelihood ratios in the discriminatively trained component?

2. Is matching the full information posterior the best way to learn the adjustment? Are there other objectives that we should consider when tuning? E.g. should we try to maximize the probability of future measurements instead?

3. Should the learning task be broken up? Estimating a multinomial distribution seems difficult, especially when working with a relatively constrained parameterization. Is there a way that we can find a better way to leverage the information? For example, should we estimate partitions of the subtypes instead?

#### Additional outstanding issues

1. Should we be using different sets of weights for each likelihood ratio model? Or should they be the same? This issue does not come up in the case where we adjust the weighing of evidence to decide between two classes, so it's not clear how to proceed.

2. Can the problem be thought of as one where the a priori probabilities are changing? There is a method that looks at how to adjust a classifier when the a priori probability of classes are different in the test set than in the training set.

### Bayesian Decision Theory Approach

In [820]:
C = confusion_matrix(np.argmax(P, axis=1), np.argmax(Q1, axis=1))
C

array([[ 17,   4,   2,   1,   0,   0,   0,   0],
       [  3, 103,  21,   7,   4,   1,   0,   0],
       [  0,  24,  77,   6,   3,   0,   1,   0],
       [  0,  10,  21,  51,  21,   0,   6,   0],
       [  0,   0,   2,  26,  93,   7,   5,   3],
       [  0,   0,   2,   0,  18,  35,   6,   1],
       [  0,   1,   4,  10,  14,   7,  35,   1],
       [  0,   0,   0,   1,   1,   0,   0,  17]])

In [821]:
z_true = np.argmax(P,  axis=1)
z_pred = np.argmax(Q1, axis=1)

In [822]:
X = np.c_[ np.ones(Q1.shape[0]), StandardScaler().fit_transform(Q1)[:, 1:] ]
Y = P.copy()

def make_opt_problem(predicted, z_pred=z_pred, X=X, Y=Y):
    i  = z_pred == predicted
    Xi = X[i]
    Yi = Y[i]
    W0 = np.random.normal(size=(Yi.shape[1], Xi.shape[1]))
    f  = lambda w: -sum(softmax.regression_ll(x, y, w.reshape(W0.shape)) for x, y in zip(Xi, Yi))
    g  = lambda w: -sum(softmax.regression_ll_grad(x, y, w.reshape(W0.shape)) for x, y in zip(Xi, Yi)).ravel()
    return W0, f, g

In [824]:
solutions = []
for i in range(Y.shape[1]):
    print('Fitting model {}'.format(i))
    W0, f, g = make_opt_problem(i)
    s = opt.minimize(f, W0.ravel(), jac=g, method='BFGS')
    solutions.append(s)

Fitting model 0
Fitting model 1
Fitting model 2
Fitting model 3
Fitting model 4
Fitting model 5
Fitting model 6
Fitting model 7


In [825]:
[s.success for s in solutions]

[True, True, False, True, False, True, True, False]

In [826]:
z_adju = z_pred.copy()
for i, s in enumerate(solutions):
    if not s.success:
        continue
        
    W = s.x.reshape((len(solutions), -1))
    i = z_pred == i
    Q = np.array([softmax.regression_proba(x, W) for x in X[i]])
    z_adju[i] = np.argmax(Q, axis=1)

In [827]:
np.mean(z_true == z_pred)

0.63690476190476186

In [828]:
confusion_matrix(z_true, z_pred)

array([[ 17,   4,   2,   1,   0,   0,   0,   0],
       [  3, 103,  21,   7,   4,   1,   0,   0],
       [  0,  24,  77,   6,   3,   0,   1,   0],
       [  0,  10,  21,  51,  21,   0,   6,   0],
       [  0,   0,   2,  26,  93,   7,   5,   3],
       [  0,   0,   2,   0,  18,  35,   6,   1],
       [  0,   1,   4,  10,  14,   7,  35,   1],
       [  0,   0,   0,   1,   1,   0,   0,  17]])

In [829]:
np.mean(z_true == z_adju)

0.65327380952380953

In [830]:
confusion_matrix(z_true, z_adju)

array([[ 17,   4,   2,   0,   1,   0,   0,   0],
       [  3, 102,  25,   3,   5,   1,   0,   0],
       [  0,  20,  82,   6,   3,   0,   0,   0],
       [  0,  10,  25,  40,  29,   0,   5,   0],
       [  0,   0,   3,  11, 111,   4,   4,   3],
       [  0,   0,   2,   0,  19,  32,   8,   1],
       [  0,   1,   5,   5,  17,   5,  38,   1],
       [  0,   0,   0,   0,   2,   0,   0,  17]])

In [781]:
W = s.x.reshape(W0.shape)

In [782]:
Q = np.array([softmax.regression_proba(x, W) for x in X])

In [783]:
Q.shape

(53, 8)

In [790]:
np.argmax(Q, axis=1)

array([3, 6, 6, 6, 6, 6, 6, 4, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
       6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
       6, 3, 6, 6, 6, 6, 6])

In [792]:
(z_true[i] == np.argmax(Q, axis=1)).mean()

0.71698113207547165

In [793]:
(z_true[i] == z_pred[i]).mean()

0.660377358490566

In [794]:
z_tmp = z_pred.copy()
z_tmp[i] = np.argmax(Q, axis=1)
confusion_matrix(z_true, z_tmp)

array([[ 17,   4,   2,   1,   0,   0,   0,   0],
       [  3, 103,  21,   7,   4,   1,   0,   0],
       [  0,  24,  77,   7,   3,   0,   0,   0],
       [  0,  10,  21,  53,  21,   0,   4,   0],
       [  0,   0,   2,  26,  94,   7,   4,   3],
       [  0,   0,   2,   0,  18,  35,   6,   1],
       [  0,   1,   4,  10,  14,   7,  35,   1],
       [  0,   0,   0,   1,   1,   0,   0,  17]])

In [831]:
z_pred1 = np.argsort(Q1, axis=1)[:, -1]
z_pred2 = np.argsort(Q1, axis=1)[:, -2]

In [837]:
correct = z_true == z_pred1
z_comb = z_pred1.copy()
z_comb[~correct] = z_pred2[~correct]

In [840]:
(z_comb == z_true).mean()

0.84970238095238093

In [841]:
confusion_matrix(z_true, z_comb)

array([[ 20,   0,   1,   0,   3,   0,   0,   0],
       [  0, 129,   3,   2,   3,   2,   0,   0],
       [  2,   1, 101,   3,   1,   3,   0,   0],
       [  1,   7,   9,  86,   2,   2,   2,   0],
       [  0,   0,   1,   1, 121,   3,   7,   3],
       [  0,   0,   0,   3,   4,  53,   2,   0],
       [  0,   3,   2,   8,   8,   7,  44,   0],
       [  0,   0,   0,   0,   0,   1,   1,  17]])

In [842]:
subtypes = pd.read_csv('benchmark_pfvc_subtypes.csv')
subtypes['subtype'] = z_comb + 1
subtypes.to_csv('benchmark_pfvc_1y_subtypes_combined.csv', index=False)

In [843]:
!Rscript score_predictions.R benchmark_pfvc_1y_subtypes_combined.csv

Loading required package: methods
Source: local data frame [4 x 2]

     bin  mae
1  (1,2] 4.41
2  (2,4] 5.31
3  (4,8] 7.16
4 (8,25] 8.19
Source: local data frame [8 x 5]

  true_subtype (1,2] (2,4] (4,8] (8,25]
1            1  6.35  8.62  6.73  13.28
2            2  3.46  3.56  4.44   5.84
3            3  4.09  4.34  5.27   5.79
4            4  4.09  4.91  7.11   7.72
5            5  4.93  6.34  8.94   9.85
6            6  4.30  5.64  7.58  10.63
7            7  5.36  7.16 12.43  13.88
8            8  4.30  5.54  3.48   1.53
Source: local data frame [8 x 9]

  true_subtype     1     2     3     4     5     6     7     8
1            1  3.25    NA 16.09    NA 44.00    NA    NA    NA
2            2    NA  3.20  8.81 12.49 14.78 36.26    NA    NA
3            3 19.12  6.00  3.60 12.95  2.36 24.89    NA    NA
4            4 26.12 15.55 10.34  3.92  7.46  4.33 18.26    NA
5            5    NA    NA  6.73 18.90  4.27 14.99 25.58 31.99
6            6    NA    NA    NA  4.58 17.02  3.67 12.98