# Bias-Variance

... on going ...

by reading a few papers [cite] the methodology is as follows...

The Loss of a Algorithm (not of a model!) can be decomposed in:
$$L(A) = Bias(A) + Variance(A) + Noise(A)$$
where:
 - Bias is how far is the model from the prediction
 - Variance is how sensitive (how changes) the prediction with different training sets (overfitting)
 - Noise is the irreducible error in the dataset (learner independent)

Given a dataset $D$, given a number of iteration $l$, we need to produce $l$ score predictions
for each element in $D$ with $l$ different models. If the loss function $L$ is MSE then everything is clear.
If the loss function is NDCG, no previous work.

  0. Shuffle $D$ and create two halves $D_1$ and $D_2$ (this should be query-aware with NDCG)
  0. learn a model from $D_1$ and apply it to $D_2$
  0. learn a model from $D_2$ and apply it to $D_1$
  0. store the scores produced so far
  0. Repeat all of the above $l$ times
  0. for each document/query compute the average prediction by averagin the $l$ stored so far (for ranking this is actually not defined, we may average predictions and then rank accordingly)
  0. Bias of a query/document is computed as the loss between the ideal prediction and the average prediction
  0. Average above for every query/document to obtain final Bias
  0. Variance of a query/document is computed as the average loss between the average prediction and every of the $l$ predictions
  0. Average above for every query/document to obtain final Variance
 
Usually Noise is assumed to be 0 as it is impossible to estimated it.
As a consequence, ometimes Variance is defined as $L(A)-Bias(A)$.

In every work, $L(A)$ should be equal to $Bias(A) + Variance(A)$. Is this true for NDCG? probably not ...

In principle, we may provide just the Loss and the Bias, and let the user compute variance by difference.

We might probalby solve the ranking problem, by saying that L is not 1-NDCG, but rather (1-NDCG)^2

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

from rankeval.core.dataset import Dataset
from rankeval.core.metrics import NDCG
from rankeval.core.metrics.rmse import RMSE
from rankeval.core.metrics.metric import Metric
from rankeval.analysis.statistical import bias_variance
import numpy as np
import math
import lightgbm
#from __future__ import print_function

%matplotlib inline
from matplotlib import pyplot as plt

In [2]:
dataset = Dataset.load("/Users/claudio/docs/LAVORO/coding/ltr/QuickScorer/debug_data/msn1.fold1.test.5k.txt", name="msn1")

In [3]:
def lgbm_algo(train_X, train_Y, train_q, test_X):
    params = {'num_leaves': 128, 'objective':'lambdarank',
             'learning_rate': 0.01, 'max_bin': 1024}

    training = lightgbm.Dataset(data=train_X, label=train_Y, group=train_q)
    
    bst = lightgbm.train(params, training, num_boost_round=10)
    
    # check the number of trees and other params is correct
    # print ( len(bst.dump_model()[u'tree_info']) )
    
    return bst.predict(test_X)

In [15]:
bias_variance(dataset, algo=lgbm_algo, metric="mse", L=5, k=3, verbose=2)

 + Dataset scoring 0 of 5
 - Processing fold 0 of 3
 - Processing fold 1 of 3
 - Processing fold 2 of 3
 + Dataset scoring 1 of 5
 - Processing fold 0 of 3
 - Processing fold 1 of 3
 - Processing fold 2 of 3
 + Dataset scoring 2 of 5
 - Processing fold 0 of 3
 - Processing fold 1 of 3
 - Processing fold 2 of 3
 + Dataset scoring 3 of 5
 - Processing fold 0 of 3
 - Processing fold 1 of 3
 - Processing fold 2 of 3
 + Dataset scoring 4 of 5
 - Processing fold 0 of 3
 - Processing fold 1 of 3
 - Processing fold 2 of 3


(1.0553246, 1.0552039, 0.00012066727)

In [10]:
a = bias_variance(dataset, algo=lgbm_algo, metric=NDCG(cutoff=10), L=10, k=2)

(' + Dataset scoring', 0, 'of', 10)
(' + Dataset scoring', 1, 'of', 10)
(' + Dataset scoring', 2, 'of', 10)
(' + Dataset scoring', 3, 'of', 10)
(' + Dataset scoring', 4, 'of', 10)
(' + Dataset scoring', 5, 'of', 10)
(' + Dataset scoring', 6, 'of', 10)
(' + Dataset scoring', 7, 'of', 10)
(' + Dataset scoring', 8, 'of', 10)
(' + Dataset scoring', 9, 'of', 10)


In [8]:
print (a)

(0.42988306, 0.41835901, 0.011524014)


In [9]:
m = NDCG(cutoff=10)