## Moment matching for averages 

In [1]:
import numpy as np
import pandas as pd

from pandas import Series
from pandas import DataFrame

from numpy.random import random

from plotnine import *

# Local imports
from convenience import *

In [2]:
np.random.seed(1351430195)

## Crossvalidation error as a double average

To derive a naive confidence intervals we just have to assume that:
* all crossvalidation spits are independent
* for each split we get a model with the same risk (true test error)

For clarity, let us consider 10-fold crossvalidation on 1000 element dataset.
* Let the true test error be $0.75$. 
* Then each prediction is correct with pobability $0.75$.
* Testing error for each split is just a mean of 100 element cointoss sequence.

In [3]:
# Observations on single fold
display(1*(random(10) <= 0.75))

# Observations for all folds
display([1*(random(10) <= 0.75) for test_fold in range(10)])

# Corresponding dataframe
df = DataFrame([[test_fold, 1*(random(100) <= 0.75)] for test_fold in range(10)], 
               columns=['test_fold', 'observations'])
display(df)

array([1, 1, 1, 0, 1, 1, 1, 1, 1, 1])

[array([1, 1, 1, 1, 0, 1, 0, 1, 0, 1]),
 array([1, 0, 0, 1, 1, 1, 1, 0, 1, 1]),
 array([1, 1, 1, 1, 0, 1, 1, 1, 1, 1]),
 array([1, 0, 1, 1, 0, 1, 1, 1, 1, 1]),
 array([0, 1, 1, 1, 1, 0, 1, 1, 1, 1]),
 array([1, 1, 1, 1, 1, 0, 0, 1, 1, 1]),
 array([0, 1, 1, 0, 1, 1, 1, 0, 1, 1]),
 array([1, 1, 1, 1, 1, 0, 1, 1, 1, 1]),
 array([0, 1, 1, 0, 1, 1, 1, 1, 1, 0]),
 array([1, 1, 1, 1, 1, 0, 1, 1, 1, 1])]

Unnamed: 0,test_fold,observations
0,0,"[1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, ..."
1,1,"[1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, ..."
2,2,"[1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, ..."
3,3,"[1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..."
4,4,"[0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, ..."
5,5,"[0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, ..."
6,6,"[1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, ..."
7,7,"[1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, ..."
8,8,"[1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, ..."
9,9,"[1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, ..."


## Empirical mean and variance estimates

In [4]:
df = (df
      .assign(E=lambda df: df['observations'].apply(np.mean))
      .assign(observation_variance=lambda df: df['observations'].apply(np.var))
      .assign(observation_mean=lambda df: df['observations'].apply(np.mean)))
display(df)      

Unnamed: 0,test_fold,observations,E,observation_variance,observation_mean
0,0,"[1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, ...",0.7,0.21,0.7
1,1,"[1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, ...",0.82,0.1476,0.82
2,2,"[1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, ...",0.75,0.1875,0.75
3,3,"[1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...",0.7,0.21,0.7
4,4,"[0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, ...",0.74,0.1924,0.74
5,5,"[0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, ...",0.71,0.2059,0.71
6,6,"[1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, ...",0.73,0.1971,0.73
7,7,"[1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, ...",0.78,0.1716,0.78
8,8,"[1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, ...",0.76,0.1824,0.76
9,9,"[1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, ...",0.7,0.21,0.7


* From this table it is clear that the variance of an individual observation is aproximately $0.19$

* The corresponding variance of E  is appoximately $0.0014$.

In [10]:
display(np.mean(df['observation_variance']))
np.var(df['E'])

0.19144999999999998

0.001429000000000001

## Theoretical variance estimate

* By construction $E=\frac{1}{100}(I_1+\ldots+I_{100})$ where $I_i$ is the single observation.
* As all observations are independent variance sum for $I_1+\ldots+I_{100}$.
* As coefficience goes come out as squares from the variance

\begin{align*}
Var(E)=\frac{1}{100^2}\cdot Var(I_1+\ldots+I_{10})=\frac{1}{100}\cdot Var(I) 
\end{align*}

This really is confirmed by our observations.    

In [11]:
display(0.19/100)
np.var(df['E'])

0.0019

0.001429000000000001

## Theoretical variance estimate for the average test error

Note that $\bar{E}=\frac{1}{10}(E_1+\cdots+E_{10})$ is just another average of independent observations with the same distribution. Thus
\begin{align*}
Var(\bar{E})=\frac{1}{10}\cdot Var(E_j)=\frac{1}{10}\cdot\frac{1}{100}\cdot Var(I_i)
= \frac{1}{1000}\cdot Var(I_i)
\end{align*}
The factor $\frac{1}{1000}$ is not a coincidence as we 
\begin{align*}
\bar{E}=\frac{1}{1000}\cdot (I_1+\ldots+I_{1000})
\end{align*}
if we open all brakets.
Thus, the naive variance estimate for the crossvalidation error has the same variance as the holdout sample with the same size.  