# Chapter 2 Training and Testing
## Exercises

#### Exercise 2.1
1. Positive rays: break point $k = 2$ and $m_{\mathcal{H}}(k) = k+1 = 3 \lt 2^k = 4$.
1. Positive intervals: break point $k = 3$ and $m_{\mathcal{H}}(k) = {k+1 \choose 2} +1 = 7 \lt 2^k = 8$.
1. Convex sets: no break point exists. For any $k$, we can find a set of $k$ points on a circle that can be shattered. 

#### Exercise 2.2
1. (a) 
  * (i) $k=2$, $RHS = \sum^{1}_{i=0}{N \choose i} = N+1$, while $m_{\mathcal{H}}(N) = N+1$, so $m_{\mathcal{H}}(N) \le \sum^{k}_{i=0}{N \choose i}$
  * (ii) $k=3$, $RHS = \sum^{2}_{i=0}{N \choose i} = \frac{N(N-1)}{2} + N+1 = \frac{N(N+1)}{2} + 1$, while $m_{\mathcal{H}}(N) = {N+1 \choose 2}+1 = \frac{N(N+1)}{2} + 1$, so $m_{\mathcal{H}}(N) \le \sum^{k}_{i=0}{N \choose i}$
  * (iii) There's no such $k$ exists. Maximum $k = N+1$, since $\sum^{N}_{i=0}{N \choose i} = 2^N$, we still have $m_{\mathcal{H}}(N) \le \sum^{k}_{i=0}{N \choose i}$
  
1. (b) If $m_{\mathcal{H}}(N) = N+2^{\frac{N}{2}}$, then the break point $k=3$. According to bound theorem 2.4, we have for all $N$, $m_{\mathcal{H}}(N) = N+2^{\frac{N}{2}} \le \sum^{2}_{i=0}{N \choose i} = \frac{N(N+1)}{2} + 1$. But this won't hold for all $N$ since left hand side is exponentially increasing while the RHS is polynomical increasing. For example, when $N=20$, the inequality breaks. So such hypothesis set doesn't exist. 

#### Exercise 2.3
1. (i) $d_{VC} = 1$
1. (ii) $d_{VC} = 2$
1. (iii) $d_{VC} = \infty$

#### Exercise 2.4 TODO
1. (a) 

#### Exercise 2.5
Through equation (2.12), we find that $\delta = 709.527509678$, so the probability is just greater or equal to zero. 

In [13]:
# Exercise 2.5
import numpy as np
N = 100
d = 0.1
mh = 2*N + 1
delta = 4*mh / np.exp(N * d**2 /8)
delta, np.exp(N * d**2 /8)

(709.5275096780147, 1.1331484530668263)

#### Exercise 2.6
* (a) Apply the error bar in $(2.1)$, i.e. $E_{out}(g) \le E_{in}(g) + \sqrt{\frac{1}{2N}\ln{\frac{2M}{\delta}}}$. 

The following calculation shows that error on $E_{in}(g) = 0.1151$ and error on $E_{test}(g) = 0.096$. So the error bar on in-sample error is higher than the error bar from test error. 

* (b) If we reserve more examples for testing, then we'll have less samples for training. We may end up with a hypothesis that is not as good as we could have arrived if using more training samples. So $E_{test}(g)$ might be way too off even the error bar on it is small.

In [11]:
# Exercise 2.6
import numpy as np
epsilon = 0.05

N = 200
# test bound
print('test bound: ', np.sqrt(np.log(2/epsilon)/2/N))

# train bound
M = 1000
N = 400
print('train bound: ', np.sqrt(np.log(2*M/epsilon)/2/N))

test bound:  0.09603227913199208
train bound:  0.11509037065006825


#### Exercise 2.7 
1. (a) 
\begin{align}
P[h(x)\ne f(x)] &= P[h(x)\ne f(x)]\cdot 1 + P[h(x) = f(x)]\cdot0 \\
&= P[h(x)\ne f(x)] (h(x)-f(x))^2 + P[h(x) = f(x)](h(x)-f(x))^2 \\
&= E[(h(x)-f(x))^2]
\end{align}

1. (b) 
\begin{align}
P[h(x)\ne f(x)] &= \frac{1}{4}P[h(x)\ne f(x)]\cdot 4 + \frac{1}{4}P[h(x) = f(x)]\cdot0 \\
&= \frac{1}{4}P[h(x)\ne f(x)] (h(x)-f(x))^2 + \frac{1}{4}P[h(x) = f(x)](h(x)-f(x))^2 \\
&= \frac{1}{4}E[(h(x)-f(x))^2]
\end{align}

#### Exercise 2.8

1. (a) If $\mathcal{H}$ is closed under linear combination, for any $x$, $\bar{g}(x)$ is weighted (by probability of data) average of hypotheses in $\mathcal{H}$, so $\bar{g}(x) \in \mathcal{H}$.

1. (b) If $\mathcal{H}$ is a set of functions defined on intervals, e.g. $f(x) = c$ when $x \in [a,b]$, otherwise $f(x) = 0$. Then $\bar{g}(x)$ probably won't have constant value in an interval and not in the original hypothesis set.

1. (c) For binary classification, each $g(x)$ will have value $+1$ or $-1$, when weighted by probabilities, the average is not binary any more. 

#### Problem 2.1

For $\epsilon \le k$, we have $N \ge \frac{1}{2k^2}ln\frac{2M}{\delta}$

In [10]:
def calc_N(M, delta, k):
    return np.log(2*M/delta)/2/k/k
delta = 0.03
k = 0.05
for M in [1,100, 10000]:
    N = calc_N(M, delta, k)
    print("Samples needed for M = {}:\tN = {}".format(M, N))


Samples needed for M = 1:	N = 839.9410155759854
Samples needed for M = 100:	N = 1760.9750527736032
Samples needed for M = 10000:	N = 2682.0090899712213


#### Problem 2.2
For $N=4$, we can pick points: $(1,3),(2,4),(3,1),(4,2)$. It's easy to see that these points are shattered by positive rectangles. So $m_{\mathcal{H}}(4) = 2^4$.

The idea is that for any two points, if we draw a rectangle using them as diagnoal points, the rectangle should NOT contain any other point. Otherwise, whenever the two diagnoal points have values 1, the middle point will have value 1 as well, which excludes the possibility of having -1. 

For $N=5$, if we draw horizontal and vertical lines through each of the four points above, the plane is divided into grids. The four points enclusing a 9-grid area. It's clear that the fifth point can't lie within the 9-grid area. Otherwise, there'll always a rectangle (constructed by two points) contains the fifth point. 

In the same way, if we place the fifth point outside the 9-grid area, it's easy to see that the point will always lie below or above at least two points (in either x or y direction). These three points construct a rectangle which contains a point in it. 
This shows that $m_{\mathcal{H}}(5) \lt 2^5$.

We have the VC dimension $d_{VC}(\mathcal{H}) = 4$, and $m_{\mathcal{H}}(N) \le \sum^{4}_{i=0} {N \choose i}$.

#### Problem 2.3
1. (a) $d_{VC}(\mathcal{H}) = 2$, $m_{\mathcal{H}}(N) = {N+1 \choose 1} + {N-1 \choose 1} = 2N$

1. (b) $d_{VC}(\mathcal{H}) = 3$, $m_{\mathcal{H}}(N) = {N+1 \choose 2}+1 +{N-1 \choose 2} = N^2 - N + 2$

1. (c) $d_{VC}(\mathcal{H}) = 2$, $m_{\mathcal{H}}(N) = {N+1 \choose 2}+1 = \frac{(N+1)N}{2} + 1$

#### Problem 2.4 TODO

#### Problem 2.5

For $N=1$, 