# Section II: Some fundamentals


In [None]:
# standard imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn
import scipy.special
# force plots to appear inline on this page
%matplotlib inline

## Overfitting and regularisation
### Overfitting 
### Penalisation
#### Lasso / ridge regression
### Randomised regularisation
#### Bootstrap / bagging

## Probabilistic approaches
#### Random variables
#### Distributions
#### Advantages of probabilistic modelling
#### Disadvantages of probabilistic modelling

## Probability theory
#### Probability as a calculus of belief
#### Axioms of probability
#### Bayes' Rule
#### Prior, likelihood, posterior
#### Integration over the evidence
##### Monte Carlo approximations
### Example: language modelling
#### n-gram character models
#### Bigram matrix
#### Sampling from the model
#### Evaluating likelihood

## Information theory
### Entropy
#### Shannon's law
#### Entropy examples
#### Mutual information
### Example: Fitts' law as an information theoretic model

## High-dimensional spaces
### What is a high-dimensional space?
### Why intuition is wrong
#### Example: cube-sphere volume
#### Example: Normal distribution in high-dimensions
### Dealing with high-D problems
#### Pre-training
#### Heavier tailed distributions


The volume of a n-D sphere with radius $1/2$  is $$ V_n(R) = \frac{\pi^{n/2}}{\Gamma({n/2}+1)}\frac{1}{2}^n$$ (i.e. inscribed in a hypersphere). The volume of a unit cube is $$1^n=1$$

In [None]:
def sphere_volume(n):
    return 0.5**n * np.pi**(n/2.0) / scipy.special.gamma(n/2.0+1)

In [None]:
x = np.arange(0,20)
plt.plot(x, [sphere_volume(xi) for xi in x])
plt.xlabel("Dimension")
plt.ylabel("Volume")
plt.figure()
plt.semilogy(x, [sphere_volume(xi) for xi in x])
plt.xlabel("Dimension")
plt.ylabel("Volume")

We can generate points randomly in a hypersphere. It's hard to visualise the hypersphere, but we can show the radii of points on a 2D circle:

In [None]:
def sphere_points(n, d):
    # generate points on the unit circle (uniformly)    
    xn = np.random.normal(0,1,(n,2))
    r = np.sqrt(np.sum(xn**2, axis=1))    
    surface_points = (xn.T/r).T
    
    # generate points on the unit d-dimensional hypershphere (uniformly)    
    xv = np.random.normal(0,1,(n,d))
    r_d = np.sqrt(np.sum(xv**2, axis=1))    
    d_surface_points = (xv.T/r_d).T
    
    # generate points on the unit line
    xt = np.random.normal(0,1,(n,1))
                
    # radii of points uniformly distributed in a n-d hypersphere
    # can be drawn by sampling using the formula below
    # [see: http://math.stackexchange.com/questions/87230/picking-random-points-in-the-volume-of-sphere-with-uniform-probability?rq=1 ]
    radius = np.random.uniform(0, 1, n) ** (1.0/d) * 0.5
    return (surface_points.T*radius).T, radius, (d_surface_points.T*radius).T

We can define a function to plot this. We'll show the 1D distribution of radii, the distribution of radii as if they were on a circle and the true 2D projection of the hypersphere points.

In [None]:
def plot_sphere_density(d):
    sphere_pts, line_pts, hyp_pts = sphere_points(2000,d)
    
    # plot the 2D sphere at the corresponding radius
    plt.scatter(sphere_pts[:,0], sphere_pts[:,1], alpha=0.5, s=2)    
    # plot 1D points at the corresponding radius
    plt.scatter(line_pts, np.zeros_like(line_pts), c='g', alpha=0.1, s=2)    
    # plot the 2D projection of the hypersphere points
    plt.scatter(-line_pts, np.zeros_like(line_pts), c='g', alpha=0.1, s=2)
    if d>1:
        plt.scatter(hyp_pts[:,0], hyp_pts[:,1], c='r', alpha=0.5, s=2)
    plt.axis("equal")

In [None]:
plot_sphere_density(1)

In [None]:
plot_sphere_density(2)

In [None]:
plot_sphere_density(3)

In [None]:
plot_sphere_density(8)

In [None]:
plot_sphere_density(128)

#### Conclusion
If you look at any 2D slice of a hypersphere, the points will appear to be concentrated at the centre as the dimension increases. But almost all of the volume is actually in a very thin shell at the outside of the sphere. This is quite counter-intuitive.

## Kernel trick: a useful high-d space
### Kernel basics
### Linear decisions in kernel space
### Kernel functions
### Non-real data: bag-of-words


## Dealing with time-series
### Windowing
### Delay embedding
### Derivative information

### Stateful algortihms
#### Hidden markov model
#### Kalman filter
#### Particle filter