# Foundations of Data Science - Solving the exercises

This notebook is an experiment with two purposes:

  * Analyze if jupyter + GitHub is a viable blogging platform.
  * Learn more about data science.
  
My idea is to do detailed solutions of each exercise, using the embedded Python to create plots and do numerical solutions.

## Chapter 2

### Exercise 2.1

The first two items ask to compute the expected values $E(x)$, $E(x^2)$, $E(x - y)$, $E(xy)$ and $E(x - y)^2$ for two cases:

  * When $x$ and $y$ are uniform variables in the interval $[0, 1]$.
  * When $x$ and $y$ are uniform variables in the interval $[-\frac{1}{2}, \frac{1}{2}]$.

#### Item 2.1.1

By symmetry we know that

$E(x) = \frac{1}{2}$ and

$E(x - y) = 0$.

Computing directly the other expressions:

$E(x^2) = \int_0^1 dx\,x^2 = \left.\frac{x^3}{3}\right|_0^1 = \frac{1}{3}$

$E(xy) = \int_0^1 dx \int_0^1 dy\,xy = \int_0^1 dx\,x \int_0^1 dy\,y = \left(\int_0^1 dx\,x\right)^2 = \left(\left. \frac{x^2}{2}\right|_0^1\right)^2 = \left(\frac{1}{2}\right)^2 = \frac{1}{4}$

$E(x - y)^2 = E(x^2 - 2xy + y^2) = E(x^2) - 2E(xy) + E(y^2) = \frac{1}{3} - 2\cdot\frac{1}{4} + \frac{1}{3} = \frac{1}{6}$

Using numpy to quickly check the values:

In [5]:
from pylab import *
from scipy import stats
np.random.seed(12345678)

In [6]:
x = stats.uniform.rvs(loc=0.0, scale=1.0, size=10000)
y = stats.uniform.rvs(loc=0.0, scale=1.0, size=10000)
print 'E(x) ~=', np.average(x)
print 'E(x^2) ~=', np.average(x ** 2)
print 'E(x - y) ~=', np.average(x - y)
print 'E(xy) ~=', np.average(x * y)
print 'E(x - y)^2 ~=', np.average((x - y) ** 2)

E(x) ~= 0.497501152581
E(x^2) ~= 0.330487553796
E(x - y) ~= -0.00298308911664
E(xy) ~= 0.248818147307
E(x - y)^2 ~= 0.167162877


#### Item 2.1.2

Now by symmetry we know that

$E(x) = 0$,

$E(x - y) = 0$ and

$E(xy) = 0$.

By translational invariance,

$E(x - y)^2 = \frac{1}{6}$ like in the previous item.

We can compute the variance directly,

$E(x^2) = \int_{-\frac{1}{2}}^{\frac{1}{2}} dx\,x^2 = 2\int_0^{\frac{1}{2}} dx\,x^2 = 2\left.\frac{x^3}{3}\right|_0^{\frac{1}{2}} = 2\frac{\frac{1}{8}}{3} = \frac{1}{12}$.

In [8]:
x = stats.uniform.rvs(loc=-0.5, scale=1.0, size=10000)
y = stats.uniform.rvs(loc=-0.5, scale=1.0, size=10000)
print 'E(x) ~=', np.average(x)
print 'E(x^2) ~=', np.average(x ** 2)
print 'E(x - y) ~=', np.average(x - y)
print 'E(xy) ~=', np.average(x * y)
print 'E(x - y)^2 ~=', np.average((x - y) ** 2)

E(x) ~= 0.00431562703003
E(x^2) ~= 0.08397723813
E(x - y) ~= 0.00536652416663
E(xy) ~= -2.09627695829e-06
E(x - y)^2 ~= 0.167627261595


#### Item 2.1.3

As every coordinate of each point will be independent, the result will be given by multiplying the value of $E(x - y)^2$ by $d$,

$E(\mathrm{dist}) = \frac{d}{6}$.

Checking for $d = 24$:

In [11]:
def dist_rv(d): 
    return np.sum((stats.uniform.rvs(loc=0.0, scale=1.0, size=d) -
                   stats.uniform.rvs(loc=0.0, scale=1.0, size=d)) ** 2)
print 'E(dist) ~=', np.average([dist_rv(24) for _ in range(10000)])

E(dist) ~= 4.00716505363
