# Homework 10

## Problem 1: concentration of angles in high dimensions

In the workbook, we saw that for a random vector $x \in \mathbb{R}^d$ sampled uniformly at random from the unit ball $B_2 = \{x\in \mathbb{R}^d : \|x\|_2 \leq 1\}$, the distance from $x$ to the origin begins to _concentrate_ around 1 as $d$ grows. That is, the "mass" of the unit ball concentrates at the edge of the ball. We also saw that the _pairwise distance_ between two points $x_i$ and $x_j$ concentrates as $d$ gets large.

In this problem, we will see that a similar concentration phenomenon happens between angles in high dimensions. Since we will again be working with samples from the unit ball in this problem, we include the following function to sample uniformly from $B_2$.

In [1]:
import numpy as np

def sample_from_ball(n, d):
    ## sample n points from the d dimensional ball
    random_vec = np.random.randn(d, n)
    random_vec /= np.linalg.norm(random_vec, axis=0)
    random_magnitude = np.random.rand(n) ** (1/d)
    return (random_vec * random_magnitude).T

### Part A

Recall that the angle between two vectors $x$ and $y$ in $\mathbb{R}^d$ can be calculated as follows:

$$
\theta = \arccos\left(\frac{x^\top y}{\|x\|_2 \|y\|_2}\right)
$$

Write a function `angle_between_vectors(x,y)` which computes the angle between two vectors $x$ and $y$.

### Part B
For $d = 3, 5, 10, 100, 1000$, sample $n=200$ points from the unit ball. For each pair of points $x_i, x_j$ with $i\neq j$, compute the angle between $x_i$ and $x_j$ using your function from Part A. Make a histogram of these angles for each value of $d$ (you can either have 5 different plots, or plot all the histograms together). Note: you should set $[0,\pi]$ as your $x$-axis (if you're using matplotlib, you can do this with `plt.xlim(0,np.pi)`).

### Part C
You should notice that the angles are concentrating around a particular value as $d$ gets large. What value is this? What does this mean about two random vectors in $d$ dimensions?

## Problem 2: sampling from a high dimensional box

In this problem, we will investigate sampling from the $d$-dimensional "box" $[-1,1]^d$ (note that this is the same as the $L_\infty$ unit ball: $B_\infty = \{x\in \mathbb{R}^d : \|x\|_\infty \leq 1\}$). We will see that some of the properties of the $L_2$ ball $B_2$ translate to the box, while others do not.

### Part A
Define a function `sample_from_box(n,d)` which draws $n$ samples from the $d$-dimensional box $[-1,1]^d$. (Hint: you can use the numpy function `np.random.uniform`. Your function should return an $n\times d$ array.)

### Part B
For each $d=3,5,10,100,1000$, generate $n=1000$ samples from the $d$-dimensional box, and for each $d$, make a histogram of the distances of the points from the origin. Also, compute the mean and variance of the distances for each $d$. What happens as $d$ gets larger? Do the distances to the origin concentrate?

### Part C
Next, let's see what happens when we rescale the samples from Part B. Repeat the same steps from Part B, but this time divide each of your samples by $\sqrt{d}$. Now what happens as $d$ gets larger? Do the distances to the origin concentrate now?