# High Dimensionality Sprint Challenge

High Dimensionality is characterized by the _Curse of Dimensionality_. Humans can visualize 2d and 3d, but not anything higher. How do we reason about data?

High Dimensionality Objectives include:
* Normalize a vector and represent it by its magnitude and unit direction vector
* Compute L1 and L2 distances of a pair of vectors
* Compute the distance between pairs of vectors and select the smallest

In [0]:
# LAMBDA SCHOOL
#
# MACHINE LEARNING
#
# MIT LICENSE

import numpy as np

# 1. Normalize a vector into magnitude and unit vector

In [0]:
v = np.array([17,7,5,2,1])

# m = Magnitude of v
m = np.linalg.norm(v)

# v_u = unit vector of v such that u * v_u = v
v_u = v / m

In [3]:
print(v_u)
print(m * v_u)

[0.88618626 0.36490022 0.26064302 0.10425721 0.0521286 ]
[17.  7.  5.  2.  1.]


# 2. Compute the L1 and L2 distances of a pair of vectors

Recall that the L1 distance is the "manhattan" distance, the sum of absolute values of a vector:

$d_{v}^{L1} = \sum{|v_i|}$

The L2 distance or "euclidean" distance is the square root of the sum of the squares of a vector:

$d_{v}^{L2} = \sqrt{\sum{v_i^2}}$

[`np.linalg.norm`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html) has an `ord` argument that can be set to specify the order of the norm specified. If it is not set, then the L2 norm of a vector will be computed.

In [0]:
a = np.array([1,2,0,1,2,0,1,2])
b = np.array([9,2,4,1,1,5,0,2])

# l1 = the L1 distance between a and b
l1 = np.linalg.norm(a-b, ord=1)

# l1 = the L2 distance between a and b
l2 = np.linalg.norm(a-b)

In [5]:
print('L1 distance: {}\nL2 distance: {}'.format(l1, l2))

L1 distance: 19.0
L2 distance: 10.344080432788601


# 3. Compute the distance between pairs of vectors and select the smallest

For this sprint challenge goal, you are _NOT_ required to compute the pairwise distance of every point in a set. Instead, find the row $i$ with the shortest distance between $x_i$ and $y_i$, two separate sets of $2d$ points.

[`np.argmin`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.argmin.html) returns the index of the minimum element of an array, while [`np.amin`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.amin.html) returns the value of the minimum element.

I intend to return both of these in the function, and either assign both `i` and `d` on the same line. Alternatively, we could call the function twice and index into it separately:

``` 
i = get_nearest_pair(x, y)[0]
d = get_nearest_pair(x, y)[1]
```

I choose to do the multipleassignment, as it is more efficient.

In [0]:
x = np.array([[10.,  2.,  0.,  3., -4.,  1., -5., 10.,  7.,  6.],
              [-4.,  4., -0.,  4., -9.,  1., -2.,  2.,  1., -1.]]).T

y = np.array([[-7., 10., -8.,  7., -5., -4.,  0.,  7., -9., -2.],
              [-7.,  2.,  4.,  7., -5., 10., -2., -2.,  1., -2.]]).T

# Your code to find the pair of points at row i with the shortest distance between them 
def get_nearest_pair(x, y):
    dist_list = [np.linalg.norm(x[i]-y[i]) for i in range(x.shape[0])]
    return np.argmin(dist_list), np.amin(dist_list)

# i: row index of the shorest distance, d: distance between x_i and y_i
i, d = get_nearest_pair(x, y)

In [7]:
print(i)
print(d)
print(np.linalg.norm(x[i]-y[i]))

4
4.123105625617661
4.123105625617661
