# L<sup>*p* </sup>-spaces and _p_-norms

Interactive visualizations of L<sup>*p* </sup>-spaces and p-norms and their use in numerical linear algebra.

### Imports

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import bqplot as bq
from ipywidgets import interactive
from mpl_toolkits.mplot3d import Axes3D
import plotly.plotly as py
import plotly.graph_objs as go
import plotly
import torch as tc

plotly.offline.init_notebook_mode(True)

%matplotlib inline

### Helper methods

In [None]:
def update_lines(change=None):
    with diag.hold_sync():
        diag.x = [np.min(scat.x), np.max(scat.x)]
        diag.y = [np.min(scat.y), np.max(scat.y)]
    with xlin.hold_sync():
        xlin.x = [np.min(scat.x), np.max(scat.x)]
        xlin.y = [np.min(scat.y), np.min(scat.y)]
    with ylin.hold_sync():
        ylin.x = [np.max(scat.x), np.max(scat.x)]
        ylin.y = [np.min(scat.y), np.max(scat.y)]
    with xlin_1.hold_sync():
        xlin_1.x = [np.min(scat.x), np.max(scat.x) / 2]
        xlin_1.y = [np.max(scat.y) / 2, np.max(scat.y) / 2]
    with ylin_1.hold_sync():
        ylin_1.x = [np.min(scat.x), np.min(scat.x)]
        ylin_1.y = [np.min(scat.y), np.max(scat.y) / 2]
    with xlin_2.hold_sync():
        xlin_2.x = [np.max(scat.x) / 2, np.max(scat.x)]
        xlin_2.y = [np.max(scat.y), np.max(scat.y)]
    with ylin_2.hold_sync():
        ylin_2.x = [np.max(scat.x) / 2, np.max(scat.x) / 2]
        ylin_2.y = [np.max(scat.y) / 2, np.max(scat.y)]

In [None]:
def plot_pnorm2d(p, r):
    pts = 2 * r * np.random.ranf((3000, 2)) - r
    idx = np.linalg.norm(pts, p, axis=1) < r
    plt.scatter(pts[idx,0], pts[idx,1])
    plt.axis('equal')

## *p*-norms

**Definition**

For a vector $x=(x_1, x_2,\ldots, x_n)$ in $\Bbb R^n$ we can calculate its length as $\sqrt{x_1^2+x_2^2+\ldots +x_n^2}$. This formula, expressed as $\lVert x \rVert_2$, is commonly referred to as the Euclidean norm; however, it is also referred to as the 2-norm in the more general class of *p*-norms. A _p_-norm for a vector *x* in ${\Bbb R}^n$is defined as the following:

$$ \lVert x\rVert_p=(|x_1|^p+|x_2|^p+\ldots +|x_n|^p)^{\frac{1}{p}} $$

### A Little Intuition

Before delving further into the topic of *p*-norms, here is a little example that helped me get a more visual understanding of it:

To begin, let's start in $\Bbb R^2$. If we have two points, then we can calculate the Euclidean distance between them quite easily.

The 1-norm is pretty simple too. It's just the distance you would have to travel along each axis to reach one point from another.

In [None]:
sc_x = bq.scales.LinearScale(min=0, max=10)
sc_y = bq.scales.LinearScale(min=0, max=10)

# initialize scatter and lines
scat = bq.marks.Scatter(x=[0, 10], y=[0, 10],colors=['black', 'black'], scales={'x': sc_x, 'y': sc_y},
                        enable_move=True)
diag = bq.marks.Lines(x=[], y=[], line_style='solid',scales={'x': sc_x, 'y': sc_y}, colors=['orange'])
xlin = bq.marks.Lines(x=[], y=[], line_style='solid',scales={'x': sc_x, 'y': sc_y}, colors=['red'])
ylin = bq.marks.Lines(x=[], y=[], line_style='solid',scales={'x': sc_x, 'y': sc_y}, colors=['blue'])
xlin_1 = bq.marks.Lines(x=[], y=[], line_style='dashed',scales={'x': sc_x, 'y': sc_y}, colors=['red'])
xlin_2 = bq.marks.Lines(x=[], y=[], line_style='dashed',scales={'x': sc_x, 'y': sc_y}, colors=['red'])
ylin_1 = bq.marks.Lines(x=[], y=[], line_style='dashed',scales={'x': sc_x, 'y': sc_y}, colors=['blue'])
ylin_2 = bq.marks.Lines(x=[], y=[], line_style='dashed',scales={'x': sc_x, 'y': sc_y}, colors=['blue'])

update_lines()

# update line on change of x or y of scatter
scat.observe(update_lines, names=['x'])
scat.observe(update_lines, names=['y'])
scat.update_on_move = True

ax_x = bq.axes.Axis(scale=sc_x)
ax_y = bq.axes.Axis(scale=sc_y, orientation='vertical')

fig = bq.figure.Figure(marks=[scat, diag, xlin, ylin, xlin_1, ylin_1, xlin_2, ylin_2],
                       axes=[ax_x, ax_y])
fig

Here we see that the 2-norm is just the length of the direct path from one point to another. On the other hand, the 1-norm is the sum of the components in the horizontal and vertical directions. The 1-norm is aptly named the Manhattan distance as it represents the distance a cab would have to travel from point A to B in the Manhattan grid.

Now that we have visualized the 1- and 2-norms in $\Bbb R^2$, let's try to visualize them in $\Bbb R^3$. If we extend the taxicab metaphor, we can imagine that the grid not only spans a horizontal plane but also allow for travel in the vertical direction (and we have taxicabs that can drive on them as well). In this case, the formula for 1- and 2-norms simply have to include the $x_3$ term as well.

### Unit "Circle"

When thinking about unit circles in different *p*-norms, it helped (for me) to not rely on any visuals, but instead focus on the math. Traditionally, a circle is, well, a circle; but if we focus on its definition, the concept of a unit circle for different norms becomes more natural.

*Definition* 

Circle: a round plane figure whose boundary consists of points equidistant from a fixed point.

For me, the key was to stop limiting myself to thinking of equidistant from a Euclidean perspective. Instead, I convinced myself that "distance" could be measured differently: 1-norm, 2-norm, 3-norm, etc.

If we set the distance equal to 1, then depending on our definition of distance, we get different unit circles.

Below is a visual showing "circles" of various radii for different p-norms.

In [None]:
f = lambda x, p: np.linalg.norm(x, ord=p, axis=2)

l = 2
dl = 0.1
X, Y = np.mgrid[-l:l:dl, -l:l:dl]
p_range = np.arange(0.1, 5.01, 0.1)

data = [go.Contour(
        z=f(np.dstack((X, Y)), p),
        x=np.arange(-l,l,dl),
        y=np.arange(-l,l,dl),
        contours=dict(
            coloring="lines",
            start=0.5,
            end=1.5,
            size=0.25
            ),
        line=dict(
            width=3
            ),
        visible=False) for p in p_range]
data[9]['visible'] = True

p_steps = []
for i in range(len(p_range)):
    step = dict(
        method='restyle',
        label="{0:0.1f}".format(p_range[i]),
        args=['visible', [False] * len(p_range)],
    )
    step['args'][1][i] = True
    p_steps.append(step)

sliders = [dict(
    active = 9,
    currentvalue = {"prefix": "p: "},
    pad = {"t": 50},
    steps = p_steps
)]

layout = go.Layout(
    height=600,
    width=600,
    sliders=sliders,
    yaxis=dict(scaleanchor="x", scaleratio=1)
)

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='pnorm-2d')

By adding another dimension, the unit circle can be extended to the unit sphere. 
Here we have a 3D isosurface plot of the unit sphere for various values of *p* and _r_.

In [None]:
f = lambda x, p: np.linalg.norm(x, ord=p, axis=0)

l = 1.5
X, Y, Z = np.mgrid[-l:l:0.25, -l:l:0.25, -l:l:0.25]
p_range = np.arange(0.4, 5.01, 0.1)

# Plot isosurface
data = [go.Isosurface(
            visible=False,
            x=X.flatten(),
            y=Y.flatten(),
            z=Z.flatten(),
            value = f(np.vstack((X.flatten(), Y.flatten(), Z.flatten())), p),
            isomin=1,
            isomax=1,
            surface=dict(count=1, fill=1),
            showscale=False) for p in p_range]
data[6]['visible'] = True

# p-slider
p_steps = []
for i in range(len(p_range)):
    step = dict(
        method='restyle',
        label="{0:0.1f}".format(p_range[i]),
        args=['visible', [False] * len(p_range)],
    )
    step['args'][1][i] = True
    p_steps.append(step)

sliders = [dict(
    active = 6,
    currentvalue = {"prefix": "p: "},
    pad = {"t": 50},
    steps = p_steps
)]

layout = go.Layout(
    height=600,
    width=600,
    sliders=sliders
)

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='pnorm-3d')

### *p*-norm in Infinite Dimensions

Take a look back at the p-norm:
$$ \\\lVert x\rVert_p=(|x_1|^p+|x_2|^p+\ldots +|x_n|^p)^{\frac{1}{p}} $$

If we plugged in $\infty$ for $p$, we would get:
$$ \\\lVert x\rVert_\infty=(|x_1|^\infty +|x_2|^\infty +\ldots +|x_n|^\infty)^{\frac{1}{\infty}} = \max_i |x_i| $$

Put simply, the $L^\infty$-norm is just the absolute value of the component of $x_n$ with the greatest absolute value.

## L<sup>*p* </sup>-spaces

## Applications in Numerical Linear Algebra

### Simple Example

One of the most useful aspects of 1- and 2-norms is their use in minimization problems.

Often in linear algebra, we are given a problem $Ax=b$ where $A$ and $b$ are known but an exact solution for $x$ does not exist. If $b$ does not exist in the column space of $A$, we cannot get an exact solution for $x$. In these cases, we resort to finding $\hat{x}$ that minimizes the error between $b$ and $A\hat{x}$. How we quantify that error then affects the solution we get for $\hat{x}$.

Let's first try this in 2D. We are given a point in $\Bbb R^2$ and some slope $m$, and we are tasked with finding another point on the line that passes through both points. Here's the catch: the other point has to have the smallest l1-norm of all possible points. 

The equation of a line that passes through a point $(x,y)$ with slope $m$ is $y=mx+b$. We can rewrite this as $-mx+y=b$. In matrix form, we get:

$$\begin{bmatrix} -m & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}=b $$

Our goal is to find the solution vector $\begin{bmatrix} x \\ y \end{bmatrix}$ for which the L1-norm is minimized. If we forget about linear algebra for a moment and think about what this equation is asking, we can see that any combination of $x$ and $y$ that satisfies the equation is a valid answer. However, our goal is to find the $x$ and $y$ such that $|x|+|y|$ is minimized for the given $m$ and $b$.

In this example, we have a point (2,3) with a slope varying from -5 to 5. By playing around with the slope slider, we can visualize how the l1-minimization works for this simple problem.

In [None]:
# Calculate x- and y-intercept for point (x,y) and slope m
f = lambda point, slope: point[1] - slope * point[0]

# Return radius of l1-square
g = lambda point, slope: abs(f(point, slope)) if abs(f(point, slope)) <= abs(-f(point, slope) / slope) else abs(-f(point, slope) / slope)

# Initial data
pt = [2, 3]
m_range = np.arange(-5, 5.01, 0.25)
x_range = np.array([-15, 15])

# Plot line
lines = [go.Scatter(
            x=x_range,
            y=m * x_range + f(pt, m),
            visible=False) for m in m_range]
lines[10]['visible'] = True

# Plot square
sq = [go.Scatter(
            x=[g(pt, m), 0, -g(pt, m), 0, g(pt, m)],
            y=[0, g(pt, m), 0, -g(pt, m), 0],
            visible=False) for m in m_range]
sq[10]['visible'] = True

# Define steps for slope slider
m_steps = []
for i in range(len(m_range)):
    step = dict(
        method='restyle',
        label="{0:0.1f}".format(m_range[i]),
        args=['visible', [False] * len(m_range)],
    )
    step['args'][1][i] = True
    m_steps.append(step)

# Create sliders
sliders = [dict(
    active = 10,
    currentvalue = {"prefix": "m (slope): "},
    pad = {"t": 50},
    steps = m_steps
)]

layout = go.Layout(
    height=600,
    width=600,
    sliders=sliders,
    xaxis=dict(
        range=[-15, 15]
    ),
    yaxis=dict(
        range=[-15, 15]
    ),
    showlegend=False
)

fig = dict(data=lines + sq, layout=layout)
py.iplot(fig, filename='l1opt_2d')

Though the plot does not present it well, it should be noted that when the slope of the line 
matches that of one of the sides of the square, there are an infinite number of solutions (all 
$(x,y)$ that lie on the side of the square). If we were using an L2-norm, we would simple replace the square with a circle.

This example can be extended to three dimensions to visualize how l1 minimization 
can be used in higher dimensions. In three dimensions, however, we can have the blue line in two dimensions
be a line in three dimensions or be a plane in three dimensions.



### Regularization

Another popular use of L1 and L2-norms is regularization.

$$  $$

In [28]:
def landscape(A, b, x=(-5, 5), y=(-5, 5), step=0.1, p=2):
    xx, yy = np.meshgrid(np.arange(x[0], x[1], step), np.arange(y[0], y[1], step))
    xy = np.dstack((xx, yy))
    
    loss = [[np.sqrt(sum(np.square(b - np.dot(A, xy[i, j])))) for j in range(xy.shape[0])] for i in range(xy.shape[1])]

    data = [go.Surface(
        x = xx,
        y = yy,
        z = loss,
        contours=go.surface.Contours(
            z=go.surface.contours.Z(
                show=True,
                usecolormap=True,
                highlightcolor="#42f462",
                project=dict(z=True)
            )
        )
    )]
    
    layout = go.Layout(
        title='Solution Landscape',
        scene = dict(
            zaxis=dict(
                title='loss',
                autorange=True
            )
        ),
        width=500,
        height=500,
    )
    
    fig = go.Figure(data=data, layout=layout)
    
    return py.iplot(fig, filename='solution-landscape')

# A = np.array([[1,1], [1,2], [1,3]])
# b = np.array([2, 3, 3])

# landscape(A, b, x=(-5, 5), y=(-5, 5), p=2)

Here I have plotted the landscape of possible solutions to the problem above:

In [29]:
# solution space of L1, L2-norm, overlap 3d surf plots. 0.4 -> 3-norm minimization solution

Let's create a linear system of equations that doesn't have an exact solution. $b$ is not in the column space (range) of $A$.

In [30]:
A = np.array([[1,1], [1,2], [1,3]])
b = np.array([[2], [3], [3]])

A, b

(array([[1, 1],
        [1, 2],
        [1, 3]]), array([[2],
        [3],
        [3]]))

In [31]:
landscape(A, b, x=(-5, 5), y=(-5, 5), step=0.5, p=2)

In [32]:
landscape(A, b, x=(-20, 20), y=(-20, 20), step=0.5, p=1)

### Clustering Algorithms

## Resources