---
title: Distance and Nearest Neighbors
subject: Inner Products and Norms
subtitle: Measuring how close two vectors are
short_title: Distance and Nearest Neighbors
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: Distance, Nearest Neighbors
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

## Reading

Material related to this page, as well as additional exercises, can be found in VMLS 3.2.

## Learning Objectives

By the end of this page, you should know:
- What is the Euclidean distance between two vectors?
- What are the properties of a general distance function?

# The Euclidean Distance

A distance function, or metric, describes how far apart 2 points are.

A familiar starting point for our study of distances will be the Euclidean distance, which is closely related to the Euclidean norm on $\mathbb R^n$:

:::{prf:definition} The Euclidean Distance
:label: euclidean_distance_defn

For vectors $\vv u, \vv v \in \mathbb{R}^n$, the Euclidean distance is defined as the Euclidean norm of their difference $\vv u - \vv v$. In other words,

\begin{align*}
    \text{dist}(\vv u, \vv v) = \| \vv u - \vv v\| = \sqrt{\langle \vv u - \vv v, \vv u - \vv v \rangle}
\end{align*}
:::

Note that this is measuring the length of the arrow drawn from point $\vv x$ to point $\vv y$:

:::{figure}../figures/04-euc_dist.png
:label:Euclidean distance
:alt: Euclidean distance bewteen 2 vectors $\vv x$ and $\vv y$
:width: 200px
:align: center
:::

````{exercise}  Euclidean distance
:label: distance-ex1

Find the Euclidean distance between $\bm 1\\ 2 \em$ and $\bm 3 \\ 4 \em$.

```{solution} distance-ex1
:class: dropdown

We have

\begin{align*}
    \text{dist}\left(\bm 1\\ 2 \em, \bm 3\\ 4 \em\right) = \left\| \bm 1\\ 2 \em - \bm 3\\ 4 \em \right\| = \sqrt{ (1 - 3)^2 + (2 - 4)^2} =\boxed{2\sqrt 2}
\end{align*}

```
````

# General Distances

In this course, we will only work with the Euclidean distance. However, given any vector space with a general norm (i.e., $\mathbb{R}^n$ with the Euclidean norm), we may construct a distance function as the norm of their difference. This leads us to a more general notion of distances:

:::{prf:definition} General Distances
:label: general_distance_defn

For a set $S$, a function $d : S \times S \to \mathbb R$ is a distance function, or metric, if it satisfies the following:

1. **Symmetry.** For all $x, y \in S$,

\begin{align*}
    d(x, y) = d(y, x)
\end{align*}

2. **Positivity.** For all $x, y \in S$, 

\begin{align*}
    d(x, y) \geq 0
\end{align*}
and $d(x, y) = 0$ if and only if $x = y$.

3. **Triangular Inequality.** For all $x, y, z \in S$,

\begin{align*}
    d(x, z) \leq d(x, y) + d(y, z)
\end{align*}

:::

Try to convince yourself why the [Euclidean distance](#euclidean_distance_defn) fits this definition. 

When the distance $\| \vv x - \vv y \|$ between two vectors $\vv x, \vv y \in V$ is small, we say they are "close." If the distance between $\| \vv x - \vv y \|$ is large, we say they are "far." What constitutes close or far is typically application dependent.

Note that one vector space can admit many distance functions. From here on, unless otherwise mentioned, we will only be considering the [Euclidean distance](#euclidean_distance_defn).

:::{prf:example} Matrix norms and their induced distances
:label:distance-feature_distance

In this example, we will introduce a class of norms over $\mathbb{R}^n$, as well as their associated distance functions.

If $\vv x, \vv y \in V$ are vectors containing *features* of two objects, $\|\vv  x - \vv  y\|$ is called the *feature distance*. It gives a measure of how "different" two objects are. 

For example, suppose each vector represents a patient in a hospital with entries such as age, weight, height, and test results. We can use $\| \vv x - \vv y\|$ to check if patients $\vv x$ and $\vv y$ are "close" to each other with respect to these features.
:::

:::{prf:example} Matrix norms and their induced distances
:label:distance-feature_distance

If $\vv x, \vv y \in V$ are vectors containing *features* of two objects, $\|\vv  x - \vv  y\|$ is called the *feature distance*. It gives a measure of how "different" two objects are. 

For example, suppose each vector represents a patient in a hospital with entries such as age, weight, height, and test results. We can use $\| \vv x - \vv y\|$ to check if patients $\vv x$ and $\vv y$ are "close" to each other with respect to these features.
:::

# Applications of Distances

:::{prf:example} Feature distances
:label:distance-feature_distance

If $\vv x, \vv y \in V$ are vectors containing *features* of two objects, $\|\vv  x - \vv  y\|$ is called the *feature distance*. It gives a measure of how "different" two objects are. 

For example, suppose each vector represents a patient in a hospital with entries such as age, weight, height, and test results. We can use $\| \vv x - \vv y\|$ to check if patients $\vv x$ and $\vv y$ are "close" to each other with respect to these features.
:::

:::{prf:example} Nearest neighbors
:label:distance-nearest_neighbors

Suppose we are given a collection $\vv {z_1}, ..., \vv {z_m} \in V$ of $m$ vectors living in a vector space $V$. We say that $\vv{z_j}$ is the *nearest neighbor* of $\vv {x}$ among the vectors $\vv {z_1}, ..., \vv {z_m} \in V$ if 

\begin{align*}
    \| \vv x - \vv{z_j} \| \leq \| \vv x - \vv{z_i} \| \quad\text{for i = 1, ..., m}
\end{align*}

In words, this means $\vv{z_j}$ is the closest vector to $\vv x$ among $\vv{z_1}, ..., \vv{z_m}$. This is illustrated below; we note that the nearest neighbor may not be unique (e.g., if several $\vv{z_i}$ satisfy the condition above).

:::{figure}../figures/04-nearest_neighbor.png
:label:Nearest neighbor
:alt: Nearest neighbor to a vector $\vv x \in \mathbb{R}^2$
:width: 400px
:align: center
:::