---
title: Angles and the Cauchy–Schwarz Inequality
subject: Inner Products and Norms
subtitle: angle between vectors
short_title: Angles
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: Angle, Cauchy–Schwarz Inequality
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

## Reading

Material related to this page, as well as additional exercises, can be found in ALA Ch. 3.2.

## Learning Objectives

By the end of this page, you should know:
- what is the Cauchy-Schwarz Inequality
- the generalized angle between vectors
- what is orthogonality
- the triangle inequality
- what are norms 

## Generalized angle

Our starting point in defining the notion of angle in a general inner product space is the familiar formula
\begin{equation}
\label{angle_fam}
\vv v . \vv w = \|\vv v\| \|\vv w\| \cos(\theta),
\end{equation}
where $\theta$ measures the angle between $\vv v$ and $\vv w$.
:::{figure}../figures/04-angle.png
:label:Angle
:alt: Angle
:width: 200px
:align: center
:::
Since $\|\cos(\theta)\| \leq 1$, we can bound the magnitude of $\vv v . \vv w$ as
\begin{equation}
\label{cauchy_simple}
|\vv v . \vv w| \leq \|\vv v\| . \|\vv w\|.
\end{equation}
:::{prf:theorem} Cauchy-Schwarz Inequality
:label:cauchy_schwarz
The simplest form of _Cauchy-Schwarz Inequality_ is [](#cauchy_simple), which holds for any inner product. That is, it is always true that
\begin{equation}
\label{cauchy}
|\langle \vv v , \vv w \rangle| \leq \|\vv v\| . \|\vv w\| \ \textrm{for all} \ \vv v, \vv w \in V.
\end{equation}
Here, $\|\vv v\| = \sqrt{\langle \vv v, \vv v\rangle}$ is the norm induced by the inner product, and $|\cdot|$ denotes the absolute value of a real number. 
:::

```{note}
Equality holds in [](#cauchy) if and only if $\vv v$ and $\vv w$ are parallel vectors.
```

:::{prf:definition} Generalized angle
:label: angle_defn
The [inequality](#cauchy) lets us define the following _generalized angle_ between any two vectors $\vv v$ and $\vv w$ in an inner product space:
\begin{equation}
\label{angle_defn}
\cos(\theta) = \frac{\langle \vv v, \vv w \rangle}{\|\vv v\|. \|\vv w\|}
\end{equation}
:::
[](#angle_defn) makes sense because, by [](#cauchy), we know that
\begin{equation}
\label{angle_bounds}
-1 \leq \frac{\langle \vv v, \vv w \rangle}{\|\vv v\|. \|\vv w\|} \leq 1.
\end{equation}
Hence, $\theta$ is well defined, and unique if restricted to be in $[0, \pi]$. 

## Angles between generic vectors

:::{prf:example} 
:label:dot_eg
The vectors $\vv v = \bm 1 \\ 1 \\ 0\em$ and $\vv w = \bm 2 \\ 1 \\ 1\em$ have dot product $\vv v . \vv w = 3$ and norms $\|\vv v\| = \sqrt{2}, \|\vv w\| = \sqrt{6}$. Hence,
$$
\cos(\theta) = \frac{3}{\sqrt{2}\sqrt{6}} = \frac{\sqrt{3}}{2} \Rightarrow \theta = \arccos\left(\frac{\sqrt{3}}{2}\right) = \frac{\pi}{6}  \ \textrm{rad},
$$
which is the usual notion of angle. 

We can also compute the _angle_ between $\vv v$ and $\vv w$ with respect to the weighted inner product $\langle \vv v, \vv w \rangle = 2v_1w_1 + 3v_3w_3$. For this inner product, $\langle \vv v, \vv w \rangle = 4, \| \vv v\| = \sqrt{2}, \|\vv w\| = \sqrt{11}$. Hence,
$$
\cos(\theta) = \frac{4}{\sqrt{2}\sqrt{11}} = 0.8528 \Rightarrow \theta = \arccos\left(0.8528\right) = 0.5495 \ \textrm{rad}.
$$
:::

:::{prf:example} 
:label:poly_eg
We can also define angles between vectors in a generic vector space, for example, polynomials. For ${p(x) = a_0 + a_1 x +a_2x^2, q(x) = b_0 + b_1x + b_2x^2 \in P^{(2)}}$, we define the $\langle p, q\rangle = a_0b_0 + a_1b_1 + a_2b_2$. This agrees with the standard [dot product](./041-inner.ipynb#dot-product-defn) applied to $\vv p = \bm a_0 \\ a_1 \\ a_2\em, \vv q = \bm b_0 \\ b_1 \\ b_2\em$ and hence immediately satisfies [this definition](./041-inner.ipynb#inner_defn).  The angle between $p(x)$ and $q(x)$ is computed as
$$
\cos(\theta) = \frac{\langle p, q \rangle}{\|p\|. \|q\|} = \frac{\langle \vv p, \vv q \rangle}{\|\vv p\|. \|\vv q\|}.
$$
For example, if $p(x)  = 1 + x$ and $q(x) = -1 + x^2$, then $\langle p, q \rangle = -1$ and $\| p \| = \sqrt{2}, \| q \| = \sqrt{2}$, and ${\cos(\theta) = \frac{-1}{2} \Rightarrow \theta = \frac{2\pi}{3}}$.
:::

```{note}
The expression ([angle](#angle_defn)) is called the _cosine similarity_ of two vectors and measures how "aligned" they are. Cosine similarity plays an important role in modern chatbots like ChatGPT, which will be discussed in the case study.
```

## Orthogonal vectors

Continuing our strategy of extending familiar geometric concepts in Eucledian spaces to general inner product spaces, we focus on the notion of _perpendicular vectors_. Such vectors meet at a right angle, that is, $\theta = \frac{\pi}{2}$ or $\theta = -\frac{\pi}{2}$ with $\cos(\theta) = 0$. Note that in such a case $\vv v \cdot \vv w = 0$ (refer to [Cauchy-Schwarz](#cauchy_schwarz))

:::{prf:definition} Orthogonal
:label: orthogonal-defn
Two elements $\vv v, \vv w \in V$ of an inner product space are _orthogonal_ with respect to $\langle \cdot, \cdot\rangle$ if $\langle \vv v, \vv w \rangle = 0$.
:::

:::{prf:example} 
:label:orth_eg
The vectors $\vv v = \bm -3 \\ 2\em$ and $\vv w = \bm -4 \\ -6\em$ are orthogonal with respect to the dot product $\vv v \cdot \vv w = -3 \cdot -4 + 2 \cdot -6 = 0$. We can see below that they meet at a right angle.
```{figure}../figures/04-orth.jpg
:label:Orth
:alt: Orthogonal
:width: 200px
:align: center
```
:::

```{warning}
$\vv v$ and $\vv w$ are _not orthogonal_ with respect to the weighted inner product $\langle \vv v, \vv w \rangle = 2v_1w_1 + 3v_2w_2$.
$$
\langle \vv v, \vv w \rangle = \left\langle \bm -3 \\ 2\em, \bm -4 \\ -6\em\right\rangle = 2(-3 \cdot -4) + 3(2 \cdot -6) = -12 \neq 0. 
$$
```

```{note}
Orthogonality, like angles in general, depend on the inner product being used. 
```

:::{prf:example} 
:label:orth_poly_eg
The polynomials $f(x) = -1 + x$ and $g(x) = 1 + x - x^2$ are orthogonal with respect to the inner product on $P^{(2)}$ defined previously as $\langle p, q\rangle = a_0b_0 + a_1b_1 + a_2b_2$. Here, $a_0 = -1, a_1 = 1, a_2 = 0$ and $b_0 = 1, b_1 = 1, b_2 = -1$. So, $\langle f, g\rangle = -1\cdot 1 + 1 \cdot 1 + 0 \cdot -1 = 0$. 

However, $f$ and $g$ are not orthogonal with respect to the inner product $\langle p, q \rangle = \int_0^1 p(x)q(x) dx$ defined on $C^{0}[0, 1]$:
$$
\langle f, g \rangle = \int_0^1 (-1 + x)(1 + x - x^2) dx = \int_0^1 (-1 + 2x^2 - x^3) dx = -1 + \frac{2}{3} - \frac{1}{4} = -\frac{7}{12} \neq 0.
$$
:::

## The Triangle Inequality

From the law of cosines, we know that the length of one side of a triangle is at most the sum of the length of the other two sides. 

```{figure}../figures/04-triangle.jpg
:label:triangle
:alt: Triangle
:width: 200px
:align: center
```

\begin{equation}
\label{cos_law}

c^2 &= a^2 + b^2 - 2ab\cos(\theta) \\
&\leq a^2 + b^2 + 2ab \ (\textrm{since} \ \cos(\theta) \leq 1) \\
&= (a+b)^2 \\
\Rightarrow c &\leq a+b
\end{equation}

The idea  in [](#cos_law) extends directly to generic vectors that relate $\|\vv v + \vv w\|$ to $\|\vv v\|$ and $\| \vv w\|$.

```{prf:theorem} Triangle Inequality
:label:triangle_ineq
The norm associated with an inner product satisfies the _triangle inequality_:
\begin{equation}
\label{tri_ineq_eq}
\|\vv v + \vv w\| \leq \|\vv v\| + \|\vv w\| \ \textrm{for all} \ \vv v, \vv w \in V.
\end{equation}
Equality holds in [](#tri_ineq_eq) if and only if $\vv v = c \vv w $ for some positive constant $c > 0$. 

:::{prf:proof} Proof of [](#triangle_ineq)
:label: proof-triangle_ineq
:class: dropdown

This is very similar to the law of cosines. We set up a triangle as given below and use that
$$
\|\vv v + \vv w\|^2 &= \langle \vv v + \vv w, \vv v + \vv w\rangle = \|\vv v\|^2 + 2 \langle \vv v, \vv w\rangle + \|\vv w\|^2 \\
&= \|\vv v\|^2 + 2 \|\vv v\| \|\vv w\| \cos(\theta) + \|\vv w\|^2 \\
&\leq \|\vv v\|^2 +2 \|\vv v\| \|\vv w\| + \|\vv w\|^2 \ (\textrm{Cauchy-Schwartz}) \\
&= \left(\|\vv v\| + \| \vv w\|\right)^2
$$
:::{figure}../figures/04-triangle_thm.jpg
:label:triangle_thm
:alt: Triangle Theorem
:width: 200px
:align: center
:::

:::
```


:::{prf:example} 
:label:triangle_eg
\begin{equation}
\vv v= \bm 2 \\ -1 \\ 3\em, \ \vv w = \bm 1 \\ 1 \\ -2\em \Rightarrow \vv v + \vv w = \bm 3 \\ 0 \\ 1\em \\
\|\vv v\| = \sqrt{14}, \ \|\vv w\| = \sqrt{6}, \ \|\vv v + \vv w\| = \sqrt{5} \\
\Rightarrow \sqrt{5} = 2.236 = \|\vv v + \vv w\| \leq \|\vv v\| + \|\vv w\| = \sqrt{14} + \sqrt{6} = 6.191
\end{equation}
:::

## Norms

Although inner products allow us to define a notion of length, which we called a _norm_, there are other sensible ways of measuring the size of a vector depending upon the application which we will discuss now. 

:::{prf:example} 
:label:manh_norm
Consider the vector $\vv v = \bm 1 \\ -2\em$. It's Eucledian norm is $\|\vv v\| = \sqrt{5}$. Its taxi cab distance, which we call  $\|\vv v\|_1$ is the distance driven by a cab driver in Manhattan where he can only drive east/west and north/south. 
$$
\|\vv v\|_1 = \|1\| + \|-2\| = 3.
$$
Note that the Eucledian norm and taxi cab distance are different! 

```{figure}../figures/04-manhat.jpg
:label:manhattan
:alt:Manhattan
:width: 300px
:align: center
```
:::

We define a general norm on a vector space that does not directly rely on inner products but the norm acts as a measure of distance.  

:::{prf:definition} Norm
:label: norm-defn
A norm on a vector space $V$ assigns a non-negative real number $\|\vv v\|$ to each vector $\vv v \in V$, subject to the following axioms, valid for every $\vv v, \vv w \in V$ and $c \in \mathbb{R}$:
1. _Positivity_: $\|\vv v\| \geq 0,$ with $\|\vv v\| = 0$ if and only if $\vv v = \vv 0$.
2. _Homogeneity_: $\|c \vv v\| = |c| \|\vv v\|$.
3. _Triangle inequality_: $\|\vv v + \vv w\| \leq \|\vv v\| + \|\vv w\|$.
:::

### Describing [](#norm-defn)

Axiom 1. says that length should always be non-negative and only the zero vector has zero length.

Axiom 2. says if I stretch/shrink a vector by some factor, then the length should scale accordingly. If $c < 0$, then the vector flips direction but will not affect length.

Axiom 3. tells us that lengths of sums of vectors should behave as if it satisfies the cosine rule. 

```{note} Common norms
1. The _1-norm_ of a vector $\vv v = \bm v_1 \\ v_2 \\ \vdots \\ v_n\em \in \mathbb{R}^n$ is the sum of the absolute values of its entries:
$$
\|\vv v\|_1 = |v_1| + |v_2| + \ldots + |v_n|
$$
which we recognize as the [taxi cab distance](#manh_norm). 

2. The $\infty-$_norm_ or _max-norm_ is given by the maximal entry in absolute value:
$$
\|\vv v\|_{\infty} = \max\{|v_1|, |v_2|, \ldots, |v_n|\}.
$$

The _1-norm_, $\infty-$_norm_ and Eucledian norm (also called the _2-norm_) are examples of the general $p-$_norm_:
$$
\|\vv v\|_p = \left(\sum_{i=1}^n|v_i|^p\right)^{\frac{1}{p}} \ (\textrm{p-norm})
$$
which can be shown to be a valid norm for $1 \leq p < \infty$, with the $\infty-$_norm_ being a limiting case as $p \to \infty$.

The hard part in showing $p-$_norm_ is a norm is verifying the triangle inequality ([axiom 3](#norm-defn)), which is also known as _Minkowski's inequality_.
```