---
title: 3.2 Angles and the Cauchy–Schwarz Inequality
subject: Inner Products and Norms
subtitle: Computing angles between general vectors
short_title: 3.2 Angles
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: Angle, Cauchy–Schwarz Inequality
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/02_Ch_3_Inner_Products_and_Norms/042-angle.ipynb)

{doc}`Lecture notes <../lecture_notes/Lecture 05 - Inner products, length, angles, and norms.pdf>`

## Reading

Material related to this page, as well as additional exercises, can be found in ALA Ch. 3.2.

## Learning Objectives

By the end of this page, you should know:
- the Cauchy-Schwarz Inequality
- the generalized angle between vectors
- orthogonality between vectors
- the triangle inequality
- definition and examples of norms 

## Generalized Angle

Our starting point in defining the notion of angle in a general inner product space is the familiar formula
\begin{equation}
\label{angle_fam}
\vv v \cdot \vv w = \|\vv v\| \|\vv w\| \cos(\theta),
\end{equation}
where $\theta$ measures the angle between $\vv v$ and $\vv w$.
:::{figure}../figures/04-angle.png
:label:Angle
:alt: Angle
:width: 200px
:align: center
:::
Since $\|\cos(\theta)\| \leq 1$, we can bound the magnitude of $\vv v \cdot \vv w$ as
\begin{equation}
\label{cauchy_simple}
|\vv v \cdot \vv w| \leq \|\vv v\|\|\vv w\|.
\end{equation}
:::{prf:theorem} Cauchy-Schwarz Inequality
:label:cauchy_schwarz
The simplest form of _Cauchy-Schwarz Inequality_ is [](#cauchy_simple), which holds for any inner product. That is, it is always true that
\begin{equation}
\label{cauchy}
|\langle \vv v , \vv w \rangle| \leq \|\vv v\| \|\vv w\| \ \textrm{for all} \ \vv v, \vv w \in V.
\end{equation}
Here, $\|\vv v\| = \sqrt{\langle \vv v, \vv v\rangle}$ is the norm induced by the inner product, and $|\cdot|$ denotes the absolute value of a real number. 
:::

```{note}
Equality holds in [](#cauchy) if and only if $\vv v$ and $\vv w$ are parallel vectors.
```

:::{prf:definition} Generalized Angle
:label: angle_defn
[This inequality](#cauchy) lets us define the following _generalized angle_ between any two vectors $\vv v$ and $\vv w$ in an inner product space:
\begin{equation}
\label{angle_defn}
\cos(\theta) = \frac{\langle \vv v, \vv w \rangle}{\|\vv v\|\|\vv w\|} \ \textrm{angle}
\end{equation}
:::
[](#angle_defn) makes sense because, by [](#cauchy), we know that
\begin{equation}
\label{angle_bounds}
-1 \leq \frac{\langle \vv v, \vv w \rangle}{\|\vv v\|\|\vv w\|} \leq 1.
\end{equation}
Hence, $\theta$ is well defined, and unique if restricted to be in $[0, \pi]$. 

## Angles between generic vectors

:::{prf:example} 
:label:dot_eg
The vectors $\vv v = \bm 1 \\ 0 \\ 1\em$ and $\vv w = \bm 0 \\ 1 \\ 1\em$ have dot product $\vv v . \vv w = 1$ and norms $\|\vv v\| = \|\vv w\|  = \sqrt{2}$. Hence,
$$
\cos(\theta) = \frac{1}{\sqrt{2}\sqrt{2}} = \frac{1}{2} \Rightarrow \theta = \arccos\left(\frac{1}{2}\right) = \frac{\pi}{3}  \ \textrm{rad},
$$
which is the usual notion of angle. 

We can also compute the _angle_ between $\vv v$ and $\vv w$ with respect to the weighted inner product $\langle \vv v, \vv w \rangle = v_1w_1 + 2v_2w_2 + 3v_3w_3$. For this inner product, $\langle \vv v, \vv w \rangle = 3, \| \vv v\| = 2, \|\vv w\| = \sqrt{5}$. Hence,
$$
\cos(\theta) = \frac{3}{2\sqrt{5}} = 0.67082 \Rightarrow \theta = \arccos\left(0.67082\right) = 0.83548 \ \textrm{rad}.
$$
:::

:::{prf:example} 
:label:poly_eg
We can also define angles between vectors in a generic vector space, for example, polynomials. For ${p(x) = a_0 + a_1 x +a_2x^2, q(x) = b_0 + b_1x + b_2x^2 \in P^{(2)}}$, we define the $\langle p, q\rangle = a_0b_0 + a_1b_1 + a_2b_2$. This agrees with the standard [dot product](./041-inner.ipynb#dot-product-defn) applied to $\vv p = \bm a_0 \\ a_1 \\ a_2\em, \vv q = \bm b_0 \\ b_1 \\ b_2\em$ and hence immediately satisfies [this definition](./041-inner.ipynb#inner_defn).  The angle between $p(x)$ and $q(x)$ is computed as
$$
\cos(\theta) = \frac{\langle p, q \rangle}{\|p\| \|q\|} = \frac{\langle \vv p, \vv q \rangle}{\|\vv p\| \|\vv q\|}.
$$
For example, if $p(x)  = 1 + x^2$ and $q(x) = x + x^2$, then $\langle p, q \rangle = 1$ and $\| p \| = \| q \| = \sqrt{2}$, and ${\cos(\theta) = \frac{1}{2} \Rightarrow \theta = \frac{\pi}{3}}$.
:::

:::{note}
The expression ([angle](#angle_defn)) is called the _cosine similarity_ of two vectors and measures how "aligned" they are. Cosine similarity plays an important role in modern chatbots like ChatGPT, which will be discussed in the case study.
:::

#### Python break!

We show how to use NumPy functions (`np.dot, np.linalg.norm, np.arcos`) to compute the angle between vectors. We also show how to compute cosine similarity from the [cosine distance](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cosine.html) between vectors using `scipy.spatial` library.

In [1]:
# angle between vectors
import numpy as np
import scipy

v = np.array([1, 0, 1])
w = np.array([0, 1, 1])

cos_theta = np.dot(v, w)/(np.linalg.norm(v)*np.linalg.norm(w))
theta = np.arccos(cos_theta)
print("Angle between v and w is: ", theta, " rad")

# cosine distance -> cosine similarity
from scipy.spatial import distance

cosine_dist = distance.cosine(v, w)
cosine_sim = 1 - cosine_dist
print("Cosine similarity between v and w is: ", cosine_sim)

Angle between v and w is:  1.0471975511965979  rad
Cosine similarity between v and w is:  0.5


## Orthogonal vectors

The notion of _perpendicular vectors_ is an important one in Euclidean geometry. These are vectors that meet at a right angle, i.e., $\theta = \frac{\pi}{2}$ or $\theta = -\frac{\pi}{2}$, with $\cos \theta = 0$. This tells us that vectors $\vv v$ and $\vv w$ are perpendicular if and only if their dot product vanishes: $\vv v \cdot \vv w = 0$ (can you see why via [Cauchy-Schwarz](#cauchy_schwarz)?).

We continue with our strategy of extending familiar geometric concepts in Euclidean space to general inner product spaces. For historic reasons, we use the term _orthogonal_ instead of perpendicular.

:::{prf:definition} Orthogonal
:label: orthogonal-defn
Two elements $\vv v, \vv w \in V$ of an inner product space are _orthogonal_ with respect to $\langle \cdot, \cdot\rangle$ if $\langle \vv v, \vv w \rangle = 0$.
:::

Orthogonality is an **incredibly** useful and practical idea that appears all over the place in engineering, AI, and economics, which we will explore in detail next lecture.


:::{prf:example} 
:label:orth_eg
The vectors $\vv v = \bm 1 \\ 2\em$ and $\vv w = \bm 6 \\ -3\em$ are orthogonal with respect to the dot product $\vv v \cdot \vv w = 1 \cdot 6 + 2 \cdot -3 = 0$. Indeed, if we draw them, we see they meet at a right angle.
```{figure}../figures/04-orth.jpg
:label:Orth
:alt: Orthogonal
:width: 200px
:align: center
```
:::

```{warning}
$\vv v$ and $\vv w$ are _not orthogonal_ with respect to the weighted inner product $\langle \vv v, \vv w \rangle = v_1w_1 + 2v_2w_2$.
$$
\langle \vv v, \vv w \rangle = \left\langle \bm 1 \\ 2\em, \bm 6 \\ -3\em\right\rangle = 1(1 \cdot 6) + 2(2 \cdot -3) = 6 - 12 = -6 \neq 0. 
$$
```

```{note}
Orthogonality, like angles in general, depend on the inner product being used. 
```

:::{prf:example} 
:label:orth_poly_eg
The polynomials $f(x) = x$ and $g(x) = 1 + x^2$ are orthogonal with respect to the inner product on $P^{(2)}$ defined previously as $\langle p, q\rangle = a_0b_0 + a_1b_1 + a_2b_2$. Here, $a_0 = 0, a_1 = 1, a_2 = 0$ and $b_0 = 1, b_1 = 0, b_2 = 1$. So, $\langle f, g\rangle = 0\cdot 1 + 1 \cdot 0 + 0 \cdot 1 = 0$. 

However, $f$ and $g$ are not orthogonal with respect to the inner product $\langle p, q \rangle = \int_0^1 p(x)q(x) dx$ defined on $C^{0}[0, 1]$:
$$
\langle f, g \rangle = \int_0^1 x(1 + x^2) dx = \int_0^1 (x + x^3) dx = \frac{x^2}{2} + \frac{x^4}{4} \bigg|_{0}^1 = \frac{1}{2} + \frac{1}{4} = \frac{3}{4} \neq 0.
$$
:::

## The Triangle Inequality

We know, e.g., from the law of cosines, that the length of one side of a triangle is at most the sum of the length of the other two sides. 

```{figure}../figures/04-triangle.jpg
:label:triangle
:alt: Triangle
:width: 200px
:align: center
```

\begin{equation}
\label{cos_law}

c^2 &= a^2 + b^2 - 2ab\cos(\theta) \\
&\leq a^2 + b^2 + 2ab \ (\textrm{since} \ \cos(\theta) \leq 1) \\
&= (a+b)^2 \\
\Rightarrow c &\leq a+b
\end{equation}

The idea  in [](#cos_law) extends directly to the setting where we want to relate the length $\|\vv v + \vv w\|$ of the sum of vectors $\vv v$, $\vv w$ to the lengths $\|\vv v\|$ and $\| \vv w\|$.

::::{prf:theorem} Triangle Inequality
:label:triangle_ineq
The norm associated with an inner product satisfies the _triangle inequality_:
\begin{equation}
\label{tri_ineq_eq}
\|\vv v + \vv w\| \leq \|\vv v\| + \|\vv w\| \ \textrm{for all} \ \vv v, \vv w \in V.
\end{equation}
Equality holds in [](#tri_ineq_eq) if and only if $\vv v = c \vv w $ for some positive constant $c > 0$. 

:::{prf:proof} Proof of [](#triangle_ineq)
:label: proof-triangle_ineq
:class: dropdown

This is almost exactly the same as the law of cosines. We set up a triangle as follows
```{figure}../figures/04-triangle_thm.jpg
:label:triangle_thm
:alt: Triangle Theorem
:width: 200px
:align: center
```
and use that
$$
\|\vv v + \vv w\|^2 &= \langle \vv v + \vv w, \vv v + \vv w\rangle \\
&= \|\vv v\|^2 + 2 \langle \vv v, \vv w\rangle + \|\vv w\|^2 \\
&= \|\vv v\|^2 + 2 \|\vv v\| \|\vv w\| \cos(\theta) + \|\vv w\|^2 \\
&\leq \|\vv v\|^2 +2 \|\vv v\| \|\vv w\| + \|\vv w\|^2 \ (\textrm{Cauchy-Schwartz}) \\
&= \left(\|\vv v\| + \| \vv w\|\right)^2
$$

:::
::::

:::{prf:example} 
:label:triangle_eg
\begin{equation}
\vv v= \bm 1 \\ 2 \\ -1\em, \ \vv w = \bm 2 \\ 0 \\ 3\em \Rightarrow \vv v + \vv w = \bm 3 \\ 2 \\ 2\em \\
\|\vv v\| = \sqrt{6}, \ \|\vv w\| = \sqrt{13}, \ \|\vv v + \vv w\| = \sqrt{17}
\end{equation}
Triangle inequality tells us that
$$
4.123 \approx \sqrt{17} =   \|\vv v + \vv w\| \leq \|\vv v\| + \|\vv w\| = \sqrt{6} + \sqrt{13} \approx 6.055
$$
which is true
:::

## Norms

We have seen that inner products allow us to define a natural notion of length. However, there are other sensible ways of measuring the size of a vector that do not arise from an inner product. For example suppose we choose to measure the size of a vector by its ``taxicab distance'' where we pretend we are a cab driver in Manhattan, and we can only drive go north-south and east-west. We then end up with a different measure of length that makes lots of sense!

:::{prf:example} 
:label:manh_norm
Consider the vector $\vv v = \bm 1 \\ -1\em$. It's Eucledian norm is $\|\vv v\| = \sqrt{1^2 + (-1)^2} = \sqrt{2}$. Its taxi cab distance, which we will label $\|\vv v\|_1$ (for reasons that become clear soon), is  
$$
\|\vv v\|_1 = \|1\| \ \textrm{(drive 1 unit east)} \  + \|-1\| \ \textrm{(drive 1 unit south)} = 2.
$$
These are different! 

```{figure}../figures/04-manhat.jpg
:label:manhattan
:alt:Manhattan
:width: 300px
:align: center
```
:::

To define a general norm on a vector space, we will extract properties that ``make sense'' as a measure of distance but that do not directly rely on an inner product structure (like angles). 

:::{prf:definition} Norm
:label: norm-defn
A norm on a vector space $V$ assigns a non-negative real number $\|\vv v\|$ to each vector $\vv v \in V$, subject to the following axioms, valid for every $\vv v, \vv w \in V$ and $c \in \mathbb{R}$:
1. _Positivity_: $\|\vv v\| \geq 0,$ with $\|\vv v\| = 0$ if and only if $\vv v = \vv 0$.
2. _Homogeneity_: $\|c \vv v\| = |c| \|\vv v\|$.
3. _Triangle inequality_: $\|\vv v + \vv w\| \leq \|\vv v\| + \|\vv w\|$.
:::

### Describing [](#norm-defn)

Axiom (i) says ``length'' should always be non-negative, and only the zero vector has zero length (seems reasonable!)

Axiom (ii) says if I stretch/shrink a vector $\vv v$ by a factor $c \in \mathbb{R}$, then the length should **scale** accordingly (this is why we call $c \in \mathbb{R}$ a _scalar_!). Note that $c<0$ means we stretch/shrink **and** flip $\mathbf{v}$, but flipping shouldn't affect length, so $\|c\vv v\| = \|-c\vv v\| = |c|\|\mathbf{v}\|$.

Axiom (iii) tells us that lengths of sums of vectors should ``behave as if there is a cosine rule'' even if there is no notion of angle. This is a less intuitive property but has been identified as a key property to make norms useful to work with.

We will introduce two other commonly used norms in practice, but you should know that there are many many more.

:::{note} Common norms
1. The _1-norm_ of a vector $\vv v = \bm v_1 \\ v_2 \\ \vdots \\ v_n\em \in \mathbb{R}^n$ is the sum of the absolute values of its entries:
$$
\|\vv v\|_1 = |v_1| + |v_2| + \ldots + |v_n|
$$
which we recognize as our [taxi cab distance](#manh_norm). 

2. The $\infty-$_norm_ or _max-norm_ is given by the maximal entry in absolute value:
$$
\|\vv v\|_{\infty} = \max\{|v_1|, |v_2|, \ldots, |v_n|\}.
$$

Checking the axioms of [](#norm-defn) is a good exercise for you. The basic inequality $|a + b| \leq |a| + |b|$ for $a, b \in \mathbb{R}$ is all you need.

The _1-norm_, $\infty-$_norm_ and Eucledian norm (also called the _2-norm_) are examples of the general $p-$_norm_:
$$
\|\vv v\|_p = \left(\sum_{i=1}^n|v_i|^p\right)^{\frac{1}{p}} \ (\textrm{p-norm})
$$
which can be shown to be a valid norm for $1 \leq p < \infty$ (the $\infty-$norm is a limiting case of $p-$norm as $p \to \infty$).

The hard part in showing $p-$norm is a norm is verifying the triangle inequality ([axiom 3](#norm-defn)), which is also known as [Minkowski's inequality](https://en.wikipedia.org/wiki/Minkowski_inequality).
:::

In [7]:
# Different norms

v = np.array([1, -2])
v1 = np.linalg.norm(v, ord=1)
v2 = np.linalg.norm(v)
vinf = np.linalg.norm(v, ord=np.inf)
print("\n1-norm: ", v1, "\n2-norm: ", v2, "\ninfinity norm: ", vinf)


1-norm:  3.0 
2-norm:  2.23606797749979 
infinity norm:  2.0


[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/02_Ch_3_Inner_Products_and_Norms/042-angle.ipynb)