# Triple Median Absolute Deviations (MADs)

The DE Africa annual GeoMAD product can be broken down into two main components: the geomedian, and the Median Absolute Deviations (MADs).

The geomedian is a statistically-representative summary composed from a year's worth of satellite data; it produces one multispectral observation for each pixel on continental Africa. MADs are change statistics based on the geomedian. They show how much variation each pixel underwent in the given timeframe. There are multiple ways of quantifying how change has occurred, so this product computes three different MADs for use in data analysis.

Let's break down the acronym "MAD"; as in the title, it stands for *median absolute deviation*:

* This implies we have a collection of measurements
* We then find the deviation of each measurement from a baseline value
* We obtain one deviation for every measurement
* These deviations are all absolute values, so each deviation is equal to or greater than 0
* We then find the median, or middle value, of these deviations
* This gives us a median absolute deviation


In this case, our "collection of measurements" is satellite data from each flyover. Even for a single pixel, we have multiple measurements in the time axis: one from every pass. 

The "baseline value" is the annual geomedian, which provides one multispectral result for each pixel. 

The "deviations" here are three different *distance* or *dissimilarity* values. We are calculating, in three separate ways, the deviation between the annual geomedian and a single flyover's measurement. These three values have been chosen to reflect a range of changes that appear in Earth observation data, and hence this section of the dataset is often referred to as "triple MADs". 

The three MADs used in DE Africa are:

* Euclidean MAD (based on Euclidean distance)
* Spectral MAD (based on cosine distance)
* Bray-Curtis MAD (based on Bray-Curtis dissimilarity)

Each will be explained in their own sections below. Example calculations with real numbers are at the very end.

> Note there are many other types of statistical distances and dissimilarities that can be used for median absolute deviation analysis (for example: Manhattan distance, Canberra distance, [there are many](https://ricottalab.files.wordpress.com/2015/05/ricotta-podani-2017-ecocom-full.pdf) -- they could all be used to calculate a MAD). However, in DE Africa, "triple MADs" or "MADs" is always specifically referring to the three MADs included in the GeoMAD dataset - EMAD, SMAD, and BCMAD.

## Euclidean MAD (EMAD)

The most logical place to start thinking about any of the MADs is the Euclidean MAD (EMAD). This is because EMAD comes from Euclidean distance, and Euclidean distance can be explained with a physical analogy: it is how we measure straight-line distances between points. In our three-dimensional world, it may look like this:

<img src="../Supplementary_data/Triple_MADs/cartesian_euclidean.JPG" alt="Euclidean" width="400" align="left"/>

In the case of satellite data, we are measuring the Euclidean distance between a pixel's geomedian value and a single multispectral measurement. The number of dimensions is equal to the number of bands in the data. In the illustration below, $m$ is the geomedian value and $\mathbf{x}$ the measured value. In real data, there will be multiple measurements over a time period, so $t$ is the timestep number, otherwise noted in equations as superscript $(t)$.

<img src="../Supplementary_data/Triple_MADs/bands_euclidean.JPG" alt="Euclidean" width="1000" align="left"/>

Each timestep gives a separate Euclidean distance result. Then EMAD is the median of all those distances.

In most real life examples, there will be more than three timesteps and more than three bands. A general expression of Euclidean distance for $p$ bands is given as:

\begin{align}
\text{Multispectral Euclidean distance for timestep }t: \left| \left| \mathbf{x}^{(t)} - m \right| \right|_{\mathbb{R}^p}
\end{align}

\begin{align}
\text{Multispectral Euclidean distance (expanded) }:  \sqrt{ \left( x^{(t)}_{\text{band 1}} - m_{\text{band 1}} \right)^2 + \left( x^{(t)}_{\text{band 2}} - m_{\text{band 2}} \right)^2  + \dots  + \left( x^{(t)}_{\text{band p}} - m_{\text{band p}} \right)^2 }
\end{align}

Then EMAD for $N$ timesteps is given by [Roberts, 2018](https://ieeexplore.ieee.org/abstract/document/8518312), as the median of the Euclidean distances from all the timesteps.

\begin{align}
\text{EMAD} = \text{median} \left( \left\{ \left| \left| \mathbf{x}^{(t)} - m \right| \right|_{\mathbb{R}^p}, t = 1, \dots , N \right\}  \right)
\end{align}

Valid values for EMAD fall within the range of the original spectral bands. In the case of Sentinel-2, this is `0 - 10000`.

EMAD is useful for showing albedo shifts in satellite spectra.

## Spectral MAD (SMAD)

The spectral MAD (SMAD) is based on the median absolute deviations in the cosine distance between the geomedian and individual measurements. 

In two dimensions, cosine distance can be graphically compared to Euclidean distance by the following figure:

<img src="../Supplementary_data/Triple_MADs/cosine_distance.JPG" alt="Cosine distance" width="400" align="left"/>

In a general sense, cosine distance is related to the angle between the two points $\theta$, while Euclidean distance is related to the straight-line distance between the two points $d$. Like Euclidean distance, points are more similar when the cosine distance between them is small. The value of the cosine distance is smaller when $\theta$ is small (i.e. close to 0) or when $\theta$ is close to 180$^{\circ}$. 

Notice we could have a small cosine distance but a large Euclidean distance; for example, if the angle between the vectors is small, but one is much longer than the other. This is an important property of cosine distance (and thus SMAD) - unlike Euclidean distance, cosine distance is not skewed by the magnitude of the measurements.

Cosine distance is defined more formally as:

\begin{align}
\text{Cosine distance (two dimensions)}: 1 - \frac{x_1 y_1 + x_2 y_2}{ \left( \sqrt{ \left( x_1\right) ^2 + \left( x_2\right) ^2 } \right) \left( \sqrt{ \left( y_1\right) ^2 + \left( y_2\right) ^2 } \right)}
\end{align}

For more than two dimensions, we can generalise the cosine distance formula for a single pixel. For a multispectral measurement of $p$ bands at timestep $t$, $\mathbf{x}^{(t)}$, and the geomedian at the same point $m$, the cosine distance is: 

\begin{align}
\text{cosdist}\left( \mathbf{x}^{(t)}, m \right)  = 1 - \frac{ \mathbf{x}^{(t)} \cdot m }{ \left| \left| \mathbf{x}^{(t)} \right| \right| \ \left| \left| m \right| \right|} \ \text{ for }  \mathbf{x}^{(t)}, m \in \mathbb{R}_{p}
\end{align}

\begin{align}
\text{Multispectral cosine distance (expanded)}: 1 - \left( \frac{\left( x_{\text{band 1}}^{(t)} \right) \left(m_{\text{band 1}} \right) + \left( x_{\text{band 2}}^{(t)} \right) \left(m_{\text{band 2}} \right) + \cdots + \left( x_{\text{band p}}^{(t)} \right) \left(m_{\text{band p}} \right)}{ \left(\sqrt{\left( x_{\text{band 1}}^{(t)} \right)^2 + \left( x_{\text{band 2}}^{(t)} \right)^2 + \cdots+ \left( x_{\text{band p}}^{(t)} \right)^2} \right) \left( \sqrt{\left( m_{\text{band 1}} \right)^2 + \left( m_{\text{band 2}} \right)^2 + \cdots+ \left( m_{\text{band p}} \right)^2 } \right)} \right)
\end{align}

Then for $N$ timesteps, SMAD is the median of the cosine distances.

\begin{align}
\text{SMAD} = \text{median} \left( \left\{ \text{cosdist}\left( \mathbf{x}^{(t)}, m \right), t = 1, \dots , N \right\}  \right)
\end{align}

SMAD takes on values of `0 - 1`.

In applications of Earth observation data, SMAD is useful for showing areas of land cover change. One reason is that SMAD is less affected by cloud; unlike EMAD, it is invariant to albedo changes, such as that caused by the diffusion of solar radiation. SMAD can also be used to track water bodies, as water has high variation in reflectance.

## Bray-Curtis MAD (BCMAD)

The Bray-Curtis MAD (BCMAD) is calculated from the Bray-Curtis dissimilarity. The Bray-Curtis dissimilarity emphasises differences in each band between the measurement and the geomedian. 

For a single band of satellite data, the Bray-Curtis dissimilarity looks remarkably like a normalised band index. For example, if we only had red band data, it might look something like this:

\begin{align}
\text{Single-band Bray-Curtis dissimilarity at timestep }t: \frac{\left| x_{\text{red}}^{(t)} - m_{\text{red}}\right|}{ \left| x_{\text{red}}^{(t)} + m_{\text{red}} \right| } 
\end{align}

It can be generalised to a multispectral dataset with $p$ bands:

\begin{align}
\text{Multispectral Bray-Curtis dissimilarity for timestep }t: \frac{\left| x_{\text{band 1}}^{(t)} - m_{\text{band 1}}\right| + \left| x_{\text{band 2}}^{(t)} - m_{\text{band 2}} \right| + \dots + \left| x_{\text{band p}}^{(t)} - m_{\text{band p}} \right| }{ \left| x_{\text{band 1}}^{(t)} + m_{\text{band 1}} \right| + \left| x_{\text{band 2}}^{(t)} + m_{\text{band 2}} \right| + \dots + \left| x_{\text{band p}}^{(t)} + m_{\text{band p}} \right|} 
\end{align}

The Bray-Curtis dissimilarity will be maximised at a value of `1` when the measurements in each band are completely different. Conversely, the value of the dissimilarity will be small where each band is similar to the geomedian of that band.

As with the other MADs, the BCMAD is found by taking the median of all the Bray-Curtis dissimilarities from $N$ timesteps.

\begin{align}
\text{BCMAD} = \text{median} \left( \left\{ \frac{\left| \mathbf{x}^{(t)} - m  \right|_{\mathbb{R}^p}}{\left| \mathbf{x}^{(t)} + m  \right| _{\mathbb{R}^p}}, t = 1, \dots , N \right\}  \right)
\end{align}

BCMAD takes on values from `0 - 1`. 

> The Bray-Curtis dissimilarity is not referred to as a "distance" because it does not obey the triangle (Schwarz) inequality.

## Appendix

### Example: calculating Euclidean distance

Let's take a selection of bands from one pixel, from one timestep. For that pixel, we have both the measurements taken by the single satellite flyover, and the geomedian value. 

|Band|Surface reflectance of one pixel from one flyover| Surface reflectance from annual geomedian of the same year |
|----------|-------------|----------------|
| Blue | 1028 | 969 |
| Green| 1468 | 1406|
| Red| 2176 | 2032|
| Near Infrared (NIR) 1 | 3090 | 3078 ||

Then the Euclidean distance for this pixel at this timestep is:

\begin{align}
\text{Euclidean distance} &= \sqrt{\left( x^{(t)}_{\text{band 1}} - m_{\text{band 1}} \right)^2 + \left( x^{(t)}_{\text{band 2}} - m_{\text{band 2}} \right)^2  + \dots  + \left( x^{(t)}_{\text{band p}} - m_{\text{band p}} \right)^2 }\\
&= \sqrt{\left( x^{(t)}_{\text{red}} - m_{\text{red}} \right)^2 + \left( x^{(t)}_{\text{green}} - m_{\text{green}} \right)^2  + \left( x^{(t)}_{\text{blue}} - m_{\text{blue}} \right)^2 + \left( x^{(t)}_{\text{nir1}} - m_{\text{nir1}} \right)^2}\\
&= \sqrt{\left( 2176 - 2032 \right)^2 + \left( 1468 - 1406 \right)^2  + \left(1028 - 969 \right)^2 + \left( 3090 - 3078 \right)^2}\\
&= 167.9
\end{align}

To then calculate EMAD, the calculation for Euclidean distance would need to be repeated for all the other timesteps.