# Concentration Inequalities

This notebook is dedicated to working through problems on concentration inequalities which more or less revolve around sums of independent random variables. 

I believe the layout of this notebook will be as follows: 1) I will first provide the inequalities, definitions, theorems and main results in Markdown - just the basics and sentences explaining them in english coupled with worked out solutions to the book problems of "Concentration Inequalities" by Boucheron et al.

# Sums of Indendent Random Variables & the Martingale Method

## Review of Independence

Here I would like to touch on definitions and properties of independent random variables. As with Kolmogorov, I first discuss the concept of independent sets, $\sigma$-algebras, and finally random variables.

**Definition**: Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space with $\{\mathcal{F}_i\}_{i\in\mathcal{I}}\;$  denoting sub $\sigma$-algebras of $\mathcal{F}$. We say that the $\sigma$-algebras $\mathcal{F}_i$, $i\in\mathcal{I}$ are mutually $\mathbb{P}$-independent if for every finite subset $\{i_1,\dots,i_n\}$ of $\underline{distinct}$ elements of $\mathcal{I}$ and every choice of sets $A_{i_k}\in\mathcal{F}_{i_k}$ for $k\in[n]$, we have

$$\mathbb{P}\left(\bigcap\limits_{k=1}^n A_{i_k}\right)=\prod\limits_{k=1}^n \mathbb{P}\left(A_{i_k}\right).$$

In particular, if $\left\{A_i: i\in\mathcal{I}\right\}$ is a family of sets in $\mathcal{F}$, then we say $A_i$, $i\in\mathcal{I}$ are $\mathbb{P}$-independent if the associated $\sigma$-algebras: $\mathcal{F}_i=\left\{\emptyset, A_i,A_i^c,\Omega\right\}$, $i\in\mathcal{I}$, are.

Now as random variables are measurable functions on $(\Omega,\mathcal{F})$ taking values in a measure space $(E_i,\mathcal{B}_i)$, we can express the definition of independent random variables by proving independence of their pull-back $\sigma$-algebras.

**Definition**: We say that the random variables $X_i$, $i\in\mathcal{I}$ are mutually $\mathbb{P}$-independent if their pull-back $\sigma$-algebras 

$$\sigma(X_i)=\left\{X_i^{-1}(B_i):B_i\in\mathcal{B}_i\right\},\;\;\;i\in\mathcal{I}$$

are $\mathbb{P}$-independent. There are many equivalent formulations that will be discussed later.

**Some Properties of Independent Random Variables**: 


- For measurable functions $f_1,\dots,f_n$, we have $\mathbb{E}\left[\prod\limits_{i=1}^n f_i(X_i)\right]=\prod\limits_{i=1}^n \mathbb{E}[f_i(X_i)]$ 


- $\mathbb{V}\left(\sum\limits_{i=1}^n X_i\right) = \sum\limits_{i=1}^n \mathbb{V}(X_i)$


- From the first point, we see that the cumulant generating function of the sum of $X_i$ is the sum of the cumulant generating functions of $X_i$: 

$$\psi_{\sum_{i=1}^n X_i}(\lambda) = \log\left(\mathbb{E}\left[\exp\{\lambda \sum\limits_{i=1}^n X_i\}\right]\right)= \sum\limits_{i=1}^n \log\left(\mathbb{E}[e^{\lambda X_i}]\right) = \sum\limits_{i=1}^n \psi_{X_i}(\lambda)$$


- $\mathbb{P}(X_1,\dots,X_n)=\prod\limits_{i=1}^n \mathbb{P}(X_i)$


- From the first point, we also have by considering the identity maps: $\mathbb{E}\left[\prod\limits_{i=1}^n X_i\right] = \prod\limits_{i=1}^n \mathbb{E}[X_i]$

# Running List of Important Inequalities and Proofs


**Jensen's Inequality**: For convex $f(\cdot)$ and a real-valued random variable, we have

$$f\left(\mathbb{E}[X]\right)\leq \mathbb{E}[f(X)].$$


**Proof**: asdf

# Example Inequalities

**Q1**: Prove that for a random variable taking values in an interval $[a,b]$,

$$\mathbb{V}(Y)\leq \frac{(b-a)^2}{4}.$$

**Q1 Work**: As $\mathbb{E}[Y]$ minimizes the objective $\mathbb{E}\left[(X-H)^2\right]$, we have

$$\mathbb{V}[Y]\leq \mathbb{E}\left[\left(Y-\frac{a+b}{2}\right)^2\right]=\frac{1}{4}\mathbb{E}\left[\left((Y-a)+(Y-b)\right)^2\right]$$

As $(Y-b)\leq 0$, we have that $(Y-a)+(Y-b)\leq (Y-a)-(Y-b)=b-a$. Thus

$$\mathbb{V}[Y]\leq \frac{1}{4}\mathbb{E}\left[(b-a)^2\right]=\frac{(b-a)^2}{4}.$$

**Side note**: This upper bound is sharp in the case of Rademacher random variables.


**Q2**: Prove 

$$\arg\min_y \mathbb{E}\left[|X-y|\right]=\mathbb{M}[X]$$

**Q2 Work**: We differentiate with respect to $y$ to obtain that 

$$\frac{d}{dy}\mathbb{E}\left[|X-y|\right]=\mathbb{E}\left[\text{sign}(X-y)\right]=\int\limits_\Omega \text{sign}(x-y)f(x)dx$$

A rough justification from here is that this is zero provided that the sign of $x-y$ is negative for half of them and positive for the other half of the domain. This point is exactly the median.





# Chapter 2 - Basic Inequalities Problems

**2.1** Let $\mathbb{M}[Z]$ be a median of the square-integrable random variable $Z$. Show that 

$$|\mathbb{M}[Z] - \mathbb{E}[Z]|\leq \sqrt{\mathbb{V}[Z]}.$$

- Notice that by linearity of $\mathbb{E}[\cdot]$, we have that 

$$|\mathbb{M}[Z] - \mathbb{E}[Z]| = \mathbb{E}\left[|\mathbb{M}[Z] - Z|\right].$$

Then from the inequality above, we know the median minimizes the expected L1 loss, so with this and after squaring, we obtain that 

$$\left|\mathbb{M}[Z] - \mathbb{E}[Z]\right|^2 = \mathbb{E}\left[|\mathbb{M}[Z] - Z|\right]^2 \leq \mathbb{E}\left[|\mathbb{E}[Z] - Z|\right]^2$$

from which we apply Jensen's inequality to see 

$$\mathbb{E}\left[|Z - \mathbb{E}[Z]|\right]^2\leq \mathbb{E}\left[|Z-\mathbb{E}[Z]|^2\right]=\mathbb{V}[Z]$$

**2.2** Let $X$ be a random variable with median $\mathbb{M}[X]$ such that there exists positive constants $a$ and $b$, so that for all $t>0$,

$$\mathbb{P}\left\{ |X - \mathbb{M}[X]| > t \right\}\leq ae^{-t^2/b}.$$

Show that $|\mathbb{M}[X] - \mathbb{E}[X]|\leq \min\left(\sqrt{ab}, a\sqrt{b\pi}/2\right)$.

- asdf

**2.3** Prove the following one-sided improvement of Chebyshev's inequality: for any real-valued random variable $Y$ and $t>0$,

$$\mathbb{P}\left\{ Y - \mathbb{E}[Y] \geq t \right\} \leq \frac{\mathbb{V}(Y)}{\mathbb{V}(Y) + t^2}.$$

- asdf

**2.4** Show that if $Y$ is a nonnegative random variable, then for any $a\in(0,1)$,

$$\mathbb{P}\left\{ Y \geq a\mathbb{E}[Y] \right\} \geq (1-a)^2\frac{(\mathbb{E}[Y])^2}{\mathbb{E}[Y^2]}.$$

- asdf

**2.5** Show that moment bounds for tail probabilities are always better than Cram\'er-Chernoff bounds. More preciselly, let $Y$ be a nonnegative random variable and let $t>0$. The best moment bound for the tail probability $\mathbb{P}\{Y\geq t\}$ is $\min_q \mathbb{E}[Y^q]t^{-q}$ where the minimum is taken over all positive integers. The best Cram\'er-Chernoff bound is $\inf_{\lambda>0} \mathbb{E}[e^{\lambda (Y - t)}]$. Prove that 

$$\min\limits_q \mathbb{E}[Y^q]t^{-q} \leq \inf\limits_{\lambda > 0} \mathbb{E}[e^{\lambda (Y - t)}].$$

- asdf


# Sources

- "Concentration Inequalities" by Boucheron, Lugosi, Massar

- https://math.mit.edu/~dws/175/prob01.pdf
- https://terrytao.wordpress.com/2015/10/12/275a-notes-2-product-measures-and-independence/