### Quantile Regression

Given data $(x_1, y_1),...,(x_m, y_m)$ where the $x_i$'s are vectors of regressors and the $y_i$'s are the dependent variable we are interested in predicting. Quantile regression is a method for estimating the parameters $\theta$ such that $F^{-1}(x^T\theta) = \alpha$, where $F^{-1}$ is the cdf of the distribution $Y|X = x$, and $\alpha \in (0, 1)$. In other words, we want $\theta$ such that 

\begin{equation}
P(Y \leq x^T\theta | X = x) = \alpha
\end{equation}

Note that $\tau$ is sometimes used instead of $\alpha$.

For example, the method of Least Absolute Deviations (aka $\ell_1$-norm regression, median regression) finds the parameters $\theta$ such that $x^T\theta - y$ is minimized. This, incidentally, is exactly the same as finding the parameters such that $\hat y = x^T \theta$ is the median of the distribution $Y | X = x$.

To find these parameters $\theta$, we minimize the **quantile loss**, which is (expressed compactly, noting that there are other equivalent forms for this loss):

\begin{equation}
\mathcal{L}_\alpha(x^T\theta - y) = (x^T\theta - y)(\mathbb{1}_{x^T\theta - y \geq 0} - \alpha)
\end{equation}

Where $\mathbb{1}_{y - x^T\theta \geq 0}$ is the indicator function that is $1$ when $y - x^T\theta \geq0$ and $0$ otherwise. 

The quantile loss penalizes overshooting (i.e. $x^T\theta - y \geq 0$) with a penalty of $(x^T\theta - y)(1 - \alpha)$ and undershooting (i.e. $x^T\theta - y < 0$) with a penalty of $(x^T\theta - y)(-\alpha)$. When $\alpha$ is large (we have $\left| \alpha \right | \geq \left| 1 - \alpha \right|$) we are looking for a high quantile (i.e. we want $x^T\theta$ to give a value that is larger than most $y$ values), so we penalize undershooting more than overshooting. Similarly, when $\alpha$ is small, we penalize overshooting more than undershooting. When $\alpha = 1/2$, we penalize undershooting and overshooting by exactly the same amount. See the first reference for a more detailed derivation as to why this might be a suitable loss function.

Therefore, for a given $\alpha$, we would like to solve the problem:

\begin{equation*}
  \begin{aligned}
    &\text{minimize} && (x^T\theta - y)(\mathbb{1}_{x^T\theta - y \geq 0} - \alpha) \\
  \end{aligned}
\end{equation*}

However, if we are solving this problem for many different values of $\alpha$, we may wish to impose the further constraint that the quantiles do not contradict each other. We would like to ensure that our estimate of the 90th quantile, for example, is always larger than our estimate of the 89th quantile. In other words, we would like to ensure that given $\theta_\alpha$ and $\theta_\beta$, corresponding to quantiles $\alpha$ and $\beta$ respectively, with $\alpha < \beta$, we want to make sure that every $x_i$ gives a larger value when dotted with $\theta_\alpha$ than with $\theta_\beta$. In other words for a fixed set of $y_i$'s, we expect that assuming these $y_i$'s represent a higher quantile should result in lower values of $x_i^T\theta$ over the dataset. Therefore, we impose the additional constraint that:

\begin{equation}
x_i^T(\theta_\beta - \theta_\alpha) \geq 0
\end{equation}

This gives us the final problem:

\begin{equation*}
  \begin{aligned}
    &\text{minimize} && \sum_{j = 1}^k \sum_{i = 1}^m (x_i^T\theta_j - y_i)(\mathbb{1}_{x_i^T\theta_j - y_i \geq 0} - \alpha_j) \\
    &\text{subject to} && x_i^T(\theta_{\alpha_l} - \theta_{\alpha_m}) >= 0 &&& i = 1,...,m \\
    &                  &&                                                   &&& 1 \leq m < l \leq k
  \end{aligned}
\end{equation*}

with variables $\theta_j$, $j = 1,..., k$, quantiles $\alpha_1,...\alpha_k$ (in increasing order), and labelled data $(x_1, y_1),...,(x_m, y_m)$.


References:

https://stats.stackexchange.com/questions/251600/quantile-regression-loss-function

In [9]:
import cvxpy as cp
import numpy as np
import scipy as sp

# setup

problemID = "quantile_0"
prob = None
opt_val = None

# Variable declarations

# Generate data
np.random.seed(0)
m = 400 # Number of data entries
n = 10 # Number of weights
k = 100 # Number of quantiles
p = 1
sigma = 0.1

x = np.random.rand(m)*2*np.pi*p
y = np.sin(x) + sigma*np.sin(x)*np.random.randn(m)
alphas = np.linspace(1./(k+1), 1-1./(k+1), k) # Do a bunch of quantiles at once

# RBF (Radial Basis Function) features
mu_rbf = np.array([np.linspace(-1, 2*np.pi*p+1, n)])
mu_sig = (2*np.pi*p+2)/n
X = np.exp(-(mu_rbf.T - x).T**2/(2*mu_sig**2)) # Gaussian
# X has dimension m x n

Theta = cp.Variable(n,k)


# Problem construction

def quantile_loss(alphas, Theta, X, y):
    m, n = X.shape
    k = len(alphas)
    Y = np.tile(y.flatten(), (k, 1)).T
    A = np.tile(alphas, (m, 1))
    Z = X*Theta - Y
    return cp.sum_entries(
        cp.max_elemwise(
            cp.mul_elemwise( -A, Z),
            cp.mul_elemwise(1-A, Z)))

f = quantile_loss(alphas, Theta, X, y)
# C = [X*(Theta[:,:-1] - Theta[:,1:]) >= 0]
C = [X*(Theta[:,1:] - Theta[:,:-1]) >= 0]
prob = cp.Problem(cp.Minimize(f), C)


# Problem collection

# Single problem collection
problemDict = {
    "problemID" : problemID,
    "problem"   : prob,
    "opt_val"   : opt_val
}
problems = [problemDict]



# For debugging individual problems:
if __name__ == "__main__":
    def printResults(problemID = "", problem = None, opt_val = None):
        print(problemID)
        problem.solve()
        print("\tstatus: {}".format(problem.status))
        print("\toptimal value: {}".format(problem.value))
        print("\ttrue optimal value: {}".format(opt_val))
    printResults(**problems[0])

quantile_0
	status: optimal
	optimal value: 718.7588164416889
	true optimal value: None
