# <center>Quantile methods</center>
### <center>Alfred Galichon (NYU+Sciences Po)</center>
## <center>'math+econ+code' masterclass on optimal transport and economic applications</center>
#### <center>With python code examples</center>
© 2018-2022 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/team).

**If you reuse material from this masterclass, please cite as:**<br>
Alfred Galichon, 'math+econ+code' masterclass on optimal transport and economic applications, January 2022. https://github.com/math-econ-code/mec_optim

## References
* Koneker and Bassett (1978). `Regression quantile'. Econometrica.
* Koenker (2005). Quantile Regression. Cambridge University Press.
* Koenker, Roger and Kevin F. Hallock. “Quantile Regression”. Journal of Economic Perspectives, Volume 15, Number 4, Fall 2001, Pages 143–156

### What is the right definition of a quantile?


In dimension one, the following statements equivalently define quantiles of a
distribution $Y\sim\nu$:

* The quantile map is the (generalized) inverse of the cdf of $\nu$:
$F_{\nu}^{-1}$.

* The quantile map is the nondecreasing map $T$ such that if
$U\sim\mathcal{U}\left(  \left[  0,1\right]  \right)  $, then $T\left(
U\right)  \sim\nu$.

* The quantile at $t$ $F_{\nu}^{-1}\left(  t\right)  $ is the solution of
$\min_{q}\mathbb{E}\left[  \rho_{t}\left(  Y-q\right)  \right]  $, where
$\rho_{t}\left(  z\right)  =tz^{+}+\left(  1-t\right)  z^{-}$.

* The quantile map is the solution to the Monge problem between
distribution $\mathcal{U}\left(  \left[  0,1\right]  \right)  $ and $\nu$
relative to cost $\Phi\left(  u,y\right)  =uy$.


### Quantile: properties



Quantiles have a number of enjoyable properties that make them easy to work with.

* They fully characterize the distribution $\nu$.

* They allow to construct a representation of $\nu$: $F_{\nu}^{-1}\left(
U\right)  $, $U\sim\mu:=\mathcal{U}\left(  \left[  0,1\right]  \right)  $ has
distribution $\nu$.

* They embed the median ($F_{\nu}^{-1}\left(  1/2\right)  $) and the
extreme values ($F_{\nu}^{-1}\left(  0\right)  $ and $F_{\nu}^{-1}\left(
1\right)  $).

* They allow to provide a construction of distance between distributions:
for $p\geq1$,<br>
$
\left(  \int\left\vert F_{\nu}^{-1}\left(  t\right)  -F_{\nu}^{-1}\left(
t\right)  \right\vert ^{p}dt\right)  ^{1/p}
$<br>
is the $p$-Wasserstein distance between $\mu$ and $\nu$.

* They allow for a natural construction of robust statistics by trimming
the interval $\left[  0,1\right]  $.

* **They lend themselves to a natural notion of regression: quantile
regression (Koenker and Bassett, 1978; Koenker 2005).**



###     Quantile/quantile regression: applications

Quantiles are widely used in economics, finance and statistics.

* Comonotonicity: $\left(  F_{\nu_{1}}^{-1}\left(  U\right)  ,F_{\nu_{2}%
}^{-1}\left(  U\right)  \right)  $ for $U\sim\mu:=\mathcal{U}\left(  \left[
0,1\right]  \right)  $ is a comonotone representation of $\nu_{1}$ and
$\nu_{2}$.

* Mesures of risk: Value-at-risk $F_{\nu}^{-1}\left(  1-\alpha\right)  $;
CVaR $\int_{1-\alpha}^{1}F_{\nu}^{-1}\left(  t\right)  dt$.

* Non-expected utility: Yaari's rank-dependent EU (Choquet integral)
$\int_{0}^{1}F_{\nu}^{-1}\left(  t\right)  w\left(  t\right)  dt$.

* Demand theory: Matzkin's identication of hedonic models.

* Income and inequality: Chamberlain (1994)'s study of the effect of
unionization on wages.

* Biometrics: growth charts.


### What is quantile regression?

* Quantile regression therefore adopts a parameterization of the
conditional quantile which is linear in $Z$. That is<br>
$
Q_{Y|X}\left(  \tau|x\right)  =x^{\intercal}\beta_{\tau}
$<br>
(note that one can always augment $x$ with nonlinear functions of $x$, so this
parameterization is quite general).

* In order to estimate $\beta_{\tau}$, first note that<br>
$
Q_{Y|X}\left(  \tau|x\right)  =\arg\min_{q}\mathbb{E}\left[  \rho_{\tau
}\left(  Y-q\right)  |X=x\right]
$<br>
where $\rho_{\tau}\left(  w\right)  =\tau w^{+}+\left(  1-\tau\right)  w^{-}$.

* Therefore, if the conditional quantile has the specified form,
$\beta_{u}$ is the solution to<br>
$
\min_{\beta\in\mathbb{R}^{k}}\mathbb{E}\left[  \rho_{\tau}\left(
Y-X^{\intercal}\beta\right)  |X=x\right]$<br>
for each $x$, and therefore it is the solution to the quantile regression
problem introduced by Koenker and Bassett (1978)<br>
$
\min_{\beta\in\mathbb{R}^{k}}\mathbb{E}\left[  \rho_{\tau}\left(
Y-X^{\intercal}\beta\right)  \right]  .
$<br>

### Quantile regression as linear programming

* Koenker and Bassett showed that this problem has a linear programming formulation. Indeed, consider its sample version<br>
$\min_{\beta\in\mathbb{R}^{k}}\sum_{i=1}^{n}\rho_{\tau}\left(  Y_{i}%
-X_{i}^{\intercal}\beta\right)$


* Introducing $Y_{i}-X_{i}^{\intercal}\beta=P_{i}-N_{i}$ with $P_{i},N_{i}\geq0$, we have<br>
$
\begin{array}
~ \min_{\substack{\beta\in\mathbb{R}^{k}\\P_{i}\geq0,N_{i}\geq0}} &  \sum
_{i=1}^{n}\tau P_{i}+\left(  1-\tau\right)  N_{i}\\
s.t.~ &  P_{i}-N_{i}=Y_{i}-X_{i}^{\intercal}\beta
\end{array}$<br>
therefore $\beta$ can be obtained by simple linear programming.

* The above can be simplified to<br>
$
\begin{array}
~ \min_{\substack{\beta\in\mathbb{R}^{k}\\P_{i}\geq0}} &  \sum
_{i=1}^{n} P_{i}+\left(  1-\tau\right)  X_{i}^{\intercal}\beta\\
s.t.~ &  P_{i} + X_{i}^{\intercal}\beta\geq Y_{i}~\left[  V_i\geq 0\right]
\end{array}$<br>

* The dual of the latter is<br>
$\begin{array}
&  \max_{V\geq 0} & \sum_i  Y_iV_i  \\
s.t.~ &   V_i\leq1~\left[  P_i\geq0\right]  \\
&  \frac{1}{I}\sum_i  V_i X_{ik}  =\left(  1-\tau\right)  \bar{x}_k  ~\left[  \beta_k\right]
\end{array}$<br>
where $\bar{x}_k:=\frac{1}{I}\sum_i X_{ik}$.

Let's import the libraries we shall need.

In [None]:
!pip install statsmodels

In [None]:
import pandas as pd
import numpy as np
import gurobipy as grb
import scipy.sparse as spr

## Loading the data

We shall use a historical dataset by Engle on food expenditures as a function of the household's income.

In [None]:
engle_path = 'https://raw.githubusercontent.com/alfredgalichon/VQR/master/engle-data/'
engle_data = pd.read_csv(engle_path+ 'engel.csv')

engle_data.head()

In [None]:
income = np.array(engle_data['income'])
food = np.array(engle_data['food'])
housing = np.array(engle_data['housing'])
nbi=len(income)
X_i_k = np.array([np.ones(nbi),income]).T
#Y = np.array([food,housing]).T
_,nbk = X_i_k.shape

In [None]:
qr_lp=grb.Model()
τ = 0.5
P = qr_lp.addMVar(shape=nbi, name="P")
β = qr_lp.addMVar(shape=nbk, name="β", lb=-grb.GRB.INFINITY )
qr_lp.setObjective(np.ones(nbi) @ P + (1-τ) * (np.ones(nbi) @ X_i_k) @ β, grb.GRB.MINIMIZE)
qr_lp.addConstr(P  + X_i_k @ β >= food)
qr_lp.optimize()

βhat = qr_lp.getAttr('x')[-nbk:]
βhat

We can recover the result using the `quantreg` package of `statsmodel` library:

In [None]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

Fit using:

In [None]:
model = smf.quantreg('food ~ income', engle_data)
print(model.fit(q=τ).summary())

In [None]:
# code taken from statsmodel documentation: 
# https://www.statsmodels.org/dev/examples/notebooks/generated/quantile_regression.html

quantiles = np.arange(0.05, 0.96, 0.1)


def fit_model(q):
    res = model.fit(q=q)
    return [q, res.params["Intercept"], res.params["income"]] + res.conf_int().loc[
        "income"
    ].tolist()


models = [fit_model(x) for x in quantiles]
models = pd.DataFrame(models, columns=["q", "a", "b", "lb", "ub"])

ols = smf.ols("food ~ income", engle_data).fit()
ols_ci = ols.conf_int().loc["income"].tolist()
ols = dict(
    a=ols.params["Intercept"], b=ols.params["income"], lb=ols_ci[0], ub=ols_ci[1]
)

print(models)
print(ols)

In [None]:
# code taken from statsmodel documentation: 
# https://www.statsmodels.org/dev/examples/notebooks/generated/quantile_regression.html

x = np.arange(engle_data.income.min(), engle_data.income.max(), 50)
get_y = lambda a, b: a + b * x

fig, ax = plt.subplots(figsize=(8, 6))

for i in range(models.shape[0]):
    y = get_y(models.a[i], models.b[i])
    ax.plot(x, y, linestyle="dotted", color="grey")

y = get_y(ols["a"], ols["b"])

ax.plot(x, y, color="red", label="OLS")
ax.scatter(engle_data.income, engle_data.food, alpha=0.2)
ax.set_xlim((240, 3000))
ax.set_ylim((240, 2000))
legend = ax.legend()
ax.set_xlabel("Income", fontsize=16)
ax.set_ylabel("Food expenditure", fontsize=16)

# Part 3: vector quantile regression

## References
* Ekeland, Galichon and Henry (2012). Comonotonic measures of multivariate risks. *Mathematical Finance*.
* Carlier, Galichon and Chernozhukov (2016). Vector quantile regression: an optimal transport approach. *Annals of Statistics*.
* Carlier, Galichon and Chernozhukov (2017). Vector quantile regression beyond correct specification. *Journal of Multivariate Analysis*.
* Chernozhukov, Galichon, Hallin and Henry (2017). Monge-Kantorovich Depth, Quantiles, Ranks and Signs. *Annals of Statistics*.
* Carlier, Chernozhukov, De Bie, and G (2021). Vector quantile regression and optimal transport, from theory to numerics. Forthcoming, *Empirical Economics*.

### Classical quantile regression and duality


* Recall<br>
$\begin{array}
\min_{P\geq0,N\geq0,\beta} &  \mathbb{E}\left[  \tau P+\left(  1-\tau\right)
N\right]  \\
s.t.~ &  P-N=Y-X^{\top}\beta~\left[  V\right]
\end{array}$<br>
where<br>
$P=\left(  Y-X^{\top}\beta\right)  ^{+}$ and $N=\left(  Y-X^{\top}
\beta\right)  ^{-}$,<br>


* Eliminate $N$ and rewrite<br>
$\begin{array}
~\min_{P\geq0,\beta} &  \mathbb{E}\left[  P+\left(  1-\tau\right)
X^{\top}
\beta \right]  \\
s.t.~ &  P + X^{\top}\beta\geq Y ~\left[  V\right]
\end{array}$<br>
which we call the *dual* problem.

* The corresponding primal problem is<br>
$\begin{array}
&  \max_{V\geq 0} & \mathbb{E}\left[  YV\right]  \\
s.t.~ &   V\leq1~\left[  P\geq0\right]  \\
&  \mathbb{E}\left[  VX\right]  =\left(  1-\tau\right)  \mathbb{E}\left[
X\right]  ~\left[  \beta\right]
\end{array}$<br>

### Complementary slackness

* Let $V\left(  \tau\right)  $ and $\beta\left(  \tau\right)  $ be
solutions to the above program. Complementary slackness yields<br>
$\left\{\begin{array}
~Y-X^{\top}\beta\left(  \tau\right)   &  <0\implies V\left(  \tau\right)  =0\\
Y-X^{\top}\beta\left(  \tau\right)   &  >0\implies V\left(  \tau\right)  =1
\end{array}\right.$<br>
therefore<br>
$1\left\{  Y>X^{\top}\beta\left(  \tau\right)  \right\}  \leq V\left(
\tau\right)  \leq1\left\{  Y\geq X^{\top}\beta\left(  \tau\right)  \right\}.$

* Assume $\left(  X,Y\right)  $ has a continuous distribution,. Then for
any $\beta$, $\Pr\left(  Y-X^{\top}\beta=0\right)  =0$, and therefore one has
almost surely<br>
$V\left(  \tau\right)  =1\left\{  Y\geq X^{\top}\beta\left(  \tau\right)
\right\}.$


### Quantile curve regression

* Consider now solving the problems above for all values of $\tau$ *all at once*. We can write:<br>
$\max_{V\left(  .\right) \geq 0 }\int_{0}^{1}\mathbb{E}\left[  YV\left(
\tau\right)  \right]  d\tau$<br>
s.t.<br>
$V\left(  \tau\right)  \leq1~\left[  P\left(  \tau\right)  \geq0\right] $<br>
$ \mathbb{E}\left[  V\left(  \tau\right)  X\right]  =\left(  1-\tau\right)
\mathbb{E}\left[  X\right]  ~\left[  \beta\left(  \tau\right)  \right]$<br>


* The problem has dual<br>
$\begin{array}
~\min_{P\geq0,\beta}  &  \int_{0}^{1}\mathbb{E}\left[  P\left(
\tau\right)  +\left(  1-\tau\right)  X^{\top}\beta\left(  \tau\right)
\right]  d\tau\\
s.t.~  &  P\left(  \tau\right)  \geq Y-X^{\top}\beta\left(
\tau\right)  ~\left[  V\left(  \tau\right)  \right]
\end{array}$


* The solution to these problems are the same as the solutions to each of the previous problem -- the contraints don't interfere.

* Sample version:<br>
$\begin{array}
~\max_{V_{ti}\geq 0}  &  \frac{1}{I}\sum_{t,i} V_{ti}Y_{i}\\
s.t.~  &  V_{ti}\leq 1\\
& \frac{1}{I}\left( VX\right) _{tk}=\left( 1-\tau _{t}\right) \bar{x}_{k}%
\left[ \beta \right] 
\end{array}$

* In matrix terms, this is<br>
$\begin{array}
~\max_{V \geq 0}  &  \frac{1}{I}1^\top_T V Y\\
s.t.~  &  V\leq 1\\
& \frac{1}{I}\left( VX\right) =\left( 1-\tau \right) \bar{x}^\top
\left[ \beta \right] 
\end{array}$

* After vectorization $v=vec(V)$<br>
$\begin{array}
~\max_{v \geq 0}  &   \frac{1}{I}\left( 1_{T}\otimes Y\right) ^{\top }v\\
s.t.~  &  V\leq 1\\
& \frac{1}{I}\left( I_{T}\otimes X^{\top }\right) v=vec\left( \left( 1-\tau \right) \bar{%
x}^{\top }\right) 
\end{array}$






Code this as:

In [None]:
Y_i_1 = food.reshape((-1,1))
nbt=21
τ_t_1 = np.linspace(0,1,nbt).reshape((-1,1))
A = spr.kron(spr.identity(nbt),X_i_k.T) / nbi
obj = np.kron(np.ones((nbt,1)),Y_i_1).T / nbi
xbar_1_k = X_i_k.mean(axis = 0).reshape((1,-1))
rhs = ((1-τ_t_1) * xbar_1_k).flatten()
qrs_lp=grb.Model()
qrs_lp.setParam( 'OutputFlag', False )
v = qrs_lp.addMVar(shape=nbi*nbt, name="v",lb=0,ub=1)
qrs_lp.setObjective(obj @ v , grb.GRB.MAXIMIZE)
qrs_lp.addConstr(A @ v == rhs)
qrs_lp.optimize()

βqrs_t_k = np.array(qrs_lp.getAttr('pi')).reshape((nbt,nbk))
βqrs_t_k[10,:]

### Quantile curve regression under dual monotonicity constraint

* Koenker and Ng (2005) consider imposing the monotonicity constraint of
the estimated quantile curves. Thus, they impose a constraint on the dual,
namely:<br> $X^{\top}\beta\left(  \tau\right)  \geq X^{\top}\beta\left(
\tau^{\prime}\right)  $ for $\tau\geq\tau^{\prime}$,<br>
that is<br>
$\begin{array}
~\min_{P\geq0,N\geq0,\beta}  &  \int_{0}^{1}\mathbb{E}\left[  P\left(
\tau\right)  +\left(  1-\tau\right)  X^{\top}\beta\left(  \tau\right)
\right]  d\tau\\
s.t.~  &  P\left(  \tau\right)  -N\left(  \tau\right)  =Y-X^{\top}\beta\left(
\tau\right)  ~\left[  V\left(  \tau\right)  \right] \\
&  X^{\top}\beta\left(  \tau\right)  \geq X^{\top}\beta\left(  \tau^{\prime
}\right)  ,~\tau\geq\tau^{\prime}%
\end{array}$

* This is the most natural approach to solve the non-monotonicity problem. However, it does not leads to a simple duality.

### Quantile curve regression under primal monotonicity constraint

* By contrast, [CCG]'s vector quantile regression approach imposes the constraint that the primal variable<br>
$\tau\rightarrow V\left(  \tau\right)  $<br> 
should be nonincreasing. This is justified by the fact that<br> 
$V\left(  \tau\right)  =1\left\{  Y\geq X^{\top}\beta\left(  \tau\right)  \right\}  $,<br>
so<br> 
$ X^{\top} \beta\left(  \tau\right) \text{ nondecreasing in } \tau \implies V\left(  \tau\right)  $ nonincreasing.

* Therefore, we let consider the program
\begin{align*}
&  \max_{V\left(  \tau\right)  }\int_{0}^{1}\mathbb{E}\left[  YV\left(
\tau\right)  \right]  d\tau\\
s.t.~  &  V\left(  \tau\right)  \geq0~\left[  N\left(  \tau\right)
\geq0\right] \\
&  V\left(  \tau\right)  \leq1~\left[  P\left(  \tau\right)  \geq0\right] \\
&  \mathbb{E}\left[  V\left(  \tau\right)  X\right]  =\left(  1-\tau\right)
\mathbb{E}\left[  X\right]  ~\left[  \beta\left(  \tau\right)  \right] \\
&  V\left(  \tau\right)  \leq V\left(  \tau^{\prime}\right)  ,~\tau\geq
\tau^{\prime}%
\end{align*}

### Primal monotonicity constraint, sample version

* Consider $\tau_{1}=0<...<\tau_{T}\leq1$ and let $\bar{x}$ be the
$1\times K$ row vector whose $k$-th entry is $\mathbb{E}\left[  X_{k}\right]
$.<br>One has
\begin{align*}
&  \max_{V_{ti}\geq0}\frac{1}{I}\sum_{\substack{1\leq i\leq I\\1\leq t\leq T}}V_{ti}%
Y_{i}\\
&  V_{ti}\leq1\\
&  \frac{1}{I}\sum_{1\leq i\leq I}V_{ti}X_{ik}=\left(  1-\tau_{t}\right)
\bar{x}_{k}\\
&  V_{\left(  t+1\right)  i}\leq V_{ti}
\end{align*}


* As $\tau_{1}=0$, one has necessarly $V_{1i}=1$ and the program becomes
\begin{align*}
&  \max_{V_{ti}}\frac{1}{I}\sum_{\substack{1\leq i\leq I\\1\leq t\leq T}}V_{ti}Y_{i}\\
&  V_{1i}=1\\
&  \frac{1}{I}\sum_{1\leq i\leq I}V_{ti}X_{ik}=\left(  1-\tau_{t}\right)
\bar{x}_{k}\\
&  V_{t1}\geq V_{t2}\geq...\geq V_{t\left(  m-1\right)  }\geq V_{T,i}\geq0.
\end{align*}

### Matrix notations

* Let $\tau$ be the $T\times1$ row matrix with entries $\tau_{k}$.

* Let $D$ be a $T\times T$ matrix defined as
$$
D=
\begin{pmatrix}
1 & 0 & 0 & \cdots & 0 & 0\\
-1 & 1 & 0 & \ddots & \vdots & \vdots\\
0 & -1 & 1 & \ddots & 0 & 0\\
\vdots & \ddots & \ddots & \ddots & 0 & 0\\
\vdots &  & 0 & -1 & 1 & 0\\
0 &  & 0 & 0 & -1 & 1
\end{pmatrix}
$$
we have $V^{\top}D\geq0$ if and only if $$V_{1i}\geq V_{2i}\geq...\geq
V_{\left(  T-1\right)  i}\geq V_{Ti}\geq0.$$

* One can write
\begin{align*}
&  \frac{1}{I}\max_{V}1_{T}^{\top}VY\\
&  \frac{1}{I}VX=\left(  1_{T}-\tau\right)  \bar{x}\\
&  V^{\top}D1_{T}=1_{I}\\
&  V^{\top}D\geq0
\end{align*}



* Thus, setting<br> $\pi=D^{\top}V/I$, and $U=D^{-1}1_{I}$, $\mu=D^{\top}\left(
1_{T}-\tau\right)  $, and $p=1_{I}/I$,<br>one has<br>
\begin{align*}
&  \max_{\pi}U^{\top}\pi Y\\
&  \pi X=\mu\bar{x}\\
&  \pi^{\top}1_{T}=p\\
&  \pi\geq0
\end{align*}


* Assume that the first entry of $X$ is one. One has that if $\pi$ satisfies the constraints, then
$$\sum_{i=1}^{I}\pi_{ti}=\mu_{t}\text{ and }\sum_{t=1}^{T}\pi_{ti}=p_{i}$$
thus $\pi$ can be thought of as a joint probability on $\tau$ and $X$.

* One has
\begin{align*}
&  \max_{\pi\geq0}\sum_{\substack{1\leq t\leq T\\1\leq i\leq I}}\pi_{ti}
U_{t}Y_{i}\\
&  \sum_{1\leq i\leq I}\pi_{ti}X_{ik}=\mu_{t}\bar{x}_{k}\\
&  \sum_{1\leq t\leq T}\pi_{ti}=p_{i}%
\end{align*}

### Computation

* Recall the problem to compute
\begin{align*}
&  \max_{\pi\geq0}U^{\top}\pi Y\\
&  \pi X=\mu\bar{x}\\
&  \pi^{\top}1_{T}=p
\end{align*}
which we sill vectorize using $vec\left(  A\pi B\right)  =\left(  A \otimes B^{\top
}\right)  vec\left(  \pi\right)  $ so that the constraint becomes
$$
\begin{pmatrix}
 I_{T}\otimes X^{\top}\\
1_{T}^{\top} \otimes I_{I}
\end{pmatrix}
vec\left(  \pi\right)  =\binom{vec\left(  \mu\bar{x}\right)  }{p}%
$$


* There are $IT$ primal variables and $KT+I$ constraints.

* Computation is done with a linear programming solver like Gurobi. Large-scale linear programming solvers make use of sparsity of constraint matrix. However, if dimension of $Y$ is larger, $T$ will need to be large.

### Recovering the $\beta$

* $\beta$ is the vector of Lagrange multipliers of the constraint $ \frac{1}{I}V X= \left( 1_{T}-\tau \right) \bar{x} $ in the former problem.

* Let $\psi$ be the vector of Lagrange multipliers of the constraint $\pi X-\mu \bar{x}$ in the latter problem.

* We have $\beta = D \psi$. Indeed:<br>
$\left( \pi X-\mu \bar{x}\right) ^{\top }\psi =0$<br>
thus<br>
$\left( \frac{1}{I}D^{\top }V X-D^{\top }\left( 1_{T}-\tau \right) \bar{x}%
\right) ^{\top }\psi =0$<br>
and therefore<br>
$\left( \frac{1}{I}V X-\left( 1_{T}-\tau \right) \bar{x}\right) ^{\top }D\psi
=0$

* Compute in the following manner:

In [None]:
D_t_t = spr.diags([1, -1], [ 0, -1], shape=(nbt, nbt))

U_t_1 = np.linalg.inv(D_t_t.toarray()) @ np.ones( (nbt,1)) 
μ_t_1 = D_t_t.T @ (np.ones((nbt,1)) - τ_t_1)

A1 = spr.kron(spr.identity(nbt),X_i_k.T)
A2 = spr.kron(np.array(np.repeat(1,nbt)),spr.identity(nbi))
A = spr.vstack([A1, A2])
rhs = np.concatenate( [(μ_t_1 * xbar_1_k).flatten(), np.ones(nbi)/nbi]) 
obj = np.kron(U_t_1, Y_i_1).T
vqr_lp=grb.Model()
pi = vqr_lp.addMVar(shape=nbi*nbt, name="pi")
vqr_lp.setParam( 'OutputFlag', False )
vqr_lp.setObjective( obj @ pi, grb.GRB.MAXIMIZE)
vqr_lp.addConstr(A @ pi == rhs)
vqr_lp.optimize()

ϕ_t_k = np.array(vqr_lp.getAttr('pi'))[0:(nbt*nbk)].reshape((nbt,nbk))

βvqr_t_k = D_t_t.toarray() @ ϕ_t_k

βvqr_t_k[10,:]

## Vector quantile regression in the continuous case

* This rewrites as a *Vector quantile regression* (yet for now in the scalar case), introduced in [CCG16]
\begin{align*}
\max_{\pi}  &  \mathbb{E}_{\pi}\left[  UY\right] \\
s.t.  &  U\sim\mu\\
&  \left(  X,Y\right)  \sim P\\
&  \mathbb{E}\left[  X|U\right]  =\mathbb{E}\left[  X\right]
\end{align*}


* This is an extension of the optimal transport problem of Monge-Kantorovich. As a matter of fact, when $X$ is restricted to the constant, this is *exactly* an optimal transport problem.

### Vector quantile regression, primal problem

* Recall our previous problem
\begin{align*}
\max_{\pi}  &  \mathbb{E}_{\pi}\left[  UY\right] \\
s.t.  &  U\sim\mu\\
&  \left(  X,Y\right)  \sim P\\
&  \mathbb{E}\left[  X|U\right]  \sim\mathbb{E}\left[  X\right]
\end{align*}
and replace mean-independence by independence; one has
\begin{align*}
\max_{\pi}  &  \mathbb{E}_{\pi}\left[  UY\right] \\
s.t.  &  U\sim\mu\\
&  \left(  X,Y\right)  \sim P\\
&  X {\perp\!\!\!\perp}U
\end{align*}


* The solution to the latter problem is simply<br>
$U=F_{Y|X}\left(  Y|X\right)$<br>
which yields the nonparametric conditional quantile representation<br>
$Y=F_{Y|X}^{-1}\left(  U|X\right).$

### Vector quantile regression, dual problem

* The dual problem yields
\begin{align*}
\min_{\psi,b}  &  \mathbb{E}_{P}\left[  \psi\left(  X,Y\right)  \right]
+\bar{x}^{\top}\mathbb{E}_{\mu}\left[  b\left(  U\right)  \right] \\
s.t.~  &  \psi\left(  x,y\right)  +x^{\top}b\left(  \tau\right)  \geq\tau
y,~\forall x,y,\tau
\end{align*}


* Optimality of $\left(  \psi,b\right)  $ yields
$$
\psi\left(  x,y\right)  =\sup_{\tau\in\left[  0,1\right]  }\left\{  \tau
y-x^{\top}b\left(  \tau\right)  \right\}
$$
which yields, if $b$ is differentiable,
$$
Y=X^{\top}\beta\left(  U\right)
$$
where $(U,X,Y)$ are the solutions to the primal problem and $\beta\left(
\tau\right)  =b^{\prime}\left(  \tau\right)  $.


### Multivariate case

* Vector quantile regression yields a natural way to extend classical
quantile regression to the case when the dependent variable is multivariate.
If $Y$ is valued in $\mathbb{R}^{d}$, one may take $\tau$ in $\mathbb{R}^{d}$,
$\mu=\mathcal{U}\left(  \left[  0,1\right]  ^{d}\right)  $. We replace the
product $\tau$ by the scalar product $\tau^{\top}\beta\left(  U\right)  $, and
the analysis goes unmodified.

* We get a nice tensorization of vector quantile regression that way: when
the components of $Y$ are independent, the previous propal amounts to running
the scalar version component by component.