# Lecture 4

The application consists of data on a population of heterosexual men and women, which includes education, height, BMI, health, the "big 5" personality traits, and a measure of risk aversion. These data are stored in the files `Xvals.csv` (men) and `Yvals.csv` (women) on $1,158$ men and women. (The data comes from married households, which is why there is the same number of men and women).

In [1]:
require("Matrix")
require("gurobi")
thePath = getwd()
nbcar = 10  # number of characteristics

data_X = as.matrix(read.csv(paste0(thePath, "/Xvals.csv"), sep = ",", header = TRUE))  # loads the data
Xvals = matrix(as.numeric(data_X[, 1:nbcar]), ncol = nbcar)

data_Y = as.matrix(read.csv(paste0(thePath, "/Yvals.csv"), sep = ",", header = TRUE))  # loads the data
Yvals = matrix(as.numeric(data_Y[, 1:nbcar]), ncol = nbcar)

Loading required package: Matrix
Loading required package: gurobi
Loading required package: slam


In [2]:
head(data_X)

educm,heightm,BMIm,healthm,consm,extram,agreem,emom,autom,riskym
2,186,28.90508,3,-0.752877,-0.360787,-0.711276,-0.291031,0.840217,0.479437
2,176,27.4406,3,0.345542,-0.805524,-0.251796,-0.305475,-0.064454,0.030303
3,187,23.16337,3,-0.759678,0.898007,-0.029462,-0.672859,-0.961691,-0.556598
1,184,29.24149,2,-0.455688,-1.053375,-0.041612,0.436133,0.121873,0.992084
1,174,23.78121,4,-1.440239,1.16373,0.29375,-0.538922,0.782285,-1.401034
1,186,21.96786,3,-1.008298,-0.484221,1.155301,0.267899,0.927354,0.011056


In [3]:
head(data_Y)

educv,heightv,BMIv,healthv,consv,extrav,agreev,emov,autov,riskyv
2,159,22.94213,4,-0.352262,0.065096,-0.713136,-0.529817,-0.06674,0.271632
2,165,22.03857,3,-0.741707,-0.484221,0.219906,0.706937,0.685428,0.353834
3,170,20.76125,3,0.327571,-0.180299,1.05207,0.999001,0.177472,-0.201117
2,160,22.65625,4,-0.18764,-1.299261,1.223071,0.154011,2.284336,-0.17207
2,165,22.77319,3,0.078951,0.613921,0.122749,0.073875,-0.253068,0.042352
1,168,19.13265,3,-1.429069,0.472616,1.240802,0.695631,1.163253,-1.445496


We postulate that the form of the surplus function is
\begin{align*}
\Phi_{ij}=x_{i}^{\intercal} Ay_{j}
\end{align*}
where $x_{i}$ and $y_{j}$ are the 10-dimensional characteristics of man $i$ and woman $j$, and the form of $A$, a 10x10 matrix, is given (it is stored in the file `affinitymatrix.csv`). Again, we'll see later how to solve the econometrics problem of estimating $A$.

In [4]:
data_aff = as.matrix(read.csv(paste0(thePath, "/affinitymatrix.csv"), sep = ",", 
    header = TRUE))  # loads the data
A = matrix(as.numeric(data_aff[1:nbcar, 2:(nbcar + 1)]), nbcar, nbcar)

sdX = apply(Xvals, 2, sd)
sdY = apply(Yvals, 2, sd)
mX = apply(Xvals, 2, mean)
mY = apply(Yvals, 2, mean)
Xvals = t((t(Xvals) - mX)/sdX)
Yvals = t((t(Yvals) - mY)/sdY)
nobs = dim(Xvals)[1]

Phi = Xvals %*% A %*% t(Yvals)

In [5]:
head(data_aff)

X,educw,heightw,BMIw,healthw,consw,extraw,agreew,emow,autow,riskyw
educm,0.56,0.02,-0.08,0.02,-0.04,-0.01,-0.03,-0.04,0.05,-0.02
heightm,0.01,0.18,0.04,-0.01,-0.04,0.05,0.02,0.02,0.02,0.02
BMIm,-0.05,0.05,0.21,0.01,0.06,0.0,-0.04,0.04,-0.01,0.01
healthm,-0.07,0.0,-0.06,0.14,-0.04,0.05,-0.04,0.04,0.02,0.0
consm,-0.06,-0.03,0.07,0.0,0.14,0.07,0.04,0.06,-0.02,-0.01
extram,0.01,-0.02,0.05,0.02,-0.06,0.02,-0.02,-0.01,-0.03,-0.05


This problem of computation of the Optimal Assignment Problem, more specifically of $\left(\pi,u,v\right)$, is arguably the most studied problem in Computer Science, and dozens, if not hundreds of algorithms exist, whose running time is polynomial in $\max\left(n,m\right)$, typically a power less than three of the latter.

Famous algorithms include: the Hungarian algorithm (Kuhn-Munkres); Bertsekas' auction algorithm; Goldberg and Kennedy's pseudoflow algorithm. For more on these, see the book by Burkard, Dell'Amico, and Martello, and a
introductory presentation in http://www.assignmentproblems.com/doc/LSAPIntroduction.pdf.

Here, we will show how to solve the problem with the help of a Linear Programming solver used as a black box; our challenge here will be to carefully set up the constraint matrix as a sparse matrix in order to let a large scale Linear Programming solvers such as Gurobi recognize and exploit the sparsity of the problem.

Let $\Pi$ and $\Phi$ be the matrices with typical elements $\left(
\pi_{xy}\right)  $ and $\left(  \Phi_{xy}\right)  $. We let $p$, $q$, $u$,
$v$, and $1$ the column vectors with entries $\left(  p_{x}\right)  $,
$\left(  q_{y}\right)  $, $\left(  u_{x}\right)  $, $\left(  v_{y}\right)  $,
and $1$, respectively. The optimal assignment problem
\begin{align}
\max_{\pi\geq0}  &  \sum_{xy}\pi_{xy}\Phi_{xy}\label{OAP}\\
s.t.~  &  \sum_{y\in\mathcal{Y}}\pi_{xy}=p_{x}~\left[  u_{x}\right]
\nonumber\\
&  \sum_{x\in\mathcal{X}}\pi_{xy}=q_{y}~\left[  v_{y}\right] \nonumber
\end{align}

Can be rewritten using matrix algebra as
\begin{align}
&  \max_{\Pi\geq0}Tr\left(  \Pi^{\prime}\Phi\right) \label{matrixLP}\\
&  \Pi1_{M}=p\nonumber\\
&  1_{N}^{\prime}\Pi=q^{\prime}.\nonumber
\end{align}

We need to convert matrices into vectors; this can be done for instance
by stacking the columns of $\Pi$ into a single column vector (typical in R or
Matlab). This operation is called *vectorization*, which we will denote
\begin{align*}
vec\left(  A\right)  ,
\end{align*}
which reshapes a $N\times M$ matrix into a $nm\times1$ vector. In `R`, this is
implemented by `c(A)`; in Matlab, by `A(:)`.

The objective function rewrites as
\begin{align*}
vec\left(  \Phi\right)  ^{\prime}vec\left(  \Pi\right)  .
\end{align*}

In [6]:
obj = c(Phi)

Recall that if $A$ is a $M\times p$ matrix and $B$ a $N\times q$ matrix,
then the Kronecker product $A\otimes B$ of $A$ and $B$ is a $mn\times
pq$ matrix such that
\begin{equation}
vec\left(  BXA^{\prime}\right)  =\left(  A\otimes B\right)  vec\left(
X\right)  . \label{VecAndKronecker}%
\end{equation}
In R, $A\otimes B$ is implemented by `kronecker(A,B)`; in Matlab, by `kron(A,B)`.

The first constraint $I_{N}\Pi1_{M}=p$, vectorizes therefore as
\begin{align*}
\left(  1_{M}^{\prime}\otimes I_{N}\right)  vec\left(  \Pi\right)  =vec\left(
p\right)  ,
\end{align*}
and similarly, the second constraint $1_{N}^{\prime}\Pi I_{M}=q^{\prime}$,
vectorizes as
\begin{align*}
\left(  I_{M}\otimes1_{N}^{\prime}\right)  vec\left(  \Pi\right)  =vec\left(
q\right)  .
\end{align*}

Note that the matrix $1_{M}^{\prime}\otimes I_{N}$ is of size $N\times
NM$, and the matrix $I_{M}\otimes1_{N}^{\prime}$ is of size $M\times NM$;
hence the full matrix involved in the left-hand side of the constraints is of
size $\left(  N+M\right)  \times NM$. In spite of its large size, this matrix
is *sparse*. In `R`, the identity matrix $I_{N}$ is coded as
`sparseMatrix(1:N,1:N)`, in Matlab as `Speye(N)`.

In [7]:
N = dim(Phi)[1]
M = dim(Phi)[2]

A1 = kronecker(matrix(1, 1, M), sparseMatrix(1:N, 1:N))
A2 = kronecker(sparseMatrix(1:M, 1:M), matrix(1, 1, N))
A = rbind2(A1, A2)

In [8]:
p = rep(1/nobs, nobs)
q = rep(1/nobs, nobs)
d = c(p, q)

Setting $z=vec\left(  \Pi\right)  $, the Linear Programming problem then
becomes
\begin{align}
&  \max_{z\geq0}vec\left(  \Phi\right)  ^{\prime}z\label{LPvectorized}\\
s.t.~  &  \left(  1_{M}^{\prime}\otimes I_{N}\right)  z=vec\left(  p\right)
\nonumber\\
&  \left(  I_{M}\otimes1_{N}^{\prime}\right)  z=vec\left(  q^{\prime}\right)
\nonumber
\end{align}
which is ready to be passed on to a linear programming solver such as Gurobi.

A LP solver typically computes programs of the form
\begin{align}
&  \max_{z\geq0}c^{\prime}z\label{standardLP}\\
&  s.t.~Az=d.\nonumber
\end{align}

In [9]:
result = gurobi(list(A = A, obj = obj, modelsense = "max", rhs = d, sense = "="), 
    params = list(OutputFlag = 0))
if (result$status == "OPTIMAL") {
    pi = matrix(result$x, nrow = N)
    u = result$pi[1:N]
    v = result$pi[(N + 1):(N + M)]
    val = result$objval
} else {
    stop("optimization problem with Gurobi.")
}

In [10]:
print(paste0("Value of the problem (Gurobi) = ", val))
print(u[1:10])
print(v[1:10])

[1] "Value of the problem (Gurobi) = 1.70388302245657"
 [1] 1.922803 1.286440 2.351599 3.030279 3.741377 2.725222 1.252313 1.988384
 [9] 1.445145 1.525087
 [1] -0.7409078 -0.9616074  0.6039734 -0.2880301 -1.1140921 -0.1014630
 [7] -0.6873566  0.5351975 -0.5838891 -0.1083689
