# <center>Block 9: Optimal transport with entropic regularization</center>
### <center>Alfred Galichon (NYU)</center>
## <center>`math+econ+code' masterclass on matching models, optimal transport and applications</center>
<center>© 2018-2019 by Alfred Galichon. Support from NSF grant DMS-1716489 is acknowledged. James Nesbit contributed.</center>

### Learning objectives

* Entropic regularization

* The log-sum-exp trick

* The Iterated Proportional Fitting Procedure (IPFP)

### References

* [OTME], Ch. 7.3

* Peyré, Cuturi, Computational Optimal Transport, Ch. 4.

### Entropic regularization of the optimal transport problem

Consider the problem

\begin{align*}
\max_{\pi\in\mathcal{M}\left(  p,q\right)  }\sum_{ij}\pi_{ij}\Phi_{ij}-\sigma\sum_{ij}\pi_{ij}\ln\pi_{ij}
\end{align*}

where $\sigma>0$. The problem coincides with the optimal assignment problem when $\sigma=0$. When $\sigma\rightarrow+\infty$, the solution to this problem approaches the independent coupling, $\pi_{ij}=p_{i}q_{j}$.

Later on, we will provide microfoundations for this problem, and connect it with a number of important methods in economics (BLP, gravity model, Choo-Siow...). For now, let's just view this as an extension of the optimal transport problem.

### Dual of the regularized problem

Let's compute the dual by the minimax approach. We have

\begin{align*}
\max_{\pi\geq0}\min_{u,v}\sum_{ij}\pi_{ij}\left(  \Phi_{ij}-u_{i}-v_{j}%
-\sigma\ln\pi_{ij}\right)  +\sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}%
\end{align*}

thus

\begin{align*}
\min_{u,v}\sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}+\max_{\pi\geq0}\sum_{ij}%
\pi_{ij}\left(  \Phi_{ij}-u_{i}-v_{j}-\sigma\ln\pi_{ij}\right)
\end{align*}

By FOC in the inner problem, one has $\Phi_{ij}-u_{i}-v_{j}-\sigma\ln \pi_{ij}-\sigma=0,$thus

\begin{align*}
\pi_{ij}=\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}-\sigma}{\sigma}\right)
\end{align*}

and $\pi_{ij}\left(  \Phi_{ij}-u_{i}-v_{j}-\sigma\ln\pi_{ij}\right) =\sigma\pi_{ij}$, thus the dual problem is

\begin{align*}
\min_{u,v}\sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}+\sigma\sum_{ij}\exp\left(
\frac{\Phi_{ij}-u_{i}-v_{j}-\sigma}{\sigma}\right)  .
\end{align*}

After replacing $v_{j}$ by $v_{j}+\sigma$, the dual is

\begin{align*}
\min_{u,v}\sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}+\sigma\sum_{ij}\exp\left(
\frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma}\right)  -\sigma. \tag{V1}
\end{align*}

### Another expression of the dual

**Claim:** the problem is equivalent to

<a name='V2'></a>
\begin{align*}
\min_{u,v}\sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}+\sigma\log\sum_{i,j}
\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma}\right)  \tag{V2}
\end{align*}

Indeed, let us go back to the minimax expression

\begin{align*}
\min_{u,v}\sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}+\max_{\pi\geq0}\sum_{ij}\pi_{ij}\left(  \Phi_{ij}-u_{i}-v_{j}-\sigma\ln\pi_{ij}\right)
\end{align*}

we see that the solution $\pi$ has automatically $\sum_{ij}\pi_{ij}=1$; thus we can incorporate the constraint into

\begin{align*}
\min_{u,v}\sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}+\max_{\pi\geq0:\sum_{ij}\pi_{ij}=1}\sum_{ij}\pi_{ij}\left(  \Phi_{ij}-u_{i}-v_{j}-\sigma\ln\pi_{ij}\right)
\end{align*}

which yields the [our desired result](#V2).

[This expression](#V2) is interesting because, taking *any* $\hat{\pi}\in
M\left(  p,q\right)$, it reexpresses as

\begin{align*}
\max_{u,v}\sum_{ij}\hat{\pi}_{ij}\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma}\right)  -\log\sum_{ij}\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma}\right)
\end{align*}

therefore if the parameter is $\theta=\left(  u,v\right)$, observations are
$ij$ pairs, and the likelihood of $ij$ is

\begin{align*}
\pi_{ij}^{\theta}=\frac{\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma
}\right)  }{\sum_{ij}\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma}\right)
}
\end{align*}

Hence, [our expression](#problem) will coincide with the maximum likelihood in this model.

### A third expression of the dual problem

Consider

<a name='V2'></a>
\begin{align*}
\min_{u,v}  &  \sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j} \\
s.t. \quad &  \sum_{i,j}\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma}\right)
=1
\end{align*}

It is easy to see that the solutions of this problem coincide with [version 2](#V2). Indeed, the Lagrange multiplier is forced to be one. In other words,

\begin{align*}
\min_{u,v}  &  \sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}\\
s.t. \quad &  \sigma\log\sum_{i,j}\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma
}\right)  =0
\end{align*}

### Small-temperature limit and the log-sum-exp trick

Recall that when $\sigma\rightarrow0$, one has

\begin{align*}
\sigma\log\left(  e^{a/\sigma}+e^{b/\sigma}\right)  \rightarrow\max\left(
a,b\right)
\end{align*}

Indeed, letting $m=\max\left(  a,b\right)$,

<a name='lse'></a>
\begin{align*}
\sigma\log\left(  e^{a/\sigma}+e^{b/\sigma}\right)  =m+\sigma\log\left(\exp\left(  \frac{a-m}{\sigma}\right)  +\exp\left(  \frac{b-m}{\sigma}\right)\right),
\end{align*}
and the argument of the logarithm lies between $1$ and $2$.

This simple remark is actually a useful numerical recipe called the *log-sum-exp trick*: when $\sigma$ is small, using [the formula above](#lse) to compute $\sigma\log\left(  e^{a/\sigma}+e^{b/\sigma}\right)$ ensures the exponentials won't blow up.

Back to the third expression, with $\sigma\rightarrow0$, one has

\begin{align*}
\min_{u,v}  &  \sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}\tag{V3}\\
s.t.  &  \max_{ij}\left(  \Phi_{ij}-u_{i}-v_{j}\right)  =0\nonumber
\end{align*}

This is exactly equivalent with the classical Monge-Kantorovich expression

\begin{align*}
\min_{u,v}  &  \sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}\tag{V3}\\
s.t.  &  \Phi_{ij}-u_{i}-v_{j}\leq0\nonumber
\end{align*}

Back to the third expression of the dual, with $\sigma\rightarrow0$, one has

\begin{align*}
\min_{u,v}  &  \sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}\tag{V3}\\
s.t.  &  \max_{ij}\left(  \Phi_{ij}-u_{i}-v_{j}\right)  =0\nonumber
\end{align*}

This is exactly equivalent with the classical Monge-Kantorovich expression

\begin{align*}
\min_{u,v}  &  \sum_{i}u_{i}p_{i}+\sum_{j}v_{j}q_{j}\tag{V3}\\
s.t.  &  \Phi_{ij}-u_{i}-v_{j}\leq0\nonumber
\end{align*}

### Computation

We can compute $\min F\left(  x\right)$ by two methods:

Either by gradient descent: $x\left(  t+1\right)  =x_{t}-\epsilon _{t}\nabla F\left(  x_{t}\right)  $. (Steepest descent has $\epsilon _{t}=1/\left\vert \nabla F\left(  x_{t}\right)  \right\vert $.)

Or by coordinate descent: $x_{i}\left(  t+1\right)  =\arg\min_{x_{i}}F\left(  x_{i},x_{-i}\left(  t\right)  \right)$.

Why do these methods converge? Let's provide some justification. We will decrease $x_{t}$ by $\epsilon d_{t}$, were $d_{t}$ is normalized by $\left\vert d_{t}\right\vert _{p}:=\left(  \sum_{i=1}^{n}d_{t}^{i}\right) ^{1/p}=1$. At first order, we have 

\begin{align*}
F\left(  x_{t}-\epsilon d_{t}\right)  =F\left(  x_{t}\right)  -\epsilon d_{t}^{\intercal}\nabla F\left(  x_{t}\right)  +O\left(  \epsilon^{1}\right).
\end{align*}

We need to maximize $d_{t}^{\intercal}\nabla F\left(  x_{t}\right)$ over $\left\vert d_{t}\right\vert _{p}=1$.

* For $p=2$, we get $d_{t}=\nabla F\left(  x_{t}\right)  /\left\vert \nabla F\left(  x_{t}\right)  \right\vert $

* For $p=1$, we get $d_{t}=sign\left(  \partial F\left(  x_{t}\right)/\partial x^{i}\right)  $ if $\left\vert \partial F\left(  x_{t}\right) /\partial x^{i}\right\vert =\max_{j}\left\vert \partial F\left(  x_{t}\right) /\partial x^{j}\right\vert $, $0$ otherwise.

In our context, gradient descent is

\begin{align*}
u_{i}\left(  t+1\right)    & =u_{i}\left(  t\right)  -\epsilon\frac{\partial
F}{\partial u_{i}}\left(  u\left(  t\right)  ,v\left(  t\right)  \right)
,\text{ and }\\
v_{j}\left(  t+1\right)    & =v_{j}\left(  t\right)  -\epsilon\frac{\partial
F}{\partial v_{j}}\left(  u\left(  t\right)  ,v\left(  t\right)  \right)
\end{align*}

while coordinate descent is

\begin{align*}
\frac{\partial F}{\partial u_{i}}\left(  u_{i}\left(  t+1\right)
,u_{-i}\left(  t\right)  ,v\left(  t\right)  \right)  =0,\text{ and }
\frac{\partial F}{\partial v_{j}}\left(  u\left(  t\right)  ,v_{j}\left(
t+1\right)  ,v_{-j}\left(  t\right)  \right)  =0.
\end{align*}

### Gradient descent

Gradient of objective function in version 1 of our problem:

\begin{align*}
\left(  p_{i}-\sum_{j}\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma}\right)
,q_{j}-\sum_{i}\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma}\right)
\right)
\end{align*}

Gradient of objective function in version 2

\begin{align*}
\left(  p_{i}-\frac{\sum_{j}\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma
}\right)  }{\sum_{ij}\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma}\right)
},q_{j}-\frac{\sum_{i}\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma}\right)
}{\sum_{ij}\exp\left(  \frac{\Phi_{ij}-u_{i}-v_{j}}{\sigma}\right)  }\right)
\end{align*}

### Coordinate decsent

Coordinate descent on objective function in version 1:

\begin{align*}
p_{i}  & =\sum_{j}\exp\left(  \frac{\Phi_{ij}-u_{i}\left(  t+1\right)
-v_{j}\left(  t\right)  }{\sigma}\right)  ,\\
q_{j}  & =\sum_{i}\exp\left(  \frac{\Phi_{ij}-u_{i}\left(  t\right)
-v_{j}\left(  t+1\right)  }{\sigma}\right)
\end{align*}

that is

\begin{align*}
\left\{
\begin{array}
[c]{c}
u_{i}\left(  t+1\right)  =\sigma\log\left(  \frac{1}{p_{i}}\sum_{j}\exp\left(
\frac{\Phi_{ij}-v_{j}\left(  t\right)  }{\sigma}\right)  \right)  \\
v_{j}\left(  t+1\right)  =\sigma\log\left(  \frac{1}{q_{j}}\sum_{i}\exp\left(
\frac{\Phi_{ij}-u_{i}\left(  t\right)  }{\sigma}\right)  \right)
\end{array}
\right.
\end{align*}

this is called the Iterated Fitting Proportional Procedure (IPFP), or Sinkhorn's algorithm.

Coordinate descent on objective function in version 2 does not yield a closed-form expression.

### IPFP, linear version

Letting $a_{i}=\exp\left(  -u_{i}/\sigma\right)  $ and $b_{j}=\exp\left(  -v_{j}/\sigma\right)  $ and $K_{ij}=\exp\left(  \Phi_{ij}/\sigma\right)  $, one has $\pi_{ij}=a_{i}b_{j}K_{ij}$, and the procedure reexpresses as

\begin{align*}
\left\{
\begin{array}
[c]{l}%
a_{i}\left(  t+1\right)  =p_{i}/\left(  Kb\left(  t\right)  \right)
_{i}\text{ and }\\
b_{j}\left(  t+1\right)  =q_{j}/\left(  K^{\intercal}a\left(  t\right)
\right)  _{j}.
\end{array}
\right.
\end{align*}

### The log-sum-exp trick

The previous program is extremely fast, partly due to the fact that it involves linear algebra operations. However, it breaks down when $\sigma$ is small; this is best seen taking a log transform and returning to $u^{k}=-\sigma\log a^{k}$ and $v^{k}=-\sigma\log b^{k}$, that is

\begin{align*}
\left\{
\begin{array}
[c]{l}%
u_{i}^{k}=\mu_{i}+\sigma\log\sum_{j}\exp\left(  \frac{\Phi_{ij}-v_{j}^{k-1}%
}{\sigma}\right) \\
v_{j}^{k}=\zeta_{j}+\sigma\log\sum_{i}\exp\left(  \frac{\Phi_{ij}-u_{i}^{k}%
}{\sigma}\right)
\end{array}
\right.
\end{align*}

where $\mu_{i}=-\sigma\log p_{i}$ and $\zeta_{j}=-\sigma\log q_{j}$.

One sees what may go wrong: if $\Phi_{ij}-v_{j}^{k-1}$ is positive in the exponential in the first sum, then the exponential blows up due to the small $\sigma$ at the denominator. However, the log-sum-exp trick can be used in order to avoid this issue.

Consider

\begin{align*}
\left\{
\begin{array}
[c]{l}%
\tilde{v}_{i}^{k}=\max_{j}\left\{  \Phi_{ij}-v_{j}^{k}\right\} \\
\tilde{u}_{j}^{k}=\max_{i}\left\{  \Phi_{ij}-u_{i}^{k}\right\}
\end{array}
\right.
\end{align*}

(the indexing is not a typo: $\tilde{v}$ is indexed by $i$ and $\tilde{u}$ by $j$).

One has

\begin{align*}
\left\{
\begin{array}
[c]{l}%
u_{i}^{k}=\mu_{i}+\tilde{v}_{i}^{k-1}+\sigma\log\sum_{j}\exp\left(  \frac
{\Phi_{ij}-v_{j}^{k-1}-\tilde{v}_{i}^{k}}{\sigma}\right) \\
v_{j}^{k}=\zeta_{j}+\tilde{u}_{j}^{k}+\sigma\log\sum_{i}\exp\left(  \frac
{\Phi_{ij}-u_{i}^{k}-\tilde{u}_{j}^{k}}{\sigma}\right)
\end{array}
\right.
\end{align*}

and now the arguments of the exponentials are always nonpositive, ensuring the exponentials don't blow up.

## Application

We will return to our marriage example from Lecture 4. We will do this both using synthetic data and real data.

In [6]:
library(gurobi)
library(Matrix)
library(tictoc)
syntheticData = TRUE
doGurobi = TRUE
doIPFP1 = FALSE
doIPFP2 = TRUE

tol = 1e-09
maxiter = 1e+06
sigma = 0.1  # note: 0.1 to 0.001

Let's generate some synthetic data, or load up the `affinitymatrix.csv`, `Xvals.csv` and `Yvals.csv` that you will recall from Lecture 4.

In [7]:
if (syntheticData) {
    seed = 777
    nbX = 10
    nbY = 8
    set.seed(seed)
    Phi = matrix(runif(nbX * nbY), nrow = nbX)
    p = rep(1/nbX, nbX)
    q = rep(1/nbY, nbY)
} else {
    thePath = getwd()
    data = as.matrix(read.csv(paste0(thePath, "/affinitymatrix.csv"), sep = ",", 
        header = TRUE))  # loads the data
    nbcar = 10
    A = matrix(as.numeric(data[1:nbcar, 2:(nbcar + 1)]), nbcar, nbcar)
    
    data = as.matrix(read.csv(paste0(thePath, "/Xvals.csv"), sep = ",", header = TRUE))  # loads the data
    Xvals = matrix(as.numeric(data[, 1:nbcar]), ncol = nbcar)
    data = as.matrix(read.csv(paste0(thePath, "/Yvals.csv"), sep = ",", header = TRUE))  # loads the data
    Yvals = matrix(as.numeric(data[, 1:nbcar]), ncol = nbcar)
    sdX = apply(Xvals, 2, sd)
    sdY = apply(Yvals, 2, sd)
    mX = apply(Xvals, 2, mean)
    mY = apply(Yvals, 2, mean)
    Xvals = t((t(Xvals) - mX)/sdX)
    Yvals = t((t(Yvals) - mY)/sdY)
    nobs = dim(Xvals)[1]
    Phi = Xvals %*% A %*% t(Yvals)
    p = rep(1/nobs, nobs)
    q = rep(1/nobs, nobs)
    nbX = length(p)
    nbY = length(q)
}
nrow = min(8, nbX)
ncol = min(8, nbY)

We are going to run a horse race between solving this problem using Gurobi and two IPFP algorithms. First Gurobi

In [8]:
if (doGurobi) {
    A1 = kronecker(matrix(1, 1, nbY), sparseMatrix(1:nbX, 1:nbX))
    A2 = kronecker(sparseMatrix(1:nbY, 1:nbY), matrix(1, 1, nbX))
    A = rbind2(A1, A2)
    
    d = c(p, q)
    
    tic()
    result = gurobi(list(A = A, obj = c(Phi), modelsense = "max", rhs = d, sense = "="), 
        params = list(OutputFlag = 0))
    toc()
    
    if (result$status == "OPTIMAL") {
        pi = matrix(result$x, nrow = nbX)
        u_gurobi = result$pi[1:nbX]
        v_gurobi = result$pi[(nbX + 1):(nbX + nbY)]
        val_gurobi = result$objval
    } else {
        stop("optimization problem with Gurobi.")
    }
    
    print(paste0("Value of the problem (Gurobi) = ", val_gurobi))
    print(u_gurobi[1:nrow] - u_gurobi[nrow])
    print(v_gurobi[1:ncol] + u_gurobi[nrow])
    print("***********************")
}

0 sec elapsed
[1] "Value of the problem (Gurobi) = 0.869151732779574"
[1] -0.29191338 -0.36441451 -0.24172456 -0.07125784 -0.15843220 -0.28708162
[7] -0.37751187  0.00000000
[1] 1.0663077 1.0247037 1.1065443 1.1264727 1.2761079 0.9572269 1.0530869
[8] 1.1938532
[1] "***********************"


Next IPFP.

In [9]:
tic()
cont = TRUE
iter = 0

K = exp(Phi/sigma)
B = rep(1, nbY)  # Guess B = vector of ones
while (cont) {
    iter = iter + 1
    A = p/c(K %*% B)
    KA = c(t(A) %*% K)
    error = max(abs(KA * B/q - 1))
    if ((error < tol) | (iter >= maxiter)) {
        cont = FALSE
    }
    B = q/KA
}
u = -sigma * log(A)
v = -sigma * log(B)
pi = (K * A) * matrix(B, nbX, nbY, byrow = T)
val = sum(pi * Phi) - sigma * sum(pi * log(pi))
toc()

if (iter >= maxiter) {
    print("Maximum number of iterations reached in IPFP1.")
} else {
    print(paste0("IPFP1 converged in ", iter, " steps"))
    print(paste0("Value of the problem (IPFP1) = ", val))
    print(paste0("Sum(pi*Phi) (IPFP1) = ", sum(pi * Phi)))
    print(u[1:nrow] - u[nrow])
    print(v[1:ncol] + u[nrow])
}

0.02 sec elapsed
[1] "IPFP1 converged in 59 steps"
[1] "Value of the problem (IPFP1) = 1.17810676894248"
[1] "Sum(pi*Phi) (IPFP1) = 0.842657479124696"
[1] -0.1960913 -0.2920093 -0.1694472 -0.1817577 -0.1516859 -0.1758683 -0.2942356
[8]  0.0000000
[1] 1.412527 1.314859 1.370709 1.393089 1.468269 1.204802 1.364797 1.413045


The IPFP is extremely fast, but breaks down when $\sigma$ is small.

In [10]:
sigma = 0.001
tic()
iter = 0
cont = TRUE
v = rep(0, nbY)
mu = -sigma * log(p)
nu = -sigma * log(q)

while (cont) {
    # print(iter)
    iter = iter + 1
    u = mu + sigma * log(apply(exp((Phi - matrix(v, nbX, nbY, byrow = T))/sigma), 
        1, sum))
    KA = apply(exp((Phi - u)/sigma), 2, sum)
    error = max(abs(KA * exp(-v/sigma)/q - 1))
    if ((error < tol) | (iter >= maxiter)) {
        cont = FALSE
    }
    
    v = nu + sigma * log(KA)
}
pi = exp((Phi - u - matrix(v, nbX, nbY, byrow = T))/sigma)
val = sum(pi * Phi) - sigma * sum((pi * log(pi))[which(pi != 0)])
time = proc.time() - ptm

if (iter >= maxiter) {
    print("Maximum number of iterations reached in IPFP1bis.")
} else {
    print(paste0("IPFP1bis converged in ", iter, " steps and ", time[1], "s."))
    print(paste0("Value of the problem (IPFP1bis) = ", val))
    print(paste0("Sum(pi*Phi) (IPFP1bis) = ", sum(pi * Phi)))
    print(u[1:nrow] - u[nrow])
    print(v[1:ncol] + u[nrow])
}

ERROR: Error in if ((error < tol) | (iter >= maxiter)) {: missing value where TRUE/FALSE needed


However if we use the log-sum-exp trick

In [11]:
tic()
iter = 0
cont = TRUE
v = rep(0, nbY)
mu = -sigma * log(p)
nu = -sigma * log(q)
uprec = -Inf
while (cont) {
    # print(iter)
    iter = iter + 1
    vstar = apply(t(t(Phi) - v), 1, max)
    
    u = mu + vstar + sigma * log(apply(exp((Phi - matrix(v, nbX, nbY, byrow = T) - 
        vstar)/sigma), 1, sum))
    error = max(abs(u - uprec))
    uprec = u
    
    ustar = apply(Phi - u, 2, max)
    v = nu + ustar + sigma * log(apply(exp((Phi - u - matrix(ustar, nbX, nbY, byrow = T))/sigma), 
        2, sum))
    
    if ((error < tol) | (iter >= maxiter)) {
        cont = FALSE
    }
    
}
pi = exp((Phi - u - matrix(v, nbX, nbY, byrow = T))/sigma)
val = sum(pi * Phi) - sigma * sum(pi * log(pi))
toc()

if (iter >= maxiter) {
    print("Maximum number of iterations reached in IPFP2.")
} else {
    print(paste0("IPFP2 converged in ", iter, " steps"))
    print(paste0("Value of the problem (IPFP2) = ", val))
    print(paste0("Sum(pi*Phi) (IPFP2) = ", sum(pi * Phi)))
    print(u[1:nrow] - u[nrow])
    print(v[1:ncol] + u[nrow])
}

106.12 sec elapsed
[1] "IPFP2 converged in 485768 steps"
[1] "Value of the problem (IPFP2) = NaN"
[1] "Sum(pi*Phi) (IPFP2) = 0.869151724470475"
[1] -0.29052708 -0.37524875 -0.24172458 -0.08168658 -0.16788013 -0.28569533
[7] -0.37572011  0.00000000
[1] 1.0790391 1.0378405 1.1192756 1.1276767 1.2773118 0.9595294 1.0651250
[8] 1.1961558
