# <center>The gravity equation</center>
### <center>Alfred Galichon (NYU & Sciences Po)</center>
## <center>'math+econ+code' masterclass series</center>
#### <center>With python code examples</center>
© 2018-2023 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/theteam), in particular Giovanni Montanari.

**If you reuse material from this masterclass, please cite as:**<br>
Alfred Galichon, 'math+econ+code' masterclass series. https://github.com/math-econ-code/mec_optim

## Learning objectives

* Regularized optimal transport

* The gravity equation

* Generalized linear models

* Pseudo-Poisson maximum likelihood estimation

## References

* Anderson and van Wincoop (2003). "Gravity with Gravitas: A Solution to the Border Puzzle". *American Economic Review*.

* Head and Mayer (2014). "Gravity Equations: Workhorse, Toolkit and Cookbook". *Handbook of International Economics*.

* Choo and Siow (2005). "Who marries whom and why". *Journal of Political Economy*.

* Gourieroux, Trognon, Monfort (1984). "Pseudo Maximum Likelihood Methods: Theory". *Econometrica*.

* McCullagh and Nelder (1989). *Generalized Linear Models*. Chapman and Hall/CRC.

* Santos Silva and Tenreyro (2006). "The Log of Gravity". *Review of Economics and Statistics*.

* Yotov et al. (2011). *An advanced guide to trade policy analysis*. WTO.

* Guimares and Portugal (2012). "Real Wages and the Business Cycle: Accounting for Worker, Firm, and Job Title Heterogeneity". *AEJ: Macro*.

* Dupuy and G (2014), "Personality traits and the marriage market". *Journal of Political Economy*.

* Dupuy, G and Sun (2019), "Estimating matching affinity matrix under low-rank constraints". *Information and Inference*.

* Carlier, Dupuy, Galichon and Sun "SISTA: learning optimal transport costs under sparsity constraints." *Communications on Pure and Applied Mathematics* (forthcoming).

# Motivation

The gravity equation is a very useful tool for explaining trade flows by various measures of proximity between countries.

A number of regressors have been proposed. They include: geographic distance, common official languague, common colonial past, share of common religions, etc.

The dependent variable is the volume of exports from country $x$ to country $y$, for each pair of country $\left(  x, y\right)$.

Today, we shall see a close connection between gravity models of international trade and separable matching models.

# The gravity equation

"Structural gravity equation" (Anderson and van Wincoop, 2003) as exposited in Head and Mayer (2014)
handbook chapter:

\begin{align*}
\mu_{xy}=\frac{n_x}{\Psi_{x}} \frac{m_y}{\Omega_{y}}  \Phi_{xy}%
\end{align*}

where $x$=exporter,  $y$=importer, $\mu_{xy}$=trade flow from $x$ to $y$, $n_x=\sum_{y}\mu_{xy}$ is value of production,  $m_y=\sum_{x}\mu_{xy}$ is importers' expenditures, and $\phi_{xy}$=bilateral accessibility of $x$ to $y$.

$\Omega_{y}$ and $\Psi_{x}$ are *multilateral resistances*, satisfying the set of implicit equations

\begin{align*}
\Psi_{x}=\sum_{y}\frac{\Phi_{xy}m_y}{\Omega_{y}}\text{ and }\Omega_{y}%
=\sum_{x}\frac{\Phi_{xy}n_x}{\Psi_{x}}%
\end{align*}

We will see that these are exactly the same equations as those of the regularized OT.

## Explaining trade

Parameterize $\Phi_{xy}=\exp\left(  \sum_{k=1}^{K}\beta_{k}D_{xy}^{k}\right)  $, where the $D_{xy}^{k}$ are $K$ pairwise measures of distance between $x$ and $y$. We have

\begin{align*}
\mu_{xy}=\exp\left(  \sum_{k=1}^{K}\beta_{k}D_{xy}^{k}-a_{x}-b_{y}\right)
\end{align*}

where fixed effects $b_{y}=-\ln \frac{m_y}{\Omega_{y}}$ and $a_{x}=-\ln \frac{n_x}{\Psi_{x}}$ are adjusted by

\begin{align*}
\sum_{x}\mu_{xy}=n_x\text{ and }\sum_{y}\mu_{xy}=m_y.
\end{align*}

Standard choices of $D_{xy}^{k}$'s:

* Logarithm of bilateral distance between $x$ and $y$

* Indicator of contiguous borders; of common official language; of
colonial ties

* Trade policy variables: presence of a regional trade agreement; tariffs

* Could include many other measures of proximity, e.g. measure of genetic/cultural distance, intensity of communications, etc.

### Regularized optimal transport

Consider the optimal transport duality

\begin{align*}
\max_{\mu\in\mathcal{M}\left(  P,Q\right)  }\sum_{xy}\mu_{xy}\Phi_{xy}=\min_{u_{x}+v_{y}\geq\Phi_{xy}}\sum_{x\in\mathcal{X}}n_xu_{x}+\sum_{y\in\mathcal{Y}}m_yv_{y}
\end{align*}

Now let's assume that we are adding an entropy to the primal objective function. For any $\sigma>0$, we get

\begin{align*}
&  \max_{\mu\in\mathcal{M}\left(  P,Q\right)  }\sum_{xy}\mu_{xy}\Phi_{xy}-\sigma\sum_{xy}\mu_{xy}\ln\mu_{xy}\\
&  =\min_{u,v}\sum_{x\in\mathcal{X}}n_xu_{x}+\sum_{y\in\mathcal{Y}}m_y v_{y}+\sigma\sum_{xy}\exp\left(  \frac{\Phi_{xy}-u_{x}-v_{y}-\sigma}{\sigma}\right)
\end{align*}

The latter problem is an unconstrained convex optimization problem. But the most efficient numerical computation technique is often coordinate descent, i.e. alternate between minimization in $u$ and minimization in $v$.

### Iterated fitting

Maximize wrt to $u$ yields

\begin{align*}
e^{-u_{x}/\sigma}=\frac{n_x}{\sum_{y}\exp\left(  \frac{\Phi_{xy}-v_{y}-\sigma}{\sigma}\right)  }
\end{align*}

and wrt $v$ yields

\begin{align*}
e^{-v_{y}/\sigma}=\frac{m_y}{\sum_{x}\exp\left(  \frac{\Phi_{xy}-v_{y}-\sigma}{\sigma}\right)  }
\end{align*}

It is called the "iterated projection fitting procedure" (ipfp), aka "matrix scaling", "RAS algorithm", "Sinkhorn-Knopp algorithm", "Kruithof's method", "Furness procedure", "biproportional fitting procedure", "Bregman's procedure". See survey in Idel (2016).

Maybe the most often reinvented algorithm in applied mathematics. Recently rediscovered in a machine learning context.

### Econometrics of matching

The goal is to estimate the matching surplus $\Phi_{xy}$. For this, take a linear parameterization

\begin{align*}
\Phi_{xy}^{\beta}=\sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}.
\end{align*}

Following Choo and Siow (2006), Galichon and Salanie (2011) introduce logit heterogeneity in individual preferences and show that the equilibrium now maximizes the *regularized Monge-Kantorovich problem*

\begin{align*}
W\left(  \beta\right)  =\max_{\mu\in\mathcal{M}\left(  P,Q\right)  }\sum_{xy}\mu_{xy}\Phi_{xy}^{\beta}-\sigma\sum_{xy}\mu_{xy}\ln\mu_{xy}
\end{align*}

By duality, $W\left(  \beta\right)  $ can be expressed

\begin{align*}
W\left(  \beta\right)  =\min_{u,v}\sum_{x}n_xu_{x}+\sum_{y}m_yv_{y}+\sigma\sum_{xy}\exp\left(  \frac{\Phi_{xy}^{\beta}-u_{x}-v_{y}-\sigma}{\sigma}\right)
\end{align*}

and w.l.o.g. can set $\sigma=1$ and drop the additive constant $-\sigma$ in the $\exp$.

### Estimation

We observe the actual matching $\hat{\mu}_{xy}$. Note that $\partial W/ \partial\beta^{k}=\sum_{xy}\mu_{xy}\phi_{xy}^{k},$ hence $\beta$ is estimated by running

<a name='objFun'></a>
\begin{align*}
\min_{u,v,\beta}\sum_{x}n_xu_{x}+\sum_{y}m_yv_{y}+\sum_{xy}\exp\left(\Phi_{xy}^{\beta}-u_{x}-v_{y}\right)  -\sum_{xy,k}\hat{\mu}_{xy}\beta_{k}\phi_{xy}^{k}
\end{align*}

which is still a convex optimization problem.

As we will show later, this is actually the objective function of the log-likelihood in a Poisson regression with $x$ and $y$ fixed effects, where we assume

\begin{align*}
\mu_{xy}|xy\sim Poisson\left(  \exp\left(  \sum_{k=1}^{K}\beta_{k}\phi
_{xy}^{k}-u_{x}-v_{y}\right)  \right)  .
\end{align*}

---
To start working with our application, let's load some of the libraries we shall need.

In [1]:
import numpy as np  # used to work with arrays
import pandas as pd # used to load and work with dataframes
#import math  # used to work with logs and infinities
import time  # used to time the execution of the code

import scipy.sparse as spr  # used to work with sparse matrices (when working with the Poisson matrix representation)
from sklearn import linear_model  # used to implement the Poisson regression

And let's load our data, which comes from the book *An Advanced Guide to Trade Policy Analysis: The Structural Gravity Mode*, by Yotov et al. We will estimate the gravity model using optimal transport as well as using Poisson regression.

While the table of trade data includes several possible regressors, we focus on four types of regressors: the logarithm of the distance between countries and dummy variables for whether any two countries are contiguous, share a common official language, or share colonial ties. These are regressors known in the literature to have explanatory power, and they are the same as the ones used in Yotov et al.

Our data look as follows:

In [2]:
thepath = 'https://raw.githubusercontent.com/math-econ-code/mec_optim_2021-01/master/data_mec_optim/gravity_wtodata/'

tradedata = pd.read_csv(thepath + '1_TraditionalGravity_from_WTO_book.csv') # load full table
tradedata = tradedata[['exporter', 'importer','year', 'trade', 'DIST','ln_DIST', 'CNTG', 'LANG', 'CLNY']]   # focus on a subset of regressors

tradedata.sort_values(['year','exporter','importer'], inplace = True)
tradedata.reset_index(inplace = True, drop = True)

nbt = len(tradedata['year'].unique())  # number of periods
nbi = len(tradedata['importer'].unique())  # number of countries (we have the same number of importers and exporters)
nbk = 4  # number of regressors we are interested in using 

tradedata.head()

Unnamed: 0,exporter,importer,year,trade,DIST,ln_DIST,CNTG,LANG,CLNY
0,ARG,ARG,1986,61288.590263,533.90824,6.280224,0,0,0
1,ARG,AUS,1986,27.764874,12044.574134,9.39637,0,0,0
2,ARG,AUT,1986,3.559843,11751.146521,9.371706,0,0,0
3,ARG,BEL,1986,96.102567,11305.285764,9.333026,0,0,0
4,ARG,BGR,1986,3.129231,12115.572046,9.402246,0,0,0


Let's extract the data from the table and format it using multidimensional tensors. Note that we are storing both absolute and normalized trade flows, as we need the former (properly cleaned of the flows within a country) to normalize the latter. 

In [3]:
years = tradedata['year'].unique()  # array of calendar years reported in the data
distances = np.array(['ln_DIST', 'CNTG', 'LANG', 'CLNY'])  # array of trade "distances" we are interested in using as regressors

D_x_y_t_k = np.zeros((nbi,nbi,nbt,nbk)) # Initialize an empty tensor of dimensions nbi x nbi x nbt x nbk to store distances
tradevol_x_y_t = np.zeros((nbi,nbi,nbt)) # Initialize empty tensor nbi x nbi x nbt to store trade volume
muhat_x_y_t = np.zeros((nbi,nbi,nbt)) # Initialize empty tensor nbi x nbi x nbt to store normalized trade flows

# fill tensors with distance, contiguity, language, and colony variables, as well as the trade flow
for t, year in enumerate(years):
    tradevol_x_y_t[:, :, t] = np.array(tradedata.loc[tradedata['year'] == year, 'trade']).reshape((nbi, nbi))  # store trade flows
    np.fill_diagonal(tradevol_x_y_t[:, :, t], 0)  # set to zero the trade within a country; we will repeat this operation within the estimation functions
    for k, distance in enumerate(distances):
        D_x_y_t_k[:, :, t, k] = np.array(tradedata.loc[tradedata['year'] == year, distance]).reshape((nbi, nbi))  # store distances

# normalize and store trade flows
muhat_x_y_t = tradevol_x_y_t / (tradevol_x_y_t.sum() / len(years))

We define a class `GravityModel` to store the data relevant for estimation. Later on we will populate it with our estimation methods.

In [4]:
class GravityModel():
    def __init__(self, muhat_x_y_t, D_x_y_t_k):
        self.nbi, _, self.nbt, self.nbk = D_x_y_t_k.shape  # number of countries, periods, and regressors
        self.muhat_x_y_t = muhat_x_y_t    # tensor of trade flows over time
        self.D_x_y_t_k = D_x_y_t_k  # tensor of bilateral resistances in each time period 

We will solve this model by fixing a $\beta$ and solving the matching problem using IPFP. Then in an outer loop we will solve for the $\beta$ which minimizes the distance between model and empirical moments, where the optimization is based on gradient descent:

In [5]:
def fit_ipfp(self, sigma = 1, maxiterIpfp = 1000, maxiter = 500, tolIpfp = 1e-12, tolDescent = 1e-6, t_s = 0.03):

    iterCount = 0
    contIter = True
    meanD_k =self.D_x_y_t_k.mean(axis=(0,1,2))
    sdD_k = self.D_x_y_t_k.std(axis=(0, 1), ddof = 1).mean(axis = 0)
    D_x_y_t_k = (self.D_x_y_t_k-meanD_k[None,None,None,:]) / sdD_k[None,None,None,:]
    n_x_t = self.muhat_x_y_t.sum(axis=1)
    m_y_t = self.muhat_x_y_t.sum(axis=0)

    beta_k = np.zeros(self.nbk)

    ptm = time.time()
    while(contIter):
        iterCount += 1
        thegrad = np.zeros(nbk)
        for t in range(self.nbt):
            v_y = np.zeros(self.nbi)
            D_xy_k = D_x_y_t_k[:,:,t,:].reshape((-1,self.nbk))
            K_x_y = np.exp(D_xy_k @ beta_k / sigma).reshape((self.nbi,self.nbi))
            np.fill_diagonal(K_x_y, 0)  # no self-flow: having already exponentiated Phi, we set the diagonal to zero
            contIpfp = True
            iterIpfp = 0
            
            while(contIpfp):
                iterIpfp += 1
                u_x = sigma * np.log(  ( K_x_y @  np.exp(-v_y / sigma)  ) / n_x_t[:,t] ).flatten() 
                v_y = sigma * np.log(  ( np.exp(-u_x / sigma) @ K_x_y   ) / m_y_t[:,t] ).flatten()
                mu_x_y = (K_x_y *  np.exp(-(u_x[:,None] +v_y[None,:] ) / sigma))
                if (np.max(np.abs(   mu_x_y.sum(axis=1) /  n_x_t[:,t] - 1)) < tolIpfp or iterIpfp >= maxiterIpfp):
                    contIpfp = False
                    #print(iterIpfp)
            thegrad = thegrad + ((mu_x_y - self.muhat_x_y_t[:, :, t]).flatten().dot(D_xy_k)).flatten()
        beta_k = beta_k - t_s * thegrad
        #print(beta_k)
        
        if (iterCount > maxiter or np.sum(np.abs(thegrad)) < tolDescent):  # measure distance against value of the problem
            contIter = False

    diff = time.time() - ptm
    print('Time elapsed = ', diff, 's.')

    return np.asarray(beta_k / sdD_k).round(3)

GravityModel.fit_ipfp = fit_ipfp

Let's test this solution method by initializing an instance of the `GravityModel` class with the data from Yotov et al. :

In [6]:
trade_yotov = GravityModel(muhat_x_y_t, D_x_y_t_k)

Let's run our estimation method:

In [7]:
trade_yotov.fit_ipfp()

Time elapsed =  2.7788071632385254 s.


array([-0.841,  0.437,  0.247, -0.222])

We recover the PPML estimates on Table 1 p. 42 of [Yotov et al.'s book](https://www.wto.org/english/res_e/booksp_e/advancedwtounctad2016_e.pdf).

We now proceed to show how this problem can be recast as an instance of Poisson regression with fixed effects.

---

### Poisson regression with fixed effects

Let $\theta=\left(  \beta,u,v\right)  $ and $Z=\left(  \phi,D^{x},D^{y}\right)  $ where $D_{x^{\prime}y^{\prime}}^{x}=1\left\{  x=x^{\prime}\right\}  $ and $D_{x^{\prime}y^{\prime}}^{y}=1\left\{  y=y^{\prime}\right\}$ are $x$-and $y$-dummies. Let $\lambda_{xy}\left(  Z;\theta\right)  =\exp\left(\theta^{\intercal}Z_{xy}\right)  $ be the parameter of the Poisson distribution.

The conditional likelihood of $\hat{\mu}_{xy}$ given $Z_{xy}$ is

\begin{align*}
l_{xy}\left(  \hat{\mu}_{xy};\theta\right)   &  =\hat{\mu}_{xy}\log \lambda_{xy}\left(  Z;\theta\right)  -\lambda_{xy}\left(  Z;\theta\right) \\
&  =\hat{\mu}_{xy}\left(  \theta^{\intercal}Z_{xy}\right)  -\exp\left(\theta^{\intercal}Z_{xy}\right) \\
&  =\hat{\mu}_{xy}\left(  \sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-u_{x}-v_{y}\right)  -\exp\left(  \sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-u_{x}-v_{y}\right)
\end{align*}

Summing over $x$ and $y$, the sample log-likelihood is

\begin{align*}
\sum_{xy}\hat{\mu}_{xy}\sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-\sum_{x}n_xu_{x}-\sum_{y}m_yv_{y}-\sum_{xy}\exp\left(  \sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-u_{x}-v_{y}\right)
\end{align*}

hence we recover the [objective function](#objFun).

### From Poisson to pseudo-Poisson

If $\mu_{xy}|xy$ is Poisson, then $\mathbb{E}\left[\mu_{xy}\right]=\lambda_{xy}\left(  Z_{xy};\theta\right)  =\mathbb{V}ar\left(  \mu_{xy}\right)  $. While it makes sense to assume the former equality, the latter is a rather strong assumption.

For estimation purposes, $\hat{\theta}$ is obtained by

\begin{align*}
\max_{\theta}\sum_{xy}l\left(  \hat{\mu}_{xy};\theta\right)  =\sum_{xy}\left(\hat{\mu}_{xy}\left(  \theta^{\intercal}Z_{xy}\right)  -\exp\left(\theta^{\intercal}Z_{xy}\right)  \right)
\end{align*}

however, for inference purposes, one shall not assume the Poisson distribution. Instead

\begin{align*}
\sqrt{N}\left(  \hat{\theta}-\theta\right)  \Longrightarrow\left(A_{0}\right)  ^{-1}B_{0}\left(  A_{0}\right)  ^{-1}
\end{align*}

where $N=\left\vert \mathcal{X}\right\vert \times\left\vert \mathcal{Y}\right\vert $ and $A_{0}$ and $B_{0}$ are estimated by

\begin{align*}
\hat{A}_{0}  &  =N^{-1}\sum_{xy}D_{\theta\theta}^{2}l\left(  \hat{\mu}_{xy};\hat{\theta}\right)  =N^{-1}\sum_{xy}\exp\left(  \hat{\theta}^{\intercal}Z_{xy}\right)  Z_{xy}Z_{xy}^{\intercal}\\
\hat{B}_{0}  &  =N^{-1}\sum_{xy}\left(  \hat{\mu}_{xy}-\exp\left(  \hat{\theta}^{\intercal}Z_{xy}\right)  \right)  ^{2}Z_{xy}Z_{xy}^{\intercal}.
\end{align*}

## Implementation: Poisson regression with fixed effects

We now introduce an additional method to recover the coefficients of interest through Poisson regression. Notice that this approach naturally recovers the fixed effects too, although in this application we are not directly interested in them. 

Consistent with common practice and as previously described, we do not want to consider the trade flow between a country and itself. To incorporate the restriction in the Poisson regression, we use weights and assign zero weight to the trade flow between a country and itself. We can then recover the Poisson objective function expressed in matrix formulation by appropriately stacking the matrix of bilateral resistancies and the matrices of fixed effects.

In [8]:
def fit_glm(self, verbosity=0, max_iter = 8000, tol=1e-12, pretest = False):
    """
    fit_glm(args) estimates the gravity equation via weighted Poisson regression.
    """

    kr = spr.kron   # shorthand to implement the Kronecker product

    M1 = kr(spr.identity(self.nbi), kr(np.ones((self.nbi, 1)), spr.identity(self.nbt)))
    M2 = kr(np.ones((self.nbi, 1)), kr(spr.identity(self.nbi), spr.identity(self.nbt)))
    C_a_k = spr.hstack([self.D_x_y_t_k.reshape((-1, self.nbk)), -M1, -M2])
    muhat_a = self.muhat_x_y_t.flatten()

    weighting_matrix_xyt = kr(np.eye(self.nbi**2), np.ones((self.nbt, 1)))@(np.ones((self.nbi, self.nbi)) - np.eye(self.nbi)).flatten()

    clf = linear_model.PoissonRegressor(fit_intercept=False, tol=tol , max_iter=max_iter, verbose=verbosity, alpha=0)
    clf.fit(C_a_k, muhat_a, sample_weight=weighting_matrix_xyt)

    return clf.coef_[:self.nbk].round(3)

GravityModel.fit_glm = fit_glm

Solving the model using the newly added method:

In [9]:
estimates_glm = trade_yotov.fit_glm()
estimates_glm

array([-0.841,  0.438,  0.248, -0.223])

Again, we recover the same estimates as in the book by Yotov et al. 