# <span style="color:darkblue">netivreg: Estimation of Peer Efects in Endogenous Social Networks</span> 

Pablo Estrada, Juan Estrada, Kim P. Huynh, David T. Jacho-Chavez and Leonardo Sanchez-Aragon

# Stata Command: netivreg

<font size="3"> 
    
The command **netivreg** implements the following estimators for endogenous linear-in-means models:
    
- Generalized Two-Stage Least Squares (G2SLS)
- Generalized Three-Stage Least Squares (G3SLS)
- Generalized Method of Moments (GMM)

# Simulated Data

<font size="3"> 
    
We use the following version of the linear-in-means model:

$$
\begin{align}
y_{i}=&1+0.7\sum_{j=1}^n\overline{w}_{ij}y_{j}+0.33\sum_{j=1}^n\overline{w}_{ij}x_{1i}+0.33\sum_{j=1}^n\overline{w}_{ij}x_{2i}+0.33\sum_{j=1}^n\overline{w}_{ij}x_{3i}\nonumber\\ 
 &+0.33x_{1i}+0.33x_{2i}+0.33x_{3i}+v_{i}\text{,}\label{sim_equation}
\end{align}
$$

where $x_{ki}$ are drawn from an i.i.d. $\sim N(0,3)$ for $k=1,2,3$, which are independent of each other. The weights $\overline{w}_{ij}$ are row-normalized versions of the adjacency matrix $\mathbf{W}=[w_{ij}]$, i.e., $\overline{w}_{ij}=w_{ij}/\sum_{j=1}^n w_{ij}$. 
    
The $\mathbf{W}$ adjacency matrix is generated from $\mathbf{W}_0=[w_{0;ij}]$ which in turn is generated from random graph with a density of 0.01.

In [None]:
use ../data/stata/data_sim.dta, replace
format y_endo y_exo x1 x2 x3 x4 %9.3f
list in 1/5, table 

In [2]:
use ../data/stata/W_sim.dta, replace
list in 113/117, table 




     +-----------------+
     | source   target |
     |-----------------|
113. |     28      259 |
114. |     28      361 |
115. |     29       67 |
116. |     29       79 |
117. |     29      196 |
     +-----------------+


In [3]:
use ../data/stata/W0_sim.dta, replace
list in 113/117, table 




     +-----------------+
     | source   target |
     |-----------------|
113. |     30      167 |
114. |     30      325 |
115. |     31       38 |
116. |     31       83 |
117. |     31      132 |
     +-----------------+


# 1. G2SLS

## 1.1. Assumptions

<font size="3">

1. No correlated effects: $\mathbb{E}[\mathbf{v}|\mathbf{X}, \mathbf{W}] = 0$
    
2. Relevance: $\mathbf{I}$, $\mathbf{W}$, $\mathbf{W}^{2}$ and $\gamma\beta + \delta \neq 0$

## 1.2. Estimation

<br>

<font size="3">

- **Step 1:**
    
Regress $\mathbf{y}$ on $\boldsymbol{\iota}$, $\mathbf{X}$, $\mathbf{W}\mathbf{y}$, and $\mathbf{W}\mathbf{X}$ by 2SLS using $\boldsymbol{\iota}$, $\mathbf{X}$, $\mathbf{W}^{2}\mathbf{X}$, and $\mathbf{W}\mathbf{X}$ as instruments. 
    
- **Step 3:**

Regress $\mathbf{y}$ on $\boldsymbol{\iota}$, $\mathbf{X}$, and $[\widehat{\mathbf{W}\mathbf{y}} \quad \widehat{\mathbf{W}\mathbf{X}}]$ by IV using $\boldsymbol{\iota}$, $\mathbf{X}$, and $[\widehat{\mathbf{Z}} \quad \widehat{\mathbf{W}\mathbf{X}}]$ as instruments, where $\widehat{\mathbf{Z}}$ is the optimal instrument.

In [4]:
use ../data/stata/data_sim.dta, replace
frame create edges
frame edges: use ../data/stata/W_sim.dta
netivreg g3sls y_exo x1 x2 x3 x4 (edges = edges) 






Network IV (G3SLS) Regression                         Number of obs =      400
                                                      Wald chi2(10)  =  2021.41
                                                      Prob > chi2   =   0.0000
                                                      R-squared     =   0.8571
                                                      Root MSE      =     .966
------------------------------------------------------------------------------
       y_exo | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
W_y          |
       y_exo |   .7187389   .0459317    15.65   0.000     .6284341    .8090436
-------------+----------------------------------------------------------------
W_x          |
          x1 |   .3628661   .0614882     5.90   0.000     .2419763    .4837559
          x2 |   .3197861    .051635     6.19   0.000     .2182683     .421304
          x3 |  

In [5]:
netivreg g3sls y_endo x1 x2 x3 x4 (edges = edges) 


Network IV (G3SLS) Regression                         Number of obs =      400
                                                      Wald chi2(10)  =  1797.04
                                                      Prob > chi2   =   0.0000
                                                      R-squared     =   0.8379
                                                      Root MSE      =    1.104
------------------------------------------------------------------------------
      y_endo | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
W_y          |
      y_endo |   .9775033   .0571161    17.11   0.000     .8652093    1.089797
-------------+----------------------------------------------------------------
W_x          |
          x1 |   .1499146   .0678751     2.21   0.028     .0164676    .2833615
          x2 |   .1307811    .058853     2.22   0.027     .0150721      .24649
          x3 |    .2

# 2. G3SLS

## 2.1. Assumptions

<font size="3">

1. No correlated effects: $\mathbb{E}[\mathbf{v}|\mathbf{X}, \mathbf{W}_0] = 0$
    
2. Relevance: Relevance: $\mathbf{I}$, $\mathbf{W}_{0}$, $\mathbf{W}_{0}^{2}$ and $\gamma\boldsymbol{\pi}_{1}^{\top}\theta+\boldsymbol{\pi}_{2}^{\top}\theta \neq 0$

## 2.2. Estimation

<br>

<font size="3">

- **Step 1:**

Regress $\mathbf{W}\mathbf{S}$ on $\mathbf{W}_0\mathbf{S}$ by OLS. Get $\widehat{\mathbf{W}\mathbf{S}}$ and $\widehat{\mathbf{U}}$.
    
    
    
- **Step 2:**
    
    
Regress $\mathbf{y}$ on $\boldsymbol{\iota}$, $\mathbf{X}$, $\mathbf{W}_{0}\mathbf{y}$, and $\mathbf{W}_{0}\mathbf{X}$ by 2SLS using $\boldsymbol{\iota}$, $\mathbf{X}$, $\mathbf{W}_{0}^{2}\mathbf{X}$, and $\mathbf{W}_{0}\mathbf{X}$ as instruments. From the equation 
    
    
$$\widehat{\alpha}_{\text{2SLS}}\boldsymbol{\iota}+\widehat{\theta}^\ast_{\text{2SLS}}\mathbf{W}_{0}\mathbf{S}+\widehat{\gamma}_{\text{2SLS}}\mathbf{X},$$
    
    
    
get the parameters $[\widehat{\alpha}_{\text{2SLS}},\widehat{\gamma}_{\text{2SLS}},\widehat{\theta}_{\text{2SLS}}]^{\top}$ where $\widehat{\boldsymbol{\theta}}_{\text{2SLS}} = \widehat{\Pi}^{-1}\widehat{\theta}^\ast_{\text{2SLS}}$.
    
    
- **Step 3:**
    
    

Regress $\mathbf{y}$ on $\boldsymbol{\iota}$, $\mathbf{X}$, and $\widehat{\mathbf{W}\mathbf{S}}$ by IV using $\boldsymbol{\iota}$, $\mathbf{X}$, and $[\widehat{\mathbf{Z}}\quad \mathbf{W}_{0}\mathbf{X}]\widehat{\boldsymbol{\Pi}}$ as instruments, where $\widehat{\mathbf{Z}}=\mathbf{W}_{0}\left(\mathbf{I}-(\widehat{\boldsymbol{\pi}}_1^\top\widehat{\boldsymbol{\theta}}_{\text{2SLS}})\mathbf{W}_{0}\right)^{-1}\left\{\widehat{\alpha}_{\text{2SLS}}\boldsymbol{\iota}+\left(\widehat{\gamma}_{\text{2SLS}}\mathbf{I}+(\widehat{\boldsymbol{\pi}}_2^\top\widehat{\boldsymbol{\theta}}_{\text{2SLS}})\mathbf{W}_{0}\right)\mathbf{X}\right\}$. From the equation
    
    
    
$$\widehat{\alpha}_{\text{G3SLS}}\boldsymbol{\iota}+\widehat{\theta}_{\text{G3SLS}}\mathbf{W}\mathbf{S}+\widehat{\gamma}_{\text{G3SLS}}\mathbf{X},$$
    
    
    
get the parameters $[\widehat{\alpha}_{\text{G3SLS}},\widehat{\gamma}_{\text{G3SLS}},\widehat{\beta}_{\text{G3SLS}},\widehat{\delta}_{\text{G3SLS}}]^{\top}$ and the residuals $\widehat{\mathbf{v}}\equiv\mathbf{y}-\widehat{\alpha}_{\text{G3SLS}}\boldsymbol{\iota}-\widehat{\gamma}_{\text{G3SLS}}\mathbf{X}-\widehat{\beta}_{\text{G3SLS}}\mathbf{W}\mathbf{y}-\widehat{\delta}_{\text{G3SLS}}\mathbf{W}\mathbf{X}$.

In [6]:
frame create edges0
frame edges0: use ../data/stata/W0_sim.dta
netivreg g3sls y_endo x1 x2 x3 x4 (edges = edges0)





Network IV (G3SLS) Regression                         Number of obs =      400
                                                      Wald chi2(10)  =   822.26
                                                      Prob > chi2   =   0.0000
                                                      R-squared     =   0.8176
                                                      Root MSE      =    1.194
------------------------------------------------------------------------------
      y_endo | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
W_y          |
      y_endo |   .7059194   .0934719     7.55   0.000     .5221476    .8896911
-------------+----------------------------------------------------------------
W_x          |
          x1 |   .3464024   .1277675     2.71   0.007     .0952031    .5976017
          x2 |   .3280795   .0870187     3.77   0.000     .1569951    .4991639
          x3 |   

In [7]:
netivreg g3sls y_endo x1 x2 x3 x4 (edges = edges0), first


Projection of W on W0
------------------------------------------------------------------------------
             | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
W_y_endo     |
   W0_y_endo |   .9927161   .0110994    89.44   0.000     .9708939    1.014538
       W0_x1 |   .0038403   .0419735     0.09   0.927    -.0786823     .086363
       W0_x2 |  -.0030277   .0372046    -0.08   0.935    -.0761744     .070119
       W0_x3 |   .0010825   .0378073     0.03   0.977    -.0732492    .0754142
       W0_x4 |    .002885   .0664465     0.04   0.965    -.1277531     .133523
-------------+----------------------------------------------------------------
W_x1         |
   W0_y_endo |  -.0924276   .0041002   -22.54   0.000    -.1004889   -.0843663
       W0_x1 |   .8743972   .0155054    56.39   0.000     .8439126    .9048817
       W0_x2 |   .0074888   .0137437     0.54   0.586    -.0195322    .0345098

In [8]:
netivreg g3sls y_endo x1 x2 x3 x4 (edges = edges0), second



2SLS Regression                                       Number of obs =      400
                                                      Wald chi2(10)  =   839.20
                                                      Prob > chi2   =   0.0000
                                                      R-squared     =   0.8095
                                                      Root MSE      =    1.224
------------------------------------------------------------------------------
      y_endo | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
W_y          |
      y_endo |   .6568384   .1030623     6.37   0.000     .4542113    .8594655
-------------+----------------------------------------------------------------
W_x          |
          x1 |   .3798965   .1172551     3.24   0.001     .1493653    .6104278
          x2 |   .3556006   .0846906     4.20   0.000     .1890934    .5221079
          x3 |   .3

# 3. GMM

## 3.1. Assumptions

<font size="3">
    
We define the matrices $\mathbf{D}=[\mathbf{W}\mathbf{y}, \mathbf{W}\mathbf{X}, \widetilde{\mathbf{X}}]$ to be the matrix of regressors of the linear-in-means model and $\mathbf{Z}=[\mathbf{W}_{N,0}^{p}\mathbf{X},\mathbf{W}_{N,0}^{p-1}\mathbf{X},\dots,\mathbf{W}_{N,0}\mathbf{X}, \widetilde{\mathbf{X}}]$ to be the matrix of *instruments*.

1. No correlated effects: $\mathbb{E}[\mathbf{v}|\mathbf{X}, \mathbf{W}_0] = 0$    
2. the conditional probability $\mathcal{F}(\mathcal{G}, \mathbf{v}\mid \mathcal{G}_{0},\mathbf{X})$ is such that $\Pr(w_{i,j}>0|\mathcal{G}_{0},\mathbf{X})=\rho(w_{0;i,j},\mathcal{G}_{0},\mathbf{X},\mathbf{v})$
3. The matrix $\mathbb{E}[\sum_{i \in \mathcal{I}_{N}}\mathbf{z}_{i}\mathbf{d}_{i}^{\top}]<\infty$ has full column rank.

## 3.2. Estimation

<br>

<font size="3">

Standard linear GMM estimator:

$$\widehat{\boldsymbol{\psi}}_{\text{GMM}} =  [\mathbf{D}_{n}^{\top}\mathbf{Z}_{n}\mathbf{A}_{n} \mathbf{Z}_{n}^{\top}\mathbf{D}_{n}]^{-1} [\mathbf{D}_{n}^{\top}\mathbf{Z}_{n}\mathbf{A}_{n}\mathbf{Z}_{n}^{\top}\mathbf{y}_{n}]$$

where the full rank weighting matrix $\mathbf{A}_{n}$ is assumed to converge in probability to $\mathbf{A}$

In [9]:
netivreg gmm y_endo x1 x2 x3 x4 (edges = edges0)


Network IV (GMM) Regression                           Number of obs =      400
                                                      Wald chi2(10)  =  5414.09
                                                      Prob > chi2   =   0.0000
                                                      R-squared     =   0.8166
                                                      Root MSE      =    1.194
------------------------------------------------------------------------------
      y_endo | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
W_y          |
      y_endo |   .7039281    .076149     9.24   0.000     .5542141     .853642
-------------+----------------------------------------------------------------
W_x          |
          x1 |   .3493545   .0781754     4.47   0.000     .1956565    .5030524
          x2 |   .3245619   .0617699     5.25   0.000     .2031183    .4460055
          x3 |   .36

# Real Data

In [10]:
use ../data/stata/articles.dta, replace
describe


(Data on articles published in the aer, eca, jpe, & qje between 2000-2002)


Contains data from ../data/stata/articles.dta
 Observations:           729                  Data on articles published in
                                                the aer, eca, jpe, & qje between
                                                2000-2002
    Variables:            12                  12 Sep 2020 14:09
--------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
id              int     %9.0g                 Article unique identifier
lcitations      float   %9.0g                 Log of total citations 8 years
                                                post publication
editor          int     %8.0g                 1 if at least one of the article's
                                      

In [11]:
gen citations = exp(lcitations)
tabulate journal year, summarize(citations)




          Means, Standard Deviations and Frequencies of citations

Journal=ae |
r,eca,jpe, |      Year=2000,2001,2002
       qje |      2000       2001       2002 |     Total
-----------+---------------------------------+----------
       aer | 52.417721  54.931821  48.652174 | 51.934364
           | 73.653308  90.712233  49.893607 | 72.800192
           |        79         88         92 |       259
-----------+---------------------------------+----------
       eca | 49.627451  43.328125  37.177779 | 42.195122
           | 52.045351  51.688336  43.565161 | 48.397466
           |        51         64         90 |       205
-----------+---------------------------------+----------
       jpe | 34.530612  32.863637  46.666667 | 38.141844
           | 26.257619  30.229811  35.717172 | 31.362075
           |        49         44         48 |       141
-----------+---------------------------------+----------
       qje | 59.380952  72.285714    102.475 | 77.653226
           |  74.70297  

In [12]:
summarize editor diff_gender isolated n_pages n_authors n_references


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      editor |        729    .0452675     .208033          0          1
 diff_gender |        729    .1303155    .3368814          0          1
    isolated |        729    .5281207    .4995513          0          1
     n_pages |        729    25.15775    11.53631          3         76
   n_authors |        729    1.888889    .7486251          1          5
-------------+---------------------------------------------------------
n_references |        729    31.40329    17.84755          0        177


In [13]:
frame reset
frame create edges
frame edges: use ../data/stata/edges.dta
frame edges: list in 1/5, table 




(Co-authorship network among articles published in the aer, eca, jpe, & qje betw
> e)


     +-----------------+
     | source   target |
     |-----------------|
  1. |      4      472 |
  2. |      5      221 |
  3. |      5      463 |
  4. |      5      478 |
  5. |      5      665 |
     +-----------------+


In [14]:
frame create edges0
frame edges: use ../data/stata/edges0.dta
frame edges: list in 1/5, table 



(Alumni network among articles published in the aer, eca, jpe, & qje between 200
> 0)


     +-----------------+
     | source   target |
     |-----------------|
  1. |      2      482 |
  2. |      2      534 |
  3. |      4      129 |
  4. |      4      136 |
  5. |      4      407 |
     +-----------------+


In [15]:
use ../data/stata/articles.dta
tabulate journal, g(journal)
tabulate year, g(year)


(Data on articles published in the aer, eca, jpe, & qje between 2000-2002)


Journal=aer |
,eca,jpe,qj |
          e |      Freq.     Percent        Cum.
------------+-----------------------------------
        aer |        259       35.53       35.53
        eca |        205       28.12       63.65
        jpe |        141       19.34       82.99
        qje |        124       17.01      100.00
------------+-----------------------------------
      Total |        729      100.00


Year=2000,2 |
   001,2002 |      Freq.     Percent        Cum.
------------+-----------------------------------
       2000 |        221       30.32       30.32
       2001 |        238       32.65       62.96
       2002 |        270       37.04      100.00
------------+-----------------------------------
      Total |        729      100.00


In [16]:
netivreg g3sls lcitations editor diff_gender n_pages n_authors n_references ///
                          isolated journal2-journal4 year2-year3 (edges = edges0), ///
                          wx(editor diff_gender) cluster(c_coauthor)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 19, in calc_netivreg
  File "/Users/pablo/opt/anaconda3/lib/python3.9/site-packages/networkx/linalg/g
> raphmatrix.py", line 157, in adjacency_matrix
    return nx.to_scipy_sparse_matrix(G, nodelist=nodelist, dtype=dtype, weight=w
> eight)
  File "/Users/pablo/opt/anaconda3/lib/python3.9/site-packages/networkx/convert_
> matrix.py", line 868, in to_scipy_sparse_matrix
    raise nx.NetworkXError(f"Node {n} in nodelist is not in G")
networkx.exception.NetworkXError: Node 1 in nodelist is not in G


r(7102);



