# <span style="color:darkblue">netivreg: Estimation of Peer Efects in Endogenous Social Networks</span> 

Pablo Estrada, Juan Estrada, Kim P. Huynh, David T. Jacho-Chavez and Leonardo Sanchez-Aragon

<font size="3"> 
    
The command **netivreg** implements the following estimators for endogenous linear-in-means models:
    
- Generalized Two-Stage Least Squares (G2SLS)
- Generalized Three-Stage Least Squares (G3SLS)
- Generalized Method of Moments (GMM)

# Start Stata from Python

In [None]:
import stata_setup
#stata_setup.config('/Applications/Stata','mp')   # for macOS
stata_setup.config('/usr/local/stata17','mp')   # for ubuntu

# Simulated Data

<font size="3"> 
    
We use the following version of the linear-in-means model:

$$
\begin{align}
y_{i}=&1+0.7\sum_{j=1}^n\overline{w}_{ij}y_{j}+0.33\sum_{j=1}^n\overline{w}_{ij}x_{1i}+0.33\sum_{j=1}^n\overline{w}_{ij}x_{2i}+0.33\sum_{j=1}^n\overline{w}_{ij}x_{3i}\nonumber\\ 
 &+0.33x_{1i}+0.33x_{2i}+0.33x_{3i}+v_{i}\text{,}\label{sim_equation}
\end{align}
$$

where $x_{ki}$ are drawn from an i.i.d. $\sim N(0,3)$ for $k=1,2,3$, which are independent of each other. The weights $\overline{w}_{ij}$ are row-normalized versions of the adjacency matrix $\mathbf{W}=[w_{ij}]$, i.e., $\overline{w}_{ij}=w_{ij}/\sum_{j=1}^n w_{ij}$. 
    
The $\mathbf{W}$ adjacency matrix is generated from $\mathbf{W}_0=[w_{0;ij}]$ which in turn is generated from random graph with a density of 0.01.

In [None]:
%%stata
use ../data/stata/data_sim.dta, replace
format y_endo y_exo x1 x2 x3 x4 %9.3f
list in 1/5, table 

In [None]:
%%stata
use ../data/stata/W_sim.dta, replace
list in 113/117, table 

In [None]:
%%stata
use ../data/stata/W0_sim.dta, replace
list in 113/117, table 

# 1. G2SLS

## 1.1. Assumptions

<br>

<font size="3">

1. No correlated effects: $\mathbb{E}[\mathbf{v}|\mathbf{X}, \mathbf{W}] = 0$
    
2. Relevance: $\mathbf{I}$, $\mathbf{W}$, $\mathbf{W}^{2}$ and $\gamma\beta + \delta \neq 0$

## 1.2. Estimation

<br>

<font size="3">

- **Step 1:**
    
Regress $\mathbf{y}$ on $\boldsymbol{\iota}$, $\mathbf{X}$, $\mathbf{W}\mathbf{y}$, and $\mathbf{W}\mathbf{X}$ by 2SLS using $\boldsymbol{\iota}$, $\mathbf{X}$, $\mathbf{W}^{2}\mathbf{X}$, and $\mathbf{W}\mathbf{X}$ as instruments. 
    
- **Step 2:**

Regress $\mathbf{y}$ on $\boldsymbol{\iota}$, $\mathbf{X}$, and $[\widehat{\mathbf{W}\mathbf{y}} \quad \widehat{\mathbf{W}\mathbf{X}}]$ by IV using $\boldsymbol{\iota}$, $\mathbf{X}$, and $[\widehat{\mathbf{Z}} \quad \widehat{\mathbf{W}\mathbf{X}}]$ as instruments, where $\widehat{\mathbf{Z}}$ is the optimal instrument.

In [None]:
%%stata
use ../data/stata/data_sim.dta, replace
frame create edges
frame edges: use ../data/stata/W_sim.dta
netivreg g3sls y_exo x1 x2 x3 x4 (edges = edges) 

In [None]:
%%stata
netivreg g3sls y_endo x1 x2 x3 x4 (edges = edges) 

# 2. G3SLS

## 2.1. Assumptions

<br>

<font size="3">

1. No correlated effects: $\mathbb{E}[\mathbf{v}|\mathbf{X}, \mathbf{W}_0] = 0$
    
2. Relevance: Relevance: $\mathbf{I}$, $\mathbf{W}_{0}$, $\mathbf{W}_{0}^{2}$ and $\gamma\boldsymbol{\pi}_{1}^{\top}\theta+\boldsymbol{\pi}_{2}^{\top}\theta \neq 0$

## 2.2. Estimation

<br>

<font size="3">

- **Step 1:**

Regress $\mathbf{W}\mathbf{S}$ on $\mathbf{W}_0\mathbf{S}$ by OLS. Get $\widehat{\mathbf{W}\mathbf{S}}$ and $\widehat{\mathbf{U}}$.
    
    
    
- **Step 2:**
    
    
Regress $\mathbf{y}$ on $\boldsymbol{\iota}$, $\mathbf{X}$, $\mathbf{W}_{0}\mathbf{y}$, and $\mathbf{W}_{0}\mathbf{X}$ by 2SLS using $\boldsymbol{\iota}$, $\mathbf{X}$, $\mathbf{W}_{0}^{2}\mathbf{X}$, and $\mathbf{W}_{0}\mathbf{X}$ as instruments. From the equation 
    
    
$$\widehat{\alpha}_{\text{2SLS}}\boldsymbol{\iota}+\widehat{\theta}^\ast_{\text{2SLS}}\mathbf{W}_{0}\mathbf{S}+\widehat{\gamma}_{\text{2SLS}}\mathbf{X},$$
    
    
    
get the parameters $[\widehat{\alpha}_{\text{2SLS}},\widehat{\gamma}_{\text{2SLS}},\widehat{\theta}_{\text{2SLS}}]^{\top}$ where $\widehat{\boldsymbol{\theta}}_{\text{2SLS}} = \widehat{\Pi}^{-1}\widehat{\theta}^\ast_{\text{2SLS}}$.
    
    
- **Step 3:**
    
    

Regress $\mathbf{y}$ on $\boldsymbol{\iota}$, $\mathbf{X}$, and $\widehat{\mathbf{W}\mathbf{S}}$ by IV using $\boldsymbol{\iota}$, $\mathbf{X}$, and $[\widehat{\mathbf{Z}}\quad \mathbf{W}_{0}\mathbf{X}]\widehat{\boldsymbol{\Pi}}$ as instruments, where $\widehat{\mathbf{Z}}=\mathbf{W}_{0}\left(\mathbf{I}-(\widehat{\boldsymbol{\pi}}_1^\top\widehat{\boldsymbol{\theta}}_{\text{2SLS}})\mathbf{W}_{0}\right)^{-1}\left\{\widehat{\alpha}_{\text{2SLS}}\boldsymbol{\iota}+\left(\widehat{\gamma}_{\text{2SLS}}\mathbf{I}+(\widehat{\boldsymbol{\pi}}_2^\top\widehat{\boldsymbol{\theta}}_{\text{2SLS}})\mathbf{W}_{0}\right)\mathbf{X}\right\}$. From the equation
    
    
    
$$\widehat{\alpha}_{\text{G3SLS}}\boldsymbol{\iota}+\widehat{\theta}_{\text{G3SLS}}\mathbf{W}\mathbf{S}+\widehat{\gamma}_{\text{G3SLS}}\mathbf{X},$$
    
    
    
get the parameters $[\widehat{\alpha}_{\text{G3SLS}},\widehat{\gamma}_{\text{G3SLS}},\widehat{\beta}_{\text{G3SLS}},\widehat{\delta}_{\text{G3SLS}}]^{\top}$ and the residuals $\widehat{\mathbf{v}}\equiv\mathbf{y}-\widehat{\alpha}_{\text{G3SLS}}\boldsymbol{\iota}-\widehat{\gamma}_{\text{G3SLS}}\mathbf{X}-\widehat{\beta}_{\text{G3SLS}}\mathbf{W}\mathbf{y}-\widehat{\delta}_{\text{G3SLS}}\mathbf{W}\mathbf{X}$.

In [None]:
%%stata
frame create edges0
frame edges0: use ../data/stata/W0_sim.dta
netivreg g3sls y_endo x1 x2 x3 x4 (edges = edges0)

In [None]:
%%stata
netivreg g3sls y_endo x1 x2 x3 x4 (edges = edges0), first

In [None]:
%%stata
netivreg g3sls y_endo x1 x2 x3 x4 (edges = edges0), second

# 3. GMM

## 3.1. Assumptions

<br>

<font size="3">
    
We define the matrices $\mathbf{D}=[\mathbf{W}\mathbf{y}, \mathbf{W}\mathbf{X}, \widetilde{\mathbf{X}}]$ to be the matrix of regressors of the linear-in-means model and $\mathbf{Z}=[\mathbf{W}_{N,0}^{p}\mathbf{X},\mathbf{W}_{N,0}^{p-1}\mathbf{X},\dots,\mathbf{W}_{N,0}\mathbf{X}, \widetilde{\mathbf{X}}]$ to be the matrix of *instruments*.

1. No correlated effects: $\mathbb{E}[\mathbf{v}|\mathbf{X}, \mathbf{W}_0] = 0$    
2. the conditional probability $\mathcal{F}(\mathcal{G}, \mathbf{v}\mid \mathcal{G}_{0},\mathbf{X})$ is such that $\Pr(w_{i,j}>0|\mathcal{G}_{0},\mathbf{X})=\rho(w_{0;i,j},\mathcal{G}_{0},\mathbf{X},\mathbf{v})$
3. The matrix $\mathbb{E}[\sum_{i \in \mathcal{I}_{N}}\mathbf{z}_{i}\mathbf{d}_{i}^{\top}]<\infty$ has full column rank.

## 3.2. Estimation

<br>

<font size="3">

Standard linear GMM estimator:

$$\widehat{\boldsymbol{\psi}}_{\text{GMM}} =  [\mathbf{D}_{n}^{\top}\mathbf{Z}_{n}\mathbf{A}_{n} \mathbf{Z}_{n}^{\top}\mathbf{D}_{n}]^{-1} [\mathbf{D}_{n}^{\top}\mathbf{Z}_{n}\mathbf{A}_{n}\mathbf{Z}_{n}^{\top}\mathbf{y}_{n}]$$

where the full rank weighting matrix $\mathbf{A}_{n}$ is assumed to converge in probability to $\mathbf{A}$

In [None]:
%%stata
netivreg gmm y_endo x1 x2 x3 x4 (edges = edges0)

# Real Data

In [None]:
%%stata
use data/articles.dta, replace
describe

In [None]:
%%stata
gen citations = exp(lcitations)
tabulate journal year, summarize(citations)

In [None]:
%%stata
summarize editor diff_gender isolated n_pages n_authors n_references

In [None]:
%%stata
frame reset
frame create edges
frame edges: use data/edges.dta
frame edges: list in 1/5, table 

In [None]:
%%stata
frame create edges0
frame edges: use data/edges0.dta
frame edges: list in 1/5, table 

In [None]:
%%stata
use data/articles.dta
tabulate journal, g(journal)
tabulate year, g(year)

## Real Data Analysis 

<br>

<font size="3">
    
    
$$
\begin{align}
  y_{i,r,t} ={}&\alpha+\beta\sum_{j \neq i}w_{i,j}y_{j,r,t}+\sum_{j \neq i}w_{i,j}\mathbf{x}_{j,r,t}^{\top}\boldsymbol{\delta}+ \mathbf{x}_{i,r,t}^{\top}\boldsymbol{\gamma}\nonumber\\
  & +\lambda_{r}+\lambda_{t}+v_{i,r,t}\text{,}
  \label{modelo}
\end{align}
$$

- $y_{i,r,t}$ represents the natural logarithm of article $i$'s citations eight years post publication (lcitations) in journal $r$ in year $t$.
- $\mathbf{x}_{j,r,t}$ includes diff_gender and editor of article $j$ in journal $r$ in year $t$.
- $\mathbf{x}_{i,r,t}$ include the same characteristics for article $i$ plus its number of pages (n_pages), authors (n_authors), bibliographic references (n_references), and whether or not it shares a co-author relationship with other articles (isolated).
- Fixed effects include journal ($\lambda_{r}$) and year ($\lambda_t$).
- Co-authors' network ($\mathbf{W}$) is endogenous and the alumni network ($\mathbf{W_0}$) is pre-determined and is therefore assumed to be exogenous.

In [None]:
%%stata
use ../data/stata/articles.dta, replace
describe

In [None]:
%%stata
gen citations = exp(lcitations)
tabulate journal year, summarize(citations)

In [None]:
%%stata
summarize editor diff_gender isolated n_pages n_authors n_references

In [None]:
%%stata
frame create edges1
frame edges1: use ../data/stata/edges.dta
frame edges1: list in 1/5, table 

In [None]:
%%stata
frame create edges2
frame edges2: use ../data/stata/edges0.dta
frame edges2: list in 1/5, table 

In [None]:
%%stata
use ../data/stata/articles.dta, replace
tabulate journal, g(journal)
tabulate year, g(year)

In [None]:
%%stata
netivreg g3sls lcitations editor diff_gender n_pages n_authors n_references ///
isolated journal2-journal4 year2-year3 (edges1 = edges2), ///
wx(editor diff_gender) cluster(c_coauthor)

In [None]:
%%stata
netivreg gmm lcitations editor diff_gender n_pages n_authors n_references /// 
isolated journal2-journal4 year2-year3 (edges1 = edges2), wx(editor diff_gender) ///
wz(editor diff_gender n_pages n_authors n_references isolated) maxp(4)