<a href="https://colab.research.google.com/github/joshxin/ceml/blob/main/peer_effects_paper_structure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**"The dissertation will include only the chapter on co-worker effects. It should be a stand-alone piece of work with a clearly specified contribution to the literature. It should simply acknowledge that the clustering algorithm is developed in a different paper, and it seems sufficient to briefly summarize it."**

I acknowledge that the hierachical clustering (HC) algorithm (defined as 4.1) is developed from Hagedorn, Manovskii and Xin (2021) (henceforce, the ML paper). I propose a plan to clarify the methodology contribution as follows:

1. In **"Section 1. Introduction - related literature"**, I will use a dedicated paragraph to discuss the relationship between the method used in the dissertation and the ML paper. In specific:
  
  * Points taken from the ML paper:
    * The vision: similar workers get similar wages in similar firms.
    * The idea of using hierarchical clustering (HC) to cluster workers in a hierarchical order based on their similarity. 
    * The insight of integrating a fast recommender system algorithm with HC to achieve better computational performance.

  * Modifications/innovations made in the dissertation to apply HC to the framework with coworker effects:
    * The dissertation uses a different similarity metrics (the "average point-in-time wage differences", also in **Section 3.1**), while the ML paper uses "M-similarity" (i.e. two workers are similiar if only if observed meeting in two different firms, getting similar within-firm wages, and at two different levels between the two firms.)
    * The disseration uses a different design of HC. The goal is to cluster workers only, while the goal for the ML paper is to bi-cluster both workers and firms. 
    * The disseration uses a different identification assumption to incorpate coworker effects, requiring wages to be monotonously increasing in worker productivity. It establishes a different set of theoretic results for identification.
    
  
2. I plan to use **Section 3.1** to document these modifications/innovations for the worker clustering stage in details, and **Section 4.2** to document the specific usage of HC algorithm and the relationship to the ML paper in details.


**"The only innovation here seems to be in using Graph Neural Network instead of a different graph embedding method in the other paper. To assess the implications of this innovation, it would be good to document whether and in which aspects (speed, accuracy...) this method improves over the previous one and by how much."**

3. I developed a new machine learning method to cluster workers using GCN method with Amazon DGL library from sratch. The method independently solves worker clustering (defined in **Section 4.1**). I plan to document GCN algorithm in **Section 4.3** in details.

5. I integrate GCN with HC to achieve better computational performance following the ML paper. I plan to document the integrated algorithm (GCN+HC) in **Section 4.4** in details.

6. I plan to use the following evaluation metrics the to access the performance of the proposed methods (GCN and GCN+HC) over the previous one (HC), and document the simulation results in **Section 5.3**:

  * **Metric 1**: The figure of the confusion matrix of the clustering (i.e. the $x_i$-$\hat{x}_i$ figure; the closer to 45-degree the better.)

  * **Metric 2**: The figure of the estimated spillover function implied by the worker clustering v.s. the ground truth. I report the standard error (RMSE) of the estimated spillover (as % of standard deviation of total log wages).

  * **Metric 3**: RMSE of out-of-sample wage prediction for job switchers on the test set (as % of standard deviation of total log wages).

  * **Metric 4**: Algorithm Runtime (seconds).


**The true contribution seems to be in the second step, i.e. measuring co-worker effects. To help frame the paper and make its contribution clear, it would seem important to provide a clear economic rationalization of what is wrong with the existing methods and how exactly the new method overcomes those limitations.**


7. The disseration developes methods that allows to measure heterogenous and asymmetric coworker spillover. I will propose the method in **Section 3.2**.

8. I show existing methods relying on additive worker and firm fixed effects are subject to a misspecification bias with the presence of wage complementarity. I will present the economic rationalization in **Appendix A3.1**, illustrate the bias with a simple 2x2 toy example in **Appendix A3.2**, assess the size of the bias with calibrated simulation **Appendix A3.3**.

**It would be desirable that the design of numerical experiments documenting the performance of alternative methods reflects the relevant features of the data, at least approximately.**

9. In **Section 5**, I propose a specific plan to assess the performance for the proposed methods (HC, GCN, GCN+HC) comparing to that of the existing methods with four evalutation metrics with empirically relevant simulations.  



# OUTLINE: MEASURING THE EFFECT OF COWORKERS ON WAGES
 
## 1. Introduction  

* Motivation
 
* Challenges

  * **Selection bias**: coworker spillover confounded by unobserved worker and firm characteristics.
  * **Wage complementarities**: additive worker and firm fixed effects biased (selection bias -> misspecification bias).
  * **Heterogenous spillovers**: Difficult to estimate heterogeneous coworker effects between workers that are unobservedly different.  
 
* This paper:

  * Identifies unobserved worker characteristics by clustering workers with similar productive attributes.
  * Estimates the spillover effects of coworkers accounting for sorting and complementarities in wages, based on the clustering.
  * Can measures asymmetric and heterogenous spillover effects based on unobserved coworker characteristics.

* **Related literature**:

  * **Literature 1**: Identifying unobserved heterogeneity in sorting and wage complentarities:
    * Relationship with Hagedorn, Manovskii, and Xin ('21) 
    * Relationship with the "grouped fixed effects" by Bonhomme, Lamadon and Manresa ('21)
  * **Literature 2**: Coworker effects based on unobserved worker heterogeneity.
    * Homogeneous coworker effects: Arcidiacono et al. ('12), Cornelissen ('17) 
    * Asymmetric spillover: Herkenhoff et. al. ('18), Jarosh et. al. ('19)

## 2. Econometric Framework 

#### 2.1 Environment (Matched Employer-Employee Data)

* For worker $i\in [N]\equiv\{1,...,N\}$, firm  $j\in[M]$, time $t\in[T]$: match indicator $m_{ift}\in\{0,1\}$, wages $w_{ift}>0$ when $m_{ift}=1$, set of coworkers ${\bf p}_{ift}\in [N]^{n}$; worker $i$ has $n\equiv |{\bf p}_{ift}|\ge 1$ coworkers in $f$ at $t$.

* Worker types $x_i\in [K]$, firm types $y_f$, coworker types $\mathbf{z}_{\mathbf{p}_{ift}}\equiv \{x_j\}_{j\in\mathbf{p}_{ift}} \in [K]^{n}$.

* Sorting and complementarities in wages (backed by a principal-agent model proposed in **Appendix A1**)

$$m_{ift} \sim d(x_i,y_f),\ \ w_{ift} \sim  g_n(x_i,y_f;\mathbf{z}_{\mathbf{p}_{ift}})$$


* The goal: estimate counterfactual effects of coworkers on wages

$$\mathbb{E}w_{ift}(\mathbf{z}) = g_n\big(x_i,y_f, \text{do}(\mathbf{z}_{\mathbf{p}_{ift}}=\mathbf{z})\big)$$

#### 2.2 Discussions

* Link with methods measuring unobserved heterogeneity, complementarities, and coworker effects in wages.


## 3. Method 

### 3.1 Identification (i.e. with  known distribution of the data)

#### 3.1.1 Identifying the types of workers 

* **Average wage differences** ("point-in-time comparison"):
$$D_{i,j} = \underset{{f\in[M],t\in[T]}}{\text{mean}}\  m_{ift} \  m_{jft} \  (w_{ift} - w_{jft}) $$

* **Assignment**: Assign $i,j$ into the same cluster $c_i=c_j$ if and only if $|D_{i,j}|<\kappa$.

* **Assumption (A1) "Single-crossing"**: $\frac{\partial w(x,y)}{\partial x}>0$; **"Homophilly"**: $\mathbb{E}\big[\sum_{ft} m_{ift}\ m_{jft}\big]>\frac{\log |{\bf x}_k|}{|{\bf x}_k|}$, $\forall\ i,j\in {\bf x}_k$.

* **Proposition (P1) "Identification"**: $\forall\ \epsilon, \ \kappa>0, \exists\ \tau\ \ \text{s.t.}\ \ \forall\ \ N>0,\ i,j\in [N], \ T>\tau: $
$$\mathbb{P}\big[x_i=x_j \big|\ |D_{i,j}|<\kappa\big]>1-\epsilon,\ \ \mathbb{P}\big[c_i=c_j\ \big|\ x_i=x_j\big]=1-O\big(|{\bf x}_k|^{-1}\big).$$





#### 3.1.2 Identifying the causal effect of coworkers

* **Assumption (A2)**: Conditional unconfoundness:

$$w_{ift}(\mathbf{z}) \perp \mathbf{z}\ |\ x_i, y_f.$$

* **Assumption (A3)**: Separable complementarity and spillover:

$$g_n(x_i,y_f;\mathbf{z}_{\mathbf{p}_{ift}})= w(x_i,y_f) +\frac{1}{n} \sum_{j\in \mathbf{p}_{ift}} s(x_i, x_j),$$

$$\underset{{j\in[N],f\in[M],t\in[T]}}{\text{mean}}\  m_{ift}\ m_{jft} \ {s}(x_i,x_j)=0.$$

* **Proposition (P2) "Causal identification"**: spillover effects $s(x,x')$ is identified.

### 3.2 Estimation

##### 3.2.1 **Step 1. Clustering**

* Use **ML algorithms** to group similar workers $\hat{x}_i\in [K]$.

##### 3.2.2 **Step 2. Estimation**

* Estimate the **complementarities** and **coworker spillover** in wages using grouped fixed effects (GFE) estimator based on worker clustering: 

$$\hat{w}(\hat{x}_i, \hat{y}_f),\ \ \hat{s}(\hat{x}_i, \hat{x}_j)$$

* Estimate the **average spillover**

$$\hat{a}(x_j) =  \underset{{j\in[N],f\in[M],t\in[T]}}{\text{mean}}\   m_{ift}\ m_{jft} \ \hat{s}(x_i,x_j)$$


## 4. ML Algorithms

### 4.1 Define ML algorithm: $\mathcal{A}: D \rightarrow \mathcal{C}.$

  * Input: average wage differences:
  $$D=\{D_{i,j}\}_{(i,j)\in[N]^2}.$$
  * Output: worker clustering:
  $$\mathcal{C}=\{\hat{x}_i\}_{i\in[N]}.$$

### 4.2 The Baseline Algorithm: Hierachical clustering (HC) 

#### 4.2.1 Full definition of HC algorithm.

#### 4.2.2 The relationship with HBC algorithm used in HMX ('21).

### 4.3 The Alternative Algorithm: Graph Convolutional Network (GCN)

I propose GCN as an alternative ML algorithm to solve problem in Section 4.1. Comparing to the baseline HC:
* GCN is memory efficient and scalable.
* GCN is less accuarate.

### 4.4 The Integrated Algorithm (GCN+HC)

* I integrate GCN with the baseline HC for better computational performance. 

## 5. Simulation Results

### 5.1 Data Generating Process

To demostrate the analytical and computational performance of the method, I simulated data from the economic model (Appendix A1): 

$$m_{ift}\sim \underbrace{d(x_i,y_f)}_{\text{Calibrated by SS ('00) model}},$$

$$w_{ift}\sim \underbrace{w\big(x_i,y_f\big)}_{\text{Calibrated by SS ('00) model}} +\underbrace{\frac{1}{|\mathbf{p}_{ift}|} \sum_{j\in \mathbf{p}_{ift}} s(x_i, x_j)}_{\text{Calibrated by CDS ('17)'s $\hat{\gamma}=\frac{Cov(\text{worker FE}, \text{coworker FE})}{Var(\text{worker FE}, \text{coworker FE})}$}}.$$



### 5.2 Calibration
#### 5.1.1 The calibration will match distribution of observed wages $w_{ift}$ and matches $m_{ift}$ to Shimmer and Smith ('00) model $$\big(d(x,y), \ w(x,y)\big)$$ using Hagedorn et al. ('17)'s calibration for German data.

|Parameter |  &nbsp;&nbsp;&nbsp; Notations &nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Calibrations &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|
|-------:|:-------:|:-----:|
|Production function (PAM) | $f(x_i,y_f)$ | $0.6+0.4(x_i^{0.5}+y_f^{0.5})^2$ | 
|Meeting function (CRS)| $m(u,v)$ | $0.4u^{0.5}v^{0.5}$ | 
|Worker distribution (Uniform) |  $d_w$ |   $\mathcal{U}[0,1]$ | 
|Firm distribution (Uniform) |  $d_f$ |   $\mathcal{U}[0,1]$ | 
|Discount factor | $β$ |  $0.996$ |
|Separation rate| $δ$| $0.01$|
|Worker's bargaining weight |$α$ | $0.5$ |

#### 5.1.2 Calibrate coworker spillover 

* I test the performance of the algorithm for four specifications for $s(x,x')$:

|Parameter|     &nbsp;&nbsp;&nbsp;&nbsp;Notations&nbsp;&nbsp;&nbsp;&nbsp;  | &nbsp;&nbsp;&nbsp;&nbsp;Option 1: Zero &nbsp;&nbsp;&nbsp;&nbsp;     | &nbsp;Option 2: Linear &nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;Option 3: Quadratic&nbsp;&nbsp;&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp;&nbsp;Option 4: Asymmetric  &nbsp;&nbsp;&nbsp;&nbsp;| 
|---:|:---:|:-----:|:----:|:---:| :-------:|
|Spillover function | $s(x_i,x_j)$ | $0$ | $0.02\times x'$ | $-0.1\times (x'-0.6)^2$|$0.1\times\max(x'-x,0)^2$ |

* Each of **Option 1-4** is motivated by the empirical literature. Each is calibrated such that the simulated data recover the Cornelissen et al. ('17) (CDS) estimator $\hat{\gamma}=0.011$. The simulation approximately recovers other unmatched moments for German Social Security Data reported by Cornelissen et al. ('17):

|Moments|  &nbsp;&nbsp; Notations    &nbsp;&nbsp;    |  Model        | Data / Estimate   | Source | Targeted |
|-------:|:-------:|:-----|:-------:|:--:|:---:|
|Coworker spillover (CDS Estimator) | $\hat \gamma$ | $0.011$ | $0.011$ | CDS ('17) | Yes |
|Standard deviation worker fixed effect |  $\hat\sigma({\alpha}_i)$ | $0.45$ | $0.32$ | CDS ('17) | No |
|Standard deviation average peer fixed effect |  $\hat\sigma({\bar\alpha}_{ift})$  |  $0.26$ |$0.24$ |CDS ('17) | No |
|Correlation worker / average peer fixed effect|  $\hat\rho({\alpha}_i, {\bar\alpha}_{ift})$ | $0.55$| $0.64$ | CDS ('17) | No |
|Workers to peer groups ratio | $N_i/N_{p}$ | $1.9$ | $1.7$ | CDS ('17) | No |
|Unemployement rate | $U$ | $0.033$ |  | | No |
|Job finding rate (monthly)|  $\lambda$ | $0.28$ | | | No |



* I test the method for three scales of simulations

|Parameter|     &nbsp;&nbsp;&nbsp;&nbsp;Notations&nbsp;&nbsp;&nbsp;&nbsp;  | &nbsp;&nbsp;&nbsp;&nbsp;Scale 1: Small &nbsp;&nbsp;&nbsp;&nbsp;     | &nbsp;&nbsp;Scale 2: Medium&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;Scale 3: Large&nbsp;&nbsp;&nbsp;&nbsp; | 
|---:|:---:|:-----:|:----:|:---:|
|Number of workers | $N_i$ | $50,000$ | $100,000$ | $500,000$|
|Number of workers per firm | $N_i/N_f$ | $50$ |  | |
|Number of discretized worker types on $[0,1]$ | $N_x$ | $5,000$ | | |
|Number of discretized firm types on $[0,1]$ | $N_f$ | $500$ | | |


### 5.3 Evaluation metrics

To assess the performance of the proposed method, I report **4 evaluation metrics** for each of the 12 simulations (coworker spillover **Option 1-4** by simulation **Scale 1-3**):

* **Metric 1**: The $x_i$-$\hat{x}_i$ plot for worker clustering (a.k.a. the confusion matrix). The demo is for "quadratic" spillover and 500,000 workers.
<img src='https://drive.google.com/uc?id=1IYg071udWZbv8H1BEzMykjnR-tvDDtNa'>
* **Metric 2**: Estimated average spillover function $\hat{a}(\hat{x}_j)$. The demo is for "quadratic" spillover and 500,000 workers. 
<img src='https://drive.google.com/uc?id=1bsV5jL5n_5vzgo2zYBDstslH7XPSJ1np'>
Estimated spillover function $\hat{s}(\hat{x}_i, \hat{x}_j)$. The demo is for "asymmetric" spillover and 500,000 workers. 
<img src='https://drive.google.com/uc?id=1PZ2bxuUy8-L4KKSE6KbF9rRwdEfayPm5'>
RMSE of the estimated spillover (as % of standard deviation of total log wages).  The demo is for "quadratic" spillover at all scales.

$$\begin{align}
 RMSE(s, \hat{s})=  \left(\ \underset{(i,f,t,w_{ift}) \in \mathcal{D}^\text{train}}{\text{mean}}  \  \ m_{ift}\ m_{jft}\  \left(\hat{s}(\hat{x}_i, \hat{x}_j)-s(x_i,x_j) \right)^2\ \ \right)^{1/2}.
\end{align}$$


| Scale|&nbsp;&nbsp;HC&nbsp;&nbsp;|&nbsp;&nbsp;GCN&nbsp;&nbsp;|&nbsp;&nbsp;GCN-10&nbsp;&nbsp;|&nbsp;&nbsp;HC (+GCN)&nbsp;&nbsp;|&nbsp;&nbsp;CDS&nbsp;&nbsp;|
|---:|:--:|:--:|:--:|:--:|:--:|
|Small (50,000 workers)|0.928|3.23|5.06|1.12|14|
|Medium (100,000 workers)| | | | | |
|Large (500,000 workers)|1.05|3.53|5.2|1.01|14|


* **Metric 3**: RMSE of out-of-sample wage prediction $\hat{w}_{ift}$ based on clustering (as % of standard deviation of total log wages). The demo is for "quadratic" spillover at all scales.

$$\begin{align}
 RMSE(w,\ \hat{w})=   \left(\ \ \underset{(i,f,t,w_{ift}) \in \mathcal{D}^\text{valid}}{\text{mean}} \ \ m_{ift}\  \left(\hat{w}_{ift}-w_{ift}\right)^2\ \ \right)^{1/2}.
\end{align}$$

| Scale |&nbsp;&nbsp;HC&nbsp;&nbsp;|&nbsp;&nbsp;GCN&nbsp;&nbsp;|&nbsp;&nbsp;GCN-10&nbsp;&nbsp;|&nbsp;&nbsp;HC (+GCN)&nbsp;&nbsp;|&nbsp;&nbsp;CDS&nbsp;&nbsp;|
|---:|:--:|:--:|:--:|:--:|:--:|
| Small (50,000 workers)|4.51|12.7|16.7|4.69|6.34|
| Medium (100,000 workers)| | | | | |
|Large (500,000 workers)|4.54|14.7|18.6|4.57|6.33|


* **Metric 4**: Runtime (seconds). The demo is for "quadratic" spillover at all scales. 

|Scale|&nbsp;&nbsp;HC&nbsp;&nbsp;|&nbsp;&nbsp;GCN&nbsp;&nbsp;|&nbsp;&nbsp;GCN-10&nbsp;&nbsp;|&nbsp;&nbsp;HC (+GCN)&nbsp;&nbsp;|&nbsp;&nbsp;CDS&nbsp;&nbsp;|
|---:|:--:|:--:|:--:|:--:|:--:|
|Small (50,000 workers)|898|31.6|21.5|704|11.1|
|Medium (100,000 workers)| | | | | |
|Large (500,000 workers)|19212|442.84|220.74|16423|126.73|


## 6. Conclusion





# Appendix 
 
## A1 The Economic Model 
### A1.1 Overview 
### A1.2 Setup 
### A1.3 Equilibrium Outcome

* Wages: $w_{ift} = w(x_i,y_f) +\frac{1}{|\mathbf{p}_{ift}|} \sum_{j\in \mathbf{p}_{ift}} s(x_i, x_j)$

* Sorting: $m_{ift} \sim d(x_i,y_f)$

## A2 Control Observed Characteristics

## A3 Sorting, Complementarities, and Coworker Effects

### A3.1

Why sorting wage complementarities lead to bias in the estimated spillover in CDS(and other methods relying on additive worker and firm fixed effects)?

* Additive fixed effects model predicts identical firm premium for all job switchers.
* With the presence of wage complementarties, "firm premium" are lower farther from the optimal match. The difference in firm effects systematically underpredict (overpredict) the wage increase for the good (bad) switcher moving upward to a better firm, leading to a positive (negative) "AKM residual" $w_{ift}-\hat{\alpha}_i - \hat{\psi}_f$.
* With the presence of sorting, the job switcher meets better coworkers moving upward.
* The spurious positive (negative) correlation between change in the "AKM residual" and average coworker quality witnessed by the job switcher
* The sign is undetermined.

### A3.2 Illustrate with a simple 2x2 example
* Workers $x\in\{l, h\}$, two firms $y\in\{L,H\}$;
  * Sorting is PAM, i.e. $H$ firms has more $h$ workers than $L$ firms;
  * Wage complementarities: $(w_{hH}-w_{lH})-(w_{hL}-w_{lL})>0$; 
  * AKM estimator overpredicts $w_{lH}$, but correctly predicts $w_{lL}, w_{hL}, w_{hH}$;
* Focusing on $l$ workers moving upward from the $L$ to the $H$ firm:
  * With sorting, the change in coworker quality is positive  $\Delta \bar{x}=2/3$;
  * The change in "AKM residual" $\Delta w=-1$;
* Downward bias in estimated spillover due to the spurious negative correlation.

<img src='https://drive.google.com/uc?id=13Swap-_UgyZBzXaq4fQQ92zQcoIlVlSU' style="width: 100px;">



### A3.3 Simulation

* **Metric 2**: Estimated spillover effects v.s. the ground truth (Scale 1-3 x Spillover 1-4). 


* **Metric 3**: RMSE of the estimated spillover (% of standard deviation of total log wages)

|Parameter| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Option 1: Zero &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     | &nbsp;Option 2: Linear (CDS)&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp;Option 3: Quadratic&nbsp;&nbsp;&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp;&nbsp;Option 4: Two-dimensional  &nbsp;&nbsp;&nbsp;&nbsp;| 
|---:|:-----:|:----:|:---:| :-------:|
|Small (50,000 workers)| | | | | |
|Medium (100,000 workers)| | | | | |
|Large (500,000 workers)| | | | | |

## A4 Proofs of Propositions 
### A4.1 Proposition 1 
### A4.2 Proposition 2 

## A5 Computation Methods

### A5.1 A Fast CUDA implementation of a "similarity filter" $(i,j)\in [N]^2$ with $|D_{i,j}|<\delta$ using sparse matrix-matrix multiplication (spmm).


### A5.2 Resolving Disagreement with Depth First Search.








