# Symbolic Computation of Wage Estimates (directed search on outcome data)

James Yu, 20 September 2021

This Jupyter notebook computes the estimated wages of the departments in the four-type VSE-EJM network estimation model. It implements the system of equations outlined at the end of [the first reading](https://montoya.econ.ubc.ca/papers/markets/notes.pdf).

We start by loading the data required to extrapolate two particular variables:

In [1]:
using JSON

In [2]:
type_metrics = JSON.Parser.parsefile("type_metrics.json")
placement_rates = type_metrics["generic"]
do_not_print = 1

1

The first of the two variables is the success rate $\rho_s$ representing the probability $Q_s$ by which a firm with wage $w_s$ hires a worker. For an arbitrary department with type $s$, their success rate is the average success rate of all the type $s$ departments. In this particular setup we assume all the departments of type $s$ act as one unit with wage $w_s$, meaning the success rate is the fraction of all applicants that ended up being hired at $w_s$ divided by the total number of applicants.

The number of applicants hired at $w_s$ is the sum of hires into $s$ from each of the four possible academic types:

In [3]:
success_rates = [0 0 0 0]
for i in 1:4
    println(placement_rates[i]["total_from"], ": total hires = ", sum(placement_rates[i]["total_from"]))
    success_rates[i] = sum(placement_rates[i]["total_from"])
end
println()
print("hire counts = ", success_rates)

Any[568, 109, 23, 10]: total hires = 710
Any[586, 258, 87, 21]: total hires = 952
Any[762, 681, 350, 58]: total hires = 1851
Any[136, 197, 94, 88]: total hires = 515

hire counts = [710 952 1851 515]

The four stacked arrays represent the number of applicants hired at each $w_s$ after graduating from each academic type on the horizontal.

The total number of applicants is the sum of all the graduates. These include the sinks, which do not show up in the "from" data because sinks do not graduate individuals. The "to" data, however, has them:

In [4]:
total_applicants = 0
for i in 1:4
    println(placement_rates[i]["total_to"], ": total graduates = ", sum(placement_rates[i]["total_to"]))
    total_applicants += sum(placement_rates[i]["total_to"])
end
println()
print("total applicants = ", total_applicants)

Any[568, 586, 762, 136, 306, 358, 460, 434]: total graduates = 3610
Any[109, 258, 681, 197, 273, 274, 225, 587]: total graduates = 2604
Any[23, 87, 350, 94, 170, 117, 89, 435]: total graduates = 1365
Any[10, 21, 58, 88, 87, 23, 11, 152]: total graduates = 450

total applicants = 8029

The four stacked arrays here represent the number of applicants that graduated from each academic type $s$ and were hired at $w_s$ in either the academic types or the sinks, for a total of eight routes.

It follows that $\rho_s$ is the rates divided by the total number of applicants:

In [5]:
rho = success_rates / total_applicants

1×4 Matrix{Float64}:
 0.0884294  0.11857  0.230539  0.0641425

We can determine as well what percentage of applicants were unsuccessful in finding a position:

In [6]:
sum(rho) # this is the fraction of successful applicants

0.5016814049072114

In [7]:
1 - sum(rho) 
# this is what we expect to be the fraction of unsuccessful applicants, 
# that ended up in the sinks

0.49831859509278864

or through a direct measurement method:

In [8]:
(total_applicants - sum(success_rates)) / total_applicants 
# this is the number of applicants not hired at an academic type,
# divided by the total number of applicants
# it should be the exact same number as 1 - sum(rho)

0.49831859509278864

Now that we have $\rho_s$, the second of the two variables to find is the probabilities $F(x_s)$ that a worker has a type less than or equal to $s$. We do this here by simply counting the number of such workers, or in this case applicants.

In this case counting refers to observing the number of graduates that graduated from a type $s$ department.

In [9]:
F = [total_applicants 0 0 0 0]
counter = total_applicants
for i in 1:4
    counter -= sum(placement_rates[i]["total_to"])
    F[i+1] = counter
end   
F = F / total_applicants

1×5 Matrix{Float64}:
 1.0  0.55038  0.226056  0.0560468  0.0


To read this, we observe $F(x_1) = 1$, $F(x_2) = 0.91$, etc.

Now, we can solve for the system of equations which provide estimates for the wages $w_s$. A version of this is (0.7) in the original reading, but here we derive it from scratch so it aligns better with what will eventually be the code.

We found $p_s$, which is the success rate of an average firm of each type. We defined $p_s = Q_s(w_1, \dots, w_s)$ as the probability that a firm of type $s$ hires an applicant.

This means $Q_s = 1 - P(\text{the firm does not hire anyone}) = 1 - P(\text{nobody applies to the firm}) = 1 - (P(\text{a particular applicant does not apply to the firm}))^m$ where $m$ is the number of applicants. We know $m = 8029$ by the above results.

Thus, $Q_s = 1 - (1 - P(\text{a particular applicant applies to the firm}))^m$. The question now is what $P(\text{a particular applicant applies to the firm})$ is. Denote this unknown as $R_s$.

Since the type of any one applicant is not known, we take this value as the linear combination of application probabilities for every type. Mathematically:

$$R_s = \sum_{\text{all types t}} \big[(F(x_t) - F(x_{t+1})) \cdot P(x_t \text{ applies to } w_s)\big]$$

Instead of deriving the probability inside the equation for $R_s$, we can construct a lookup table for the probability that $x_t$ applies to $w_s$. This is simply the equilibrium strategies from the table on page 2 of the reading. In particular, since the number of types here is four, our table would look like:

$$\begin{bmatrix}
x_1: & \pi_1 & 0 & 0 & 0 \\
x_2: & (1 - \pi_2)\pi_1 & \pi_2 & 0 & 0 \\
x_3: & (1 - \pi_3)(1 - \pi_2)\pi_1 & (1 - \pi_3)\pi_2 & \pi_3 & 0 \\
x_4: & (1 - \pi_4)(1 - \pi_3)(1 - \pi_2)\pi_1 & (1 - \pi_4)(1 - \pi_3)\pi_2 & (1 - \pi_4)\pi_3 & \pi_4
\end{bmatrix}$$

where $\pi_1 = 1$ is included to show the structural patterns in the table. This allows the following four equations to be constructed:

$$Q_1 = 1 - \bigg[1 - \big[(F(x_1) - F(x_2))(\pi_1) + (F(x_2) - F(x_3))(1-\pi_2)(\pi_1) + (F(x_3) - F(x_4))(1 - \pi_3)(1 - \pi_2)(\pi_1) + (F(x_4) - F(x_5))(1 - \pi_4)(1 - \pi_3)(1 - \pi_2)(\pi_1)\big]\bigg]^m$$

$$Q_2 = 1 - \bigg[1 - \big[(F(x_2) - F(x_3))(\pi_2) + (F(x_3) - F(x_4))(1 - \pi_3)(\pi_2) + (F(x_4) - F(x_5))(1 - \pi_4)(1 - \pi_3)(\pi_2)\big]\bigg]^m$$

$$Q_3 = 1 - \bigg[1 - \big[(F(x_3) - F(x_4))(\pi_3) + (F(x_4) - F(x_5))(1 - \pi_4)(\pi_3)\big]\bigg]^m$$

$$Q_4 = 1 - \bigg[1 - \big[(F(x_4) - F(x_5))(\pi_4)\big]\bigg]^m$$

This is the system of equations from which we can solve for $\pi_s$, which then allows us to solve for wages.

Most importantly, however, is that because $Q_4$ is a function of only $\pi_4$, we can back-substitute to retrieve all values of $\pi_s$.

$$1 - (1 - Q_4)^\frac{1}{m} = F(x_4)\pi_4$$

$$\pi_4 = \frac{1 - (1 - Q_4)^\frac{1}{m}}{F(x_4)}$$

From here it would be efficient to obtain numerical results for back-substitution. We do the following:

In [10]:
println(rho)
println(F)

[0.08842944326815294 0.11857018308631212 0.2305392950554241 0.06414248349732221]
[1.0 0.5503798729605182 0.2260555486361938 0.056046830240378626 0.0]


In [11]:
m = total_applicants
π_4 = (1 - (1 - rho[4])^(1/m)) / F[4]

0.00014731503639711047

We know:

$$1 - (1 - Q_3)^\frac{1}{m} = (F(x_3) - F(x_4))(\pi_3) + (F(x_4))(1 - \pi_4)(\pi_3)$$

$$1 - (1 - Q_3)^\frac{1}{m} = \pi_3((F(x_3) - F(x_4)) + (F(x_4))(1 - \pi_4))$$

In [12]:
π_3 = (1 - (1 - rho[3])^(1/m)) / (F[3] - F[4] * π_4)

0.00014439156354594202

Next:

$$1 - (1 - Q_2)^\frac{1}{m} = (F(x_2) - F(x_3))(\pi_2) + (F(x_3) - F(x_4))(1 - \pi_3)(\pi_2) + (F(x_4))(1 - \pi_4)(1 - \pi_3)(\pi_2)$$

$$1 - (1 - Q_2)^\frac{1}{m} = \pi_2\big[(F(x_2) - F(x_3)) + (F(x_3) - F(x_4))(1 - \pi_3) + (F(x_4))(1 - \pi_4)(1 - \pi_3)\big]$$

In [13]:
π_2 = (1 - (1 - rho[2])^(1/m)) / (F[2] - F[3]*π_3 + F[4]*π_4*(1-π_3))

2.856177827987146e-5

$$1 - (1 - Q_1)^\frac{1}{m} = \pi_1\big[(F(x_1) - F(x_2)) + (F(x_2) - F(x_3))(1-\pi_2) + (F(x_3) - F(x_4))(1 - \pi_3)(1 - \pi_2) + (F(x_4))(1 - \pi_4)(1 - \pi_3)(1 - \pi_2)\big]$$

In [14]:
π_1 = (1 - (1 - rho[1])^(1/m)) / ((F[1] - F[2]) + (F[2] - F[3])*(1 - π_2) + (F[3] - F[4])*(1 - π_3)*(1 - π_2) + F[4]*(1 - π_4) * (1 - π_3) * (1 - π_2))

1.1532069840717093e-5

This last one is wrong, however, because we need $\pi_1 = 1$.

Now we check our answers. $Q_s$ is:

In [15]:
rho

1×4 Matrix{Float64}:
 0.0884294  0.11857  0.230539  0.0641425

So we use this to compare.

In [16]:
1 - (1 - ((F[1] - F[2])*(π_1) + (F[2] - F[3])*(1-π_2)*(π_1) + (F[3] - F[4])*(1 - π_3)*(1 - π_2)*(π_1) + (F[4])*(1 - π_4)*(1 - π_3)*(1 - π_2)*(π_1)))^m

0.08842944326775415

In [17]:
1 - (1 - ((F[2] - F[3])*(π_2) + (F[3] - F[4])*(1 - π_3)*(π_2) + (F[4])*(1 - π_4)*(1 - π_3)*(π_2)))^m

0.1185668456919472

In [18]:
1 - (1 - ((F[3] - F[4])*(π_3) + (F[4])*(1 - π_4)*(π_3)))^m

0.2305392950550832

In [19]:
1 - (1 - ((F[4])*(π_4)))^m

0.06414248349711116

So everything resolves. Thus, we now have $\pi_s$, so we can solve for the wages. Recall from the reading that the formula for the wages is equation (0.5):

$$\pi_s = \frac{1}{1 + (\frac{w_s}{w_{s-1}})^\frac{1}{m-1}}$$

We can extract the wages by doing the following sequence of operations:

$$\frac{1}{\pi_s} = 1 + (\frac{w_s}{w_{s-1}})^\frac{1}{m-1}$$

$$(1 - \frac{1}{\pi_s})^{m-1} = \frac{w_s}{w_{s-1}}$$

This provides us with a sequence of ratios.

In [20]:
(1 - (1/π_4))

-6787.173321997799

In [21]:
(1 - (1/π_3))

-6924.612379575234

In [22]:
(1 - (1/π_2))

-35010.825601375

In [23]:
(1 - (1/1))

0.0

But:

In [24]:
(1 - (1/π_4))^(m-1)

Inf

This is infinity, which is a problem.