In [1]:
include("type_allocation_flexible.jl")

using Random, JSON

First we configure the settings:

In [2]:
Random.seed!(0)            # for reproducibility: ensures random results are the same on script restart
YEAR_INTERVAL = 2003:2025  # change this to select the years of data to include in the estimation
NUMBER_OF_TYPES = 4        # change this to select the number of types to classify academic departments into

4

We also need to collect the placements data. For this example we will be using the comprehensive API:

In [3]:
to_from_by_year_api = SBM_flexible.fetch_data("https://support.econjobmarket.org/api/placement_data")

DataStructures.DefaultDict{Any, Any, UnionAll} with 23 entries:
  "2003" => Dict{Any, Any}("17827"=>Dict{String, Any}("to_shortname"=>"Economic…
  "2005" => Dict{Any, Any}("11382"=>Dict{String, Any}("to_shortname"=>"Economic…
  "2014" => Dict{Any, Any}("56926"=>Dict{String, Any}("to_shortname"=>"Business…
  "2018" => Dict{Any, Any}("37159"=>Dict{String, Any}("to_shortname"=>"Dallas o…
  "2020" => Dict{Any, Any}("45907"=>Dict{String, Any}("to_shortname"=>"Economic…
  "2010" => Dict{Any, Any}("5422"=>Dict{String, Any}("to_shortname"=>"Managemen…
  "2016" => Dict{Any, Any}("1886"=>Dict{String, Any}("to_shortname"=>"Cons Fina…
  "2019" => Dict{Any, Any}("43054"=>Dict{String, Any}("to_shortname"=>"Computer…
  "2004" => Dict{Any, Any}("24943"=>Dict{String, Any}("to_shortname"=>"Economic…
  "2007" => Dict{Any, Any}("24350"=>Dict{String, Any}("to_shortname"=>"Economic…
  "2023" => Dict{Any, Any}("57428"=>Dict{String, Any}("to_shortname"=>"Health P…
  "2017" => Dict{Any, Any}("27626"=>Dict{Stri

Auto-recognition of API endpoints is provided assuming that the API URL is either `http` or `https` (and that the endpoint returns a raw list of placements with a `year` field). Otherwise, the endpoint is assumed to be a `json` file at the specified path.

Using the raw placements, we can sort them into academic and sink placements, as well as collect some labels:

In [4]:
academic, academic_to, academic_builder, rough_sink_builder, institution_mapping, reverse_mapping = SBM_flexible.get_builders(to_from_by_year_api, YEAR_INTERVAL);

We are mostly flexible in how we choose to design the set of sinks to include. One exception is teaching universities, which must always be included:

In [5]:
# sink of teaching universities that do not graduate PhDs
# this must be constructed using academic placements, not pre-defined sink placements

teaching_universities = Set() 
for dept_name in academic_to
    if !(dept_name in academic)
        # the department hired an assistant professor but never graduated anyone
        push!(teaching_universities, dept_name)
    end
end

The rest are built by standardized `if` statements:

In [6]:
public_sector = ("Public Sector", Set())
private_sector = ("Private Sector", Set())
other_groups = ("Other Groups", Set())

postdocs = ("Postdocs", Set())
lecturers = ("Lecturers", Set())
other_academic = ("Other Academic", Set())

for outcome in rough_sink_builder
    if outcome["recruiter_type"] == 5 # government institution
        push!(public_sector[2], (string(outcome["to_name"], " ($(public_sector[1]))"), outcome))
    elseif outcome["recruiter_type"] in [6, 7] # private sector: for and not for profit
        push!(private_sector[2], (string(outcome["to_name"], " ($(private_sector[1]))"), outcome))
    elseif outcome["recruiter_type"] == 8 # international organizations, think tanks, assorted
        push!(other_groups[2], (string(outcome["to_name"], " ($(other_groups[1]))"), outcome))

    # some other examples
    # every example here must also have a corresponding sink Set() above, 
    #     and an entry in sinks_to_include below
   
    elseif outcome["postype"] == 6
        # postdocs that are not in the above (i.e. academic; not public, private, or other)
        # please note that the included JSON does not contain postdocs; use the API
        push!(postdocs[2], (string(outcome["to_name"], " ($(postdocs[1]))"), outcome))
    elseif outcome["postype"] in [5, 7]
        # lecturers that are not in the above
        # please note that the included JSON does not contain lecturers; use the API
        push!(lecturers[2], (string(outcome["to_name"], " ($(lecturers[1]))"), outcome))
    else
        # everything else including terminal academic positions
        # this sink can only be constructed as an "else" statement
        push!(other_academic[2], (string(outcome["to_name"], " ($(other_academic[1]))"), outcome))
    
    end
end

# sort to ensure consistent ordering
academic_list = sort(collect(academic))
teaching_list = sort(collect(teaching_universities))
# to be consistent with the original estimation, we only include these additional sinks:
sinks_to_include = (public_sector, private_sector, other_groups)#, postdocs, lecturers, other_academic)

sink_builder, sinks, sink_labels = SBM_flexible.build_sinks(sinks_to_include, teaching_list)

NUMBER_OF_SINKS = length(sink_labels)
numtotal = NUMBER_OF_TYPES + NUMBER_OF_SINKS
institutions = vcat(academic_list, sinks...)
println("$(length(academic_list)) academic departments, $(length(institutions)) total departments")

Including the following sinks:
 Public Sector
 Private Sector
 Other Groups
 Teaching Universities
Total 4 sinks
870 academic departments, 2041 total departments


Next, the adjacency matrix:

In [7]:
length(academic_builder) + length(sink_builder)

12149

In [8]:
out = SBM_flexible.get_adjacency(academic_list, institutions, academic_builder, sink_builder);

Total 12149 Placements (found 12149 by sequence counting, 12149 by matrix sum)


In [9]:
sum(out)

12149

We are now ready to run the SBM itself:

In [10]:
@time est_obj, est_alloc = SBM_flexible.doit(out, length(academic_list), [length(s) for s in sinks], NUMBER_OF_TYPES, numtotal, 500 * (NUMBER_OF_TYPES-2) + 1000)

191.546873 seconds (389.54 k allocations: 20.313 MiB, 0.11% compilation time)


(35487.31919535547, Int32[4, 3, 2, 4, 4, 3, 4, 3, 4, 2  …  8, 8, 8, 8, 8, 8, 8, 8, 8, 8])

In [11]:
placement_rates, counts, sorted_allocation, full_likelihood = SBM_flexible.get_allocation(est_alloc, out, NUMBER_OF_TYPES, numtotal, institutions)

(Int32[1668 201 27 8; 1967 814 126 31; … ; 314 157 68 20; 827 681 259 135], Int32[2116 6900 11224 19780; 6900 22500 36600 64500; … ; 2116 6900 11224 19780; 32798 106950 173972 306590], Int32[4, 3, 2, 4, 4, 3, 4, 3, 4, 2  …  8, 8, 8, 8, 8, 8, 8, 8, 8, 8], -51068.7722563633)

We can explore the results:

In [12]:
placement_rates

8×4 Matrix{Int32}:
 1668  201   27    8
 1967  814  126   31
  681  431  229   32
   32   52   57   77
  987  427  130   50
 1154  356  111   40
  314  157   68   20
  827  681  259  135

In [13]:
placement_rates ./ counts # means

8×4 Matrix{Float64}:
 0.78828    0.0291304    0.00240556   0.000404449
 0.285072   0.0361778    0.00344262   0.00048062
 0.0606736  0.011776     0.00384641   0.000304994
 0.0016178  0.000806202  0.000543271  0.000416441
 0.125477   0.0166472    0.00311571   0.000679995
 0.104095   0.00984786   0.00188763   0.000385989
 0.148393   0.0227536    0.00605845   0.00101112
 0.025215   0.00636746   0.00148875   0.000440327

In [14]:
full_likelihood

-51068.7722563633

In [15]:
SBM_flexible.nice_table(placement_rates, NUMBER_OF_TYPES, NUMBER_OF_SINKS, sink_labels)

┌───────────────────────┬────────┬────────┬────────┬────────┬────────────┐
│[1m                       [0m│[1m Tier 1 [0m│[1m Tier 2 [0m│[1m Tier 3 [0m│[1m Tier 4 [0m│[1m Row Totals [0m│
├───────────────────────┼────────┼────────┼────────┼────────┼────────────┤
│[1m                Tier 1 [0m│   1668 │    201 │     27 │      8 │       1904 │
│[1m                Tier 2 [0m│   1967 │    814 │    126 │     31 │       2938 │
│[1m                Tier 3 [0m│    681 │    431 │    229 │     32 │       1373 │
│[1m                Tier 4 [0m│     32 │     52 │     57 │     77 │        218 │
│[1m         Public Sector [0m│    987 │    427 │    130 │     50 │       1594 │
│[1m        Private Sector [0m│   1154 │    356 │    111 │     40 │       1661 │
│[1m          Other Groups [0m│    314 │    157 │     68 │     20 │        559 │
│[1m Teaching Universities [0m│    827 │    681 │    259 │    135 │       1902 │
│[1m         Column Totals [0m│   7630 │   3119 │   1007 │   

To save the allocation to file, if we want to explore it later, we can do the following:

In [16]:
type_dictionary = []
for (i, alloc) in enumerate(sorted_allocation)
    if alloc in 1:NUMBER_OF_TYPES
        inst_id = reverse_mapping[institutions[i]]
        push!(type_dictionary, Dict("name" => institutions[i], "institution_id" => inst_id, "type" => alloc))
    end
end

In [17]:
open(".estimates/id_to_type_api.json", "w") do f
    write(f, JSON.json(type_dictionary))
end;

The allocation itself:

In [18]:
for sorted_type in 1:NUMBER_OF_TYPES
    counter = 0
    inst_hold = []
    println("TYPE $sorted_type:")
    for (i, sbm_type) in enumerate(sorted_allocation)
        if sbm_type == sorted_type
            push!(inst_hold, institutions[i])
            counter += 1
        end
    end
    for inst in sort(inst_hold)
        println("  ", inst)
    end
    println("Total Institutions: $counter")
    println()
end

TYPE 1:
  Bocconi University
  Boston College
  Boston University
  Brown University
  Carnegie Mellon University
  Columbia University
  Cornell University
  Duke University
  Harvard University
  Johns Hopkins University
  London School of Economics and Political Science
  Massachusetts Institute of Technology
  Michigan State University
  New York University
  Northwestern University
  Ohio State University
  Pennsylvania State University
  Princeton University
  Purdue University
  Stanford University
  Texas A&M University, College Station
  Tilburg University
  University College London
  University of British Columbia
  University of California Los Angeles (UCLA)
  University of California, Berkeley
  University of California, Davis
  University of Chicago
  University of Illinois at Urbana-Champaign
  University of Mannheim
  University of Maryland
  University of Michigan
  University of Minnesota, Twin Cities
  University of Oxford
  University of Pennsylvania
  University of