# College Admissions: Making Tough Choices

## Remember: the goal of this assignment is for you to explore different algorithms, choose 3 or more to compare, and then argue in favor of one, in reference to the others (and based on the various concepts in the course)

### I am _not_ interested in whether you get "the best" algorithm (think about why...) --- I am interested in how you engage with and discuss the trade-offs between your different proposals.  

### Your grade _will_ be based on how you engage with these, so try to pick some proposals that in fact represent "trade-offs": it's not particularly hard to find "really bad" algorithms.  

### If you're lost, think about why each of the three permissible traits (SAT scores, high school GPA, and STEM major interest) might matter in college admissions: I'm not grading whether these justifications are "correct," but rather looking for you to engage with the "hypothetical real-world algorithmic design challenge."  

#### Put another way: be creative!  At its heart, "data science" is simply a fancier way of saying "creativity in the face of empirical uncertainty."

## Step 1 (Initialization): _Evaluate the cell below and include the resulting number at the beginning the answer you submit on Canvas_. 
### <b>DO THIS <u>ONLY ONCE</u> AT THE BEGINNING OF YOUR ASSIGNMENT</b>

In [1]:
Print["Write this number down in your assignment: ", init=Round[Mod[AbsoluteTime[],1000000]]]

Write this number down in your assignment: 192331


### Step 2 (Code definition): Evaluate the next cell: this contains the code for the simulation

In [13]:
(* This is the code for the simulation *)

acode[satCoeff_, gradeCoeff_, majorCoeff_] := 
  Module[{},
   
   SeedRandom[init];
   
   sw = 0.001;
   gw = 1;
   mw = 1;
   
    (* Outcome function *)
   mu[x1_, x2_, x3_] := sw*x1 + gw*x2 + mw*x3; 

    (* Prediction function *)
   tau[x1_, x2_, x3_] := satCoeff * x1/1600 + gradeCoeff * x2 + majorCoeff * x3;

   
   admitThreshold=0; (* threshold for admits *)
   nLarge = 100; (* Size of larger group *)
   alpha = 1/2; (* smaller group's size as a proportion of nLarge *)
   lambda = 1; (* for bernoulli distribution of y *)
   
   theta = Table[3 - j, {j, 1, 2}];
   sigma = Table[1, {j, 1, 2}];
   n = {nLarge, Round[alpha*nLarge]};
   
   
   (*Distributions*)
   testDist[theta_, sigma_] := 
    TruncatedDistribution[{400, 1600}, 
     NormalDistribution[100*theta + 800, 500*sigma]];
   
   gradeDist[theta_, sigma_] := 
    TruncatedDistribution[{0, 4}, 
     NormalDistribution[3 - 0.1 theta, 1]];
   
   majorDist[theta_] := BernoulliDistribution[(0.1*theta + 1)/4];
   
   outcomesDist[theta_] := BernoulliDistribution[theta];
   
   (* Draw the random data *)
   
   satData = Table[Round[
      RandomVariate[testDist[theta[[j]], sigma[[j]]], n[[j]]]], {j, 1,
       2}];
       
   gradeData = Table[Round[
      RandomVariate[gradeDist[theta[[j]], sigma[[j]]], n[[j]]], 
      0.1], {j, 1, 2}];
   
   majorData = Table[RandomVariate[majorDist[theta[[j]]], n[[j]]] + 1, {j, 1, 2}];
    
    noiseData = 
    Table[
      RandomVariate[NormalDistribution[0,1], n[[j]]], {j, 1,
       2}];

   (* Group-Specific Means *)
   
   satMeans = N[Table[Mean[satData[[j]]], {j, 1, 2}]];
   gradeMeans = N[Table[Mean[gradeData[[j]]], {j, 1, 2}]];
   majorMeans = N[Table[Mean[majorData[[j]]], {j, 1, 2}]];
   (* Overall Means *)
   
   satMean = Mean[Flatten[satData]];
   gradeMean = Mean[Flatten[gradeData]];
   majorMean = Mean[Flatten[majorData]];
   
   outcomesData = 
    Table[RandomVariate[
      outcomesDist[
       Exp[lambda * mu[satData[[j, i]], gradeData[[j, i]], 
          majorData[[j, i]]]]/(Exp[
           lambda* mu[satData[[j, i]], gradeData[[j, i]], 
            majorData[[j, i]]]] + 
          Exp[lambda * mu[satMean, gradeMean, majorMean]])]], {j, 1, 2}, {i, 1,
       n[[j]]}];
   
   outcomesMeans = N[Table[Mean[outcomesData[[j]]], {j, 1, 2}]];
   
   admitScores = 
    Table[tau[satData[[j, i]], gradeData[[j, i]], 
      majorData[[j, i]]], {j, 1, 2}, {i, 1, n[[j]]}];
   
   
   
   admitted = 
    Table[Piecewise[{{0, 
        tau[satData[[j, i]], gradeData[[j, i]], majorData[[j, i]]] + noiseData[[j,i]] < 
         admitThreshold+tau[satMean, gradeMean, majorMean]}, {1, 
        tau[satData[[j, i]], gradeData[[j, i]], majorData[[j, i]]] + noiseData[[j,i]] >= 
         admitThreshold+tau[satMean, gradeMean, majorMean]}}] , {j, 1, 2}, {i, 1, n[[j]]}];
   
   (* Datset labeling & creation *)
   allDataTable = 
    Flatten[Table[{i, j, satData[[j, i]], gradeData[[j, i]], 
       majorData[[j, i]], admitScores[[j, i]], admitted[[j, i]], 
       outcomesData[[j, i]]}, {j, 1, 2}, {i, 1, 
       Length[satData[[j]]]}], 1];
   
   PrependTo[
    allDataTable, {"ID", "Group", "SAT", "GPA", "STEM", "Score", 
     "Admitted", "Outcome"}];
   
   ds = Dataset[
     AssociationThread[First@allDataTable, #] & /@ Rest@allDataTable];
   
   (* Output Variable Storage *)
   
   ppv = Table[-1, {g, 1, 2}];
   npv = Table[-1, {g, 1, 2}];
   tpr = Table[-1, {g, 1, 2}];
   fpr = Table[-1, {g, 1, 2}];
   avgSAT = Table[-1, {g, 1, 2}];
   avgGPA = Table[-1, {g, 1, 2}];
   avgSTEM = Table[-1, {g, 1, 2}];
   num = Table[-1, {g, 1, 2}];
   
   (* Compute Overall Applicant Stats *)
   
   applicantPoolavgSAT = N[Mean[d[All, "SAT"]]];
   applicantPoolavgGPA = N[Mean[d[All, "GPA"]]];
   applicantPoolavgSTEM = N[Mean[d[All, "STEM"]]] - 1;
   
   
   (* Compute Overall Admissions Stats *)
   
   
   d = ds;
   numAdmits = Length[d[Select[#Admitted == 1 &]]];
   numRejects = Length[d[Select[#Admitted == 0 &]]];
   numQualified = Length[d[Select[#Outcome == 1 &]]];
   numUnqualified = Length[d[Select[#Outcome == 0 &]]];
   numQualifiedAdmits = 
    Length[d[Select[#Outcome == 1 && #Admitted == 1 &]]];
   numUnqualifiedAdmits = 
    Length[d[Select[#Outcome == 0 && #Admitted == 1 &]]];
   numQualifiedRejects = 
    Length[d[Select[#Outcome == 1 && #Admitted == 0 &]]];
   numUnqualifiedRejects = 
    Length[d[Select[#Outcome == 0 && #Admitted == 0 &]]];
   
   overallppv = Round[N[numQualifiedAdmits/numAdmits],0.01];
   overallnpv = Round[N[numUnqualifiedRejects/numRejects],0.01];
   overalltpr = Round[N[numQualifiedAdmits/numQualified],0.01];
   overallfpr = Round[N[numUnqualifiedAdmits/numUnqualified],0.01];
   
   overallavgSAT = Round[N[Mean[d[Select[#Admitted == 1 &], "SAT"]]],0.01];
   overallavgGPA = Round[N[Mean[d[Select[#Admitted == 1 &], "GPA"]]],0.01];
   overallavgSTEM = Round[N[Mean[d[Select[#Admitted == 1 &], "STEM"]]],0.01] - 1;
   
   
   
   Do[
    d = ds[Select[#Group == g &]];
    numAdmits = Length[d[Select[#Admitted == 1 &]]];
    num[[g]]=numAdmits;
    numRejects = Length[d[Select[#Admitted == 0 &]]];
    numQualified = Length[d[Select[#Outcome == 1 &]]];
    numUnqualified = Length[d[Select[#Outcome == 0 &]]];
    numQualifiedAdmits = 
     Length[d[Select[#Outcome == 1 && #Admitted == 1 &]]];
    numUnqualifiedAdmits = 
     Length[d[Select[#Outcome == 0 && #Admitted == 1 &]]];
    numQualifiedRejects = 
     Length[d[Select[#Outcome == 1 && #Admitted == 0 &]]];
    numUnqualifiedRejects = 
     Length[d[Select[#Outcome == 0 && #Admitted == 0 &]]];
    
    ppv[[g]] = Round[N[numQualifiedAdmits/numAdmits],0.01];
    npv[[g]] = Round[N[numUnqualifiedRejects/numRejects],0.01];
    tpr[[g]] = Round[N[numQualifiedAdmits/numQualified],0.01];
    fpr[[g]] = Round[N[numUnqualifiedAdmits/numUnqualified],0.01];
    
    avgSAT[[g]] = Round[N[Mean[d[Select[#Admitted == 1 &], "SAT"]]],0.01];
    avgGPA[[g]] = Round[N[Mean[d[Select[#Admitted == 1 &], "GPA"]]],0.01];
    avgSTEM[[g]] = Round[N[Mean[d[Select[#Admitted == 1 &], "STEM"]] - 1],0.01];
    
    
    , {g, 1, 2}];
   

   PrependTo[num,num[[1]]+num[[2]]];
   PrependTo[ppv, Round[overallppv,0.01]];
   PrependTo[npv, Round[overallnpv,0.01]];
   PrependTo[tpr, Round[overalltpr,0.01]];
   PrependTo[fpr, Round[overallfpr,0.01]];
   PrependTo[num,Total[n]];
   PrependTo[ppv,"---"];
   PrependTo[npv,"---"];
   PrependTo[tpr,"---"];
   PrependTo[fpr,"---"];
   PrependTo[avgSAT, overallavgSAT];
   PrependTo[avgGPA, overallavgGPA];
   PrependTo[avgSTEM, overallavgSTEM];
   PrependTo[avgSAT,Round[applicantPoolavgSAT,0.01]];
   PrependTo[avgGPA,Round[applicantPoolavgGPA,0.01]];
   PrependTo[avgSTEM,Round[applicantPoolavgSTEM,0.01]];
   
   summaryOutput=Transpose[Join[{{"Applicants", "All Admits", "Group 1 Admits", 
       "Group 2 Admits"}},{num},{ppv}, {npv}, {tpr}, {fpr}, {avgSAT}, {avgGPA}, {avgSTEM}]];
    
   PrependTo[summaryOutput,{" ","#", "PPV", "NPV", "TPR", "FPR", "Avg SAT", 
       "Avg GPA", "% STEM"}]

   Print[TableForm[summaryOutput]];
       
    Print["\n"];
    
    Print["Emory's Expected Payoff: NumAdmits * (Overall TPR - Overall FPR)/2 = ", num[[2]]*(overalltpr-overallfpr)/2];
    
    Print["\n"];

   
   ]

### Step 3 (Parameter Setting & Evaluation): Set your 3 coefficients for a complete admissions proposal

#### Hint: these values should probably be positive!  I have set the default coefficients at zero.  These are "almost surely not the best," though! 

In [15]:
(* Step 1: SATCofficient is the weight on SAT scores: this can be any real number *)
Print["The SAT test score coefficient is ", SATCoefficient = 0  ];

(* Step 2: GPACofficient is the weight on high school GPA: this can be any real number *)
Print["The high school GPA cofficient is ", GPACoefficient = 0   ];

(* Step 3: STEMCofficient is the weight on interest in a STEM major: this can be any real number *)
Print["The STEM major coefficient is ", STEMCoefficient = 0  ];

Print["Your policy would yield the following summary statistics:"]
Print["\n"]
acode[SATCoefficient,GPACoefficient,STEMCoefficient]






The SAT test score coefficient is 0
The high school GPA cofficient is 0
The STEM major coefficient is 0
Your policy would yield the following summary statistics:


                 #     PPV    NPV    TPR    FPR    Avg SAT   Avg GPA   % STEM

Applicants       150   ---    ---    ---    ---    1044.32   2.59      0.38

All Admits       95    0.52   0.42   0.6    0.67   983.55    2.53      0.32

Group 1 Admits   67    0.51   0.48   0.67   0.67   975.49    2.57      0.33

Group 2 Admits   28    0.54   0.32   0.5    0.65   1002.82   2.41      0.29


Emory's Expected Payoff: NumAdmits * (Overall TPR - Overall FPR)/2 = -3.325




## <b>The values above are key to answering and justifying your answers.</b>

### You should copy these down... 

### Taking a screenshot is useful, but you can repeat the analysis as many times as you want: all of "randomness" in the data was initialized in the first step of the notebook, above.

### When you find a proposal you find interesting --- you will need these to provide rationales for your proposals and why you recommend the one you choose.

#### <b>Note: </b> Emory's expected payoff is _not_ the only criterion you should consider. <b>You need to consider both Emory's goal and other issues, such as</b> _fairness_.

##### (As a side note: If Emory simply didn't admit _anyone_, <b>it would receive a payoff equal to zero</b>. _Why?_)

#### The goal, again, is <b>not</b> to find "the best" coefficients: it is to offer a succinct but thorough justification for three proposals and _which one to choose_ among the three.