## Intro Lab: A First-Pass Hiring Filter

*Lab adapted from an exercise developed by [Evan Peck](https://www.eg.bucknell.edu/~emp017/)*

### Scenario ###

Imagine you are working for *Moogle*, a well-known tech company that receives tens of thousands of job applications from graduating college seniors every year.

Since the company receives too many job applications for HR to individually assess in a reasonable amount of time, you are asked to create an algorithm that analyzes the data embedded in the applications and selects the ones most worth passing onto HR.

[Sound](https://qz.com/1427621/companies-are-on-the-hook-if-their-hiring-algorithms-are-biased/) [familiar](https://mashable.com/article/amazon-sexist-recruiting-algorithm-gender-bias-ai/)?

### Applicant Data

It's difficult to create these first-pass cuts, so *Moogle* designs their application forms to get some numerical data about their applicants' education. Job applicats from QTM must enter the grades they received in 6 core courses, as well as their overall GPA. For your convenience, this will be stored in a python `list` that you can access. (For more on Python lists, see [this notebook](https://github.com/laurenfklein/QTM340-Fall22/blob/main/notebooks/class3_lists.ipynb).)

For example, a student who received the following scores...

- **QTM 110 - Intro to Scientific Methods:** 100
- **QTM 150 - Intro to Statistical Computing I:** 95
- **QTM 151 - Intro to Statistical Computing II:** 80
- **QTM 210 - Probability and Statistics:** 89
- **QTM 220 - Regression Analysis:** 91
- **QTM 310 - Data Justice:** 75
- **Overall College GPA:** 83

... would result in the following list: `[100, 95, 80, 89, 91, 75, 83]`. 

You can assume that index `0` is *always* Intro to Scientific Methods, `1` is *always* Intro to Statistical Computing I, and so on.

Because you are processing many applications, your program will receive a *list of lists*. For example, this would be the information for 3 applicants:

`[ 
    [100, 95, 80, 89, 91, 75, 83], 
    [75, 80, 85, 90, 85, 88, 90], 
    [85, 70, 99, 100, 81, 82, 91] 
 ]`

### Your Task 
Your task is to:
1. Determine how you are going to select the top applicants from QTM to pass onto HR.
2. Given a list of applicant data (a *list of lists*), write a function returns a new list of worthwhile candidates.

### The Data
We'll be working with two datasets for this task. The first is `example_list`, which we can load just below:

In [26]:
example_list = [[93, 89, 63, 88, 60, 73, 80], [100, 63, 57, 96, 58, 71, 78], [81, 91, 99, 78, 57, 87, 86], [81, 73, 100, 57, 91, 60, 66], [86, 89, 64, 81, 69, 93, 92], [78, 63, 88, 95, 59, 98, 90], [55, 74, 68, 55, 69, 94, 80], [64, 77, 75, 92, 77, 72, 83], [95, 58, 92, 62, 77, 64, 59], [94, 78, 84, 83, 68, 69, 76]]

example_list

[[93, 89, 63, 88, 60, 73, 80],
 [100, 63, 57, 96, 58, 71, 78],
 [81, 91, 99, 78, 57, 87, 86],
 [81, 73, 100, 57, 91, 60, 66],
 [86, 89, 64, 81, 69, 93, 92],
 [78, 63, 88, 95, 59, 98, 90],
 [55, 74, 68, 55, 69, 94, 80],
 [64, 77, 75, 92, 77, 72, 83],
 [95, 58, 92, 62, 77, 64, 59],
 [94, 78, 84, 83, 68, 69, 76]]

The second is a larger dataset, which contains a list of ten-thousand randomly generated applicants. It's stored in a standalone file, which we'll use once we've gotten something working. We can load it as follows:

In [28]:
%load allApps.py

[[93, 89, 63, 88, 60, 73, 80],
 [100, 63, 57, 96, 58, 71, 78],
 [81, 91, 99, 78, 57, 87, 86],
 [81, 73, 100, 57, 91, 60, 66],
 [86, 89, 64, 81, 69, 93, 92],
 [78, 63, 88, 95, 59, 98, 90],
 [55, 74, 68, 55, 69, 94, 80],
 [64, 77, 75, 92, 77, 72, 83],
 [95, 58, 92, 62, 77, 64, 59],
 [94, 78, 84, 83, 68, 63, 76],
 [82, 96, 79, 89, 87, 93, 61],
 [63, 92, 79, 86, 58, 79, 69],
 [87, 73, 62, 59, 77, 94, 82],
 [92, 60, 81, 85, 61, 58, 81],
 [99, 66, 98, 60, 96, 80, 91],
 [56, 76, 76, 88, 73, 72, 91],
 [77, 75, 83, 60, 95, 75, 95],
 [55, 90, 80, 90, 78, 99, 70],
 [59, 57, 61, 69, 93, 88, 96],
 [80, 76, 91, 71, 89, 78, 59],
 [96, 66, 91, 95, 55, 77, 90],
 [68, 77, 70, 79, 59, 88, 97],
 [93, 78, 78, 71, 58, 92, 72],
 [84, 71, 69, 99, 63, 100, 67],
 [79, 81, 74, 91, 66, 89, 62],
 [80, 88, 80, 60, 81, 72, 66],
 [70, 63, 57, 88, 81, 61, 92],
 [81, 100, 86, 97, 72, 71, 58],
 [79, 99, 70, 72, 76, 66, 70],
 [94, 97, 68, 55, 88, 91, 70],
 [66, 89, 97, 66, 90, 71, 100],
 [61, 55, 76, 56, 59, 73, 71],
 [5

### The Code

We (your instructors) have prepared a some code that, given all of the applicant data, returns the most qualified applications according to a particular criteria. 

To begin, let's make our criteria: has an overall GPA of above 80.

For our data, we'll use `example_list` to start out. 

Remember the format of each app:

`[0]` - QTM 110 - Intro to Scientific Methods: 100

`[1]` - QTM 150 - Intro to Statistical Computing I: 95

`[2]` - QTM 151 - Intro to Statistical Computing II: 80

`[3]` - QTM 210 - Probability and Statistics: 89

`[4]` - QTM 220 - Regression Analysis: 91

`[5]` - QTM 310 - Data Justice: 75

`[6]` - Overall College GPA: 83

In [15]:
finalists = list() # create a list to hold the finalists 
                   # that meet our standard

for app in example_list: # this iterates through each of the apps 
                         # in the example_list
    if app[6] > 80: # remember that the 6th item in the 
                    # list is the overall college GPA;
                    # this looks for a GPA greater than 80
        finalists += [app] # and then, if there's a match
                           # it adds the app to the finalist lists 

finalists

[[81, 91, 99, 78, 57, 87, 86],
 [86, 89, 64, 81, 69, 93, 92],
 [78, 63, 88, 95, 59, 98, 90],
 [64, 77, 75, 92, 77, 72, 83]]

So that gives us four applicants that make the first cut. Now let's try a few more methods of winnowing the pack. Below, complete the code to return all applicants that have no grade below 65.

In [29]:
finalists = list() # create a list to hold the finalists 
for app in example_list: # this iterates through each of the apps 
    
    # below is a very clunky way to spell out what we're looking for,
    # but it should be easy logic for you to understand. 
    
    if app[0] >= 65 and app[1] >= 65 and app[2] >= 65 and app[3] >= 65 and app[4] >= 65 and app[5] >= 65:
        finalists += [app]
    
finalists

[[94, 78, 84, 83, 68, 69, 76]]

Let's do one more: filter applicants that have an average grade (including overall GPA) above 80. 

*Hint: `sum()` and `mean()` are both Python functions that work on lists just as you would expect*

In [21]:
finalists = list() # create a list to hold the finalists 

for app in example_list: # iterate through each of the apps 

    # your code here! 
    
    # remove below
    
    average = sum(app) / 7
            
    if average > 80:
        finalists += [app]
            
finalists
    
    

[[81, 91, 99, 78, 57, 87, 86],
 [86, 89, 64, 81, 69, 93, 92],
 [78, 63, 88, 95, 59, 98, 90]]

## In-class Group Exercise

In your group, discuss the tradeoffs of these three methods. Then, write a filter with your own criteria in the cell below.

First test it on the `example_list` data. When you've got it working, try it again with the `allApps` data. 

In [22]:
finalists = list() # create a list to hold the finalists 

for app in example_list: # replace "example_list" with "allApps" 
                         # when you've got your filter working
    
    # your criteria here
            
finalists

IndentationError: expected an indented block (<ipython-input-22-86b6dfbcb24b>, line 8)

## In-Class Group Discussion 


### Discussion Questions #1

In a few sentences, please explain the criteria you used to choose your finalists, and how you arrived at those criteria. 

*Your answer here....*

### Discussion Question #2

Here is some code to help you calculate the percentage of finalists that your algorthim passed along to HR. Run this code and then answer the questions below.

In [None]:
# some code to help you calculate the percentage of finalists your algorithm kept

for finalist in finalists:
    print(finalist)
print("Your algorithm kept", round(len(finalists)/len(allApps)*100), "percent of applicants")

What percentage of applicants does your algorithm pass on as finalists? Do you think that percentage is enough? 

*Your answer here....*

### Discussion Question #3

If Moogle asked you to take a more aggressive approach with your algorithm, would there be any tradeoffs? In few sentences, explain what the tradeoffs might be. 



*Your answer here...*

At this point, **save your copy of this notebook!!!**

**The following questions should be answered individually, and submitted via Canvas as your first lab assignment, due February 1st by 10am.** 

# Individual Homework Discussion Questions

Having designed your algorithm and discussed its trade-offs in your group, and having discussed some additional considerations as a class, please answer the following questions. Your responses should be in the range of a few sentences to a short paragraph. (Some questions may require longer responses than others).

### Individual Homework Question #1

What systemic advantages/disadvantages is your algorithm likely to amplify?


*Your answer here....*

### Individual Homework Question #2

At this point in the course, what do you think it means to design a fair algorithm?


*Your answer here....*

### Individual Homework Question #3

What is the human cost of efficiency? More permissive algorithms may capture more interesting candidates, but it also means more costly, human work. What is the ideal balance?

*Your answer here....*