## Lab 1: A First-Pass Hiring Filter

*Lab adapted from an exercise developed by [Evan Peck](https://www.eg.bucknell.edu/~emp017/)*

### Scenario ###

Imagine you are working for *Moogle*, a well-known tech company that receives tens of thousands of job applications from graduating seniors every year.

Since the company receives too many job applications for HR to individually assess in a reasonable amount of time, you are asked to create a program that algorithmically analyzes applications and selects the ones most worth passing onto HR.

[Sound](https://qz.com/1427621/companies-are-on-the-hook-if-their-hiring-algorithms-are-biased/) [familiar](https://mashable.com/article/amazon-sexist-recruiting-algorithm-gender-bias-ai/)?

### Applicant Data

It's difficult to create these first-pass cuts, so *Moogle* designs their application forms to get some numerical data about their applicants' education. Job applications must enter the grades they received in 6 core CS courses, as well as their overall GPA. For your convenience, this will be stored in a python `list` that you can access. (For more on Python lists, see [this notebook](../lists.ipynb).)

For example, a student who received the following scores...

- **Intro to CS:** 100
- **Data Structures:** 95
- **Software Engineering:** 80
- **Algorithms:** 89
- **Computer Organization:** 91
- **Operative Systems:** 75
- **Overall College GPA:** 83

... would result in the following list: `[100, 95, 80, 89, 91, 75, 83]`. 

You can assume that index `0` is *always* Intro to CS, `1` is *always* Data Structures, and so on.

Because you are processing many applications, your program will receive a *list of lists*. For example, this would be the information for 3 applicants:

`[ 
    [100, 95, 80, 89, 91, 75, 83], 
    [75, 80, 85, 90, 85, 88, 90], 
    [85, 70, 99, 100, 81, 82, 91] 
 ]`

### Your Task 
Your task is to:
1. Determine how you are going to select the top applicants to pass onto HR.
2. Given a list of applicant data (a *list of lists*), write a function returns a new list of worthwhile candidates.

### The Data
We'll be working with two datasets for this task. The first is `example_list`, which we can load just below:

In [None]:
example_list = [[93, 89, 63, 88, 60, 73, 80], [100, 63, 57, 96, 58, 71, 78], [81, 91, 99, 78, 57, 87, 86], [81, 73, 100, 57, 91, 60, 66], [86, 89, 64, 81, 69, 93, 92], [78, 63, 88, 95, 59, 98, 90], [55, 74, 68, 55, 69, 94, 80], [64, 77, 75, 92, 77, 72, 83], [95, 58, 92, 62, 77, 64, 59], [94, 78, 84, 83, 68, 63, 76]]

example_list

The second is a larger dataset, which contains a list of ten-thousand randomly generated applicants. It's stored in a standalone file, which we'll use once we've gotten something working. We can load it as follows:

In [None]:
%load allApps.py


In [None]:
# just check to see what it looks like

allApps

### The Code

I've prepared a some code that, given all of the applicant data, returns the most qualified applications according to a particular criteria. 

To begin, let's make our criteria: has an overall college GPA of above 80.

For our data, we'll use `example_list` to start out. 

Remember the format of each app:

`[0]` - Intro to CS: 100

`[1]` - Data Structures: 95

`[2]` - Software Engineering: 80

`[3]` - Algorithms: 89

`[4]` - Computer Organization: 91

`[5]` - Operative Systems: 75

`[6]` - Overall College GPA: 83

In [None]:
finalists = list() # create a list to hold the finalists -- the ones that meet our standard

for app in example_list: # this iterates through each of the apps in the example_list
    if app[6] > 80: # remember that the 6th item in the list is the overall college GPA
        finalists += [app] # add the app to the finalist lists 

finalists

So that gives us four applicants that make the first cut. Now let's try a few more methods of winnowing the pack. Below, complete the code to return all applicants that have no grade below 65.

In [None]:
finalists = list() # create a list to hold the finalists -- the ones that meet our standard

for app in example_list: # this iterates through each of the apps in the example_list
    if app[0] >= 65 and # ... complete this if statement :
        finalists += [app]
        
finalists

Now let's try a filter where we accept applicants that have at least 4 grades ABOVE 85.

In [None]:
finalists = list() # create a list to hold the finalists -- the ones that meet our standard

for app in example_list: # this iterates through each of the apps in the example_list

    # your code here! 
    # hint: you might want to start with a counter 
            
finalists

Let's do one more: filter applicants that have an average grade above 80. 

In [None]:
finalists = list() # create a list to hold the finalists -- the ones that meet our standard

for app in example_list: # this iterates through each of the apps in the example_list

    # your code here! 
            
finalists

After writing, testing, and considering the tradeoffs of these four methods, write your own criteria in the cell below.

First test it on the `example_list` data. When you've got it working, try it again with the `allApps` data. 

In [None]:
finalists = list() # create a list to hold the finalists -- the ones that meet our standard

for app in example_list: # replace example_list with allApps when you've got your filter working
    
    # your criteria here
            
finalists

### Questions to consider

1. What criteria did you choose to select finalists? How did you choose that criteria?


2. Roughly what percentage of applicants does your algorithm pass on as finalists? Is that enough? If _Moogle_ asked you to take a more aggressive approach with your algorithm, are there any tradeoffs?


In [None]:
# some code to help you calculate the percentage of finalists your algorithm kept

for finalist in finalists:
    print(finalist)
print("Your algorithm kept", round(len(finalists)/len(allApps)*100), "percent of applicants")