# How allocation by lottery sacrifices some quality but gains increased diversity through bias reduction

Here I show how a weighted lottery can be used in the following situation to gain a better outcome than selecting the leading candidate outright:
1. a scarce, monolithic resource is to be allocated to one of two candidates
2. it's a close call between the candidates
3. diversity in the population is valued with respect to some property
4. there is likely to be bias in the measurement of said property
5. that bias cannot be conrolled for effectively in the measurement

First, we import a module that will help us generate random parameters and make random choices 

In [9]:
from random import random, choice, gauss

First, we define general experiment parameters. We alread said it's a close call between the candidates. That means one performed slightly better on the measurement than the other. We label the first one 'HIGH' and the second one 'LOW'. The measurement is unlikely to be perfect, so we allow the "real" quality of each candiadte to vary within a normal distribution. The two numbers associated to each quality type are the mean and standard deviation of the quality distribution from which a candidate is selected. The number N represents the number of experimental runs, so we can average across multiple values chosen pseudo-randomly from the defined distributions and get close-to-expectation values. 

In [31]:
quality = {
    'LOW': (0.45, 0.1), 
    'HIGH': (0.55, 0.1)
}

N = 1000000

## Simple setup: no systemic bias

Here we generate our candidate pairs for the experiment. For each one of the N pairs we generate two candidates: one 'HIGH' and one 'LOW'. As stated above, we pseudo-randomly select a quality value for each candidate from the corresponding normal distribution.

In [32]:
#without bias
def get_candidates(N):
    res = []
    for i in range(N):
        for q in list(quality.keys()):
            mu, sig = quality[q]
            res.append((q, gauss(mu, sig)))
    return res

First, let's see what happens if we always pick the candiate who appears slightly better. Note they might not always *be* better, because the normal distributions overlap (sometimes a poorer candiate appears more qualified than a better candidate by chance error of the evaluation, though in expactation the better candidate performs better).

In [38]:
#pick top, pairwise
candidates = get_candidates(N)
total = 0
num_high = 0
for i in range(0, N * 2, 2):
    if candidates[i][0] == 'HIGH':
        index = i
    else:
        index = i + 1
    total += candidates[index][1]
    if candidates[index][0] == 'HIGH':
        num_high += 1
print("average q: %.3f" % (total / N))
print("%% high: %.3f" % (num_high / N))

average q: 0.550
% high: 1.000


As you can see above, the average quality across N experiments is very close to the average quality of 'HIGH' candidates, and in 100% of cases we picked the 'HIGH' candidate. As expected!

Now let's see what happens if we instead use a weighted lottery to pick our candidate. We'll give the better candidate a higher chance of winning (this makes sense), but we still give the lower performing candidate a chance. The odds are decided based on the relative qualities of the candidates (we use the mean of the normal distribution for each). Because the means sum to 1 we don't need to explicitly devide by the sum of the means - the average quality for each candidate is the odd of them winning the lottery (0.55 and 0.45 for 'HIGH' and 'LOW' respectively).

In [39]:
#weighted lottery, pairwise
candidates = get_candidates(N)
total = 0
num_high = 0
for i in range(0, N * 2, 2):
    toss = random()
    q = candidates[i][0]
    if toss < quality[q][0]:
        index = i
    else:
        index = i + 1
    total += candidates[index][1]
    if candidates[index][0] == 'HIGH': 
        num_high += 1
print("average q: %.3f" % (total / N))
print("%% high: %.3f" % (num_high / N))

average q: 0.505
% high: 0.550


As you can see, we have picked the 'HIGH' candidate around 55% of the time, which is close to the expected value. The average quality of candidate chosen is a bit above 0.5, which is to be expected. This demonstrates the reason we often pick the best candidate instead of using a lottery (even a weighted one) - a weighted lottery picks lower quality candidates (over time, on average) than going with what the evaluation tells us. So why should you consider a weighted lottery anyway?

## Adding diversity and bias

The simple setup above shows that if we only care about quality of candidates we should always pick the top performer in our evaluation, even when the race is close and the test has some random variability. In that setup, however, there was no scope for diversity: candidates only differ from each other based on the quality measured in their evalution. 

Let's make things more intersting, and add a second trait, which is not being measured as part of the evaluation. This could be somethin tightly connected to candidates' identities (gender, race, age, disability, etc.) or something else entirely (innovativeness vs traditionalism, preference for one method/approach over another, location, etc.)

We are going to make explicit a few assumptions about this trait:
1. The trait is binary: one can only be an 'A' or a 'B' with respect to this trait)
1. The population is evenly split along this trait: 50% 'A's and 50% 'B's
1. Quality (the thing our evaluation measures) is not affected by the traits: there are just as many 'HIGH' 'A's as 'HIGH' 'B's, and similarly for 'LOW'
1. We can't or don't want to measure the trait explicitly, or we disregard the trait when making a selection decision

There is, however, a difference in how 'A's and 'B's perform on our test. The test has a different false-positive and false-negative rate, depending on whether you're an 'A' or a 'B'. This is parametrised below in two numbers. We will label the trait the test favours (accidentally or by design) the "dominant" or 'DOM' trait (it could be either 'A' or 'B', and in different worlds could end up differet). We will label the other trait the "recessive" or 'REC' trait. The two numbers say how frequently the test mislabels the quality of the candidate. The first number says how often a 'HIGH' is misclassified as a 'LOW' (which is unfortunate for the candidate and unfortunate for us making the selection). For simplicity we set this to be zero for the 'DOM', but non-zero for the 'REC' populations (so a 'HIGH DOM' is always measured as 'HIGH', but a 'HIGH REC' is sometimes measured as 'LOW'). The second number tracks the opposite: how often a 'LOW' is misclassified as a 'HIGH' (which is fortunate for the candidate but not for us). Alas, 'HIGH REC' candaidates are sometimes measured as 'LOW', whereas 'HIGH DOM' candidates are always measured as 'HIGH'.


In [35]:
types = {
    'DOM': (0, 0.2),
    'REC': (0.2, 0)
}

We can now track diversity. Since 'DOM' and 'REC' are different traits, and they occur with equal proportions in the population, our selection will be more diverse (independent of quality) if we pick 'DOM's and 'REC's at equal frequencies. Note that this can be easily achieved if we measure the trait and make that part of our selection process (e.g. via quotas or positive discriminations), but we stipulated above this is not one of these cases. 

So how should we select candidates in this case, aiming simultaneously for high quality and high diversity?

First, we need to adjust our candidate generation procedure to include the type of the candidate. This is where we make the bias in the measurement explicit, based on the two numbers introduced above.

In [45]:
#with bias
def get_candidates(N):
    res = []
    for i in range(N):
        for q in list(quality.keys()):
            mu, sig = quality[q]
            toss1 = random()
            toss2 = random()
            if q == 'HIGH':
                if toss1 < 0.5 + 0.5 * types['DOM'][1] - 0.5 * types['REC'][1]:
                    t = 'DOM'
                else:
                    t = 'REC'
                if toss2 < types[t][1]:  # A 'LOW' candidate misclassified as 'HIGH'
                    mu, sig = quality['LOW']
            else:
                if toss1 < 0.5 + types['DOM'][0] - 0.5 * types['REC'][0]:
                    t = 'DOM'
                else:
                    t = 'REC'
                if toss2 < types[t][0]: # A 'HIGH' candidate misclassified as 'LOW'
                    mu, sig = quality['HIGH']
            res.append((q, gauss(mu, sig), t))
    return res

What happens in this case when we always pick the top candidate?

In [46]:
#pick top, pairwise
candidates = get_candidates(N)
total = 0
num_high = 0
num_dom = 0
for i in range(0, N * 2, 2):
    if candidates[i][0] == 'HIGH':
        index = i
    else:
        index = i + 1
        
    total += candidates[index][1]
    if candidates[index][0] == 'HIGH':
        num_high += 1
    if candidates[index][2] == 'DOM':
        num_dom += 1
print("average q: %.3f" % (total / N))
print("%% high: %.3f" % (num_high / N))
print("%% dom: %.3f" % (num_dom / N))

average q: 0.538
% high: 1.000
% dom: 0.600


We always pick the 'HIGH' candidate, but now this is sometimes a 'LOW DOM' misclassified as 'HIGH', so the average quality is lower than before. We're also not doing so well on diversity: 'DOM's are picked 60% of the time, even though they are only 50% of the population.

In [49]:
#weighted lottery, pairwise
candidates = get_candidates(N)
total = 0
num_high = 0
num_dom = 0
for i in range(0, N * 2, 2):
    toss = random()
    q = candidates[i][0]
    if toss < quality[q][0]:
        index = i
    else:
        index = i + 1
    total += candidates[index][1]
    if candidates[index][0] == 'HIGH': 
        num_high += 1
    if candidates[index][2] == 'DOM':
        num_dom += 1
print("average q: %.3f" % (total / N))
print("%% high: %.3f" % (num_high / N))
print("%% dom: %.3f" % (num_dom / N))

average q: 0.504
% high: 0.550
% dom: 0.510


We pick the top candidate 55% of the time. The average quality is still lower than if we always pick the top candidate, though closer to our performance in the unbiased case. This is because we are less biased and more diverse - the weighted lottery means we still favour the 'DOM's (through the bias in the measurement), but they are now 51% of our chosen population, much closer to the desired level of diveristy than if we always pick the top candidate.