# Lab 1: How accurate is testing?

### The parameters of testing

There are a few important numbers that describe medical tests. The first is called *sensitivity*. It tells what fraction of sick people do in fact test positive. Of course, we would like our test to detect all sick people (i.e., sensitivity = 1), but that’s rarely possible in real life. When our sensitivity is less than one, that means we get some false negatives – i.e., the test gives a negative result but it’s mistaken.

The second important number is the *specificity*. It tells what fraction of healthy people test negative. Of course, we want to have specificity = 1, but again that’s unlikely in real life. When specificity < 1, we get the occasional false positive – a healthy person winds up mistakenly testing positive.

A well-run site using a high-quality Covid test (like we have at Tufts) may easily have sensitivity and specificity of .95 or higher. Does that mean our results are quite accurate? That’s what this lab will tell us.

### How the statistics work
Let’s take an example. Assume that there are 10,000 people in our testing pool and 1% of them (100 people) are infected. Assume that we use a test that has 98% sensitivity and 97% specificity. Then consider the following table:

<table>
<tr><th></th><th>Healthy</th><th>Infected</th></tr>
    <tr><td>Test positive</td><td>297</td><td>98</td></tr>
    <tr><td>Test negative</td><td>9603</td><td>2</td></tr>
</table>
    
Of the 100 infected people, our 98% sensitivity says that 98 people will correctly test positive and 2 will mistakenly test negative (i.e., a false negative). Of our 9900 healthy people, the 97% specificity says that 9603 will correctly test negative and 297 will result in a false positive.


Looking at these numbers, we see that 98+297=395 people test positive. Of these, 98 are true positives (i.e., are actually infected). Thus, if you receive a positive test, then your odds of being infected are 98/395 or roughly 24%. Surprisingly low! (Of course, if, in addition to receiving a positive test, you also feel lousy, are running a fever and have lost your sense of smell then your odds of being infected are probably a whole lot higher).

But enough talking; let's do these calcuations with Python.

Write code to define variables for the three input values.  It is a common coding practice to use `ALL_CAPS` for variable names which are *constants*, i.e., "magic numbers" that are used as inputs to the program and should not be modified.  Use the following values:

* `N_PEOPLE` : 200000
* `SENSITIVITY` : 0.98
* `SPECIFICITY` : 0.97

Assume that the fraction of the population who is infected remains at 1%.

In [None]:
# Your code here...



Now perform calculations similar to the ones described above, but using the variables `SENSITIVITY` and `SPECIFICITY` rather than the actual numbers .98 and .97. After those computations, print out a line like the following (your numbers will be different):

> Of the 395 total positives, 25% are true and 75% are false.

You may find the string `format` method useful: it replaces the curly braces `{}` with the numbers (or variables) you specify in the parenthesis of `format()`.  Experiment with the code cell below to see how it works.

In [None]:
print("the lucky numbers are {} and {}".format(12, N_PEOPLE))

In [None]:
# Your code here...



If somebody tells you to spend two weeks of your life quarantining based on having a 24% chance of being infected, you might reasonably push back. Perhaps you should ask for a retest?

Using our example numbers from earlier, let's assume that all of the 395 people who test positive ask for a retest. In this new sub-pool, we have 395 people of whom 98 are actually positive. We can use this data to make a new table, assuming that the sensitivity and specificity numbers are unchanged.

<table>
<tr><th></th><th>Healthy (297)</th><th>Infected (98)</th></tr>
    <tr><td>Test positive</td><td>9</td><td>96</td></tr>
    <tr><td>Test negative</td><td>288</td><td>2</td></tr>
</table>

Write code to do these calculations, and print out the results like this (your numbers will be different, of course):

> Retest results: of the 105 total positives, 92% are true and 8% are false.

In [None]:
# Your code here...



### Analysis questions

1. Suppose you are able to increase the sensitivity of the test to 99.8%, but doing so would cause the specificity to drop to 95%.  Is this a good idea?  *You should play with different numbers in the code until you have an intuitive sense of what would happen!*
2. What about going the other way --- increasing the *specificity* to 99.8%, but causing the sensitivity to drop to 95%?
3. For a while, Tufts was doing pooled testing, where multiple swabs were combined for a single test, and anyone whose swab-group tested positive had to return for a re-test.  What advantages and disadvantages does this have in terms of accuracy?


Write your answers to the above questions in this cell (double-click to edit, just like a code cell).


### Challenge problems
Each lab will include a few ungraded challenge problems intended to stretch your coding and analytical skills.  If you found this lab easy, take some time to work on these!  Feel free to add more code cells below this one to solve them.

1. Given the sensitivity and specificity defined above, how many tests do you need in order to be 99.99% sure that you don’t have Covid?

2. Suppose you get tested every two days, and the test has 98% sensitivity.  After a few negative tests, you’ll be very sure you don’t have Covid.  But suppose that each day you also have a 1% chance of getting infected without knowing it.  What is your limit of certainty?  That is, on any given day, how sure can you be that you do not have the illness? You can solve this analytically or with code; we’d love to see both!

3. Conveniently, the numbers in this simulation worked out so that a whole number of people tested positive and negative.  But suppose that our initial sample had some other number, giving a fractional number of positive tests.  Clearly it doesn’t make any sense for 120.78 people to test positive, so what is the right thing to do with these numbers?  How would you modify the simulation so that it reflects the reality that individuals test either positive or negative?