<div style="text-align: right">CSCI E-7 Introduction to Python Programming for Life Sciences</div>
<div style="text-align: right">Dino Konstantopoulos, 3 February 2019 assigned Homework, which is due 10 February</div>


# Homework

All your results should be evaluated in cells in this notebook. Please upload your completed notebook to Canvas. All computation cells should be accompanied by at least one markdown cell clearly explaining your logic. Correct computation cells with no associated reasoning markdown will not get you a top grade.

*All your solutions should be in the form of python **list** or **set comprehensions***. No loops!

All results should be given in terms of python `Fraction`s, no decimal numbers please! Please use the prbability function we defined in class:
```python
from fractions import Fraction
def p(event, space): 
    "The probability of an event, given a sample space of equiprobable outcomes."
    return Fraction(len(event & space), 
                    len(space))
```

I give a lot of code hints in markdown cells below. Cut and paste judiciously.

In [1]:
from fractions import Fraction
def p(event, space): 
    "The probability of an event, given a sample space of equiprobable outcomes."
    return Fraction(len(event & space), 
                    len(space))

# Dangerous pill combinations

<br />
<center>
    <img src="ipynb.images/pills.jpg" width=600 />
</center>

The danger of mixing drugs, supplements, and/or alcohol is very real. At least 1.5 million people in the U.S. are harmed annually by medication errors, according to a report issued in July 2006 by the Institute of Medicine.

Reducing your risk, experts agree, is often a matter of using common sense and asking your doctor or pharmacist the right questions. 

For example, the blood thinner Coumadin, taken by people with blood clots or with heart valve conditions, shouldn't be mixed with ginseng nor aspirin. "It's an additive effect", experts agree, of the Coumadin-aspirin/ginseng combination. "It increases your chance of internal bleeding or, if you get a cut on your finger, the blood won't clot as quickly." Indeed, there are many drugs and supplements that are off-limits when you are taking [Coumadin](https://www.webmd.com/drugs/2/drug-4069/coumadin-oral/details).

> A very strange pill box indeed contains 23 pills: 8 Coumadin pills, 6 aspirin pills, and 9 ginseng pills. It comes into the possesion of a patient who thinks it contains just Coumadin pills. The patient selects six pills at random from the box (each possible selection is equally likely). What is the probability of each of these possible outcomes:

> 1. All pills are Coumadin pills
2. A deadly comination of 3 Coumadin pills, 2 aspirin pills, and 1 ginseng pill
3. A dangerous but non-fatal combination that contains exactly 4 Coumdadin pills

So, outcome = set of 6 pills, sample space = set of all possible 6 pill combinations. 

We'll mark our pills `'C1'` through `'C8'`, `'A1'` through `'A6'`, and `'G1'` through `'G9'`, `C` for Coumadin, `A` for Aspirin, and `G` for Ginseng.

To note:
- An outcome is a *set* of pills, where order doesn't matter, not a *sequence*, where order matters. When order **matters**, the set of outcomes is called all **permutations**. When order **does not matter**, the set of outcomes is called all **combinations**.

The number of *combinations* of pills is the number of *permutations* divided by `c!` (c [factorial](https://en.wikipedia.org/wiki/Factorial)), where *c* is the number of pills. So there are less combinations and more permutations possible. If I want to choose 2 Coumadin pills from the 8 available, there are 8 ways to choose a first Coumadin pill and 7 ways to choose a second (because the first one has been picked and not available in the sample space anymore), and therefore 8 &times; 7 = 56 permutations of two Coumadin pills. But there are only 56 / 2 = 28 combinations, because `(C1, C2)` is the same combination as `(C2, C1)`.

We'll start by defining the contents of the pill box, and we'll use a python `set` (unordered collection with no duplicate elements), since we don't expect them to change. Since we're passing in strings, the `+` operator will concatenate strings together.

```python
def cross(A, B):
    """The set of ways of concatenating one item from collection A with one from B."""
    return {a + b 
            for a ... for b ...}  # fill in the ...

pillbox = cross('C', '12345678') | cross('A', ...) | cross('G', ...)
pillbox
```

In [4]:
def cross(A, B):
    """The set of ways of concatenating one item from collection A with one from B."""
    return {a + b 
            for a in A for b in B}

pillbox = cross('C', '12345678') | cross('A', '123456') | cross('G', '123456789')
pillbox

{'A1',
 'A2',
 'A3',
 'A4',
 'A5',
 'A6',
 'C1',
 'C2',
 'C3',
 'C4',
 'C5',
 'C6',
 'C7',
 'C8',
 'G1',
 'G2',
 'G3',
 'G4',
 'G5',
 'G6',
 'G7',
 'G8',
 'G9'}

Now let's define the sample space, `U6`, as the set of all 6-pill combinations. Know what.. physicists define all forces in nature in terms of similar sample spaces where the number of samples is the number of symmetries in the behavior of the objects that the forces act upon. Check it out [here](https://arxiv.org/pdf/hep-th/9712154.pdf) for a good introduction to symmetry groups in physics). Lot of that research happened right here, MIT and Harvard..

We will use the python `itertools.combinations` package to generate the combinations, and then join each combination into a string:

```python
import itertools

def combinations(items, n):
    "All combinations of n items; each combination as a concatenated str."
    return {' '.join(combo) 
            for combo in itertools.combinations(items, n)}

U6 = combinations(pillbox, 6)
len(U6)
```

In [11]:
import itertools

def combinations(items, n):
    "All combinations of n items; each combination as a concatenated str."
    return {' '.join(combo) 
            for combo in itertools.combinations(items, n)}

U6 = combinations(pillbox, 6)
len(U6)

100947

You should find that there are 100,947 members in our pillbox sample space `U6`. To take a peek at 10 random 6-pill samples of them (you should always take a peek at datasets in Data Science. *Always*):

```python
import random
random.sample(U6, 10)
```

In [13]:
import random
random.sample(U6, 10)

['G6 G5 C4 A6 C6 C5',
 'A3 A5 G9 G2 A2 C5',
 'A3 C7 C8 C4 A6 A2',
 'G6 G8 C8 G2 A2 C5',
 'G6 A5 G8 C8 A6 C6',
 'G7 G1 C2 C8 A6 C6',
 'G8 C1 A6 C6 A2 C5',
 'G6 C7 G8 A4 A6 G4',
 'C3 C8 G2 A6 C6 A2',
 'A3 G1 A5 G8 C4 G2']

We can pick any of 23 pills for the first item, any of 22 for the second, ..., and any of 18 for the sixth. But since we don't care about the ordering of the six pills, we divide the product by 6! (the number of possible combinations of 6 things) and thus:

$$23 ~\mbox{choose}~ 6 = \frac{23 \cdot 22 \cdot 21 \cdot 20 \cdot 19 \cdot 18}{6!} = 100947$$

But since $23 \cdot 22 \cdot 21 \cdot 20 \cdot 19 \cdot 18 = 23! \;/\; 17!$, we can write:

$$n ~\mbox{choose}~ c = \frac{n!}{(n - c)! \cdot c!}$$

To translate that to code, use the following, and note that
* Python has two division operators, a single slash `/` character for classic division and a double-slash `//` for *floor* division (rounds down to nearest whole number). Classic division means that if the operands are both integers, it will perform floor division, while for floating point numbers, it represents true division.

```python
from math import factorial

def choose(n, c):
    """Number of ways to choose c items from a list of n items."""
    return factorial(n) // (factorial(n - c) * factorial(c))
choose(23, 6)
```


In [14]:
from math import factorial

def choose(n, c):
    """Number of ways to choose c items from a list of n items."""
    return factorial(n) // (factorial(n - c) * factorial(c))
choose(23, 6)


100947

To note:
* `count()` is the python function that returns the *cardinal* (a.k.a *length*) of a sequence (a.k.a. list/set/tuple/dict), filtered by an argument. True statement: ```'foobar'.count('o') == 2```. 

Now we're ready to answer the 4 problems: 

### Pilbox Problem 1: What's the probability of the benign pick of 6 Coumadin pills? 

Use a python set comprehension, then leverage the `p` function above.
```python
coumadin6 = {b for b ...}  # fill in the ...
print(coumadin6)
p(coumadin6, U6)
```

Go ahead, cut and paste below and replace `...` with the right answer. Then verify your answer by running the code below and ensuring that it's the number of ways of picking 6 pills from a list of 9 pills in an unordered fashion by leveraging the `choose` function referenced above:
```python
p(coumadin6, U6) == Fraction(choose(...)
```

In [63]:
coumadin6 = {b for b in U6 if 'A' not in b and 'G' not in b}
print(coumadin6)
p(coumadin6, U6)

{'C7 C2 C3 C8 C1 C6', 'C3 C8 C4 C1 C6 C5', 'C2 C3 C8 C4 C6 C5', 'C2 C3 C4 C1 C6 C5', 'C7 C2 C3 C4 C6 C5', 'C7 C3 C8 C4 C1 C6', 'C7 C2 C3 C4 C1 C6', 'C7 C2 C3 C8 C6 C5', 'C7 C2 C3 C8 C4 C6', 'C7 C2 C3 C8 C1 C5', 'C7 C2 C8 C4 C1 C5', 'C7 C8 C4 C1 C6 C5', 'C2 C8 C4 C1 C6 C5', 'C2 C3 C8 C1 C6 C5', 'C7 C2 C3 C1 C6 C5', 'C7 C2 C8 C1 C6 C5', 'C7 C2 C8 C4 C1 C6', 'C2 C3 C8 C4 C1 C6', 'C7 C2 C3 C8 C4 C1', 'C7 C3 C4 C1 C6 C5', 'C2 C3 C8 C4 C1 C5', 'C7 C3 C8 C1 C6 C5', 'C7 C2 C3 C4 C1 C5', 'C7 C3 C8 C4 C6 C5', 'C7 C2 C8 C4 C6 C5', 'C7 C2 C3 C8 C4 C5', 'C7 C3 C8 C4 C1 C5', 'C7 C2 C4 C1 C6 C5'}


Fraction(4, 14421)

In [67]:
len(coumadin6) == (Fraction(choose(8, 6)))

True

### Pillbox Problem 2: What is the probability of the *lethal* cocktail of exactly 3 Coumadin, 2 Aspirin, and 1 Ginseng pills?

Use a python set comprehension, the evaluate a probability using the `p` function.
```python
c3a2g1 = {s for s ...}
p(c3a2g1, U6)
```


Then verify that it's equal to the number of ways of picking 3 Coumadin out of 6, 2 Aspirin out of 8, and 1 Ginseng out of 9 by leveraging the `choose` function.

You can also reason that there are 6 ways to pick the first Coumadin 5 ways to pick the second Coumadin, and 4 ways to pick the remaining Coumadin. Then 8 ways to pick the first Aspirin and 7 to pick the second. Then 9 ways to pick a Ginseng pill. But the order 'C1, C2, C3' should count as the same as 'C2, C3, C1' and all the other orderings; so divide by 3! to account for the permutations of Coumadin pills, by 2! to account for the permutations of Aspirin pills, and finally by the length of `U6` to get a probability.

In [69]:
c3a2g1 = {s for s in U6 if s.count('A') == 2 and s.count('C') == 3 and s.count('G') == 1}
p(c3a2g1, U6)

Fraction(360, 4807)

In [87]:
Fraction((6*5*4)*(8*7)*(9)/(factorial(3)*factorial(2))/len(U6))

Fraction(7195266307094947, 144115188075855872)

### Pillbox Problem 3: What is the probability of the non-lethal cocktail of exactly 4 Coumadin pills?

Use a python set comprehension, then evaluate a probability using the `p` function.

Then verify that it's the number of ways to choose 4 out of the 8 Coumadin pills and 2 out of the 15 non-Coumadin pills by leveraging the `choose` function.


In [88]:
c4 = {c for c in U6 if c.count('C') == 4}
p(c4, U6)

Fraction(350, 4807)

In [89]:
choose(8,4)*choose(15,2)

7350

In [90]:
Fraction(7350/100947)

Fraction(5246548348923399, 72057594037927936)