# Tools and Methods of Data Analysis
## Session 4 - Part 2

Niels Hoppe <<niels.hoppe.extern@srh.de>>

### Combinatorics in Python

In [20]:
import itertools

cards = itertools.product(
    [ 2, 3, 4, 5, 6, 7, 8, 9, 10, 'Jake', 'Queen', 'King', 'Ace' ],
    [ 'Clubs', 'Hearts', 'Spades', 'Diamonds' ]
)
[c for c in cards]

[(2, 'Clubs'),
 (2, 'Hearts'),
 (2, 'Spades'),
 (2, 'Diamonds'),
 (3, 'Clubs'),
 (3, 'Hearts'),
 (3, 'Spades'),
 (3, 'Diamonds'),
 (4, 'Clubs'),
 (4, 'Hearts'),
 (4, 'Spades'),
 (4, 'Diamonds'),
 (5, 'Clubs'),
 (5, 'Hearts'),
 (5, 'Spades'),
 (5, 'Diamonds'),
 (6, 'Clubs'),
 (6, 'Hearts'),
 (6, 'Spades'),
 (6, 'Diamonds'),
 (7, 'Clubs'),
 (7, 'Hearts'),
 (7, 'Spades'),
 (7, 'Diamonds'),
 (8, 'Clubs'),
 (8, 'Hearts'),
 (8, 'Spades'),
 (8, 'Diamonds'),
 (9, 'Clubs'),
 (9, 'Hearts'),
 (9, 'Spades'),
 (9, 'Diamonds'),
 (10, 'Clubs'),
 (10, 'Hearts'),
 (10, 'Spades'),
 (10, 'Diamonds'),
 ('Jake', 'Clubs'),
 ('Jake', 'Hearts'),
 ('Jake', 'Spades'),
 ('Jake', 'Diamonds'),
 ('Queen', 'Clubs'),
 ('Queen', 'Hearts'),
 ('Queen', 'Spades'),
 ('Queen', 'Diamonds'),
 ('King', 'Clubs'),
 ('King', 'Hearts'),
 ('King', 'Spades'),
 ('King', 'Diamonds'),
 ('Ace', 'Clubs'),
 ('Ace', 'Hearts'),
 ('Ace', 'Spades'),
 ('Ace', 'Diamonds')]

### Combinatorics in Python

In [10]:
import math
import itertools

comb = itertools.combinations(range(1, 7), 2)
ncomb = math.comb(5, 2)

combr = itertools.combinations_with_replacement(range(1, 7), 2)

perm = itertools.permutations(range(1, 7), 2)
nperm = math.perm(5, 2)

print(ncomb, [c for c in comb])
print('  ', [c for c in combr])
print(nperm, [p for p in perm])

10 [(1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 3), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6), (4, 5), (4, 6), (5, 6)]
   [(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 3), (3, 4), (3, 5), (3, 6), (4, 4), (4, 5), (4, 6), (5, 5), (5, 6), (6, 6)]
20 [(1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3), (5, 4), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5)]


### Combinatorics in Python

In [23]:
import pandas as pd

deck = pd.DataFrame(cards, columns=['value', 'suit'])

spades = deck.loc[deck['suit'] == 'Spades']
facecards = deck.loc[deck['value'].isin(['Jake', 'Queen', 'King', 'Ace'])]
deck.describe()

[]


Unnamed: 0,value,suit
count,0.0,0.0
unique,0.0,0.0
top,,
freq,,


### Sampling in Python

In [12]:
import pandas as pd

df = pd.DataFrame(
    ['G', 'G', 'G', 'G', 'G', 'G', 'R', 'R', 'R', 'R']
    ).sample(2, replace=False)
df

Unnamed: 0,0
3,G
9,R


# Exercises

## 1) System

Consider the given system of connected components.
Components 1 and 2 are connected in parallel, so that subsystem works if either I or II works.
Components III and IV are connected in series, that subsystem works if both III and IV work.
Suppose all components work independently of one another and the probability for each component to work is 0.8 calculate `P(system works)`.

$$P(I \lor II) = (0.8 + 0.8 - 0.8 \cdot 0.8) = 0.96$$
$$P(III \land IV) = (0.8 \cdot 0.8) = 0.64$$
$$P((I \lor II) \lor (III \land IV)) = 0.96 + 0.64 - 0.96 \cdot 0.64 = 0.9856$$

## 2) Blood Phenotypes

The proportions of blood phenotypes in the U.S. population are as follows:

| A   | B   | AB  | 0   |
|-----|-----|-----|-----|
| 40% | 11% |  4% | 45% |

Assuming that the phenotypes of two randomly selected individuals are independent of one another,
what is the probability that both phenotypes are 0?

In [13]:
0.45 * 0.45

0.2025

## 3) Pumps

Two pumps connected in parallel fail independently of one another on any given day.
The probability that only the older pump will fail is `.10`,
and the probability that only the newer pump will fail is `.05`.
What is the probability that the pumping system will fail on any given day (which happens if both pumps fail)?

$$P(\neg A \land \neg B) = P(\neg A) \cdot P(\neg B) = 0.1 \cdot 0.05 = 0.005$$

    1 - P(A or B) = 1 - ( P(A) + P(B) - P(A) * P(B) )
                  = 1 - (0.9 + 0.95 - 0.9 * 0.95)
                  = 0.005

## 4) Impurity

A chemical engineer is interested in determining whether a certain trace impurity is present in a product.
The prior probability of the impurity being present is `0.40`.
A method has a probability of `0.8` of detecting the impurity if it is present.
The probability of not detecting the impurity if it is absent is `0.9`.
What is the posterior probability that the impurity is present?

In [14]:
p = 0.4 # probability
s = 0.8 # sensitivity
z = 0.9 # specificity
ppv = (s * p) / (s * p + (1 - z) * (1 - p))
ppv

0.8421052631578948

## 5) Disease

One percent of all individuals in a certain population are carriers of a particular disease $(event D)$.
A diagnostic test for this disease has a $90\%$ detection rate for carriers $(event +|D)$ and a $5\%$ detection rate for non carriers $(event +|\bar{D})$.
Calculate the following probabilities and interpret each result.

$$ P(D|+); P(D|-); P(\bar{D}|+); P(\bar{D}|-) $$


|       | D     | not D  | Total  |
|-------|-------|--------|--------|
| +     | 0.009 | 0.0495 | 0.0585 |
| -     | 0.001 | 0.9405 | 0.9415 |
| Total | 0.01  | 0.99   | 1.0    |


    P(D|+) = 0.009/0.0585 = 0.1538462
    P(D|-) = 0.001/0.9415 =  0.001062135
    P(D̅|+) = 0.0495/0.0585 = 0.8461538
    P(D̅|-) = 0.9405/0.9451 = 0.9951328

## 6) Emission Inspection

Seventy percent of all vehicles examined at a certain emissions inspection station pass the inspection.
Assuming that successive vehicles pass or fail independently of one another, calculate the following probabilities:

1. P(all of the next three vehicles inspected pass)
2. P(at least one of the next three inspected fails)
3. P(exactly one of the next three inspected passes)
4. P(at most one of the next three vehicles inspected passes)
5. Given that at least one of the next three vehicles passes inspection, what is the probability that all three pass (a conditional probability)?

In [15]:
all_pass = 0.7 ** 3
at_least_one = 1 - all_pass
exactly_one = 3 * 0.7 * 0.3 * 0.3
at_most_one = 0.3 ** 3 + exactly_one
all_conditional = all_pass / (1 - 0.3 ** 3)