# Combinatorics and Probability

## Week 2. Binomial Coefficients

### 2.1.Combinations

*Rule Of Sum*

If there are n objects of the first type and there are k objects of the second type, then there are n+k objects of one of two types.

In [None]:
print(['Alice', 'Bob', 'Charlie']  + [0, 1, 2, 3, 4])

*Rule of Product*

If there are n objects of the first type and there are k objects of the second type, then there are n x k pairs of objects, the first of the first type and the second of the second type.

In [None]:
from itertools import product
for p in product(['a', 'b', 'c'], ['x', 'y']):
    print("".join(p))

*Tuples*

The number of sequences of length k composed out of n symbols is n**k

In [None]:
from itertools import product
for p in product("ab", repeat=3):
    print("".join(p))

*k-Permutations*

The number of sequences of length k with no repetitions composed out of n symbols is  

$n(n-1)...(n-k+1) = \frac {n!}{(n-k)!}$

In [None]:
from itertools import permutations
for p in permutations("abcd", 2):
    print("".join(p))

Combinations

k-combination is a subset of set S with size k
The number of k-combinations of an n element set is deonoted by ${n \choose k}$  
Pronounced "n choose k"

Theorem:

${n \choose k} = \frac{n!}{k!(n-k)!}$

Proof:
Need to count the number of subsets of size k of a set size n  
There are:  
n choices for the 1st element  
n-1 choices for the 2nd element  
n-k+1 choices for the kth element  
this gives $n * (n-1) * (n-k+1) = \frac{n!}{(n-k)!}$ which is number of k-combinations!  
But for k-permutations "order" inside the subset doesnt matter at all ->  
it decreases amount of combinations by $k!$ and -> we get $\frac{n!}{k!(n-k)!}$



In [None]:
from itertools import combinations
for c in combinations("abcd", 2):
    print("".join(c))

#### Quiz. Forming Sport Teams

Q: In how many ways can one select a team of five students out of ten students?  
A: $\binom{10}{5}$


Q: In how many ways can one partition ten students into two teams of size five?
A: That's right! There are $\binom{10}{5}=252$  
252 ways of selecting the first team. When the first team is selected, the second team is uniquely determined. But this way, each pair of teams (A,B) is counted twice: when A is selected as the first team and when B is selected as the first team. Thus, the resulting count should be divided by two. This finally gives $\binom{10}{5}/2=126$ 

In [None]:
for i in range(10):
    print(i)

### Number of Iterations of Nested For Loops
Q: What will the following program print?
```
n = 10
count = 0

for i in range(n):
    for j in range(n):
        for k in range(n):
            if i < j and j < k:
                count += 1

print(count)
```
A: $\binom{10}{3}=120$ 

Q: What will the following program print?
```
n = 1000
count = 0

for i in range(n):
    for j in range(n):
        for k in range(n):
            if i < j and j < k:
                count += 1

print(count)
```
A: Exactly! Here we are counting all triples of different integers $i,j,k$ such that $1≤i<j<k≤1000$. It is the same as the number of ways of selecting three different integers out of $1000$ since one can always arrange them in decreasing order.  
Thus, the answer is $\binom{1000}{3}=\frac{1000 \times 999 \times 998}{2 \times 3}=166167000$

### 2.2 Pascals's Triangle

Forming a team property:  
forming a team with Alice: $\binom{n-1}{k-1}$  
forming a team without Alice: $\binom{n-1}{k}$  
hence, $\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}$

Theorem based on pascal triangle symmetry:  
$\binom{n}{k} = \binom{n}{n-k}$
Proven based on ini formula, more formally it is 1 to 1 correspondance between k and n-k elements

Theorem 2 based on pascals triangle symmetry:

For n > 0 (not first row),  
$\sum_{k=0}^{n}(-1)^{k}{n \choose k} = 0$

Combinatorial proof:
${n \choose 1} + {n \choose 3} + .. = {n \choose 0} + {n \choose 2} + ..$  
Its needed to establish 1 to 1 correspondance

#### Binomial Theorem

$(a+b)^2 = a^2 + 2ab + b^2$  
$(a+b)^3 = a^3 + 3a^2b + 3ab^2 + b^3$  
... Based on pascals triangle

$(a+b)^n = {n \choose 0}a^n 
+ {n \choose 1}a^{(n-1)} 
+ {n \choose k}a^{(n-k)} b^k
+ {n \choose n} b^nk $  
is equal to:  
$(a+b)^n = \sum_{k=0}^n {n \choose k}a^{(n-k)}b^k$  
$(a+b)^n = (a+b)(a+b)...(a+b)$  
${n \choose k}$ - binomial coefficient


### 2.3 Practice Counting

Q: what is the number of 5-card hands with two hearts and three spades?  
A: $ {13 \choose 2}{13 \choose 3} = 22 308$

Q: What is the number of non negative ints with at most 4 digits at least one of them is equal to 7?  
A: $10^4 - 9^4 = 3439$ 

Q: num of non-negative int with at most 4 digits whose digits are increasing:  
A: $ {10 \choose 4} = 206$

Quiz:

1. Q: what is the number of 6-card hands with 3 hearts and 3 spades  

A: ${13 \choose 3}{13 \choose 3} = 81796$

2. Q:  
What is the number of bit-strings (that is, strings consisting of 0's and 1's) of length 6 where the number of 0's is equal to the number of 1's?
For example, there are two such strings of length two: 01 and 10.  

A:  
Correct
Exactly! We just need to select three (out of six positions) of 0's.  
${6 \choose 3} = 20$

3. Q:  
What is the number of sequences of six digits where the number of even digits is equal to the number of odd digits?

For example, there are 50 such sequences of length two: 01, 03, 05, 07, 09, 10, 12, 14, 16, 18, ..., 90, 92, 94, 96, 98.
A: Exactly! We first select three positions for odd digits. For each of these three positions, we select one of five odd digits. For each of the remaining three positions, we select one of the five even digits.  
Overall, this gives $ {6 \choose 3} * {5^3} * {5^3} = 312500$

4. Q:In how many ways one can get from the bottom left cell to top right cell of a 9×9 grid, if each move is either 
two cells up or three cells to the right?  

A:  
0 ways

5. Q: In how many ways one can get from the bottom left cell to the top right cell of a 13×13 grid, if each move is either two cells up or three cells to the right?  

A:  
4 possible move to the right,  
6 possible moves to up,
$ {10 \choose 6} or {10 \choose 4} = 210$

_________________



## W.3 Advanced Counting

### 3.1 Combinations with Repetitions

Consider k = 2 (items, "length") and n = 3 (possible otpions), for example a,b,c

- Ordered with repetitions - 9 possible pairs: (a,a) ... (c,c);  
tuples $n^k$
- Ordered without repetitions - 6 possible pairs: (a,b), (a,c), (b,a), (b,c), (c,a), (c,b);  
k-permutations $\frac{n!}{(n-k)!}$
- Unordered selection without repetitions - 3 options: {a,b}, {a,c}, {b,c}  
combinations ${n \choose k} == \frac{n!}{k!(n-k)!}$
- Unordered selection with repetitions - 6 options: {a,b}, {a,c} ,{b,c}, {a,a}, {b,b}, {c,c}  
formula - ${k + n -1} \choose {n-1}$; n-1 - number of positions to put "delimeters", which split objects (example to split tomatoes and lettuce we need 1 delimeter)

#### Quiz. Salads

Q:
We have an unlimited supply of tomatoes, bell peppers and lettuce. We want to make a salad out of 4 units among these three ingredients (we do not have to use all ingredients). The order in which we use the ingredients does not matter. How many different salads we can make?

We do not have the formula to answer this question yet, so try to list all the salads first or create a program that will do that for you. Then you can count the number of salads by hand (note the answer to the problem should be the number).

A:  
Here T=`tomato', B=`bell pepper', L=`lettuce':
```
from itertools import combinations_with_replacement
for c in combinations_with_replacement("TBL", 4):
    print("".join(c))```; 15 options

#### Quiz. Combinations with Repetitions

Q:
Twenty people are voting for one of 5 candidates. They have secret ballot, each voter votes for one of 5 candidates. The result of an election is the number of votes for each of the candidate. How many possible results can this vote have (the result of the vote is determined by the number of votes for each candidate)? 

A: k = 20; n = 5; ${24 \choose 4} = 10626$

Q : We have 9 identical candies and we want to distribute them between 3 different sections of our bag. It does not matter which candies go to which section. How many ways do we have to do it?  
A: k = 3; n = 9  ${11 \choose 2} = 55$

### 3.2 Problems in Combinatorics

#### Quiz. Distributing assignments among people

Q: Suppose there are 4 people and 9 different assignments. Each person should receive one assignment. Assignments for different people should be different. How many ways are there to do it?  

A: 
- ppl are different -> selection is ordered 
- no assignment can be given to 2 ppl -> no repetitions
- k-permutations
$\frac{9!}{5!}=3024$

Q: There are 4 people and 9 different assignments. We need to distribute all assignments among people. No assignment should be assigned to two people. Every person can be given arbitrary number of assignments from 0 to 9. How many ways are there to do it?  

A: 
- Each person recieve several assignments

- Hint: look from the assignment POV: each assignment can be given to any of 4 ppl
- $4^9=262144$

#### Quiz. Distributing Candies Among Kids

Q: There are 15 identical candies. How many ways are there to distribute them among 7 kids?  
A: distribution is unordered, repetitions are also possible  
(looks like  salad problem), k = 15, n = 7  
${{k + n -1} \choose {n-1}} = {{21} \choose {6}} = 54264$ 

Q: There are 15 identical candies. How many ways are there to distribute them among 7 kids in such a way that each kid receives at least 1 candy?
A: 
This is correct! We can just give one candy to each kid. This way we will satisfy the requirement (note that we have to give out these candies). Now for the remaining candies we have exactly the same situation as in the previous problem. With each of 8 remaining candies we pick one of the kids. Order does not matter, repetitions are allowed. Thus we have combinations of size 8 out of 7 options with repetitions.  
k = 15, n = 7  => k = 8, n = 7
${{k + n -1} \choose {n-1}} = {{14} \choose {6}} = 3003$ 

#### Quiz. Numbers with Fixed Sum of Digits

Q: How many non-negative integer numbers are there below 10000 such that their sum of digits is equal to 9?  
A: It is convenient to look from the point of view of digits in the number. There are four digits in the number and we split the weight 9 among them. We can start with all digits being equal to 0 and then add 1 to one of them nine times in a row. So, each time we have to pick one of four digits. These choices are unordered and the repetitions are allowed. So we are dealing with combinations of size 9 out of 4 options with repetitions  
k = 9, n = 4  
${{k + n -1} \choose {n-1}} = {{12} \choose {3}} = 220$ 

Q: How many non-negative integer numbers are there below 10000 such that their sum of digits is equal to 10?  
A: Looks like the previous problem except cases where weight is splitted the following way:

10 _ _ _  
_ 10 _ _  
_ _ 10 _  
_ _ _ 10  
k = 10, n = 4 
${{k + n -1} \choose {n-1}} = {{13} \choose {3}} - 4 = 282$ 

#### Quiz. Problems in Combinatorics Quiz

1. Alice has 7 different textbooks. She would like to lend three books to Bob for a weekend. How many ways does she have to do it?  
A: $ {7 \choose 3} = 35$

2. Alice has 7 textbooks and Bob has 5 textbooks. All textbooks are different. They would like to exchange three books each between each other for a weekend. That is, Alice gives Bob three of her books and Bob gives Alice three of his books. How many ways do they have to do it?  
A: $ {7 \choose 3} * {5 \choose 3} = 350$

3. Five married couples are planning a barbecue. They need to pick three couples who will be responsible for bringing food. How many ways do they have to do it?  
A: $ {5 \choose 3} = 10$

4. Five married couples are planning a barbecue. They need to hold a meeting dedicated to the planning. The meeting should consist of five people, one from each couple. How many possible ways do they have to pick people for the meeting?  
A: $2^5=32$

5. Five married couples gathered for a barbecue. They need to pick three people among them who will be responsible for preparing the table. But they do not want to pick two people from the same couple for this (this would not be fair). How many ways do they have to pick three people satisfying this requirement?  
A: $ \frac{{10 \choose 1}{8 \choose 1}{6 \choose 1}}{3!} = 80$

6. In a 6 number lottery one is trying to guess an unordered subset of 6 numbers among 44 numbers without repetitions. For this one picks 6 numbers out of 44 himself. How many ways are there to do this? You can use wolfram alpha to compute the exact number.

A: $ {44 \choose 6} = 7059052$

7. In a 6 number lottery one is trying to guess an unordered subset of 6 numbers among 44 numbers without repetitions. After the lottery the organisers decided to count how many possible ways are there to guess correctly exactly three numbers. What is the answer to this question? You can use wolfram alpha to compute the exact number.

A: To answer that question it is needed:
1. to pick 3 numbers from 6 winning numbers
2. to pick 3 numbers from 44 - 6 winning numbers
3. order doesnt matter  
${6 \choose 3} * {38 \choose 3} = 8436 * 20 = 168720$


## W.4 Probability

### 4.1 What is Probability?

Galton board (B machine) - 
seems like a machine to display normal distribution, looks like it is quite close to the coinflip, normal distribution and pascal triangle: ${n \choose k} / 2^n$

#### Quiz. Concentration for Galton Board

Probability for the galton board in the case beans are divided evenly all the time is following:
- for the numerator, z: z = (x + y)/2 (pascal triangle)
- for the denominator: $2^n$, n - row number

1. What is the fraction of beans that end near the center (bins 40-60 among 0-100) for the ideal Galton board with 100 layers? (You may try to write a program, or use some scientific computations tool. For such a problem wolfram alpha should be enough.)

A: According to the script below or by formula answer is the following:  
$\sum_{k=40}^{60}\frac{100 \choose k}{2^{100}} = 0.9647$


2. What is the fraction of beans that end near the center (bins 400-600 among 0-1000) for the ideal Galton board with 1000 layers? (You may try to write a program, or use some scientific computations tool. For such a problem wolfram alpha should be enough.)

A: $\sum_{k=400}^{600}\frac{1000 \choose k}{2^{1000}} = 0.9999999998$

In [None]:
def galton_board_fraction_counter(left_bin, right_bin, layers):
    import scipy.special
    target_fraction_of_beans = []
    for curr_bin_number in range(layers):
        numerator = scipy.special.binom(layers-1, curr_bin_number)
        denominator = 2**(layers-1)
        # we are interested only in certain bins
        if left_bin <= curr_bin_number and curr_bin_number <= right_bin:
            target_fraction_of_beans.append(numerator/denominator)
            
    return sum(target_fraction_of_beans)
    
# example - pascal triangle with 5 layers (aka 5 bins with ids [0,1,2,3,4])
# we want to count fraction between bin №1 and bin №3
galton_board_fraction_counter(400, 600, 1000)

#### natural sciences and mathematics notes:
"Probability theory says that the coin gives heads and tails equally often" - FALSE, because probability theory is a part of mathematics and it doesnt care about coin, it only cares about implication of mathematical model.

probability space - all outcomes  
event - some favorable outcomes  
probability = event / probability space  
independent probabilities - while rolling 2 dices all 36 probabilities has the same frequency
equiprobability

#### Quiz. Computing probabilities for two dice

1. 
Consider again the setting with red and blue dice where all 36 pairs are equiprobable. Consider the event "red and blue numbers are different". How many favourable outcomes do we have for this event (out of 36)  
A: $n^k - 6 = 6^2 - 6 = 30$, substracting cases  (11,22,...,66)

2. In the same setting with 36 (red, blue) outcomes we consider the sum of two numbers (on two dice). What is the most probable value of this sum?  
A: 7, bcs there are the most ways to get it: 1+6, 2+5, 3+4, 4+3, 5+2, 6+1

3. What is more probable while rolling two dice: to get at least one six, or to have no sixes?  
A: No sixes happen in 25 cases (5 times 5) out of 36, and 25/36 > 1/2 (so we have only 36-25=11 favorable outcomes for the other event)

Difference between combinatorics and probability theory is that probability theory cares about:
- independence
- non-uniform distributions
- unknown distributions
- continuous distributions

while combinatorics focus on counting things.

#### Quiz. Computing Probabilities: more examples

1. Six people, including A,B, and C, form a queue in a random order (all 6! orderings are equiprobable). Consider the event "A is the first in the queue". (This event does not mention B, C or other people in the queue: they will appear in the other questions.) What is its probability?  
A: All queues can be split into six classes depending on who is the first. All the classes have the same cardinality, so each is 1/6 of the total space.  
1/6

2. Again six people, including A,B, and C, form a queue in a random order (all 6! orderings are equiprobable). Consider the event "A precedes B in the queue". (Again this event does not mention C or other people in the queue.) What is its probability?  
A: 1/2, A and B equiprobable and A can be in front or behind with equal chance

3. Again six people, including A,B, and C, form a queue in a random order (all 6! orderings are equiprobable). Consider the event "B is between A and C in the queue". What is its probability? (The order of A and C can be arbitrary, but B should be between them)  
A: 1/3

4. Again six people, including A,B, and C, form a queue in a random order (all 6! orderings are equiprobable). Consider the event "A and B are neighbors in the queue". What is its probability?  
A: Here we cannot use the symmetry (at least I don't see how this helps). If the position of A and B (their ordinal numbers in the queue) are fixed, then there are 4! outcomes. To fix the position where A and B are neighbors, we choose between 5*2 positions (there could be 0..4 people before them and there are two orderings of A and B - A may be before B or after). So we get 5⋅2⋅4!/6!=1/3.

### 4.2 Probability: Do's and Don't's

Unequal probabilities (not equiprobable outcome) isnt a problem if it can be stabilizend around some probability p.
Sum of all events still should be equal to 1

event A = {1,2,3}; Pr[A] = p1 + p2 + p3  
event B = {5,6}; Pr[B] = p5 + p6  
A or B = {1,2,3,5,6}
Pr[A or B] = Pr[A] + Pr[B]

#### Quiz. Fair decisions and imperfect coins

1. Assume that some dice has six outcomes with probabilities.  
$p_1 = 1/10$  
$p_2 = 2/5 = 4/10$  
$p_3 = 1/5 = 2/10$  
$p_4 = 1/10$  
$p_5 = 1/10$  
$p_6 = 1/10$  
You and a friend came to the restaurant and want to roll this dice once to decide who pays, so that both have equal chances. Can this be done? [Here "dice" stands for one cube, may be may be better to use "die" instead, sorry]  
A: Y, $p_2 + p_4 = 1/2$


2. Same dice as above. You and two friends came to the restaurant and want to roll this dice once to decide who pays, so that all three eaters have equal chances. Can this be done?  
A: No, 10, cannot be divided by 3. The probabilities of individual outcomes are finite decimal fractions, so the probablity of any event (their sum) is also a finite decimal fraction, so we cannot get 1/3=0.33333... exactly

3. Now you want to design a (six-face) dice with sum 1 are allowed) that can be used for two, three or four people (rolling the dice once, we can provide equal chances for all eaters). Is it possible?  
A: Consider the sum 1/4+1/12+1/6+1/6+1/12+1/4. You may now group it into groups with sum 1/4 (note than 1/12+1/6=1/4), or into groups with sum 1/3 (1/4+1/12=1/6+1/6=1/3). This is enough. (What happens if you cut the same log into three and four pieces not moving the pieces inbetween?)

4. Now you want to design a (six-face) dice with sum 1 are allowed) that can be used for two, three, four or five people (rolling the dice once, we can provide equal chances for all eaters). Is it possible? 
A: Indeed, if there is a division into five groups, and the total number of outcomes is 6, then there should be at least four isolated outcomes (events with only one outcome) of size 1/5. Then this pieces make fair division for four impossible, since each of them should go into its own group, and 1/20 is missing, while we have only two pieces that can be used to fill the gap.

#### More about finite spaces & Questions that not make any sense:
If A and B are mutually excusive  
Pr[A or B] = Pr[A] + Pr[B]  
Pr[not A] = 1 - Pr[A]  
"A" and "not A" are mutually exclusive and fill the entire space (You can either see crocodile or don't)  


The most danger of probability theory is not that you cannot compute something. The danger happens if you beliveve that you have compute something and this is wrong and you make a wrong decision thinking because it is based on probability theory.

Think to be careful about:
- Probability about something in the past, it should be related to mathematical probability theory
- Example what is the probability that I have a dollar bill in my pocket? Impossible to answer without any data cause it has no meaning.
- Large prime numbers are useful for cryptography
- Good algorithm depending on the internal coin give wrong answer with probability < $10^{-2}$, however most of them will be incorrect - OK
- "The number $(2^{5240707} - 1)/75392810903$ is prime with probability > 0.99 - NOT OK, cause no data to say such things

Probability theory extension:
- infinite probability spaces
- market

#### Quiz. Inclusion-Exclusion Formula

1. It is known that two events A and B in some probability space have probabilities 0.7 and 0.8. What is the minimal possible probability of an event "A and B" (the intersection of both events)?  
A: P(A or B) = P(A)+P(B) - P(A and B)  
P(A or B) <= 1;  
P(A and B) = 1.5 - P(A or B)  
P(A and B) >= 0.5  
Minimal P(A and B) = 0.5

2. It is known that two events A and B in some probability space have probabilities 0.7 and 0.8. What is the maximal possible probability of an event "A and B" (the intersection of both events)?  
Q: This happens when A is a part of B (and this is the maximal possible probability, since "A and B" is a part of A and it is 0.7

### 4.3 Conditional Probability

US: age 65 and over: about 14% (wikipedia) ==  
"For a random American the probability to be at least 65 years old is 0.14"  
Random selection: difficult task for pollsters

Example 1

P(Male among ppl > 65 y.o in USA) = 0.43  
P(be at least 65 y.o.) = 0.14  
P(Male > 65 y.o among typical USA citizen) =  
 P(Male among ppl > 65 y.o in USA) x P(be at least 65 y.o.) =  
0.43 x 0.14 = 0.06  
P(A and B) = P(B) * P(A|B)

Example 2  
- Probability space: 6 outcomes 1, 2, 3, 4, 5, 6  
- A: "at least 3" (3, 4, 5, 6)  
- B: "even number" (2, 4, 6)  
- A and B: even number at least 3 (4,6)  
P(A and B) = P(A) x P(B|A)  
P(A and B) = 4/6 x 2/4 = 1/3  
P(B|A): the fraction of B-outcomes among A-outcomes (for equiprobable outcomes)  
P(B|A) = P(A and B)/P(A)

#### Quiz. Computing Conditional Probabilities

1. We roll a dice with 6 equiprobable outcomes. Compute the conditional probability.  
A:
P(outcome is odd ∣ outcome is even)  
Outcome_is_odd = (1,3,5)  
Outcome_is_even = (2,4,6)  
P(Outcome_is_odd AND Outcome_is_even) = 0  
P(outcome is odd ∣ outcome is even) = 0/0.5 = 0


2. Six people form a queue, including A and B (all 6! orderings are equiprobable). What is the probability that B is before A under the condition that A is not the first?  
A:
P(A_is_not_first) = (6!-5!)/6! = 5/6  
P(B before A) = 1/2  
P(B before A | A_is_not_first) = P(B before A),  
because The event "B is before A" is a part of the event "A is not the first"  
P(B before A) = 1/2  
P(B before A)/P(A_is_not_first) = $\frac{1/2}{5/6} = 3/5$


3. We roll two dice (all 36 outcomes are equiprobable). What is the conditional probability to have 1 on the first dice under the condition that the sum of two numbers is 6?  
P(1st_dice_equal_1) = 1/6  
P(sum_equal_6) = 5/36  
P(1st_dice_equal_1 AND sum_equal_6) = 1/36   
P(sum_equal_6) = 5/36  
P(1st_dice_equal_1 | sum_equal_6) =  
P(1st_dice_equal_1 AND sum_equal_6)/P(sum_equal_6) =  
(1/36) / (5/36) = 1/36 * 36/5 = 1/5  

4. Recall the puzzle about the prisoner and the king. Imagine that the prisoner puts 1 white ball in one box, and 14 white and 15 black balls in the other box. Then the king chooses a random box and then chooses a random ball inside this box. What is the conditional probability of the event "the king chooses the second box" under the condition "the ball chosen by the king was white"?  
P(chosen_ball_is_white) = 1/2 + 14/29 * 1/2 = 29/58 + 14/58 = 43/58  
P(king_choses 2nd box) = 1/2  
P(king_choses 2nd box|chosen_ball_is_white) =  
P(chosen_ball_is_white AND king_choses 2nd box)/P(chosen_ball_is_white) =  
P(chosen_ball_is_white AND king_choses 2nd box) = "king chooses the white ball from the second box" = (1/2)(14/29)/(43/58) = (14/58)/(43/58) = 14/43

#### How reliable is the test?

- disease: 1% of population
- test false negative rate 10% (negative for 1/10 of ppl with the disease)
- test false positive rate 10% (positive for 1/10 of ppl w/o the disease)

Q: fraction of having the disease among the test-positive = ?  
P(D) = 0.01 (probability of the disease);  
P(T|D) = 0.9 (conditional probability of test if I am really having disease);  
P(T|~D) = 0.1 (false positive)  
P(D|T) = ?

P(T and D) = P(D)P(T|D) = 0.01 x 0.9 = 0.009  
P(T and ~D) = P(~D)P(T|~D) = 0.99 x 0.1 = 0.099  
P(T) = P(T and D) + P(T and ~D) = 0.009 + 0.099 = 0.108  
P(D|T) = P(D and T)/P(T) = 0.009/0.108 = 8.3%  
The law of total probability:  
P(A) = P(B) x P(A|B) + (1-P(B)) x P(A|~B) = 1

#### Bayes Theorem

Example: email with header "MESSAGE FROM BANK OF AMERICA"  
From: <info4@water.ocn.ne.jp>  
Looks like a scam, but, why? 3 components of Bayes reasoning:
- many scam messages use foreign email address
- foreign email address is rather unusual in general
- scam messages are quite frequent now
---

- H for hypothesis
- E for evidence  
$P(H|E) = \frac{P(H and E)}{P(E)} = \frac{P(E|H)P(H)}{P(E)}$  
meaning is the following:  
We are comparing probability of H with condition E or without this condition.  
Without condition E probability of H looks like P(H)  
With the condition E factor $\frac{P(E|H)}{P(E)}$  
shows how much E becomes more probable if the condition H is added  
---
Going back to the emails:
$P(H|E) = \frac{P(E|H)}{P(E)} P(H)$
- H message is a scam
- E message uses foreign address
P(H|E) is high because:
1. P(E|H) is quite high
2. P(E) is low
3. P(H) is not low  
"Foreign address make the scam hypothesis much more probable because it appears in scam messages much more often than in general"

If condition B increases probability of A by factor k, then condition A increases the probability of B by same factor k

#### Quiz. Prisoner, King and Conditional Probabilities
There are two boxes; the first one contains 10 white balls and 5 black balls; the other one contains 10 black balls and 5 white balls. King randomly selects a box (with equal probabilities) and then randomly takes a ball from this box (with equal probabilities). What is the probability that King selected the first box under the condition that the ball he selected is white?

P(1st_box)=1/2  
P(wb)=1/2  
P(wb|1st_box)=2/3

$P(1st|box) = \frac{P(wb|1st_box) x P(1st_box)}{P(wb)}$  
$\frac{2/3 * 1/2}{1/2} = 2/3$

#### Conditional Probablity: A paradox

Example: Mary tossed a fair coin twice (probability space). At least one of outcomes is a tail (condition C). What is the probability that she got tails (event E)?  
$P(E|C) = \frac{P(E and C)}{P(C)} = \frac{3/4}{1/4} = 1/3$
- Information is not the condition!

#### Quiz. More Conditional Probabilities
1. Mary tosses the coin three times. What is conditional probability of the event "all tails" under the condition "at least two tails"?  
A: 4 options to get at least two tails; 1 option to get all tails: 1/4

2. The conditional probability Pr[B|A] is 4/5; the conditional probability P[B|not A] is 2/5, and the unconditional probability of B is 1/2. What is the probability of A?  
P(B) = P(A) x P(B|A) + (1-P(A)) x P(B|~A)  
1/2 = 4/5 x P(A) + 2/5 - 2/5 x P(A)  
1/2 - 2/5 = 2/5 x P(A)  
1/10 = 2/5 x P(A)  
P(A) = 1/4

####  Past and Future

If you see 999 heads while tossing a coin what about the next coin tossing?
- The model says "still 1/2":
$2^{-1000}/2^{-999} = 1/2)$

From the probability POV: needed to study consequences of a given model   
From the statistics POV: finding a better model (two heads coin for example)
- Crazy statistician brings the bomb to a plane because he thinks 2 bombs on the same plane are quite improbable

#### Independence

- independent events:  
A and B are independent if P(A|B) = P(A)  
then P(A and B) = P(A) x P(B) - definition of independence aka 'product rule'

- dependent events:  
P(A and B) != P(A) x P(B)  
P(A and B) > P(A) x P(B)  
then P(A|B) > P(A), condition B makes A more probable ->  
then condition A makes B more probable

Mathematical independence is not real life independence!
aka: correlation != causation  
P(ill|visiting a doctor) > P(ill) - Mathematicaly true, but be careful with conclusions!

#### Quiz. More About Independence

1. Two events A and B are independent, Pr[A}=p, Pr[B]=q. What is the probability of the event "A and B"?  
A: pq
2. Two events A and B are independent, Pr[A}=p, Pr[B]=q. What is the probability of the event "A or B"?  
A: p+q-pq
3. Two events A and B are independent, Pr[A}=p, Pr[B]=q. What is the probability of the event "neither A nor B"?  
A: 1-(p+q-pq)
4. Two events A and B are independent. Does it imply that the events "not A" and "B" are independent?  
A: Yes
5. Two events A and B are independent. Does it imply that the events "not A" and "not B" are independent?  
A: Yes
6. Events A and C are independent. Events B and C are independent. Does it imply that the event (A and B) and the event C are independent?  
A: No, not necessarily


### 4.4 Monty Hall Paradox

- Try to think about repetitive experiment to solve some probability theory puzzles

#### Quiz. Monty Hall Gone Crazy

1. Imagine that now host have the following instructions. Put a prize behind a random door. Let the guest guess a door.  

1) If the guest chooses an incorrect door (with no prize), roll a dice (in such a way that the guest does not see this and does not know whether this happened);  
a) with probability 1/3 (outcomes 1 and 2) open the door (that has no prize behind); the game ends;  
b) with probability 2/3 (outcomes 3,4,5,6) open the other door with no prize and ask the guest whether she wants to change the guess.  



2) if the guest chooses a correct door (with a prize), open one of the two other doors (making a random choice) and ask the guest whether she wants to change the guess.  
What is the probability for the guest to get a prize if she uses "change" strategy (i.e., changes the guess)? We consider the fraction of winning days among all days (when she was given a chance to change or when she was not).

A: In 2/3 of the days the initial guess was wrong. With probability 2/3, i.e., in 4/9 of all the days, the guest will be given a chance to change the guess and will win (assuming the "change" strategy is used). In 1/3 of the days the initial guess is correct, and then the "change" stategy fails.  
4/9

2. The same scenario is used (as in the previous exercise). Put a prize behind a random door. Let the guest guess a door.

1) If the guest chooses an incorrect door (with no prize), roll a dice (in such a way that the guest does not see this and does not know whether this happened);

a) with probability 1/3 (outcomes 1 and 2) open the door (that has no prize behind); the game ends;

b) with probability 2/3 (outcomes 3,4,5,6) open the other door with no prize and ask the guest whether she wants to change the guess.

2) if the guest chooses a correct door (with a prize), open one of the two other doors (making a random choice) and ask the guest whether she wants to change the guess.

What is the probability for the guest to get a prize if she uses "keep" strategy (i.e., does not change the guess)? We consider the fraction of winning days among all day (when she was given a chance to change or when she was not).  
A: Here the reasoning is simpler and only the initial guess matters: the guest wins if and only if the initial guess is correct (which happens in 1/3 of the days).  
1/3

## W.5 Random Variables

### 5.1 Random Variables and Expectations

Events can answer only yes or no questions.  
To study numerical characteristics of random outcomes random variables are used. 

- Random variable f - variable which value is determined by a random experiment.  
- For the probability distribution on the finite set X of k outcomes with probabilities $p_1, p_2, ... p_k$
- To define f let's assign number $a_i$ to each outcome
Then f has value $a_i$ with probability $p_i$

#### Quiz. Random Variables

1. Suppose we throw a dice with numbers from 1 to 6 on its sides two times in a row. In the following list pick those elements that are random variables (and not events).
    - The sum of numbers on both dices (random variable)
    - The number in the first throw is greater than the number in the second throw (event)
    - The difference between numbers on the first and the second throw (random variable)
    - The difference between numbers in the first and the second throw is positive (event)
    - The product of numbers on both dices is even (event)
    - The number on the first dice (random variable)



2. Suppose we toss a coin two times in a row. Consider the random variable that is equal to the number of heads in these throws. What is the probability of the event that this random variable has value 1?
A: There are four possible outcomes: (heads, heads), (heads, tails), (tails, heads), (tails, tails). And two of them fall in our event: (heads, tails) and (tails, heads). 2/4 = 0.5

#### Average outcome of dice throw

Expected value (expectation of a dice throw) is calculated the following way:
- n - number of times dice have been thrown

$\frac{n}{6} x 1 + \frac{n}{6} x 2 + \frac{n}{6} x 3 + \frac{n}{6} x 4 + \frac{n}{6} x 5 +
\frac{n}{6} x 6 = \frac{n(1+2+3+4+5+6)}{6} = 3.5n$

The following code simulates an experiment we discussed in the previous video: we simulate a dice throw 100000 times and compute the average value. Run this code and compare the value with the result we obtained in the video.

In [None]:
from random import randint, seed
from datetime import datetime

seed(datetime.now())

num_rounds = 10**5
sum_of_values = 0

for _ in range(num_rounds):
    sum_of_values += randint(1, 6)
    
print("The average is {}".format(sum_of_values/(num_rounds*1.0)))

#### Quiz. Average

1. Suppose number a is at least 1 and at most 5, and number b is at least 2 and at most 7. What is the smallest possible value for the average of a and b?  
A: 3/2

2. Suppose number a is at least 1 and at most 5, and number b is at least 2 and at most 7. What is the largest possible value for the average of a and b?
A: 6

3. Student got scores 78, 85, 87 and 93 on four tests. What is his average score?  
A: (78+85+87+93)/4 = 85.75

4. Suppose we throw a coin many times. Consider a random variable that is equal to 1 if the outcome of a throw is 'tails' and that is 0 if the outcome is 'heads'. What is the approximate value of an average outcome?

#### Expectation

Suppose we have random variable f on the distribution with 4 outcomes  
Probabilities of outcomes are p1, p2, p3, p4  
Values of f are a1,a2,a3,a4; $a_i$ occurs about $p_{i}n$ times
Lets repeat experiment many times!

On average we have:
$Ef = \frac{{a_{1}p_{1}n}+{a_{2}p_{2}n}+...+{a_{4}p_{4}n}}{n} = a_{1}p_{1} + a_{2}p_{2} + a_{3}p_{3} + a_{4}p_{4}$  
An approximation to what we would expect as an average outcome of an experiment repeated many times.



#### Quiz. Expectations

1. Consider a random variable with outcomes 0 and 1 having probabilities 1/3 and 2/3 respectively. What is the expected value of this random variable?  
A: This is correct! Indeed by the formula we have that the expected value is equal to $0\times \frac 13 + 1 \times \frac 23 = \frac 23$

2. Consider a random variable with outcomes 1, 3 and 4 having probabilities 1/4, 1/2 and 1/4 respectively. What is the expected value of this random variable?  
A: Correct
This is correct! Indeed by the formula we have that the expected value is equal to $1\times \frac 14 + 3 \times \frac 12 + 4 \times \frac 14 = \frac{11}{4}$

### 5.2 Linearity of Expectation

Suppose there are random variables f and g on the same probability space.  
Then $E(f+g) = Ef + Eg$, True by definition of Ef

#### Quiz. Linearity of Expectation

1. Suppose we throw a coin 4 times in a row. What is the expected number of tails in these throws?
A: Indeed, for each throw we can consider a random variable that is equal to 1 if the outcome is `tails' and that is equal to 0 if the outcome is `heads'. Then the number of tails as a random variable is equal to the sum of these random variables over all outcomes. The expectation of each of introduced random variables is equal to 1/2 as computed in the video. So the expectation of the sum is equal to $4 \times \frac 12 = 2$

2. Suppose we throw a dice 4 times in a row. What is the expected number of outcomes 1 in these throws?
A: Indeed, for each throw we can consider a random variable that is equal to 1 if the outcome is `one' and that is equal to 0 for all other outcomes. Then the number of outcomes 1 as a random variable is equal to the sum of these random variables over all outcomes. The probability of the value 1 for our random variables is 1/6 and the probability of the value 0 is 5/6. Thus the expectation of each of introduced random variables is equal to $1\times \frac 16 + 0 \times \frac 56 = \frac 16$. So the expectation of the sum is equal to $4 \times \frac 16 = \frac 23$

#### Quiz. Bob's Party

1. Bob has a birthday and is throwing a party for his friends. He invited 30 people to the party. From the previous experience Bob knows that each of his friends will show up to the party with probability 2/5 independently of others. What is the expected number of guests on Bob’s party?  
A: To compute its expectation it is convenient to break this random variable into the sum of 30 more simple random variables. For this for each friend of Bob that he invited to the party introduce a random variable that is equal to 1 if this person showed up at the party and is equal to 0 otherwise. Then the number of guest who showed up at the party is equal to the sum of all 30 of these random variables. For each of the guest the probability that the corresponding value is equal to 1 is 2/52/5. So the expectation of each of our simple random variables is equal to 2/52/5. So the expected number of guests at the party is $30\times \frac 25 = 12$

#### Birthday Problem

Consider 28 randomly chosen ppl. Consider the number of pairs (i,j) such that i-th person has a birtday on the same day as the j-th person. Show that the expectation of this number is greater than 1.  
- We assume that birthdays are distributed uniformly among 365 days of the year.
- Linearity of expectation will be used. Lets denote the number of pairs of people with the same birthday by f.
- Enumerate people from 1 to 28; consider a random variable $g_{ij}$ that is equal to 1 if persons i and j have birthday on the same day, and is equal to 0 otherwise.
- Observation: f is equal to the sum $g_{ij}$ over all unordered pairs of i and j!

Lets compute the expectation of individual $g_{ij}$:
$Eg_{ij} = 1 x \frac{1}{365} + 0 x \frac{364}{365} = \frac{1}{365}$  
There are 365 outcomes with birthdays on the same day
There are 28 ppl in total, so there are:
$28 \choose 2$ = 378 ways to choose and unordered pair among them.  
$Ef = 378 \times \frac {1}{365} > 1$

In [None]:
import random
def random_28_ppl_birthday_sample():
    l = []
    for i in range(28):
        l.append(random.randint(1, 366))
    if len(l) == len(set(l)):
        return 0
    return (len(l) - len(set(l)))

def Expectations_to_have_bday_based_on_nsamples(n):
    r = []
    for i in range(n):
        r.append(random_28_ppl_birthday_sample())
    print('ideal = 1,035616')
    return sum(r)/n

In [None]:
Expectations_to_have_bday_based_on_nsamples(10000)

#### Quiz. More Linearity

Correct
This is correct! In case of this problem it is gets really hard to compute this expectation directly. But linearity helps.

Some thoughts:  
First. the probability to obatin a head after tail es 1/4. buy, If you toss a coin 20 times in fact you have to "check" the results 19 times (first check is first and second, seconc check: second and third and so on) So finally you have a value of 19 checks, So the expected value is 19/4. Is my reasoning right?

For each ii from 1 to 19 we can consider the random variable that is equal to 1 of we have `heads' in the throw number ii and `tails' in the throw number i+1i+1 and is equal to 0 otherwise. Then we can observe that the random value in the formulation of the problem is the sum of our 19 new random variables. The probability that a new random variable obtains value 1 is 1/4, since we need two coins to give us specific outcomes. The probability of this variable to be 0 is this 3/4. So the expectation of new variables is $1\times\frac 14 + 0 \times \frac 34 = \frac 14$. Since there are 19 of these random variables the expectation of their sum is $19\times \frac 14 = \frac{19}{4}$

### 5.3 Expectation is Not All

A dice game:

Alice = [2,2,2,2,3,3]  
Bob = [1,1,1,1,6,6]
Let's see, who has a better expected value of a dice throw  
Alice has $2 x \frac{2}{3} + 3 x \frac{1}{3} = \frac{7}{3}$  
Bob has $1 x \frac{2}{3} + 6 x \frac{1}{3} = \frac{8}{3}$  
Who wins more often?  
Note that the winner depends only on Bob's throw: if he throws 1 he definitely loses, if he throws 6 he definitely wins.

In [None]:
from random import randint, seed
from datetime import datetime

seed(datetime.now())

dice1=[2, 2, 2, 2, 3, 3]
dice2=[1, 1, 1, 1, 6, 6]

num_rounds = 10**5

assert len(dice1) == 6 and len(dice2) == 6

num_dice1_wins = 0
num_dice2_wins = 0

for _ in range(num_rounds):
    dice1_result = dice1[randint(0, 5)]
    dice2_result = dice2[randint(0, 5)]

    if dice1_result > dice2_result:
            num_dice1_wins += 1
    elif dice2_result > dice1_result:
            num_dice2_wins += 1

if num_dice1_wins > num_dice2_wins:
        print("The dice {} is better than {}:\nout of {} rounds it won {} times more".format(dice1, dice2, num_rounds, num_dice1_wins-num_dice2_wins))
elif num_dice2_wins > num_dice1_wins:
        print("The dice {} is better than {}:\nout of {} rounds it won {} times more".format(dice2, dice1, num_rounds, num_dice2_wins-num_dice1_wins))
else:
        print("A tie")

### Markov's Inequality

Lottery ticket costs 10 dollars. A 40% a lottery budget goes to prizes. Show that the chances to win 500 dollars or more are less than 1%.

- We will use proof by contradiction
- Assume the contrary: the probability to win 500 dollars or more is at least 0.01.
- Denote the number of tickets sold by n
- Then the budget of the lottery is 10n dollars
- 10n X 0.4 = 4n dollars spent by prizes
- By our assumption at least $\frac{n}{100}$ tickets win at least 500 dollars
- In total these tickets win $\frac{n}{100} x 500 = 5n$ dollars
- This exceeds the total prize budget of 4n!
- We arrived into contradiction and the problem is solved -> chances to win 500 dollars or more are less than 1%.

#### Quiz. Average Income

An internet article claims that 10% of citizens of a certain country earn at least 15 times more money than the average income in this country. Can it be the case?


A: No, this is impossible

Indeed, suppose the statement is true. Denote the number of people in the country by nn and the average income by s. Then the total income of all people in the country is $n \times s$. And if we consider 10% of the population from the formulation of the problem, their total income is at least $\frac{n}{10} \times 15\times s$ (the number of people in this group is $\frac{n}{10}$ and their individual income is at least $15\times s$). This amount is greater than the total income of the population which gives a contradiction.

#### Markov's Inequality

Suppose that f is a non-negative random variable.  
Then for any number a>0 we have:  
$P(f>=a)<=\frac{Ef}{a}$. The probability of f is greater or equal than a is at most an expectation of f divided by a.
- The inequality allows to use expected value to bound probability of certain events
- For the proof it is convenient to rewrite the inequality:
$a \times P(f>=a)<Ef$
- Consider the following random variable g on the same probability space:
for an outcome such that $f = a_{i}$ we let g = a on this outcome if a <= a_{i} and g = 0 otherwise; that is: 


g == a if f = $a_i$ >= a,  
g == 0 if f = $a_i$ < a

- What is expectation of g?
- g has only one nonzero value;
- Eg is this value multiplied by the sum of probabilities of all outcomes for this value
- But these outcomes are exactly the outcomes that form the event "f>=a"
- Thus the sum of their probabilities is equalt to P(f>=a)
- Thus $Eg = a \times P(f>=a)$
 
Finally, we have Eg <= Ef  
And $Eg = a \times P(F>=a)$  
So $Ef >= Eg = a \times P(f>=a)$
We have shown markov's inequality, and connected expectation and probability

#### Geometric interpretation (skipped)
$ Ef >= a \times P(f>=a)$  
Suppose f obtains values a1,a2,a3,a4 with probabilities p1,p2,p3,p4

Quiz. Bob’s Party Revisited

1. Bob has a birthday and is throwing a party for his friends. He invited 30 people to the party. From the previous experience Bob knows that each of his friends will show up to the party with probability 2/5 independently of others. Recall that we already have computed before the expected number of guests at the party and it is equal to 12. Bob is deciding how much snacks will be at the party and he would like to upper bound the probability that there will be at least 18 people. What upper bound on this probability can he get from Markov’s inequality?

A: Indeed we have a random variable f that is the number of people at the party.  This is a non-negative random variable and its expectation is $Ef = 12$.  
$P(f>=a)<=\frac{Ef}{a}$  
By Markov's inequality $P(f>=18) <= \frac{Ef}{18}$  
$P(f>=18) <= 2/3$  
$P(max) = 2/3$

#### Quiz. Alice's test  
Alice makes 3 mistakes on average on a random test in the course she is taking. What is the best upper bound we can get from Markov’s inequality on the probability that she will make at least 15 mistakes? The answer to the problem should be a number between 0 and 1 (not a percent).

A: Number of mistakes ff by Alice on a random test is a random variable. This is a non-negative random variable and its expectation is 3. So, by Markov’s inequality $Pr[f≥15]≤Ef/15=1/5$.

Note that given the information from the formulation of the problem this is the best bound on the probability of this event we can hope for. Indeed, it might be the case that Alice makes exactly 15 mistakes with probability 1/5 and makes 0 mistakes with probability 4/5. This does not contradict to the statement of the problem: the expectation then is exactly 3. And also in this case $Pr[f≥15]=1/5$. So our bound is tight.



#### Application to Algorithms

Problem. Suppose there is a randomized algorithm that runs on average in time, say, $n^2$, where n is the size of input. The algorithm outputs the correct answer.  
Construct another randomized algorithm that always stops in time $cn^2$ for some constant c and makes a mistake with probability at most $10^{-3}$.  
We will apply Markov inequality.

- Running time of the algorithm is a random variable; denote it by f
- We know that Ef = $n^{2}$
- Here is a new algorithm
- Run the original algorithm for $10^{3}n^{2}$ steps
- If it stops, we also stop
- If not, stip and output, say, 0

Indeed, this probability is P(f>=$10^3$$n^2$)  
By Markov's inequality it is bounded by:  
P(f>=$10^3$$n^2$) <= $\frac{Ef}{10^{3}n^2}$ = $\frac{n^2}{{10}^3n^2} = 10^{-3}$

- Expected value is one of the main characteristics of a random variable
- Expectation bears a lot of information of a random variable 
- Expectation has very convinient mathematical properties

## W.6 Random Variables

### 6.1 Dice Game Problem

- Probability is tricky
- One should be very careful when applying usual inutition to probability

### 6.2 Playing Dice Game

How to play the game
- So how should you play?
- By compairing pairs of dices you should check whether there is a dice that is better then all others
- If there is one, you should pick your dice first and you should pick the dice that achieves the highest possible probability against all others
- If there is no such dice, you should find for each dice the one that is the best against it.
- In this case you pick your dice second and you choose the dice that is the best against your opponents pick

Project Overview
In this series of three programming tasks, we will implement together a program that will play optimally in a tricky dice game! You program will be given a list of dices and will decide who chooses the dice first (you or your opponent).

When the dices are chosen, we will simulate 10000 throws. Each time your number is greater, you get \\$1 from your opponent. Conversely, each time your number is smaller, you pay \\$1 to your opponent.

Your ultimate goal is to implement a program that always wins in such a simulation.

#### 6.2.1. Compare Two Dices  

Implement a function that takes two dices as input and computes two values: the first value is the number of times the first dice wins (out of all possible 36 choices), the second value is the number of times the second dice wins. We say that a dice wins if the number on it is greater than the number on the other dice.

In [None]:
#To debug your implementation, use the following test cases:

#Sample 1
#Input: 
s1_dice1 = [1, 2, 3, 4, 5, 6]
s1_dice2 = [1, 2, 3, 4, 5, 6]
#Output: (15, 15)

#Sample 2
#Input: 
s2_dice1 = [1, 1, 6, 6, 8, 8]
s2_dice2 = [2, 2, 4, 4, 9, 9]
#Output: (16, 20)

In [None]:
#%%timeit

def count_wins(dice1, dice2):
    assert len(dice1) == 6 and len(dice2) == 6
    dice1_wins, dice2_wins = 0, 0
    for i in dice1:
        for j in dice2:
            if i>j:
                dice1_wins+=1
            if i<j:
                dice2_wins+=1
    return (dice1_wins, dice2_wins)

count_wins(s2_dice1, s2_dice2)

#### 6.2.2. Is there the Best Dice?

Now, your goal is to check whether among the three given dices there is one that is better than the remaining two dices.  
Implement a function that takes a list of dices and checks whether there is dice (in this list) that is better than all other dices. We say that a dice is better than another one, if it wins more frequently (that is, out of all 36 possibilities, it wins in aa cases, while the second one wins in bb cases, and a>ba>b). If there is such a dice, return its (0-based) index. Otherwise, return -1.

In [None]:
#Use the following datasets for debugging:

#Sample 1  
s1_dice1 = [1, 1, 6, 6, 8, 8]
s1_dice2 = [2, 2, 4, 4, 9, 9]
s1_dice3 = [3, 3, 5, 5, 7, 7]
#Output: -1

#Sample 2  
s2_dice1 = [1, 1, 2, 4, 5, 7]
s2_dice2 = [1, 2, 2, 3, 4, 7]
s2_dice3 = [1, 2, 3, 4, 5, 6]
#Output: 2

#Sample 3  
s3_dice1 = [3, 3, 3, 3, 3, 3]
s3_dice2 = [6, 6, 2, 2, 2, 2]
s3_dice3 = [4, 4, 4, 4, 0, 0]
s3_dice4 = [5, 5, 5, 1, 1, 1]
#Output: -1

In [None]:
import itertools 

dices_dict = {}
dices_scores = {}

def count_wins(dice1, dice2):
    assert len(dice1) == 6 and len(dice2) == 6
    dice1_wins, dice2_wins = 0, 0
    for i in dice1:
        for j in dice2:
            if i>j:
                dice1_wins+=1
            if i<j:
                dice2_wins+=1
    return (dice1_wins, dice2_wins)

def get_pairs(dices):
    for k,v in enumerate(dices):
        dices_dict[str(k)] = v
        dices_scores[str(k)] = 0
    return list(itertools.combinations(dices_dict.keys(), 2))

def get_combination_scores(dices):
    unordered_no_reps = get_pairs(dices)
    for i in unordered_no_reps:
        res = count_wins(dices_dict[i[0]], dices_dict[i[1]])
        if res[0] > res[1]:
            dices_scores[i[0]] += 1
        if res[0] < res[1]:
            dices_scores[i[1]] += 1
        if res[0] == res[1]:
            dices_scores[i[0]] += 1
            dices_scores[i[1]] += 1
    return list(dices_scores.values())

def find_the_best_dice(dices):
    assert all(len(dice) == 6 for dice in dices)
    result = get_combination_scores(dices)
    id_max_val = result.index(max(result))
    multiple_max_values_check = [i for i in result if i==max(result)]
    if len(multiple_max_values_check) != 1:
        return -1
    return id_max_val

find_the_best_dice([s2_dice1, s2_dice2, s2_dice3])

#### 6.2.3. Third Task: Implement a Strategy

You are now ready to play!

Implement a function that takes a list of dices (possibly more than three) and returns a strategy. The strategy is a dictionary:

If, after analyzing the given list of dices, you decide to choose a dice first, set strategy["choose_first"] to True and set strategy["first_dice"] to be the (0-based) index of the dice you would like to choose

If you would like to be the second one to choose a dice, set strategy["choose_first"] to False. Then, specify, for each dice that your opponent may take, the dice that you would take in return. Namely, for each i from 0 to len(dices)-1, set strategy[i] to an index j of the dice that you would take if the opponent takes the i-th dice first.

In [1]:
#Sample 1
s1_dice1 = [1, 1, 4, 6, 7, 8]
s1_dice2 = [2, 2, 2, 6, 7, 7]
s1_dice3 = [3, 3, 3, 5, 5, 8]
s1_dice4 = [1, 1, 1, 1, 1, 1]
#Output: {'choose_first': False, 0: 1, 1: 2, 2: 0}

#Sample 2
s2_dice1 = [4, 4, 4, 4, 0, 0]
s2_dice2 = [7, 7, 3, 3, 3, 3]
s2_dice3 = [6, 6, 2, 2, 2, 2]
s2_dice4 = [5, 5, 5, 1, 1, 1]
#Output: {'choose_first': True, 'first_dice': 1}

#Sample_3
s3_dice1 = [1, 1, 6, 6, 8, 8]
s3_dice2 = [3, 3, 5, 5, 7, 7]
s3_dice3 = [2, 2, 4, 4, 9, 9]

#Sample_4
s4_dice1 = [4, 4, 4, 4, 0, 0]
s4_dice2 = [3, 3, 3, 3, 3, 3]
s4_dice3 = [6, 6, 2, 2, 2, 2]
s4_dice4 = [5, 5, 5, 1, 1, 1]


In [11]:
import itertools

dices_dict = {}
dices_scores = {}

# getting all possible combination (unordered without repetitions)
def get_pairs(dices):
    for k,v in enumerate(dices):
        dices_dict[k] = v
        dices_scores[k] = []
    combinations = list(itertools.combinations(dices_dict.keys(), 2))
    return combinations

# counting wins by certain dice
def count_wins(dice1, dice2):
    dice1_wins, dice2_wins = 0, 0
    for i in dice1:
        for j in dice2:
            if i>j:
                dice1_wins+=1
            if i<j:
                dice2_wins+=1
    print(dice1, dice1_wins, dice2, dice2_wins)
    return (dice1_wins, dice2_wins)

# formatting wins
def get_dice_counters(dices):
    unordered_no_reps = get_pairs(dices)
    for i in unordered_no_reps:
        res = count_wins(dices_dict[i[0]], dices_dict[i[1]])
        if res[0] > res[1]:
            dices_scores[i[1]].append(int(i[0]))
        if res[0] < res[1]:
            dices_scores[i[0]].append(int(i[1]))
    return dices_scores

# finding best dice (if not exist moving to get_counter)
def get_best_dice(dices):
    dices_impossible_to_counter_dict = {}
    result_dict = get_dice_counters(dices)
    super_dice_index = None
    for i in result_dict:
        if len(result_dict[i]) == 0:
            super_dice_index = i
    if super_dice_index != None: 
        dices_impossible_to_counter_dict['choose_first'] = True
        dices_impossible_to_counter_dict['first_dice'] = super_dice_index
        return dices_impossible_to_counter_dict
    else:
        return get_counter(dices)
        
# finding counter dice to dice picked by opponent in a format {'dice peicked by opponent' : 'counter-dice'}
def get_counter(dices):
    result_dict = get_dice_counters(dices)
    counter_dict = {}
    counter_dict['choose_first'] = False
    for i in result_dict:
        counter_dict[int(i)] = int(result_dict[i][0])
    return counter_dict

def compute_strategy(dices):
    assert all(len(dice) == 6 for dice in dices)
    return get_best_dice(dices)

In [13]:
compute_strategy(l)

[4, 4, 4, 4, 0, 0] 16 [7, 7, 3, 3, 3, 3] 20
[4, 4, 4, 4, 0, 0] 16 [6, 6, 2, 2, 2, 2] 20
[4, 4, 4, 4, 0, 0] 12 [5, 5, 5, 1, 1, 1] 24
[7, 7, 3, 3, 3, 3] 28 [6, 6, 2, 2, 2, 2] 8
[7, 7, 3, 3, 3, 3] 24 [5, 5, 5, 1, 1, 1] 12
[6, 6, 2, 2, 2, 2] 24 [5, 5, 5, 1, 1, 1] 12


{'choose_first': True, 'first_dice': 1}