<div style="text-align: right">INFO 6105 Data Science Eng Methods and Tools</div>
<div style="text-align: right">Sanket Khadke (NUID 002968050), Aditya Sawant (NUID 002762104)</div>

# Assignment 4

Lets import the required libraries first

In [1]:
from fractions import Fraction
import math

Now lets define the `ProbDist` function such that it takes a dictionary and finds out the probability of each event and returns a probability distribution of all the events

In [2]:
class ProbDist(dict):
    def __init__(self, mapping=(), **kwargs):
        self.update(mapping, **kwargs)
        total = sum(self.values())
        for outcome in self:
            self[outcome] = self[outcome] / total
            assert self[outcome] >= 0

We also define the function `such_that` which applies transformation to all possible outcome and returns only favourable ones. It makes use of predicate function which returns only true or false and returns all the outcomes that are true for the given predicate function

In [3]:
def such_that(predicate, space): 
    if isinstance(space, ProbDist):
        return ProbDist({o:space[o] for o in space if predicate(o)})
    else:
        return {o for o in space if predicate(o)}

We then define the function `p` which is the Probability function and returns the probability of the event given the sample space. Here the function `p` is such that it can also handle a predicate argument and a collection argument, which is the one we will be using in the solutions below

In [4]:
def p(event, space): 
    if is_predicate(event):
        event = such_that(event, space)
    if isinstance(space, ProbDist):
        return sum(space[o] for o in space if o in event)
    else:
        return Fraction(len(event & space), len(space))

is_predicate = callable

## Question 1:
The season is over and [Max Verstappen](https://www.formula1.com/en/drivers/max-verstappen.html) won, again! But let's assume the season is *not* over and there are *still* two F1 races coming up: The **Netherlands Grand Prix** coming up this weekend and the **Italian Grand Prix** the weekend after. The 2022 driver standings are given [here](https://www.formula1.com/en/results.html/2022/drivers.html). Given these standings (*please do not use team standings given on the same Web site, use driver standings*), what is the Probability Distribution for each F1 driver to win the Netherlands Grand Prix? What is the Probability Distribution for each F1 driver to win *both* the Netherlands and Italian Grand Prix? What is the probability for Red Bull to win both races? What is the probability for Red Bull to win at least one race? Note that Red Bull, and each other racing team, has two drivers per race.

## Solution
Using the latest driver standings from the F1 website, we first create a dictionary containing the driver name initials as the keys and their points as values. We then pass this dictionary to the `ProbDist()` function so as to find the probability distribution of each player winning the next Grand Prix. 

Note here that, since the probability of each driver winning the Italian Grand Prix is not dependent on the results of the Netherlands Grand Prix, the Probability Distribution in both the cases would be the same.

In the cells below, we use `NGP` and `IGP` to represent the Probability Distribution for each driver to win in the *Netherlands Grand Prix* and *Italian Grand Prix* respectively

In [5]:
NGP = ProbDist(
    MV = 341,
    CL = 237,
    SP = 235,
    GR = 203,
    CS = 202,
    LH = 170,
    LN = 100,
    EO = 66,
    FA = 59,
    VB = 46,
    DR = 29,
    SV = 24,
    PG = 23,
    KM = 22,
    LS = 13,
    MS = 12,
    YT = 11,
    ZG = 6,
    AA = 4,
    NV = 2,
    NL = 0,
    NH = 0)

NGP

{'MV': 0.1889196675900277,
 'CL': 0.13130193905817175,
 'SP': 0.13019390581717452,
 'GR': 0.11246537396121883,
 'CS': 0.11191135734072022,
 'LH': 0.09418282548476455,
 'LN': 0.055401662049861494,
 'EO': 0.03656509695290859,
 'FA': 0.032686980609418284,
 'VB': 0.02548476454293629,
 'DR': 0.016066481994459834,
 'SV': 0.013296398891966758,
 'PG': 0.012742382271468145,
 'KM': 0.01218836565096953,
 'LS': 0.007202216066481994,
 'MS': 0.006648199445983379,
 'YT': 0.006094182825484765,
 'ZG': 0.0033240997229916896,
 'AA': 0.00221606648199446,
 'NV': 0.00110803324099723,
 'NL': 0.0,
 'NH': 0.0}

In [6]:
IGP = ProbDist(
    MV = 341,
    CL = 237,
    SP = 235,
    GR = 203,
    CS = 202,
    LH = 170,
    LN = 100,
    EO = 66,
    FA = 59,
    VB = 46,
    DR = 29,
    SV = 24,
    PG = 23,
    KM = 22,
    LS = 13,
    MS = 12,
    YT = 11,
    ZG = 6,
    AA = 4,
    NV = 2,
    NL = 0,
    NH = 0)

IGP

{'MV': 0.1889196675900277,
 'CL': 0.13130193905817175,
 'SP': 0.13019390581717452,
 'GR': 0.11246537396121883,
 'CS': 0.11191135734072022,
 'LH': 0.09418282548476455,
 'LN': 0.055401662049861494,
 'EO': 0.03656509695290859,
 'FA': 0.032686980609418284,
 'VB': 0.02548476454293629,
 'DR': 0.016066481994459834,
 'SV': 0.013296398891966758,
 'PG': 0.012742382271468145,
 'KM': 0.01218836565096953,
 'LS': 0.007202216066481994,
 'MS': 0.006648199445983379,
 'YT': 0.006094182825484765,
 'ZG': 0.0033240997229916896,
 'AA': 0.00221606648199446,
 'NV': 0.00110803324099723,
 'NL': 0.0,
 'NH': 0.0}

Now, let us find the Probability Distribution of different pairs of drivers winning the Netherlands Grand Prix and Italian Grand Prix. Since there are `22` drivers, so there are `22`x`22` different possible winning combinations. Let us find all the combinations by simply multiplying their winning probabilities for each race (for Netherlands Grand Prix and Italian Grand Prix). 
This is the *joint probability* of different combination of drivers winning each of the two races. We define the function `joint_prob` below which simply multiplies the winning probability of one driver in a pair with the other. We then store the driver names in the pair in the format `driver1name driver2name` and set them as the keys, while their respective joint probabilities are set as the values.

In [7]:
#Joint Probability function
def joint_prob(A, B, sep=''):
    return ProbDist({i + sep + j: A[i] * B[j]
                    for i in A
                    for j in B})

joint_pd = joint_prob(NGP, IGP ,' ')
joint_pd

{'MV MV': 0.03569064080232656,
 'MV CL': 0.024805518680795884,
 'MV SP': 0.024596189409227986,
 'MV GR': 0.02124692106414162,
 'MV CS': 0.02114225642835767,
 'MV LH': 0.01779298808327131,
 'MV LN': 0.010466463578394886,
 'MV EO': 0.0069078659617406255,
 'MV FA': 0.006175213511252983,
 'MV VB': 0.004814573246061648,
 'MV DR': 0.003035274437734517,
 'MV SV': 0.0025119512588147727,
 'MV PG': 0.002407286623030824,
 'MV KM': 0.0023026219872468753,
 'MV LS': 0.0013606402651913352,
 'MV MS': 0.0012559756294073863,
 'MV YT': 0.0011513109936234377,
 'MV ZG': 0.0006279878147036932,
 'MV AA': 0.0004186585431357955,
 'MV NV': 0.00020932927156789774,
 'MV NL': 0.0,
 'MV NH': 0.0,
 'CL MV': 0.024805518680795884,
 'CL CL': 0.01724019920043585,
 'CL SP': 0.017094712287352,
 'CL GR': 0.014766921678010452,
 'CL CS': 0.014694178221468528,
 'CL LH': 0.01236638761212698,
 'CL LN': 0.007274345654192341,
 'CL EO': 0.004801068131766946,
 'CL FA': 0.004291863935973481,
 'CL VB': 0.003346199000928477,
 'CL DR':

As said earlier, the total length of `joint_pd` would be `22`x`22`=`484`

In [8]:
len(joint_pd)

484

### Each Driver wins both the races

Now, since we have to find the Probability Distribution of each driver winning both the races, we will check the keys (of `joint_pd` in this case) to determine if the key contains only one single driver winning both the races. 

For instance, if we are searching for the Probability Distribution of Max Verstappen winning both the races, we should search for the key `MV MV`.

We will accomplish this by defining a function where we would use string comparison. In the function we will first split the key string using `.split(" ")` and then check if the two obtained strings are the same. If yes, we would append this key value pair to a result dictionary which would be returned through the function.

In [9]:
#To find the probability distribution of each F1 driver to win both Netherlands and Italy Grand Prix
def each_driver_wins_both_races(joint_pd):
    result_pd = {}
    for x in joint_pd:
        if x.split(" ")[:-1] == x.split(" ")[1:]:
            result_pd[x.split(" ")[1]] = joint_pd[x]
    return result_pd
            
each_driver_wins_both = each_driver_wins_both_races(joint_pd)
each_driver_wins_both

{'MV': 0.03569064080232656,
 'CL': 0.01724019920043585,
 'SP': 0.01695045311193131,
 'GR': 0.012648460340236799,
 'CS': 0.012524151901842374,
 'LH': 0.008870404616293614,
 'LN': 0.003069344157887063,
 'EO': 0.0013370063151756052,
 'FA': 0.001068438701360487,
 'VB': 0.0006494732238089027,
 'DR': 0.00025813184367830206,
 'SV': 0.00017679422349429485,
 'PG': 0.0001623683059522257,
 'KM': 0.00014855625724173387,
 'LS': 5.1871916268291375e-05,
 'MS': 4.419855587357371e-05,
 'YT': 3.713906431043347e-05,
 'ZG': 1.1049638968393428e-05,
 'AA': 4.910950652619302e-06,
 'NV': 1.2277376631548256e-06,
 'NL': 0.0,
 'NH': 0.0}

### Red Bull wins both the races

Now for the next part of the question, we need to find the probability of Red Bull winning both the races. Red Bull team consists of `Max Verstappen` and `Sergio Perez` denoted by `MV` and `SP` respectively

**Method 1**

Now, in order for Red Bull to win both the races, we define a function `redbull_wins_both_races` and pass it to our probability function `p` to find the probability of Red Bull winning both the races. 

Here, we check if the keys of `joint_pd` begin _**and**_ end with either `MV` **or** `SP`. In short, we check for 4 combinations in `joint_pd` which are `MV MV`, `MV SP`, `SP MV` and `SP SP`

In [10]:
def redbull_wins_both_races(outcome):
    return (outcome.startswith("MV") or outcome.startswith("SP")) and (outcome.endswith("MV") or outcome.endswith("SP"))

p(redbull_wins_both_races, joint_pd)

0.10183347273271384

**Method 2**

Now for Red Bull to win both the races, first lets make a list of the two drivers (MV and SP in this case) who race for Red Bull in `redbull_players`

We then generate a list of all possible combinations of the two players so that we only obtain keys where only `MV` and `SP` win the two races. We obtain a total of 4 combinations as below in `redbull_drivers_combo`.

In [11]:
redbull_drivers = ["MV", "SP"]
redbull_drivers

['MV', 'SP']

In [12]:
redbull_drivers_combo = [a + " " + b for a in redbull_drivers for b in redbull_drivers]
redbull_drivers_combo

['MV MV', 'MV SP', 'SP MV', 'SP SP']

We now write a function to search for the probability distribution of all the combination keys we generated above. We will then store these key-value pairs and return them as a dictionary and pass it to the probability function `p` to find out the probability of Red Bull winning both the races. In the function below, `result_pd` is used to store the resultant probability distribution dictionary that would be returned from the function.

In [13]:
def redbull_wins_both_races_v2(driver_combo, joint_pd):
    result_pd = {}
    for x in driver_combo:
        result_pd[x] = joint_pd[x]
    return result_pd

redbull_wins_both = redbull_wins_both_races_v2(redbull_drivers_combo, joint_pd)
p(redbull_wins_both, joint_pd)

0.10183347273271384

### Red Bull wins atleast one race

For the next subsection of this question, we need to find the probability of Red Bull winning atleast one race. Red Bull team consists of Max Verstappen and Sergio Perez denoted by MV and SP respectively

**Method 1**

Now, in order for Red Bull to win atleast one race, we define a function `redbull_wins_atleast_one_race` and pass it to our probability function `p` to find the resultant probability. 
Here in this function that we would define, we check if the keys of `joint_pd` begin _**or**_ end with either `MV` **or** `SP`. In short, we check for multiple combinations in `joint_pd` that would just rather contain either `MV` or `SP`

In [14]:
def redbull_wins_atleast_one_race(outcome):
    return outcome.startswith("MV") or outcome.startswith("SP") or outcome.endswith("MV") or outcome.endswith("SP")

p(redbull_wins_atleast_one_race, joint_pd)

0.5363936740816907

**Method 2**

In order to find the total probability of Red Bull winning atleast one race among two, we will define a function to first find out if the key of `joint_pd` contains either `MV` or `SP`. If present, we pass this key-value pair to a `result_pd`. 

Since we iterate twice, once over `MV` and the next time over `SP`, we save this key-value pair in a dictionary to ensure that any combinations, particularly combinations of `MV` and `SP` are not repeated and are unique. 

Finally, we pass this resultant output dictionary along with `joint_pd` to the function `p` to find the required probability of Red Bull winning atleast one race

In [15]:
def redbull_wins_atleast_one_race_v2(driver_list, joint_pd):
    result_pd = {}
    for i in joint_pd:
        for j in driver_list:
            if j in i:
                result_pd[i] = joint_pd[i]
    return result_pd

result_pd = redbull_wins_atleast_one_race_v2(redbull_drivers, joint_pd)
p(result_pd, joint_pd)

0.5363936740816907

## Question 2

If Red Bull wins the first race, what is the probability that Red Bull wins the next one? If Red Bull wins at least one of these two races, what is the probability Red Bull wins both races? How about Ferrari, Mercedes, and Alpine-Renault?

## Solution

### If Red Bull wins the first race, what is the probability that Red Bull wins the next one?

In order to find the probability that Red Bull wins the first race we first define a predicate function `redbull_wins_first_race` where we check if the key of the joint probability distribution dictionary `joint_pd` either starts with `MV` or `SP`. This indicates that either Max Verstappen and Sergio Perez denoted by MV and SP respectively win the Netherlands Grand Prix.

In [16]:
def redbull_wins_first_race(outcome):
    return outcome.startswith("MV") or outcome.startswith("SP")

As we can see below, we can check the resultant dictionary and it clearly shows that either of `MV` or `SP` won the Netherlands Grand Prix

In [17]:
such_that(redbull_wins_first_race, joint_pd)

{'MV MV': 0.11184306709756849,
 'MV CL': 0.07773257156048016,
 'MV SP': 0.07707660049245924,
 'MV GR': 0.06658106340412435,
 'MV CS': 0.06625307787011388,
 'MV LH': 0.05575754078177902,
 'MV LN': 0.032798553401046475,
 'MV EO': 0.021647045244690678,
 'MV FA': 0.01935114650661742,
 'MV VB': 0.015087334564481381,
 'MV DR': 0.009511580486303479,
 'MV SV': 0.007871652816251155,
 'MV PG': 0.0075436672822406904,
 'MV KM': 0.007215681748230226,
 'MV LS': 0.004263811942136042,
 'MV MS': 0.0039358264081255775,
 'MV YT': 0.003607840874115113,
 'MV ZG': 0.0019679132040627888,
 'MV AA': 0.0013119421360418592,
 'MV NV': 0.0006559710680209296,
 'MV NL': 0.0,
 'MV NH': 0.0,
 'SP MV': 0.07707660049245924,
 'SP CL': 0.0535693674976916,
 'SP SP': 0.0531173053247153,
 'SP GR': 0.04588431055709449,
 'SP CS': 0.04565827947060634,
 'SP LH': 0.03842528470298554,
 'SP LN': 0.022603108648815023,
 'SP EO': 0.014918051708217916,
 'SP FA': 0.013335834102800863,
 'SP VB': 0.01039742997845491,
 'SP DR': 0.006554901

We then define another predicate function `redbull_wins_second_race_if_first_race_won` where we would check if the keys of the dictionary obtained from the predicate function `redbull_wins_first_race` end with either `MV` or `SP`. This means that Max Verstappen and Sergio Perez, denoted by MV and SP respectively, won the second race as well, which in turn signify that Red Bull won the second race provided Red Bull also won the first race

In [18]:
def redbull_wins_second_race_if_first_race_won (outcome):
    return outcome.endswith("MV") or outcome.endswith("SP")

We then pass both these predicate functions to `p` to calculate the required probability. We pass the predicate function `redbull_wins_first_race` through the `such_that` function so that we could get a resultant dictionary which could be used as a collection argument for our `p` function

In [19]:
p(redbull_wins_second_race_if_first_race_won, such_that(redbull_wins_first_race, joint_pd))

0.31911357340720226

### If Red Bull wins at least one of these two races, what is the probability Red Bull wins both races?

Further, for the next part of the question, we are supposed to find the probability Red Bull wins both races if Red Bull wins at least one of these two races

Here, we would make use of the predicate function `redbull_wins_atleast_one_race` from Question 1 to check if the keys of `joint_pd` contain either `MV` or `SP`. We check this by comparing the key string and check if it starts **or** ends with either `MV` **or** `SP`

As we can see below, we can check the resultant dictionary and it clearly shows that either of `MV` or `SP` or both win atleast one of the races

In [20]:
such_that(redbull_wins_atleast_one_race, joint_pd)

{'MV MV': 0.06653814637808539,
 'MV CL': 0.04624498736541419,
 'MV SP': 0.045854734307478204,
 'MV GR': 0.03961068538050245,
 'MV CS': 0.03941555885153445,
 'MV LH': 0.0331715099245587,
 'MV LN': 0.019512652896799234,
 'MV EO': 0.012878350911887496,
 'MV FA': 0.011512465209111549,
 'MV VB': 0.008975820332527648,
 'MV DR': 0.005658669340071778,
 'MV SV': 0.004683036695231816,
 'MV PG': 0.004487910166263824,
 'MV KM': 0.004292783637295832,
 'MV LS': 0.0025366448765839003,
 'MV MS': 0.002341518347615908,
 'MV YT': 0.002146391818647916,
 'MV ZG': 0.001170759173807954,
 'MV AA': 0.0007805061158719694,
 'MV NV': 0.0003902530579359847,
 'MV NL': 0.0,
 'MV NH': 0.0,
 'CL MV': 0.04624498736541419,
 'CL SP': 0.03186971270050538,
 'SP MV': 0.045854734307478204,
 'SP CL': 0.03186971270050538,
 'SP SP': 0.031600769977294364,
 'SP GR': 0.02729768640591811,
 'SP CS': 0.027163215044312603,
 'SP LH': 0.02286013147293635,
 'SP LN': 0.013447136160550792,
 'SP EO': 0.008875109865963524,
 'SP FA': 0.007933

Further, we want to calculate the probability that Red Bull wins both races given it wins atleast one race. 

Here, again we make use of the predicate function `redbull_wins_both_races` defined in Question 1 to determine the required outcome dictionary. In this predicate function, we check if the key contains only `MV` and `SP` or their combinations

In [21]:
p(redbull_wins_both_races, such_that(redbull_wins_atleast_one_race, joint_pd))

0.18984838497033615

We are also supposed to do the same set of calculations for Ferrari, Mercedes, and Alpine-Renault

### Ferrari

### Part 1 : wins the first race, what is the probability that Ferrari wins the next one

For this part of the question, we define a similar predicate function `ferrari_wins_first_race` for Ferrari where we would check first if the keys of the dictionary `joint_pd` start with either `CS` or `CL`

In [22]:
def ferrari_wins_first_race(outcome):
    return outcome.startswith("CS") or outcome.startswith("CL")

Checking the resultant dictionary using the such_that function

In [23]:
such_that(ferrari_wins_first_race, joint_pd)

{'CL MV': 0.10199080004290789,
 'CL CL': 0.07088510149609728,
 'CL SP': 0.07028691498558168,
 'CL GR': 0.06071593081733226,
 'CL CS': 0.060416837562074464,
 'CL LH': 0.05084585339382505,
 'CL LN': 0.029909325525779438,
 'CL EO': 0.019740154847014434,
 'CL FA': 0.01764650206020987,
 'CL VB': 0.013758289741858543,
 'CL DR': 0.008673704402476038,
 'CL SV': 0.007178238126187064,
 'CL PG': 0.006879144870929272,
 'CL KM': 0.006580051615671476,
 'CL LS': 0.0038882123183513273,
 'CL MS': 0.003589119063093532,
 'CL YT': 0.003290025807835738,
 'CL ZG': 0.001794559531546766,
 'CL AA': 0.0011963730210311776,
 'CL NV': 0.0005981865105155888,
 'CL NL': 0.0,
 'CL NH': 0.0,
 'CS MV': 0.0869288675471198,
 'CS CL': 0.060416837562074464,
 'CS SP': 0.05990699083159282,
 'CS GR': 0.05174944314388656,
 'CS CS': 0.05149451977864575,
 'CS LH': 0.043336972090939486,
 'CS LN': 0.02549233652408205,
 'CS EO': 0.016824942105894154,
 'CS FA': 0.015040478549208411,
 'CS VB': 0.011726474801077745,
 'CS DR': 0.0073927

We then define another predicate function `ferrari_wins_second_race_if_first_race_won` where we would check if the keys of the dictionary obtained from the predicate function `ferrari_wins_first_race` end with either `CS` or `CL`.

In [24]:
def ferrari_wins_second_race_if_first_race_won (outcome):
    return outcome.endswith("CS") or outcome.endswith("CL")

We then pass both these predicate functions to `p` to calculate the required probability. We pass the predicate function `ferrari_wins_first_race` through the `such_that` function so that we could get a resultant dictionary which could be used as a collection argument for our `p` function

In [25]:
p(ferrari_wins_second_race_if_first_race_won, such_that(ferrari_wins_first_race, joint_pd))

0.24321329639889197

### Part 2 : wins at least one of these two races, what is the probability Ferrari wins both races

For this part of the question, we define a similar predicate function `ferrari_wins_atleast_one_race` for **Ferrari** where we would check first if the keys of the dictionary `joint_pd` either start **or** end with either `CS` **or** `CL`

In [26]:
def ferrari_wins_atleast_one_race(outcome):
    return outcome.startswith("CL") or outcome.startswith("CS") or outcome.endswith("CL") or outcome.endswith("CS")

We again check the resultant dictionary using the `such_that` function

In [27]:
such_that(ferrari_wins_atleast_one_race, joint_pd)

{'MV CL': 0.05805531191341808,
 'MV CS': 0.049481742643504016,
 'CL MV': 0.05805531191341808,
 'CL CL': 0.04034929303073339,
 'CL SP': 0.040008792667604834,
 'CL GR': 0.03456078685754801,
 'CL CS': 0.034390536675983734,
 'CL LH': 0.028942530865926904,
 'CL LN': 0.01702501815642759,
 'CL EO': 0.011236511983242211,
 'CL FA': 0.010044760712292277,
 'CL VB': 0.007831508351956692,
 'CL DR': 0.004937255265364001,
 'CL SV': 0.004086004357542621,
 'CL PG': 0.003915754175978346,
 'CL KM': 0.00374550399441407,
 'CL LS': 0.0022132523603355866,
 'CL MS': 0.0020430021787713106,
 'CL YT': 0.001872751997207035,
 'CL ZG': 0.0010215010893856553,
 'CL AA': 0.0006810007262571036,
 'CL NV': 0.0003405003631285518,
 'CL NL': 0.0,
 'CL NH': 0.0,
 'SP CL': 0.040008792667604834,
 'SP CS': 0.03410032117660834,
 'GR CL': 0.03456078685754801,
 'GR CS': 0.029456873186602096,
 'CS MV': 0.049481742643504016,
 'CS CL': 0.034390536675983734,
 'CS SP': 0.03410032117660834,
 'CS GR': 0.029456873186602096,
 'CS CS': 0.02

We would then define another predicate function `ferrari_wins_both_races` similar to `redbull_wins_both_races` where we would check if the keys obtained from the output dictionary returned in `ferrari_wins_atleast_one_race` contains only `CS` **or** `CL`

In [28]:
def ferrari_wins_both_the_races (outcome):
    return (outcome.startswith("CL") or outcome.startswith("CS")) and (outcome.endswith("CL") or outcome.endswith("CS"))

We now pass both these predicate functions to our probability function `p` using the `such_that` function to calculate the required probability

In [29]:
p(ferrari_wins_both_the_races, such_that(ferrari_wins_atleast_one_race, joint_pd))

0.13844213181961526

### Mercedes

### Part 1 : wins the first race, what is the probability that Mercedes wins the next one

For this part of the question, we define a similar predicate function `mercedes_wins_first_race` for Mercedes where we would check first if the keys of the dictionary `joint_pd` start with either `GR` or `LH`

In [30]:
def mercedes_wins_first_race(outcome):
    return outcome.startswith("GR") or outcome.startswith("LH")

We again check the resultant dictionary using the `such_that` function

In [31]:
such_that(mercedes_wins_first_race, joint_pd)

{'GR MV': 0.10281687002888909,
 'GR CL': 0.07145923224881734,
 'GR SP': 0.07085620075304673,
 'GR GR': 0.06120769682071695,
 'GR CS': 0.06090618107283165,
 'GR LH': 0.05125767714050189,
 'GR LN': 0.03015157478853052,
 'GR EO': 0.019900039360430145,
 'GR FA': 0.01778942912523301,
 'GR VB': 0.013869724402724042,
 'GR DR': 0.008743956688673851,
 'GR SV': 0.007236377949247324,
 'GR PG': 0.006934862201362021,
 'GR KM': 0.006633346453476715,
 'GR LS': 0.003919704722508967,
 'GR MS': 0.003618188974623662,
 'GR YT': 0.0033166732267383577,
 'GR ZG': 0.001809094487311831,
 'GR AA': 0.001206062991541221,
 'GR NV': 0.0006030314957706105,
 'GR NL': 0.0,
 'GR NH': 0.0,
 'LH MV': 0.08610279756113864,
 'LH CL': 0.059842706809354426,
 'LH SP': 0.059337705064127805,
 'LH GR': 0.05125767714050189,
 'LH CS': 0.05100517626788858,
 'LH LH': 0.04292514834426266,
 'LH LN': 0.02525008726133098,
 'LH EO': 0.01666505759247845,
 'LH FA': 0.014897551484185279,
 'LH VB': 0.011615040140212252,
 'LH DR': 0.0073225253

We then define another predicate function `mercedes_wins_second_race_if_first_race_won` where we would check if the keys of the dictionary obtained from the predicate function `mercedes_wins_first_race` end with either `GR` or `LH`.

In [32]:
def mercedes_wins_second_race_if_first_race_won (outcome):
    return outcome.endswith("GR") or outcome.endswith("LH")

We now pass both these predicate functions to our probability function `p` using the `such_that` function to calculate the required probability

In [33]:
p(mercedes_wins_second_race_if_first_race_won, such_that(mercedes_wins_first_race, joint_pd))

0.2066481994459834

### Part 2 : wins at least one of these two races, what is the probability Mercedes wins both races

Now lets define similar functions `mercedes_wins_atleast_one_race` and `mercedes_wins_both_races` for Mercedes where we would check the same conditions as in case of Red Bull and Ferrari but now for the two drivers `GR` and `LH`

In [34]:
def mercedes_wins_atleast_one_race (outcome):
    return outcome.startswith("LH") or outcome.startswith("GR") or outcome.endswith("LH") or outcome.endswith("GR")

Checking the resultant dictionary using `such_that` function

In [35]:
such_that(mercedes_wins_atleast_one_race, joint_pd)

{'MV GR': 0.057332236763096886,
 'MV LH': 0.048012217978948134,
 'CL GR': 0.03984674519898523,
 'CL LH': 0.033369195486834924,
 'SP GR': 0.03951048574582923,
 'SP LH': 0.03308759890044813,
 'GR MV': 0.057332236763096886,
 'GR CL': 0.03984674519898523,
 'GR SP': 0.03951048574582923,
 'GR GR': 0.034130334495333335,
 'GR CS': 0.03396220476875533,
 'GR LH': 0.028582053518259445,
 'GR LN': 0.01681297265779967,
 'GR EO': 0.011096561954147785,
 'GR FA': 0.009919653868101808,
 'GR VB': 0.00773396742258785,
 'GR DR': 0.0048757620707619055,
 'GR SV': 0.004035113437871921,
 'GR PG': 0.003866983711293925,
 'GR KM': 0.0036988539847159284,
 'GR LS': 0.0021856864455139574,
 'GR MS': 0.0020175567189359605,
 'GR YT': 0.0018494269923579642,
 'GR ZG': 0.0010087783594679802,
 'GR AA': 0.0006725189063119869,
 'GR NV': 0.00033625945315599346,
 'GR NL': 0.0,
 'GR NH': 0.0,
 'CS GR': 0.03396220476875533,
 'CS LH': 0.028441255225066048,
 'LH MV': 0.048012217978948134,
 'LH CL': 0.033369195486834924,
 'LH SP': 

Defining the predicate function `mercedes_wins_both_races` having the same function as `redbull_wins_both_races` and `ferrari_wins_both_races` but considering the team drivers `LH` and `GR`

In [36]:
def mercedes_wins_both_races(outcome):
    return (outcome.startswith("LH") or outcome.startswith("GR")) and (outcome.endswith("LH") or outcome.endswith("GR"))

Now passing both the predicate functions to our probability function `p` by using `such_that` to generate the second collection argument

In [37]:
p(mercedes_wins_both_races, such_that(mercedes_wins_atleast_one_race, joint_pd))

0.11523015137472958

### Alpine-Renault

### Part 1 : wins the first race, what is the probability that Alpine-Renault wins the next one

For this part of the question, we define a similar predicate function `alpinerenault_wins_first_race` for Alpine-Renault where we would check first if the keys of the dictionary `joint_pd` start with either `EO` or `FA`

In [38]:
def aplinerenault_wins_first_race(outcome):
    return outcome.startswith("EO") or outcome.startswith("FA")

We again check the resultant dictionary using the `such_that` function

In [39]:
such_that(aplinerenault_wins_first_race, joint_pd)

{'EO MV': 0.09974958448753464,
 'EO CL': 0.0693274238227147,
 'EO SP': 0.06874238227146816,
 'EO GR': 0.059381717451523555,
 'EO CS': 0.059089196675900284,
 'EO LH': 0.049728531855955695,
 'EO LN': 0.029252077562326877,
 'EO EO': 0.019306371191135742,
 'EO FA': 0.01725872576177286,
 'EO VB': 0.013455955678670365,
 'EO DR': 0.008483102493074796,
 'EO SV': 0.0070204986149584505,
 'EO PG': 0.0067279778393351825,
 'EO KM': 0.006435457063711913,
 'EO LS': 0.003802770083102494,
 'EO MS': 0.0035102493074792252,
 'EO YT': 0.0032177285318559564,
 'EO ZG': 0.0017551246537396126,
 'EO AA': 0.0011700831024930752,
 'EO NV': 0.0005850415512465376,
 'EO NL': 0.0,
 'EO NH': 0.0,
 'FA MV': 0.0891700831024931,
 'FA CL': 0.06197451523545707,
 'FA SP': 0.061451523545706384,
 'FA GR': 0.0530836565096953,
 'FA CS': 0.05282216066481995,
 'FA LH': 0.04445429362880887,
 'FA LN': 0.02614958448753463,
 'FA EO': 0.01725872576177286,
 'FA FA': 0.015428254847645434,
 'FA VB': 0.012028808864265932,
 'FA DR': 0.00758

We would then define another predicate function `alpinerenault_wins_both_races` similar to `redbull_wins_both_races` where we would check if the keys obtained from the output dictionary returned in `alpinerenault_wins_atleast_one_race` contains only `EO` **or** `FA`

In [40]:
def alpinerenault_wins_second_race_if_first_race_won(outcome):
    return outcome.endswith("EO") or outcome.endswith("FA")

We now pass both these predicate functions to our probability function `p` using the `such_that` function to calculate the required probability

In [41]:
p(alpinerenault_wins_second_race_if_first_race_won, such_that(aplinerenault_wins_first_race, joint_pd))

0.06925207756232689

### Part 2 : wins at least one of these two races, what is the probability Ferrari wins both races

Let us also define similar functions `alpinerenault_wins_atleast_one_race` and `alpinerenault_wins_both_races` for Alpine-Renault where we would check the same conditions as in case of Red Bull, Ferrari and Mercedes, but now for the two drivers `EO` and `FA`

In [42]:
def alpinerenault_wins_atleast_one_race(outcome):
    return outcome.startswith("FA") or outcome.startswith("EO") or outcome.endswith("FA") or outcome.endswith("EO")

Checking the resultant dictionary obtained from the above predicate function by using the `such_that` function

In [43]:
such_that(alpinerenault_wins_atleast_one_race, joint_pd)

{'MV EO': 0.05166370157819224,
 'MV FA': 0.04618421807747488,
 'CL EO': 0.03590703012912482,
 'CL FA': 0.032098708751793395,
 'SP EO': 0.035604017216642754,
 'SP FA': 0.03182783357245336,
 'GR EO': 0.030755810616929693,
 'GR FA': 0.02749383070301291,
 'CS EO': 0.03060430416068866,
 'CS FA': 0.027358393113342892,
 'LH EO': 0.02575609756097561,
 'LH FA': 0.023024390243902435,
 'LN EO': 0.015150645624103298,
 'LN FA': 0.013543758967001432,
 'EO MV': 0.05166370157819224,
 'EO CL': 0.03590703012912482,
 'EO SP': 0.035604017216642754,
 'EO GR': 0.030755810616929693,
 'EO CS': 0.03060430416068866,
 'EO LH': 0.02575609756097561,
 'EO LN': 0.015150645624103298,
 'EO EO': 0.009999426111908178,
 'EO FA': 0.008938880918220947,
 'EO VB': 0.0069692969870875175,
 'EO DR': 0.004393687230989957,
 'EO SV': 0.003636154949784791,
 'EO PG': 0.0034846484935437587,
 'EO KM': 0.0033331420373027255,
 'EO LS': 0.0019695839311334286,
 'EO MS': 0.0018180774748923956,
 'EO YT': 0.0016665710186513627,
 'EO ZG': 0.0

Now defining a predicate function `alpinerenault_wins_both_the_races` similar to those defined for Red Bull, Ferrari and Mercedes

In [44]:
def alpinerenault_wins_both_the_races(outcome):
    return (outcome.startswith("FA") or outcome.startswith("EO")) and (outcome.endswith("FA") or outcome.endswith("EO"))

In [45]:
p(alpinerenault_wins_both_the_races, such_that(alpinerenault_wins_atleast_one_race, joint_pd))

0.035868005738880916

## Question 3:

Red Bull wins one of these two races on a rainy day. What is the probability Red Bull wins both races? Assuming races can be held on either rainy, sunny, cloudy, snowy or foggy days? 

<br>Assume that rain, sun, clouds, snow, and fog are the only possible weather conditions on race tracks and there is an equal probability for each one of these weather events.

## Solution:

There are 5 possible weather conditions and there is an equal probability for each one of these weather events. Hence probability of each weather is 1/5 or 20%

In [46]:
weather = ProbDist(rainy = 0.2, sunny = 0.2 , cloudy = 0.2 , snowy = 0.2 , foggy = 0.2)
weather

{'rainy': 0.2, 'sunny': 0.2, 'cloudy': 0.2, 'snowy': 0.2, 'foggy': 0.2}

Now that we have the Probability Distribution for the weather, lets find the *joint probability distribution* for *Netherlands Grand Prix and weather* and also for the *Italian Grand Prix and weather*. 
Here we use the same function `joint_prob` defined in Question 1. Both these joint probabilities will have a total of `22`x`5` (players* weather events) = 110 pairs

In [47]:
weather_NGP = joint_prob(NGP, weather , ' ')
weather_NGP

{'MV rainy': 0.037783933518005554,
 'MV sunny': 0.037783933518005554,
 'MV cloudy': 0.037783933518005554,
 'MV snowy': 0.037783933518005554,
 'MV foggy': 0.037783933518005554,
 'CL rainy': 0.02626038781163436,
 'CL sunny': 0.02626038781163436,
 'CL cloudy': 0.02626038781163436,
 'CL snowy': 0.02626038781163436,
 'CL foggy': 0.02626038781163436,
 'SP rainy': 0.026038781163434912,
 'SP sunny': 0.026038781163434912,
 'SP cloudy': 0.026038781163434912,
 'SP snowy': 0.026038781163434912,
 'SP foggy': 0.026038781163434912,
 'GR rainy': 0.022493074792243772,
 'GR sunny': 0.022493074792243772,
 'GR cloudy': 0.022493074792243772,
 'GR snowy': 0.022493074792243772,
 'GR foggy': 0.022493074792243772,
 'CS rainy': 0.02238227146814405,
 'CS sunny': 0.02238227146814405,
 'CS cloudy': 0.02238227146814405,
 'CS snowy': 0.02238227146814405,
 'CS foggy': 0.02238227146814405,
 'LH rainy': 0.018836565096952914,
 'LH sunny': 0.018836565096952914,
 'LH cloudy': 0.018836565096952914,
 'LH snowy': 0.018836565

In [48]:
len(weather_NGP)

110

In [49]:
weather_IGP = joint_prob(IGP, weather , ' ')
weather_IGP

{'MV rainy': 0.037783933518005554,
 'MV sunny': 0.037783933518005554,
 'MV cloudy': 0.037783933518005554,
 'MV snowy': 0.037783933518005554,
 'MV foggy': 0.037783933518005554,
 'CL rainy': 0.02626038781163436,
 'CL sunny': 0.02626038781163436,
 'CL cloudy': 0.02626038781163436,
 'CL snowy': 0.02626038781163436,
 'CL foggy': 0.02626038781163436,
 'SP rainy': 0.026038781163434912,
 'SP sunny': 0.026038781163434912,
 'SP cloudy': 0.026038781163434912,
 'SP snowy': 0.026038781163434912,
 'SP foggy': 0.026038781163434912,
 'GR rainy': 0.022493074792243772,
 'GR sunny': 0.022493074792243772,
 'GR cloudy': 0.022493074792243772,
 'GR snowy': 0.022493074792243772,
 'GR foggy': 0.022493074792243772,
 'CS rainy': 0.02238227146814405,
 'CS sunny': 0.02238227146814405,
 'CS cloudy': 0.02238227146814405,
 'CS snowy': 0.02238227146814405,
 'CS foggy': 0.02238227146814405,
 'LH rainy': 0.018836565096952914,
 'LH sunny': 0.018836565096952914,
 'LH cloudy': 0.018836565096952914,
 'LH snowy': 0.018836565

In [50]:
len(weather_IGP)

110

Now that we have the joint probability of weather with Netherlands Grand Prix and Italian Grand Prix, let us now find the joint probability of these two joint probabilities. This would give us a joint probability distribution where we would be able to find out the probability that a certain player will win a certain race given a certain weather. 

To compute this joint probability distribution, we will use the `joint_prob` function from Question 1 again. Here we will have a total of `110`x`110` = `12100` combinations

In [51]:
weather_NGP_IGP = joint_prob(weather_NGP, weather_IGP , ' ')
weather_NGP_IGP

{'MV rainy MV rainy': 0.0014276256320930258,
 'MV rainy MV sunny': 0.0014276256320930258,
 'MV rainy MV cloudy': 0.0014276256320930258,
 'MV rainy MV snowy': 0.0014276256320930258,
 'MV rainy MV foggy': 0.0014276256320930258,
 'MV rainy CL rainy': 0.0009922207472318097,
 'MV rainy CL sunny': 0.0009922207472318097,
 'MV rainy CL cloudy': 0.0009922207472318097,
 'MV rainy CL snowy': 0.0009922207472318097,
 'MV rainy CL foggy': 0.0009922207472318097,
 'MV rainy SP rainy': 0.000983847576369094,
 'MV rainy SP sunny': 0.000983847576369094,
 'MV rainy SP cloudy': 0.000983847576369094,
 'MV rainy SP snowy': 0.000983847576369094,
 'MV rainy SP foggy': 0.000983847576369094,
 'MV rainy GR rainy': 0.0008498768425656429,
 'MV rainy GR sunny': 0.0008498768425656429,
 'MV rainy GR cloudy': 0.0008498768425656429,
 'MV rainy GR snowy': 0.0008498768425656429,
 'MV rainy GR foggy': 0.0008498768425656429,
 'MV rainy CS rainy': 0.000845690257134285,
 'MV rainy CS sunny': 0.000845690257134285,
 'MV rainy CS

In [52]:
len(weather_NGP_IGP)

12100

Now, let us define a predicate function `redbull_wins_one_of_two_on_rainy` where we would find the events where the Red Bull drivers, `MV` and `SP` win any one race under `rainy` conditions. 

For this we check in the joint probability distribution `weather_NGP_IGP` whether the keys start *or* end with either `MV rainy` **or** `SP rainy`. Further we then check if the other race result is not `MV rainy` and `SP rainy`. This would help us only filter out those events where Red Bull, though wins two races, not both of them are on a rainy day, thus satisfying the condition specified

This would give us the dictionary with all key-value pairs where **any or both the Red Bull drivers won only on one rainy day**

In [53]:
def redbull_wins_one_of_two_on_rainy(outcome):
    return ((outcome.startswith("MV rainy") or outcome.startswith("SP rainy")) and ((not outcome.endswith("MV rainy")) and (not outcome.endswith("SP rainy")))) or ((not outcome.startswith("MV rainy") and not outcome.startswith("SP rainy")) and (outcome.endswith("MV rainy") or outcome.endswith("SP rainy")))

Using `such_that` to check the resultant dictionary output

In [54]:
such_that(redbull_wins_one_of_two_on_rainy, weather_NGP_IGP)

{'MV rainy MV sunny': 0.011946782821109675,
 'MV rainy MV cloudy': 0.011946782821109675,
 'MV rainy MV snowy': 0.011946782821109675,
 'MV rainy MV foggy': 0.011946782821109675,
 'MV rainy CL rainy': 0.008303189233439862,
 'MV rainy CL sunny': 0.008303189233439862,
 'MV rainy CL cloudy': 0.008303189233439862,
 'MV rainy CL snowy': 0.008303189233439862,
 'MV rainy CL foggy': 0.008303189233439862,
 'MV rainy SP sunny': 0.008233120125984674,
 'MV rainy SP cloudy': 0.008233120125984674,
 'MV rainy SP snowy': 0.008233120125984674,
 'MV rainy SP foggy': 0.008233120125984674,
 'MV rainy GR rainy': 0.007112014406701653,
 'MV rainy GR sunny': 0.007112014406701653,
 'MV rainy GR cloudy': 0.007112014406701653,
 'MV rainy GR snowy': 0.007112014406701653,
 'MV rainy GR foggy': 0.007112014406701653,
 'MV rainy CS rainy': 0.007076979852974058,
 'MV rainy CS sunny': 0.007076979852974058,
 'MV rainy CS cloudy': 0.007076979852974058,
 'MV rainy CS snowy': 0.007076979852974058,
 'MV rainy CS foggy': 0.007

Now, we have to find the probability that Red Bull wins both the race. So we again define a predicate function `redbull_wins_both_race` where we check if the key/event either begins with `MV` or `SP` or contain `MV` or `SP` as the 3rd element after using `.split(" ")` on the key string. This would return all the events where the winner was only one amongst `MV` or `SP` for both the races

In [55]:
def redbull_wins_both_race(outcome): 
    return (outcome.startswith("MV") or outcome.startswith("SP")) and (outcome.split(" ")[2]=="MV" or outcome.split(" ")[2]=="SP") 

Using the `such_that` function to view the resultant probability distribution. As you can spot below, Red Bull only wins one amongst the two races on a rainy day, though both the races are eventually won by Red Bull

In [56]:
such_that(redbull_wins_both_race, such_that(redbull_wins_one_of_two_on_rainy, weather_NGP_IGP))

{'MV rainy MV sunny': 0.0438100555796682,
 'MV rainy MV cloudy': 0.0438100555796682,
 'MV rainy MV snowy': 0.0438100555796682,
 'MV rainy MV foggy': 0.0438100555796682,
 'MV rainy SP sunny': 0.030191680531442902,
 'MV rainy SP cloudy': 0.030191680531442902,
 'MV rainy SP snowy': 0.030191680531442902,
 'MV rainy SP foggy': 0.030191680531442902,
 'MV sunny MV rainy': 0.0438100555796682,
 'MV sunny SP rainy': 0.030191680531442902,
 'MV cloudy MV rainy': 0.0438100555796682,
 'MV cloudy SP rainy': 0.030191680531442902,
 'MV snowy MV rainy': 0.0438100555796682,
 'MV snowy SP rainy': 0.030191680531442902,
 'MV foggy MV rainy': 0.0438100555796682,
 'MV foggy SP rainy': 0.030191680531442902,
 'SP rainy MV sunny': 0.030191680531442902,
 'SP rainy MV cloudy': 0.030191680531442902,
 'SP rainy MV snowy': 0.030191680531442902,
 'SP rainy MV foggy': 0.030191680531442902,
 'SP rainy SP sunny': 0.020806583357445986,
 'SP rainy SP cloudy': 0.020806583357445986,
 'SP rainy SP snowy': 0.020806583357445986

Now, calculating the probability for this scenario using the `p` function, we obtain the below probability

In [57]:
p(redbull_wins_both_race, such_that(redbull_wins_one_of_two_on_rainy,weather_NGP_IGP))

0.27269499349035414

## Question 4:
After the races, there is always a media blitz! Knowing that everyone has a favorite F1 driver, fashion brands pay F1 drivers to be their ambassadors. Let's assume that brands combine to organize one hundred fashion shows every season and that each retired or active F1 driver is under contract to attend exactly one fashion show every season in order to wear a fashion brand's clothes and to represent the brand (e.g. Gucci). For the sake of simplicity let's also assume that there are exactly 100 F1 retired and active driveres. If you go to a fashion show, what is the probability you will see your favorite driver (e.g. Lewis Hamilton)? What is the probability you will see a Formula 1 driver? How many fashion shows do you need to attend per season to have at least a 50% chance to share a cocktail with a Formula 1 driver?

## Solution:

Here we will calculate the solution and build functions that are both dependent and independent of the total number of people attending the fashion events

Consider Case 1, where we know the total number of people that would attend the fashion shows and we also know, the total number of people in the show with whom the driver will share a cocktail.

In Case 2, the probability of sharing a cocktail is independent on the number of people attending or getting the chance to share a cocktail with the driver and is purely dependent on the agreement or disagreement of the driver upon you asking for consent

### Case 1

In this case, lets consider two scenarios:<br>
Case 1a: Each brand has one brand ambassador (there are total of 100 brands each having one ambassador) and one fashion event<br>
Case 1b: Each brand has possibility of more than one brand ambassador (Total number of brands is less than 100) and more than one fashion event

According to **Case 1a**<br>
* Since there is only one favourite driver and he can attend any one of the 100 fashion events, therefore the probability of meeting your favourite Formula 1 driver (retired or active) = 1/100
* Now since there are 100 fashion events and each of the 100 brands have one ambassador each, therefore every event surely has one Formula 1 driver attending that event. Therefore 100 F1 drivers attend the 100 fashion events and thus the probability of one F1 driver at one fashion event becomes 100/100 = 1
* For sharing a cocktail with a F1 driver, you need to be one amongst the lucky people who will share a cocktail with a driver. Suppose there are a total of 30 people attending each event and the driver only drinks with one of these people, thus you getting a chance to drink = total number of people driver drinks with/total number of people attending = 1/30. We also need to consider the probability of meeting a Formula 1 driver in the first case, which in this case = 1. Therefore joint probability that you meet a driver and you share a drink with him = 1 * 1/30. Now in order to have atleast a 50% or 0.5 chance to share a cocktail with the driver, we divide 0.5 by the joint probability obtained thus resulting in the number of days required, which is, 0.5/ (1* 1/30) = **15 days**

According to **Case 1b**<br>
* Here as well, there is only one favourite driver and he can attend any one of the 100 fashion events, therefore the probability of meeting your favourite Formula 1 driver (retired or active) still remains 0.01 or 1/100
* Now since there are less than 100 fashion brands, hence each brand has more than one ambassador and more than one fashion event. For instance, say there are only 40 brands and all these brands have multiple ambassadors and mulitple shows each but the total count of shows and ambassadors still remains 100 each. And there is also a probability that all of these ambassadors of all the brands attend only 40 out of the total 100 shows. Hence there are only a total of 40 events where F1 drivers actually attend an event, and 60 events have no drivers in attendance. Thus the probability now becomes (no. of events attended by drivers)/(total number of shows) = 40/100
* Again, for sharing a cocktail with a F1 driver, you need to be one amongst the lucky people who will share a cocktail with a driver. Suppose there are a total of 25 people attending each event and each driver only drinks with 1 of these people, thus you getting a chance to drink = total number of people driver drinks with/total number of people attending = 1/25. We also need to consider the probability of meeting a Formula 1 driver in the first case, which in this case = 40/100. Therefore joint probability that you meet a driver and you share a drink with him = 40/100 * 1/25. Now in order to have atleast a 50% or 0.5 chance to share a cocktail with the driver, we divide 0.5 by the joint probability obtained thus resulting in the number of days required, which is, 0.5/ (40/100* 1/25) = **32 days** (note that here, we take ceil division, since we can't have total number of days in fractions)

In [58]:
def question4(total_fashion_shows_attended_by_formula1_drivers , total_fashion_shows_held , count_when_driver_drink_with_fans , total_fans_in_season):
    
    # probability you will see your favorite driver (e.g. Lewis Hamilton) assuming you don't know which driver represents which brand
    prob_of_seeing_favorite_driver = 1 /total_fashion_shows_held
    print('Probability of seeing the favorite driver = ', prob_of_seeing_favorite_driver)
    
    #probability of seeing a formula 1 driver
    probability_of_seeing_a_formula1_driver = total_fashion_shows_attended_by_formula1_drivers / total_fashion_shows_held
    print('Probability of seeing a formula 1 driver = ', probability_of_seeing_a_formula1_driver )

    # First find the probability of sharing a cocktail with the F1 player
    cocktail_probability = probability_of_seeing_a_formula1_driver * (count_when_driver_drink_with_fans/total_fans_in_season)

    # How many fashion shows do you need to attend per season to have at least a 50% chance to share a cocktail with a Formula 1 driver?
    print("Number of shows needed to attend for 50% of the chance sharing a cocktail with the player:" , math.ceil(0.5/cocktail_probability))

Lets check both our scenarios for Case 1 now

Case 1a:

In [59]:
question4(100,100,1,30)

Probability of seeing the favorite driver =  0.01
Probability of seeing a formula 1 driver =  1.0
Number of shows needed to attend for 50% of the chance sharing a cocktail with the player: 15


Case 1b:

In [60]:
question4(40, 100, 1, 25)

Probability of seeing the favorite driver =  0.01
Probability of seeing a formula 1 driver =  0.4
Number of shows needed to attend for 50% of the chance sharing a cocktail with the player: 32


As we can see, both the cases are verified here

### Case 2

So in this also we will have two scenarios for **Case 2**:<br>
**Case 2a:** Each brand has one brand ambassador (there are total of 100 brands each having one ambassador) and one fashion event<br>
**Case 2b:** Each brand has possibility of more than one brand ambassador (Total number of brands is less than 100) and more than one fashion event

According to **Case 2a**<br>
* Since there is only one favourite driver and he can attend any one of the 100 fashion events, therefore the probability of meeting your favourite Formula 1 driver (retired or active) = 1/100
* Now since there are 100 fashion events and each of the 100 brands have one ambassador each, therefore every event surely has one Formula 1 driver attending that event. Therefore 100 F1 drivers attend the 100 fashion events and thus the probability of one F1 driver at one fashion event becomes 100/100 = 1
* For sharing a cocktail with a F1 driver, you need to ask him for consent. There are two possibilities, the driver either **agrees** or **disagrees** with equal probability. Thus probability of sharing a cocktail becomes 1/2. And since the first day you are going to meet a F1 driver is Day 1, hence from **Day 1** your probability of sharing a cocktail becomes atleast 0.5 or 50%

According to **Case 2b**<br>
* Here as well, there is only one favourite driver and he can attend any one of the 100 fashion events, therefore the probability of meeting your favourite Formula 1 driver (retired or active) still remains 0.01 or 1/100
* Now since there are less than 100 fashion brands, hence each brand has more than one ambassador and more than one fashion event. For instance, say there are only 80 brands and 20 of these 80 brands have 2 ambassadors and 2 events each. And there is also a probability that both of these ambassadors of all the brands attend only one of the two events. Hence there are only a total of 80 events where F1 drivers actually attend an event, and 20 events have no drivers in attendance. Thus the probability now becomes (no. of events attended by drivers)/(total number of shows held) = 80/100
* For sharing a cocktail with a F1 driver, you need to ask him for consent and again there are two possibilities, the driver either **agrees** or **disagrees** with equal probability. But for you to ask the driver, ou need to have atleast **one** F1 driver present at the event. Considering the scenario from point 1, since there are only 80 events with actual attendance. It can be possible that the 20 events with no driver attendance take place first (worst case scenario). Thus there would be an attendance from the 21st event (100 - 80 + 1 or total events - total events with attendance + 1) and probability of sharing a cocktail is 1/2 depending upon the agreemnt or disagreement from the driver. Since in this worst case scenario, the first day you are going to meet a F1 driver is Day 21, hence from **Day 21** your probability of sharing a cocktail becomes atleast 0.5 or 50%

Putting all of these scenarios in a function using formulas specified above, we obtain the below function `question4_v2`

In [61]:
def question4_v2(shows_attended_by_drivers , total_shows_held):
    
    # probability you will see your favorite driver (e.g. Lewis Hamilton) assuming you don't know which driver represents which brand
    prob_of_seeing_favorite_driver = 1/total_shows_held       #since everyone has only *one* favourite driver
    print('Probability of seeing your favorite F1 driver = ', prob_of_seeing_favorite_driver)

    #probability of seeing a formula 1 driver
    probability_of_seeing_a_formula1_driver = shows_attended_by_drivers/total_shows_held
    print('Probability of seeing *a* Formula 1 driver = ', probability_of_seeing_a_formula1_driver )

    # How many fashion shows do you need to attend per season to have at least a 50% chance to share a cocktail with a Formula 1 driver?
    # Here we consider the least number of events you need to attend in order to ask the driver for a cocktail, where the driver would then agree or disagree with equal probability
    # similar to finding (shows with no drivers + 1) to find the first day in a worst case scenario from when the drivers start attending the shows
    shows_required_to_share_cocktail = total_shows_held - shows_attended_by_drivers + 1   
    print("Minimum number of shows needed to attend for 50% of the chance sharing a cocktail with the player:" , shows_required_to_share_cocktail)

Lets execute both our cases below

Case 2a:

In [62]:
question4_v2(100,100)

Probability of seeing your favorite F1 driver =  0.01
Probability of seeing *a* Formula 1 driver =  1.0
Minimum number of shows needed to attend for 50% of the chance sharing a cocktail with the player: 1


Case 2b:

In [63]:
question4_v2(80,100)

Probability of seeing your favorite F1 driver =  0.01
Probability of seeing *a* Formula 1 driver =  0.8
Minimum number of shows needed to attend for 50% of the chance sharing a cocktail with the player: 21


As seen above, both our cases are fully satisfied and verified from our function