 Introduction to Data Science for Sports

<br />
<center>
    <img src="ipynb.images/f1races.png" width=800 />
</center>

Question 1.1 (20 points) There are a number of F1 races coming up: 
- Singapore GP: Date: Sun, Sep 22, 8:10 AM
- Russian GP: Date: Sun, Sep 29, 7:10 AM
- Japanese GP: Date: Sun, Oct 13, 1:10 AM
- Mexican GP Date: Sun, Oct 13, 1:10 AM

The Singaporean Grand Prix this weekend and the Russian Grand Prix the weekend after, as you can see [here](https://www.formula1.com/en/racing/2019.html). 

The 2019 driver standings are given [here](https://www.formula1.com/en/results.html/2019/drivers.html). Given these standings (please do not use team standings given onthe same Web site, use driver standings), what is the Probability Distribution for each F1 driver to win the Singaporean Grand Prix? What is the Probability Distribution for each F1 driver to win *both* the Singaporean and Russian Grand Prix? What is the probability for Mercedes to win both races? What is the probability for Mercedes to win at least one race? Note that Mercedes, and each other racing team, has two drivers per race.

Question 1.2 (30 points) If Mercedes wins the first race, what is the probability that Mercedes wins the next one? If Mercedes wins at least one of these two races, what is the probability Mercedes wins both races? How about Ferrari, Red Bull, and Renault?

Question 1.3 (50 points) Mercedes wins one of these two races on a **rainy** day. What is the probability Mercedes wins both races, assuming races can be held on either rainy, sunny, cloudy, snowy or foggy days? Assume that rain, sun, clouds, snow, and fog are the *only possible weather conditions* on race tracks.

You need to provide *proof* for your answers. `I think it's one in a million because Mercedes sucks and I like Ferrari a lot more` is not a good answer. Leverage the counting framework in this workbook!

## Solutions

In [50]:
class ProbDist(dict):
    """A Probability Distribution; an {outcome: probability} mapping."""
    def __init__(self, mapping=(), **kwargs):
        self.update(mapping, **kwargs)
        # Make probabilities sum to 1.0; assert no negative probabilities
        total = sum(self.values())
        for outcome in self:
            self[outcome] = self[outcome] / total
            assert self[outcome] >= 0

In [51]:
def p(event, space): 
    """The probability of an event, given a sample space of outcomes. 
    event: a collection of outcomes, or a predicate that is true of outcomes in the event. 
    space: a set of outcomes or a probability distribution of {outcome: frequency} pairs."""
    
    # if event is a predicate it, "unroll" it as a collection 
    if is_predicate(event):
        event = such_that(event, space)
        
    # if space is not an equiprobably collection (a simple set), 
    # but a probability distribution instead (a dictionary set),
    # then add (union) the probabilities for all favorable outcomes
    if isinstance(space, ProbDist):
        return sum(space[o] for o in space if o in event)
    
    # simplest case: what we played with in our previous lesson
    else:
        return Fraction(len(event & space), len(space))

is_predicate = callable

# Here we either return a simple collection in the case of equiprobable outcomes, or a dictionary collection in the
# case of non-equiprobably outcomes
def such_that(predicate, space): 
    """The outcomes in the sample pace for which the predicate is true.
    If space is a set, return a subset {outcome,...} with outcomes where predicate(element) is true;
    if space is a ProbDist, return a ProbDist {outcome: frequency,...} with outcomes where predicate(element) is true."""
    if isinstance(space, ProbDist):
        return ProbDist({o:space[o] for o in space if predicate(o)})
    else:
        return {o for o in space if predicate(o)}

Please find the solutions to the following questions below
## Q1.1
## a) Probability Distribution for each F1 driver to win the Singaporean Grand Prix

In [52]:
SGP = ProbDist(LH = 284,VB=221,MV = 185,CL=182,SV=169,PG=65,
    CS=58,DR=34,AA=34,DK=33,NH=31,KR=31,SP=27,LN=25,
    LS=19,KM=18,RG=8,AG=3,RK=1,GR=0)
    
SGP

{'AA': 0.023809523809523808,
 'AG': 0.0021008403361344537,
 'CL': 0.12745098039215685,
 'CS': 0.04061624649859944,
 'DK': 0.023109243697478993,
 'DR': 0.023809523809523808,
 'GR': 0.0,
 'KM': 0.012605042016806723,
 'KR': 0.021708683473389355,
 'LH': 0.19887955182072828,
 'LN': 0.01750700280112045,
 'LS': 0.01330532212885154,
 'MV': 0.12955182072829133,
 'NH': 0.021708683473389355,
 'PG': 0.04551820728291316,
 'RG': 0.0056022408963585435,
 'RK': 0.0007002801120448179,
 'SP': 0.018907563025210083,
 'SV': 0.11834733893557423,
 'VB': 0.15476190476190477}

## b) Probability distribution for each F1 racer winning both Singapore and Russian GPs.

In [53]:
 # For Russian GP - RGP
#Taking the driver standing values post the Singapore GP

RGP = ProbDist(LH = 296,VB=231,MV = 200,CL=200,SV=194,PG=69,
    CS=58,DR=34,AA=42,DK=33,NH=33,KR=31,SP=27,LN=31,
    LS=19,KM=18,RG=8,AG=4,RK=1,GR=0)

In [54]:
# Obtaining the joint probability space for all driver combinations in Russian and Singapore GPs
def joint(A, B, sep=''):
    """The joint distribution of two independent probability distributions. 
    Result is all entries of the form {a+sep+b: P(a)*P(b)}"""
    return ProbDist({a + sep + b: A[a] * B[b]
                    for a in A
                     for b in B
                    })

SRGP = joint(SGP, RGP, ' ')
# SRGP - Singapore and Russian Joint probability space 
SRGP

{'AA AA': 0.0006540222367560493,
 'AA AG': 6.22878320720047e-05,
 'AA CL': 0.003114391603600235,
 'AA CS': 0.0009031735650440681,
 'AA DK': 0.0005138746145940388,
 'AA DR': 0.00052944657261204,
 'AA GR': 0.0,
 'AA KM': 0.0002802952443240212,
 'AA KR': 0.0004827306985580365,
 'AA LH': 0.004609299573328348,
 'AA LN': 0.0004827306985580365,
 'AA LS': 0.0002958672023420224,
 'AA MV': 0.003114391603600235,
 'AA NH': 0.0005138746145940388,
 'AA PG': 0.0010744651032420813,
 'AA RG': 0.0001245756641440094,
 'AA RK': 1.5571958018001176e-05,
 'AA SP': 0.0004204428664860317,
 'AA SV': 0.003020959855492228,
 'AA VB': 0.0035971223021582714,
 'AG AA': 5.770784441965141e-05,
 'AG AG': 5.495985182823945e-06,
 'AG CL': 0.0002747992591411972,
 'AG CS': 7.969178515094718e-05,
 'AG DK': 4.534187775829754e-05,
 'AG DR': 4.671587405400353e-05,
 'AG GR': 0.0,
 'AG KM': 2.4731933322707752e-05,
 'AG KR': 4.259388516688557e-05,
 'AG LH': 0.0004067029035289719,
 'AG LN': 4.259388516688557e-05,
 'AG LS': 2.610592

In [None]:
# Here is the probability distribution of each racer winning both races  

In [55]:
# predicate defining the outcomes where both the drivers are same i.e same driver wins in both races.
def WinBothRaces(outcome): return outcome[0]+outcome[1]==outcome[3]+outcome[4] 
# now we return the collection of outcomes where the same driver wins both the races, from the joint probability space of Singapore and Russian (SRGP)
# which gives us our desired probabilty distribtuion of same racer winning both the races.
such_that(WinBothRaces,SRGP)

{'AA AA': 0.005554842398851687,
 'AG AG': 4.667934788950998e-05,
 'CL CL': 0.1415940219315136,
 'CS CS': 0.013085777191692632,
 'DK DK': 0.004236150820973031,
 'DR DR': 0.004496777180022795,
 'GR GR': 0.0,
 'KM KM': 0.0012603423930167695,
 'KR KR': 0.003738237776818258,
 'LH LH': 0.32700439174864726,
 'LN LN': 0.0030147078845308532,
 'LS LS': 0.0014042703823427584,
 'MV MV': 0.1439279893259891,
 'NH NH': 0.003979414407580726,
 'PG PG': 0.017446406273704355,
 'RG RG': 0.0002489565220773866,
 'RK RK': 3.889945657459165e-06,
 'SP SP': 0.0028357703842877306,
 'SV SV': 0.1275357583254562,
 'VB VB': 0.19858561575894784}

## c) Probability for Mercedes to win both races - is the probability that Lewis Hamilton or Valterri Bottas have to win both races 

In [56]:
def BothMercedes(outcome): 
    return (('LH' in outcome and 'VB' in outcome) or 
            (outcome.startswith('LH') and outcome.endswith('LH')) or 
            (outcome.startswith('VB') and outcome.endswith('VB')))
such_that(BothMercedes,SRGP)
# here is the probability 
p(BothMercedes,SRGP)

0.12188950138590421

## d) Probability for Mercedes to win at least one race

In [57]:
def Mercedes(outcome): return 'LH' in outcome or 'VB' in outcome
such_that(Mercedes,SRGP)
#probability 
p(Mercedes,SRGP)

0.5764216739671669

## Q1.2
## a) If Mercedes wins the first race, what is the probability that Mercedes wins the next one? 


In [58]:
def firstMer(outcome): return outcome.startswith('LH') or outcome.startswith('VB')
# firstMer -> Mercedes in the first race
such_that(firstMer,SRGP) 

def secondMer(outcome):return outcome.endswith('LH') or outcome.endswith('VB')
# secondMer - > Mercedes in the second race
such_that(secondMer,SRGP)


'''Here is the PROBABILITY of mercedes winning the secondrace given that first race is won by Mercedes''' 
p(secondMer,such_that(firstMer,SRGP))


0.3446697187704381

## b) If Mercedes wins at least one of these two races, what is the probability Mercedes wins both races? 


In [59]:
# such_that(Mercedes,SRGP) returns the sample where Mercedes wins atleast one of the 2 races as we have obtained earlier, 
# so given that, we are calculating the event of Mercedes winning both using the predicate BothMercedes,
# which is also already implemented above"

p(BothMercedes,such_that(Mercedes,SRGP))


0.21145891435173053

## c) How about Ferrari, Red Bull, and Renault? 

Ans - Similarly implementing the same probability functions for the above 3  
1- Ferrari it is the ProbDist outcomes for Charles Leclerc(CL) or Sebastien Vettel(SV)

2- Red Bull - Max Verstappen(MV) or Alexander Albon(AA)

3- Renault - Daniel Ricciardo(DR) or Nico Hulkenberg(NH)

## c.1) Ferrari -- 

## Probability of Ferrari winning the secondrace given that first race is won by Ferrari

In [60]:
'''Ferrari'''
def firstFerr(outcome): return outcome.startswith('CL') or outcome.startswith('SV')
# firstFerr -> Ferrari in the first race
such_that(firstFerr,SRGP) 

def secondFerr(outcome):return outcome.endswith('CL') or outcome.endswith('SV')
# secondFerr - > Ferrari in the second race
such_that(secondFerr,SRGP)


'''Here is the PROBABILITY of Ferrari winning the secondrace given that first race is won by Ferrari''' 
p(secondFerr,such_that(firstFerr,SRGP))



0.25768476128188367

## Probability of Ferrari winning both races given that it has won atleast one of the 2 races

In [61]:

# defining the predicate for ferrari to win both races
def BothFerrari(outcome): return outcome.startswith('CL') and outcome.endswith('CL') or outcome.startswith('SV') and outcome.endswith('SV')
such_that(BothFerrari,SRGP)

# predicate for ferrari to win atlease one race
def Ferrari(outcome): return 'CL' in outcome or 'SV' in outcome
such_that(Ferrari,SRGP)

'''Probability of Ferrari winning both races given that it has won atleast one of the 2 races '''
p(BothFerrari,such_that(Ferrari,SRGP))

0.0719924829633607

## c.2) RedBull -- 

## Probability of RedBull winning the secondrace given that first race is won by RedBull

In [62]:

'''RedBull'''
def firstRB(outcome): return outcome.startswith('MV') or outcome.startswith('AA')
#firstRB -> RedBull in the first race
such_that(firstRB,SRGP) 

def secondRB(outcome):return outcome.endswith('MV') or outcome.endswith('AA')
#secondRB - > RedBull in the second race
such_that(secondRB,SRGP)


'''Here is the probability of RedBull winning the secondrace given that first race is won by RedBull'''
p(secondRB,such_that(firstRB,SRGP))



0.15827338129496407

## Probability of RedBull winning both races given that it has won atleast one of the 2 races

In [63]:

# defining the predicate for RedBull to win both races"
def BothRB(outcome): return outcome.startswith('MV') and outcome.endswith('MV') or outcome.startswith('AA') and outcome.endswith('AA')
such_that(BothRB,SRGP)

# predicate for ferrari to win atlease one race"
def RedBull(outcome): return 'MV' in outcome or 'AA' in outcome
such_that(RedBull,SRGP)

'''Probability of RedBull winning both races given that it has won atleast one of the 2 races '''
p(BothRB,such_that(RedBull,SRGP))

0.06124677055093085

## c.3) Renault -- 

## Probability of Renault winning the secondrace given that first race is won by Renault

In [64]:
'''Renault'''
def firstRenault(outcome): return outcome.startswith('DR') or outcome.startswith('NH')
# firstRenault -> Renault in the first race
such_that(firstRenault,SRGP) 

def secondRenault(outcome):return outcome.endswith('DR') or outcome.endswith('NH')
# secondRenault - > Renault in the second race
such_that(secondRenault,SRGP)


'''here is the probability of Renault winning the secondrace given that first race is won by Renault'''
p(secondRenault,such_that(firstRenault,SRGP))



0.043819489862655346

## Probability of Renault winning both races given that it has won atleast one of the 2 races

In [65]:

# defining the predicate for renault to win both races
def BothRenault(outcome): return outcome.startswith('DR') and outcome.endswith('DR') or outcome.startswith('NH') and outcome.endswith('NH')
such_that(BothRenault,SRGP)

# predicate for renault to win atlease one race
def Renault(outcome): return 'DR' in outcome or 'NH' in outcome
such_that(Renault,SRGP)

'''Probability of Renault winning both races given that it has won atleast one of the 2 races'''
p(BothRenault,such_that(Renault,SRGP))

0.011425964573741784

## Q 1.3

Mercedes wins one of these two races on a rainy day. What is the probability Mercedes wins both races, assuming races can be held on either rainy, sunny, cloudy, snowy or foggy days? Assume that rain, sun, clouds, snow, and fog are the only possible weather conditions on race tracks.


In [66]:
# There are 5 weather conditions - rain, sun,clouds , snow and fog 
# Implementing the Probability distribution taking weather also as one of the criteria.


In [67]:
weather=ProbDist(rain=1,sun=1,clouds=1,snow=1,fog=1)
weather

{'clouds': 0.2, 'fog': 0.2, 'rain': 0.2, 'snow': 0.2, 'sun': 0.2}

In [68]:
# Taking the weather conditions into effect
def joint(A, B, C , D, sep=''):
        return ProbDist({a + sep + c + sep + b + sep + d  : A[a] * B[b] * C[c] * D[d]
                    for a in A
                     for b in B
                      for c in C
                       for d in D 
                    })

SRWC = joint(SGP, RGP, weather, weather, ' ')
''' SRWC - Joint probability space of Singapore and Russian races including the weather conditions '''
SRWC

{'NH sun SP clouds': 1.533379866007864e-05,
 'KM fog AG snow': 1.3190364438777328e-06,
 'PG clouds KM rain': 2.143434221301315e-05,
 'SP fog KM sun': 8.903495996174692e-06,
 'AA snow SP sun': 1.6817714659441087e-05,
 'KM fog VB sun': 7.617435463393906e-05,
 'GR clouds MV snow': 0.0,
 'RK clouds MV rain': 3.6639901218825906e-06,
 'AG clouds LH rain': 1.6268116141158703e-05,
 'RK fog KM clouds': 3.297591109694331e-07,
 'LS clouds GR fog': 0.0,
 'RK snow GR clouds': 0.0,
 'DK sun LN clouds': 1.874130947342945e-05,
 'LH sun KM sun': 9.3651587515319e-05,
 'LN fog AG snow': 1.8319950609412953e-06,
 'RK clouds MV sun': 3.6639901218825906e-06,
 'RK snow DR clouds': 6.228783207200403e-07,
 'LN sun GR rain': 0.0,
 'CS snow LS snow': 2.0188585571573077e-05,
 'SP snow GR fog': 0.0,
 'KM sun LH clouds': 9.760869684695219e-05,
 'SV clouds LH rain': 0.0009164372092852736,
 'RG rain LN snow': 4.5433477511344126e-06,
 'RK fog VB clouds': 4.2319085907743915e-06,
 'VB rain AG rain': 1.619483633872105e-05

In [69]:
# Mercedes winning one of the races on a rainy day 
def MercedesRain(outcome): return ('LH rain' in outcome or 'VB rain' in outcome)
such_that(MercedesRain,SRWC)

{'AA clouds LH rain': 0.0013678800984590967,
 'AA clouds VB rain': 0.0010675010227839572,
 'AA fog LH rain': 0.0013678800984590967,
 'AA fog VB rain': 0.0010675010227839572,
 'AA rain LH rain': 0.0013678800984590967,
 'AA rain VB rain': 0.0010675010227839572,
 'AA snow LH rain': 0.0013678800984590967,
 'AA snow VB rain': 0.0010675010227839572,
 'AA sun LH rain': 0.0013678800984590967,
 'AA sun VB rain': 0.0010675010227839572,
 'AG clouds LH rain': 0.00012069530280521444,
 'AG clouds VB rain': 9.41912667162315e-05,
 'AG fog LH rain': 0.00012069530280521444,
 'AG fog VB rain': 9.41912667162315e-05,
 'AG rain LH rain': 0.00012069530280521444,
 'AG rain VB rain': 9.41912667162315e-05,
 'AG snow LH rain': 0.00012069530280521444,
 'AG snow VB rain': 9.41912667162315e-05,
 'AG sun LH rain': 0.00012069530280521444,
 'AG sun VB rain': 9.41912667162315e-05,
 'CL clouds LH rain': 0.00732218170351634,
 'CL clouds VB rain': 0.0057142701807847114,
 'CL fog LH rain': 0.00732218170351634,
 'CL fog VB 

In [70]:
# predicate defining that mercedes wins both races taking weather conditions into consideration
def BothMercedesWeather(outcome): 
    return (('LH' in outcome and 'VB' in outcome) or (outcome.count('LH')==2) or (outcome.count('VB')==2))
such_that(BothMercedesWeather,SRWC)


{'LH clouds LH clouds': 0.012634790613786237,
 'LH clouds LH fog': 0.012634790613786237,
 'LH clouds LH rain': 0.012634790613786237,
 'LH clouds LH snow': 0.012634790613786237,
 'LH clouds LH sun': 0.012634790613786237,
 'LH clouds VB clouds': 0.009860258891164257,
 'LH clouds VB fog': 0.009860258891164257,
 'LH clouds VB rain': 0.009860258891164257,
 'LH clouds VB snow': 0.009860258891164257,
 'LH clouds VB sun': 0.009860258891164257,
 'LH fog LH clouds': 0.012634790613786237,
 'LH fog LH fog': 0.012634790613786237,
 'LH fog LH rain': 0.012634790613786237,
 'LH fog LH snow': 0.012634790613786237,
 'LH fog LH sun': 0.012634790613786237,
 'LH fog VB clouds': 0.009860258891164257,
 'LH fog VB fog': 0.009860258891164257,
 'LH fog VB rain': 0.009860258891164257,
 'LH fog VB snow': 0.009860258891164257,
 'LH fog VB sun': 0.009860258891164257,
 'LH rain LH clouds': 0.012634790613786237,
 'LH rain LH fog': 0.012634790613786237,
 'LH rain LH rain': 0.012634790613786237,
 'LH rain LH snow': 0.0

## Probability of mercedes winning both races given that mercedes has won one of the races on a rainy day.

In [71]:
# probability of mercedes winning both races given that mercedes has won one of the races on a rainy day
p(BothMercedesWeather,such_that(MercedesRain,SRWC))

0.32555315282499075