# Conditional Probability

We will be looking into independent and dependent events, 
concepts of conditional probablity, permutations, and combinations. 

Independent and Dependent events follow from their names. 
Independent events are those whose occurence is wholly independent of another event, 
or at least the events we are calculating probabilities for. 
Dependent events exist in a space where the likliehood of one event depends on or is influenced by the occurrence of the other. 
We express this below mathematically. 

Conditional probability is the probability of seeing some event knowing that some other event has actually occurred. For example, weather forecasting is based on conditional probabilities. 
When the forecast says that there is a 30% chance of rain, 
that probability is based on all the information that the meteorologists know up until that point.

http://ftp.cs.pu.edu.tw/network/CRAN/web/packages/prob/vignettes/prob.pd

### Independent and Dependent Events


**Independent: ** Events A and B are said to be independent if

            (A ∩ B) = (A).(B)

Otherwise, the events are said to be **dependent**.
    
                                                   P(A ∩ B)     P(A).P(B)
          when P(B) > 0  we can write,    P(A|B) = --------  =  --------- = P(A)
                                                     P(B)          P(B)
                                                     

When A and B are independent, the numerator of the fraction factors so that (B) cancels with the result:
 
            P(A|B) = P(A) when A, B are independent



Let's consider the example of tossing ten coins to illustrate the nature of independent events. 
So, what is the probability of observing at least one Head? 

Imagine that we are tossing the coins in such a way that they do not interfere with each other; i.e. they are independent events. 

The only way there will not be at least one Head is if all tosses are Tails. 
Therefore,
         
         P(at least one H) = 1 − P(all T),


                             1 - (1/2)^10

In [13]:
library(prob)

In [14]:
Space <- tosscoin(10, makespace = TRUE)

In [18]:
# the isrep function in the prob package will test each row of space to see whether the value T appears 10 times 
# and returns true or false for each row it checks. The subset function is logical so it makes a subset 
# with the rows which are true

R <- isrep(Space, vals = "T", nrep = 10)
R

In [20]:
length(R)

In [16]:
A <- subset(Space, R)
A

Unnamed: 0,toss1,toss2,toss3,toss4,toss5,toss6,toss7,toss8,toss9,toss10,probs
1024,T,T,T,T,T,T,T,T,T,T,0.0009765625


In [17]:
1 - Prob(A)

#### Repeated Experiments with Independent Events

Experiments are repeated when we want to discern the probability of two events occuring more reliably. 
Often, a single experiment does not yield sufficient data. 
Therefore, it is common to repeat a certain experiment multiple times under identical 
conditions and in an independent manner. 
Experiments like tossing a coin repeatedly, rolling a die or dice, etc. are repreated experiments.

The `iidspace` function in the "prob" library in R (note "*`library(prob)`* in the code above) implements repeated experiments. 
It takes three arguments: 
`x`, which is a vector of outcomes, 
`ntrials`, which is an integer telling how many times to repeat the experiment, and 
`probs` to specify the probabilities of the outcomes of x in a single trial.

In [22]:
coin <- c("H", "T")
coin

In [24]:
probs <- c(0.5, 0.5)
probs

In [26]:
iidspace(coin, ntrials = 3, probs = probs)

X1,X2,X3,probs
H,H,H,0.125
T,H,H,0.125
H,T,H,0.125
T,T,H,0.125
H,H,T,0.125
T,H,T,0.125
H,T,T,0.125
T,T,T,0.125


### Dependent Events / Conditional Probability

Conditional probabilities only exist for dependent events. 
Consider why: 
If events are independent, then their occurence probabilities cannot be expressed together in an equation. 
If events are *dependent*, then their probabilities can be expressed as one depending on the other or in an expression of mutual dependence. 

Consider an example of drawing cards from a full deck of 52 standard playing cards as an example of *dependent events* and *conditional probablity*. 
Select two cards from the deck, in succession. 

    Let A = {first card drawn is an Ace} and B = {second card drawn is an Ace}. 

Since there are four Aces in the deck, it is natural to assign P(A) = 4/52. 
Let's unpack how this probability changes after the first card is drawn,
because after the first card is drawn there are only 51 cards remaining. 

Suppose we looked at the first card. 
What is the probability of B now? 
The answer depends on the value of the first card. 
If the first card is an Ace, then the probability that the second also is an Ace should be 3/51, 
but if the first card is not an Ace, then the probability that the second is an Ace should be 4/51. 

Mathematically, for these two situations we write
    
    P(B|A) = 3/51, P(A) = 4/52
    
The probability of B being an Ace, if A was an Ace is 3/51. 
The probability of A being an Ace is 4/52, because no cards have been drawn yet. 
    
Definition: The conditional probability of B given A, denoted P(B|A), is defined by
    
    
              P(A ∩ B)
    P(B|A) =  --------
                P(A)
                
P(A ∩ B) means that A & B intersect. 
They intersect if they are the same card (in this case, an Ace). 
P(A) means the probabilty of A being an Ace
    
              P(A ∩ B)                                  3      4
    P(B|A) =  -------- => P(A ∩ B) => P(B|A) * P(A) => --- *  --- => 0.0045
                P(A)                                    51     52

In [27]:
(3/51) * (4/52)

**Example: **Let's work out an example. 
Toss a six-sided die _twice_. 
The sample space consists of all ordered pairs (i, j)

of the numbers 1, 2, . . . , 6, that is, S = {(1, 1), (1, 2), . . . ,(6, 6)}. 

Essentially, "i" is one die and "j" is another die. 

Let A = {outcomes match} and B = {sum of outcomes at least 8}.

The first thing to do is to set up the probability space with the rolldie function. "S" is the probability space. 
"rolldie" is the function. 2 is the number of die. 

In [29]:
S <- rolldie(2, makespace = TRUE)
# S contains all 36 possible outcomes
# each outcome has an identical probability of 0.277
head(S, 10)

X1,X2,probs
1,1,0.02777778
2,1,0.02777778
3,1,0.02777778
4,1,0.02777778
5,1,0.02777778
6,1,0.02777778
1,2,0.02777778
2,2,0.02777778
3,2,0.02777778
4,2,0.02777778


In [30]:
# Subsetting sample space S for outcomes matching event A (outcomes match). 
# This results in a set where both die are the same (i and j are equal)
A <- subset(S, X1 == X2)
A

Unnamed: 0,X1,X2,probs
1,1,1,0.02777778
8,2,2,0.02777778
15,3,3,0.02777778
22,4,4,0.02777778
29,5,5,0.02777778
36,6,6,0.02777778


In [31]:
# Subsetting sample space S
# for outcomes matching event B (sum of outcomes at least 8)
# the die total must be 8 or more
B <- subset(S, X1 + X2 >= 8)
B

Unnamed: 0,X1,X2,probs
12,6,2,0.02777778
17,5,3,0.02777778
18,6,3,0.02777778
22,4,4,0.02777778
23,5,4,0.02777778
24,6,4,0.02777778
27,3,5,0.02777778
28,4,5,0.02777778
29,5,5,0.02777778
30,6,5,0.02777778


In [35]:
# when calculating conditional probability, we should use the "given" argument of the prob function as show below:
# A is the event of getting the same outcome {1, 1}, {2, 2}, {3, 3}
# B is the event of getting the outcomes with the sum >= 8 {2, 6}, {3, 6}, {5, 6}

paste("P(A|B):", Prob(A, given = B))

paste("P(B|A):", Prob(B, given = A))

In [36]:
# instead of defining events A and B, you can directly do conditional probability. 

paste ("P(A|B):", Prob(S, X1 == X2, given = (X1 + X2 >= 8)))

paste("P(B|A):", Prob(S, X1 + X2 >= 8, given = (X1 == X2)))

The above examples shown are simple applications of conditional probability on a die. 
`prob` package can be extended to multivariate datasets where events can be defined 
as columns and supplied as arguments, like in the previous examples.

### Permutations and Combinations

The main difference between combinations and permutations is that a combination does not take into account the order, whereas a permutation does.

Consider a simple example from [mathisfun](http://www.mathsisfun.com/combinatorics/combinations-permutations.html). 
When we say "My fruit salad is a **combination** of apples, grapes and bananas", we are not bothered about what order the fruits are in. No matter in which order you mention the fruits, it's the same fruit salad.

But when we say "You need the combination 123 to open the safe", 
we care about the order of numbers. 
No other combination will work to open the safe. 
It has to be exactly 1-2-3. 
This is a **permutation**.

  * When the order doesn't matter, it is a Combination.
	
  * When the order does matter, it is a Permutation.
    

There are many ways you can create permutations and combinations in R. 
We will be using combinat package for this. 

**combn():** `combn()` is used to generate combinations. Its usage is illustrated below. 

`Usage`

    combn(x, m, fun=NULL, simplify=TRUE, ...)


`Arguments`

    x         vector source for combinations i'e the vector of elements used to generate the combinations 
    m         number of elements in each combination. If you specify 2 as input, combinations of size two are generated.
    fun       function to be applied to each combination (may be null). It can be any function like sum(), mean() etc.
    simplify  logical, if FALSE, returns a list, otherwise returns vector or array. 
    ...       args to fun

It generates all combinations of the elements of x taken m at a time. 
In code snippet below, we have given an input of 4 to x and 2 to m. 
So, the function has to return combinations of size 2 using the numbers {1,2,3,4}, like {{1,2},{1,3}....}. 

If argument FUN is not NULL, the code applies a function given by the argument to each point. 
We will supply sum() as the function. 
If `simplify` is FALSE, it returns a list; otherwise, it returns an array, typically a matrix. 
"..." are passed unchanged to the FUN function, if specified.

In [37]:
library(combinat)

In [38]:
# Generate different possible combinations of size 2 using numbers {1, 2, 3, 4}
combn(4, 2)

0,1,2,3,4,5
1,1,1,2,2,3
2,3,4,3,4,4


In [39]:
# sum of elements of each combination
combn(4, 2, sum)

**permn(): ** `permn()` is used to generate permutations. 

`Usage`

    permn(x, fun=NULL, ...)
        
        
`Arguments`

    x    vector source for permutations i'e the vector of elements used to generate the permutations 
    fun  if non.null, applied at each perm

Generates all permutations of the elements of x. 
In the example below we have given 3 as our input in order to generate permutations of size 3, like {{1,2,3},{1,3,2},{2,1,3}...} etc. 
If argument "fun" is not null, it applies a function given by the argument to each point. 

In [40]:
# generate different possible permutations using number (1, 2, 3)
permn(3)

In [41]:
# generate different possible permutations using numbers (1, 2, 3) and return the standard deviation
permn(3, sd)

In [42]:
length(permn(3))

### Extensions of probability to multivariate data

We have seen how conditional probability has been applied to simple dice events. 
Let's continue our discussion to multivariate data. 
We will work with the motor vehicle thefts dataset. 
The data is a combination of both factor and continuous variables. 
The table() command is used extensively when dealing with conditional probability.

In [43]:
vehicle_thefts <- read.csv("datasets/mvt.csv", header = TRUE)

In [44]:
head(vehicle_thefts)

ID,Date,LocationDescription,Arrest,Domestic,Beat,District,CommunityArea,Year,Latitude,Longitude
8951354,12/31/2012 23:15,STREET,False,False,623,6,69,2012,41.75628,-87.62164
8951141,12/31/2012 22:00,STREET,False,False,1213,12,24,2012,41.89879,-87.6613
8952745,12/31/2012 22:00,RESIDENTIAL YARD (FRONT/BACK),False,False,1622,16,11,2012,41.96919,-87.76767
8952223,12/31/2012 22:00,STREET,False,False,724,7,67,2012,41.76933,-87.65773
8951608,12/31/2012 21:30,STREET,False,False,211,2,35,2012,41.83757,-87.62176
8950793,12/31/2012 20:30,STREET,True,False,2521,25,19,2012,41.92856,-87.754


In [45]:
DateConvert <- strptime(vehicle_thefts$Date, "%m/%d/%Y")

In [47]:
DateConvert[0:2]

[1] "2012-12-31 CST" "2012-12-31 CST"

In [48]:
vehicle_thefts$Month <- months(DateConvert)
vehicle_thefts$Month[0:2]

In [49]:
vehicle_thefts$Weekday <- weekdays(DateConvert)
vehicle_thefts$Weekday[0:2]

In [52]:
with(vehicle_thefts, table(Arrest, Domestic))


       Domestic
Arrest   FALSE   TRUE
  FALSE 175755    350
  TRUE   15471     65

In [53]:
# We are trying to find out the probability of arrest happening given the theft is a Domestic type. 
# Mathematically, this is represented as P(Arrest|Domestic) = P( Arrest & Domestic)/P(Domestic)

# P( Arrest & Domestic) = 65. Look at above table for the instances where arrest is TRUE and domestic is TRUE
# P(Domestic) = 350. Look for instances where Domestic is true in above table.
65/(350+65)

* https://www.statmethods.net/stats/withby.html
* https://www.r-bloggers.com/to-attach-or-not-attach-that-is-the-question/

In [54]:
# There are different locations where the cars are being stolen from. Subset the data using top 5 locations in the order of 
#maximum number of thefts, excluding the "Other" category.  Select the bottom 5 of the following options.
sort(table(vehicle_thefts$LocationDescription))


    AIRPORT BUILDING NON-TERMINAL - SECURE AREA 
                                              1 
                 AIRPORT EXTERIOR - SECURE AREA 
                                              1 
                                ANIMAL HOSPITAL 
                                              1 
                                APPLIANCE STORE 
                                              1 
                                      CTA TRAIN 
                                              1 
                        JAIL / LOCK-UP FACILITY 
                                              1 
                                      NEWSSTAND 
                                              1 
                                         BRIDGE 
                                              2 
              COLLEGE/UNIVERSITY RESIDENCE HALL 
                                              2 
                              CURRENCY EXCHANGE 
                                              2 
                   

In [55]:

Top5 <- subset(vehicle_thefts, vehicle_thefts$LocationDescription=="STREET" | 
                               vehicle_thefts$LocationDescription=="PARKING LOT/GARAGE(NON.RESID.)" | 
                               vehicle_thefts$LocationDescription=="ALLEY" | 
                               vehicle_thefts$LocationDescription=="DRIVEWAY - RESIDENTIAL" | 
                               vehicle_thefts$LocationDescription=="GAS STATION")


str(Top5)

'data.frame':	177510 obs. of  13 variables:
 $ ID                 : int  8951354 8951141 8952223 8951608 8950793 8950760 8951611 8951802 8950706 8951585 ...
 $ Date               : Factor w/ 131679 levels "1/1/2001 0:01",..: 42824 42823 42823 42822 42821 42820 42819 42817 42816 42816 ...
 $ LocationDescription: Factor w/ 78 levels "ABANDONED BUILDING",..: 72 72 72 72 72 72 72 72 72 72 ...
 $ Arrest             : logi  FALSE FALSE FALSE FALSE TRUE FALSE ...
 $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ Beat               : int  623 1213 724 211 2521 423 231 1021 1215 1011 ...
 $ District           : int  6 12 7 2 25 4 2 10 12 10 ...
 $ CommunityArea      : int  69 24 67 35 19 48 40 29 24 29 ...
 $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
 $ Latitude           : num  41.8 41.9 41.8 41.8 41.9 ...
 $ Longitude          : num  -87.6 -87.7 -87.7 -87.6 -87.8 ...
 $ Month              : chr  "December" "December" "December" "Dec

Take a look at the number of levels of **LocationDescription**. 
Ideally, the new dataframe `Top5` should contain only five locations: 
STREET, PARKING LOT/GARAGE(NON.RESID.), ALLEY, DRIVEWAY - RESIDENTIAL and GAS STATION. 
However, str() says **LocationDescription** has 78 levels.

So, what's going on here?

In [56]:
# R will remember the other categories of the LocationDescription variable from the original dataset 'vehicle_thefts'. 
# Therefore, update the LocationDescription of Top5 dataframe according to new data. If you forget to update the 
# LocationDescription, the Top5$LocationDescription will contain all 78 levels that you find in vehicle_thefts$LocationDescription.
Top5$LocationDescription = factor(Top5$LocationDescription)

str(Top5)

'data.frame':	177510 obs. of  13 variables:
 $ ID                 : int  8951354 8951141 8952223 8951608 8950793 8950760 8951611 8951802 8950706 8951585 ...
 $ Date               : Factor w/ 131679 levels "1/1/2001 0:01",..: 42824 42823 42823 42822 42821 42820 42819 42817 42816 42816 ...
 $ LocationDescription: Factor w/ 5 levels "ALLEY","DRIVEWAY - RESIDENTIAL",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ Arrest             : logi  FALSE FALSE FALSE FALSE TRUE FALSE ...
 $ Domestic           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ Beat               : int  623 1213 724 211 2521 423 231 1021 1215 1011 ...
 $ District           : int  6 12 7 2 25 4 2 10 12 10 ...
 $ CommunityArea      : int  69 24 67 35 19 48 40 29 24 29 ...
 $ Year               : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
 $ Latitude           : num  41.8 41.9 41.8 41.8 41.9 ...
 $ Longitude          : num  -87.6 -87.7 -87.7 -87.6 -87.8 ...
 $ Month              : chr  "December" "December" "December" "De

In [57]:
# What is the probability that an arrest is made and the place is street?
with(Top5,table(LocationDescription, Arrest))

                                Arrest
LocationDescription               FALSE   TRUE
  ALLEY                            2059    249
  DRIVEWAY - RESIDENTIAL           1543    132
  GAS STATION                      1672    439
  PARKING LOT/GARAGE(NON.RESID.)  13249   1603
  STREET                         144969  11595

In [58]:
# P(arrest|LocationDescription=="street") = P( Arrest& street)/P(street)

# P( Arrest being made & location is 'street') = 11595
# P(street) = 11595 + 144969 
(11595)/(144969+11595)

In [59]:
# What is the probability that an arrest did not happen and the weekday is Monday?
with(Top5,table(Weekday, Arrest))

           Arrest
Weekday     FALSE  TRUE
  Friday    24929  2149
  Monday    23334  1954
  Saturday  23213  2042
  Sunday    22421  2135
  Thursday  23374  1864
  Tuesday   22917  1880
  Wednesday 23304  1994

In [60]:
# P(!Arrest|Weekday=="monday") = P( !Arrest& Weekday)/P(Weekday)

# P( Arrest didn't happen & Weekday is 'monday') = 23334
# P(Weekday  is 'monday') = 23334+1954
23334/(23334+1954)