A probability is a fraction of a finite set.

In [2]:
import pandas as pd

In [3]:
#https://gss.norc.org/
gss = pd.read_csv('./gss_bayes.csv')
gss.head()

Unnamed: 0,caseid,year,age,sex,polviews,partyid,indus10
0,1,1974,21.0,1,4.0,2.0,4970.0
1,2,1974,41.0,1,5.0,0.0,9160.0
2,5,1974,58.0,2,6.0,1.0,2670.0
3,6,1974,30.0,1,5.0,4.0,6870.0
4,7,1974,48.0,1,5.0,4.0,7860.0


In [7]:
# Bankers are defined by indus10 = 6870
banker = gss.indus10 == 6870
banker

0        False
1        False
2        False
3         True
4        False
         ...  
49285    False
49286    False
49287    False
49288    False
49289    False
Name: indus10, Length: 49290, dtype: bool

In [8]:
print(f'Total number of bankers {banker.sum()}')

Total number of bankers 728


In [9]:
print(f'Fraction of bankers {banker.mean()}')

Fraction of bankers 0.014769730168391155


In [11]:
def prob(A):
    # Returns the probability of A, assumes A is a series of true/false values
    return A.mean()

In [12]:
print(f'Fraction of bankers using prob function {prob(banker)}')

Fraction of bankers using prob function 0.014769730168391155


In [13]:
female = gss.sex == 2

In [14]:
print(f'Fraction that are female {prob(female)}')

Fraction that are female 0.5378575776019476


Political Views and Parties

In [15]:
# Political views, 1=Extremely liberal, 7=Extremely conservative, 4=moderate
liberal = gss.polviews <= 3


In [18]:
print(f'Probability of being liberal {prob(liberal)}')

Probability of being liberal 0.27374721038750255


In [20]:
# Party id: 0=strong democrat, 3=independent, 6=strong republican, 7=other
democrat = gss.partyid <= 1 # not strong democrat

In [21]:
print(f'Prob of being democrat {prob(democrat)}')

Prob of being democrat 0.3662609048488537


Conjunction
- `AND` operation between two propositions

In [24]:
print(f'The probability of banker {prob(banker)}, probability of democrat {prob(democrat)}')
print(f'Probabiliy of banker AND democrat {prob(banker & democrat)}')

The probability of banker 0.014769730168391155, probability of democrat 0.3662609048488537
Probabiliy of banker AND democrat 0.004686548995739501


In [26]:
print(f'Conjunction is commutative, therefore democrat AND banker should be the same {prob(democrat & banker)}')
print(f'Asserting: {prob(democrat & banker) == prob(banker & democrat)}')

Conjunction is commutative, therefore democrat AND banker should be the same 0.004686548995739501
Asserting: True


Conditional Probability

In [30]:
# Of all the respondents who are liberal, what fraction are democrats?

#1. Get all respondents who are liberal (the liberal variable)
#2. Compute the fraction of the selected respondents who are democrat
selected = democrat[liberal]
print(f'Probability of being a democrat, given you are a liberal {prob(selected)}')

Probability of being a democrat, given you are a liberal 0.5206403320240125


In [31]:
# What is the probability that a respondent is female, given they are a banker
selected = female[banker]
print(f'Probability of being female, given banker {prob(selected)}')

Probability of being female, given banker 0.7706043956043956


In [32]:
def conditional(proposition, given):
    """Probability of A conditioned on given"""
    return prob(proposition[given])

In [34]:
print(f'Probability liberal given female {conditional(liberal, given=female)}')

Probability liberal given female 0.27581004111500884


Conditionals are NOT commutative

In [35]:
print(f'Banker given female {conditional(banker, given=female)} != female given banker {conditional(female, given=banker)}')

Banker given female 0.02116102749801969 != female given banker 0.7706043956043956


Condition and Conjunction

In [36]:
print(f'Probability female, given liberal and democrat {conditional(female, given=liberal & democrat)}')

Probability female, given liberal and democrat 0.576085409252669


In [37]:
print(f'Liberal female, given banker {conditional(liberal & female, given=banker)}')

Liberal female, given banker 0.17307692307692307


Laws of Probability

Theorem 1: Using a conjunction to compute a conditional probability.

Theorem 2: Using a conditional probability to compute a conjunction.

Theorem 3: Using conditional(A, B) to compute conditional(B, A).

1. `P(A)` is the probability of proposition A
2. `P(A and B)` is the probability of the conjunction A and B, that is, the probability both are true
3. `P(A|B)` is the conditional probability of A given B is true. 

Theorem 1: What fraction of bankers are female

In [39]:
print('First method', female[banker].mean())
print('Second method', conditional(female, given=banker))

#third method
print('Third method', prob(female & banker) / prob(banker))

First method 0.7706043956043956
Second method 0.7706043956043956
Third method 0.7706043956043956


In [None]:
#Third method can be summarized as:
'''
P(A|B) = P(A and B) / P(B) -> Theorem 1
'''

Theorem 2

Take Theorem 1, and do some algebra:
`P(A and B) = P(A|B) * P(B)`

In [40]:
#liberal and democrat
prob(liberal & democrat)

np.float64(0.1425238385067965)

In [41]:
prob(democrat) * conditional(liberal, given=democrat)

np.float64(0.1425238385067965)

Theorem 3

Given conjunctions are commutative: `P(A and B) = P(B and A)`. Therefore, apply theorem 2 to both side:

`P(B)P(A|B) = P(A)P(B|A)`. 

This leaves theorem 3:
`P(A|B) = P(A)P(B|A)/P(B)`, aka Bayes theorem!

In [42]:
conditional(liberal, given=banker)

np.float64(0.2239010989010989)

In [43]:
prob(liberal) * conditional(banker, given=liberal) / prob(banker)

np.float64(0.2239010989010989)

Law of Total Probability

`P(A) = P(B_1 and A) + P(B_2 and A)`

In words, the total probability of
is the sum of two possibilities: either
and
are true or
and
are true. But this law applies only if
and

are:

- Mutually exclusive, which means that only one of them can be true, and

- Collectively exhaustive, which means that one of them must be true.


In [44]:
prob(banker)

np.float64(0.014769730168391155)

In [48]:
male = gss.sex == 1

In [49]:
prob(female & banker) + prob(male & banker)

np.float64(0.014769730168391155)

Using theorem 2, we can rewrite as 
`P(A) = P(B_1) P(A|B_1) + P(B_2)(A|B_2)`

In [50]:
prob(male) * conditional(banker, given=male) + prob(female) * conditional(banker, given=female)

np.float64(0.014769730168391153)

generalizes to:
`P(A) = SUM_i[P(B_i)P(A|B_i)]`

In [51]:
# to test the generalization
B = gss.polviews
B.value_counts().sort_index()

polviews
1.0     1442
2.0     5808
3.0     6243
4.0    18943
5.0     7940
6.0     7319
7.0     1595
Name: count, dtype: int64

In [None]:
i = 4
prob(B==i) * conditional(banker, given=B==i) #probability of moderate banker

np.float64(0.005822682085615744)

In [None]:
# probability of banker, by using total probability of political views
sum(prob(B ==i) * conditional(banker, given=B==i) for i in range(1, 8)) # matches from above when using male/female

np.float64(0.014769730168391157)