In [1]:
#Import important libraries
import pandas as pd

gss = pd.read_csv('GSS1974.csv')
gss_df = gss[['YEAR','AGE','AGE','SEX','POLVIEWS','PARTYID', 'INDUS10']].iloc[1: , :].astype(int)
gss_df.columns = ['YEAR','AGE','AGE1','SEX','POLVIEWS','PARTYID', 'INDUS10']

#create probability function
def prob(A):
    """Computes the probability of a proposition, A"""
    return A.mean()

#Creating a function
def conditional(proposition, given):
    return prob(proposition[given])

#Create the df and series
banker = gss_df['INDUS10'] == 7860
banker.sum()

#do the same thing with sex
female = (gss_df['SEX'] == 2)

#once more with political party
liberal = (gss_df['POLVIEWS'] <= 3)
democrat = (gss_df['POLVIEWS'] <= 1)


# Laws of Probability

## Theorems

Theorem 1: Using a conjunction to compute a conditional probability

Theorem 2: Using a conditional probability to compute a conjunction

Theorem 3: Using conditional(A,B) to compute conditional(B,A) (BAYES THEOREM) 

## Theorems with Mathematical Notation

Theorem 1: P(A) is the probability of proposition A.

Theorem 2: P(A and B) is the probability of the conjunction of A and B, that's is, the probability that both are true.

Theorem 3: P(A|B) is the conditional probability of A given that B is true. The vertical line between A and B is pronounced "given".

# Theorem 1

## P(A)
Using conjunction to compute a conditional probability - The probability of proposition A

### What fraction of bankers are female?

In [2]:
#Use the bracket operator to select the bankers
#then use mean to compute the fraction of bankers who are female
female[banker].mean()

0.7217391304347827

In [3]:
conditional(female, given=banker)

0.7217391304347827

Compute this conditional probability by computing the ration of two probabilities 
1. the fraction of respondents who are female bankers, and
2. the fraction of respondents who are bankers

In [4]:
prob(female & banker) / prob(banker)

0.7217391304347827

In [5]:
#            P(A and B)
# P(A|B) = ---------------
#               P(B)

# Theorem 2

If we start with theorem 1 and multiply both sides by P(B), we get theorem 2

This formula suggests a second way to compute a conjunction: instead of using the & operator, we can compute the product of two probabilities

In [6]:
prob(liberal & democrat)

0.014824797843665768

In [7]:
prob(democrat ) * conditional(liberal, democrat)

0.014824797843665768

# Theorem 3

We have established that conjunction is commutative. In math notation, that means...

apply theorem two to both sides, we have...

If we divide by P(B) we get... 

In [8]:
#           P(A)P(B|A)
# P(A|B) = ------------- = Baye's theorem
#              P(B)

The fraction of bankers who are liberal

In [9]:
conditional(liberal, given=banker)

0.2956521739130435

In [10]:
#Bayes Theorem

prob(liberal) * conditional(banker, liberal) / prob(banker)

0.2956521739130435

# Law of Total Probability

Bayesian Statistics: Law of total probability expressed in mathematical notation

In [11]:
# P(A) = P(B1 and A) + P(B2 and A)

Total probability of A is t he sum of two possibilities either B1 and A are true or B2 and A are true.

This is true if...

- Mutually Exclusive, means that only one of them can be true and...
- Collectively Exhaustive, which means that one of them must be true

## example

In [12]:
# probability that a respondent is a banker

prob(banker)

0.07749326145552561

In [13]:
# create the male respondent variable
male = (gss_df['SEX'] == 1)

### Total probability of banker

In [14]:
prob(male & banker) + prob(female & banker)

0.07749326145552561

In [15]:
gss_df['SEX'].value_counts()

2    793
1    691
Name: SEX, dtype: int64

In [16]:
#Theorem 2
# P(A) = P(B1)P(A|B1) + P(B2)P(A|B2)

In [17]:
(prob(male) * conditional(banker, given=male) + prob(female) * conditional(banker, given=female))

0.0774932614555256

In [18]:
# P(A) = ΣP(B)P(A|B)

In [19]:
B = gss_df['POLVIEWS']
B.value_counts().sort_index()

1     22
2    201
3    207
4    564
5    221
6    160
7     35
8     70
9      4
Name: POLVIEWS, dtype: int64

In [20]:
#Looking at moderates
i=4
prob(B==i) * conditional(banker, B==1)

0.03455035530507229

In [21]:
#compute summation

sum(prob(B==i) * conditional(banker, B==i)
    for i in range(1, 9))

0.0774932614555256

# Exercises

The probability that Linda is a female banker,

The probability that Linda is a liberal female banker, and

The probability that Linda is a liberal female banker and a Democrat.

In [22]:
prob(female & banker)

0.05592991913746631

In [23]:
prob(liberal & female & banker)

0.01752021563342318

In [24]:
prob(liberal & female & banker & democrat)

0.0013477088948787063

Exercise: Use conditional to compute the following probabilities:

What is the probability that a respondent is liberal, given that they are a Democrat?

What is the probability that a respondent is a Democrat, given that they are liberal

In [25]:
conditional(liberal, given=democrat)

1.0

In [26]:
conditional(democrat, given=liberal)

0.05116279069767442

Use prob and conditional to compute the following probabilities.

What is the probability that a randomly chosen respondent is a young liberal?

What is the probability that a young person is liberal?

What fraction of respondents are old conservatives?

What fraction of conservatives are old?

For each statement, think about whether it is expressing a conjunction, a conditional probability, or both.

In [27]:
# young = gss_df['AGE'] < 30 accidently made two age columns
young = (gss_df['AGE'] <= 29)
prob(young)

0.2560646900269542

In [28]:
old = (gss_df['AGE'] >= 65)
prob(old)

0.1738544474393531

In [29]:
conservative = (gss_df['POLVIEWS'] >= 5)
prob(conservative)

0.330188679245283

Answers to questions

In [30]:
prob(young & liberal)

0.10849056603773585

In [31]:
conditional(liberal, given=young)

0.4236842105263158

In [32]:
prob(old & conservative)

0.06940700808625337

In [33]:
conditional(old, given=conservative)

0.21020408163265306

In [None]:
 #Testing