In [2]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import sklearn
from mpl_toolkits.mplot3d import Axes3D
from scipy.stats import gaussian_kde
from scipy.stats import norm
from scipy.stats import multivariate_normal

import tensorflow as tf
from keras.models import Model, Sequential
from keras.layers import Input, Dense, Layer, Dropout, BatchNormalization, LeakyReLU, Lambda
from keras.losses import Loss, mse, MeanSquaredError
from keras.optimizers import Optimizer, Adam
from keras.metrics import Mean
from keras.utils import to_categorical, plot_model, load_img, img_to_array
from keras.callbacks import EarlyStopping
from keras.models import load_model

import tensorflow_probability as tfp
from scipy.stats import gamma

np.random.seed(1234)
tf.random.set_seed(1234)

# **Probabilities**

**Features**

- caseid: Respondent id (which is the index of the table).
- year: Year when the respondent was surveyed.
- age: Respondent’s age when surveyed.
- sex: Male or female.
- polviews: Political views on a range from liberal to conservative.
- partyid: Political party affiliation: Democratic, Republican, or independent.
- indus10: Code for the industry the respondent works in.

In [87]:
# General Social Survey data
gss = pd.read_csv('Data/gss_bayes.csv')
gss.head(2)

Unnamed: 0,caseid,year,age,sex,polviews,partyid,indus10
0,1,1974,21.0,1,4.0,2.0,4970.0
1,2,1974,41.0,1,5.0,0.0,9160.0


## **Ordinary Probability**

In ordinary probability theory, we consider events and their associated probabilities, which describe the certainty or possibility of these events occurring in a random experiment.

A random experiment is a procedure or process that yields an outcome from a given sample space, where the sample space is the set of all possible outcomes. Each outcome is considered an elementary event, and a collection of elementary events forms an event. The probability of an event is a measure of the certainty that the event will occur, with this measure being a number between 0 and 1.

A fundamental concept in probability theory is the law of large numbers, which asserts that as the number of trials in a random experiment increases indefinitely, the relative frequency of any given event will converge to the true probability of that event. Mathematically, this can be expressed as:


$$\lim_{N \to \infty} \frac{N(A)}{N} = P(A)$$

where $N$ is the total number of trials, $N(A)$ is the number of times event $A$ occurs, and $P(A)$ is the true probability of event $A$. This law estimate probabilities through the relative frequencies observed in empirical settings.

Let's consider an example to illustrate the theory:

**Exemple**

What is the probability of choosing at random a banking worker? The code for “Banking and related activities” is 6870.

In [35]:
def prob(A):
    """Computes the probability of a proposition, A."""
    N = len(A)
    return A.sum()/N # or A.mean()

In [86]:
# Banking and related activities
banker = (gss['indus10'] == 6870)

print("P(banker) = ", prob(banker))

P(banker) =  0.014769730168391155


We can se that about $1.5\%$ of the respondents work in banking, so if we choose a random person from the dataset, the probability they are a banker is about $1.5\%$.

## **Joint (Conjunction) Probability**

Joint probability refers to the probability of two or more events happening at the same time. For two events, $A$ and $B$, the joint probability is mathematically represented as:

$$
P(A \cap B) = P(A \text{ and } B)
$$

In practical terms, it answers questions of the kind: "What is the probability that event $A$ occurs while event $B$ also occurs?" 

Consider a random experiment with a finite number of outcomes with each having equal probability of occurrence. Let $N$ denote the total number of trials in the experiment. Suppose $N(A)$ and $N(B)$ represent the number of times events $A$ and $B$ occur, respectively, and $N(A \cap B)$ represents the number of times both events $A$ and $B$ happen together ( Intersection set). In this scenario, the joint probability can be expressed as:

$$
P(A \cap B) = \frac{N(A \cap B)}{N}
$$

If we are considering more than two events, say $A$, $B$, and $C$, the joint probability can be generalized further as:

$$
P(A \cap B \cap C) = P(A \text{ and } B \text{ and } C)
$$

This concept is extendable to any number of events, providing a robust tool to evaluate the probability of several interconnected events occurring simultaneously.

**Properties**

- If events $A$ and $B$ are independent, the joint probability can be derived from the individual probabilities of the events:
  $
  P(A \cap B) = P(A)P(B)
  $
  
- If events are mutually exclusive, it implies that they cannot occur at the same time, hence the joint probability is zero:
  $
  P(A \cap B) = 0
  $

**Exemple**

What is the probability that a respondent is a banker and a Democrat? Where the `partyid` are encoded as:

- 0 Strong democrat
- 1 Not strong democrat
- 2 Independent, near democrat
- 3 Independent
- 4 Independent, near republican
- 5 Not strong republican
- 6 Strong republican
- 7 Other party

As we should expect, `prob(banker & democrat)`= 0.47% is less than `prob(banker)` = 1.5 %, because
not all bankers are Democrats.

In [85]:
# Banking and related activities
banker = (gss['indus10'] == 6870)

# Banking and related activities
democrat = (gss['partyid'] <= 1)

print("P(democrat and banker) =", prob(banker & democrat))

P(democrat and banker) = 0.004686548995739501


## **Conditional Probability**

Conditional probability helps us understand the relationship between two events, $A$ and $B$. In particular, it describes the probability of event $A$ given that event $B$ has already occurred. The conditional probability of event $A$ given event $B$ is defined as:

$$P(A|B) = \frac{P(A \cap B)}{P(B)}.$$

Here, $P(A \cap B)$ represents the joint probability, which signifies the chance of two or more events occurring simultaneously. For events $A$ and $B$, the joint probability is denoted by $P(A \cap B)$, indicating the probability of both events $A$ and $B$ happening together.

Suppose we conduct a random experiment with a finite number of outcomes, all having equal probability. Let $N$ represent the total number of trials, $N(B)$ be the number of trials resulting in event $B$, and $N(A \cap B)$ be the number of trials where both $A$ and $B$ occur together. In this case, the probability of $B$ and the joint probability $A \cap B$ can be expressed as:

$$P(B) = \frac{N(B)}{N}, ~~~~~~P(A \cap B) = \frac{N(A \cap B)}{N},$$

Using these expressions, we can rewrite the conditional probability of $A$ given $B$ as:

$$P(A|B) = \frac{N(A \cap B)}{N(B)}.$$

This formulation helps clarify the meaning of conditional probability. Essentially, it calculates the proportion of trials where both $A$ and $B$ occur (joint probability) out of the trials where event $B$ occurs.


- If $A$ and $B$ are mutually exclusive events (A and B cannot occur at the same time), then $A \cap B = \emptyset$ and $P(A|B) = 0$.
- If $A$ implies in  $B$ ($B$ is subset of $A$: $A \subset B$ ), then $P(A|B) = 1$.

**Example**

What is the probability that a respondent is a Democrat, given that they are liberal? where `polviews` are encoded as:


- 1 Extremely liberal
- 2 Liberal
- 3 Slightly liberal
- 4 Moderate
- 5 Slightly conservative
- 6 Conservative
- 7 Extremely conservative


We can compute this probability in two steps:

1. Select all respondents who are liberal.
2. Compute the fraction of the selected respondents who are Democrats.

In [44]:
# Propositions
# Identify liberal respondents
liberal = (gss['polviews'] <= 3)
# Identify bankers by industry code
banker = (gss['indus10'] == 6870)
# Identify Democrats and strong Democrats
democrat = (gss['partyid'] <= 1)

# Intersection of the two sets
liberal_democrats = democrat[liberal]

This essentially gives the intersection of the two sets of conditions, yielding a series of booleans that are True only where someone is both liberal and a democrat.

In [59]:
def conditional(proposition, given):
    prob = lambda A: A.mean() # ordinary probability
    return prob(proposition[given])

In [84]:
# Using Boolean series
print("p(liberal|democrat) = ", prob(liberal_democrats))
print("p(liberal|democrat) = ", conditional(democrat, given = liberal))

# Using formula for conditional probability
print("p(liberal|democrat) = ",prob(liberal & democrat)/prob(liberal))

p(liberal|democrat) =  0.5206403320240125
p(liberal|democrat) =  0.5206403320240125
p(liberal|democrat) =  0.5206403320240124


As we can observe, ordinary probability considers the probability of an event occurring over the entire sample space . In contrast, conditional probability focuses on the probability of an event occurring within a reduced sample space, confined by the conditions or information we have at hand. Essentially, conditional probability allows us to update the probability of an event by incorporating the new evidence or conditions specified. We also can combine the conditional and joint probabilities as follows:

In [54]:
print("p(A|B and C) = ", conditional(democrat, given = liberal & banker))
print("p(B and C|A) = ", conditional(liberal & banker, given = democrat))

p(A|B and C) =  0.48466257668711654
p(B and C|A) =  0.004376003988256799


## **Addition Law for Probability**

Consider two mutually exclusive events $A_1$ and $A_2$ associated with the outcomes of a random experiment, and let $A = A_1 \bigcup A_2$ be the union of the two events. If events $A_1$ and $A_2$ are mutually exclusive, by definition, they cannot occur simultaneously. This implies that the intersection of these two events must be $A_1 \bigcap A_2 = \emptyset$. If $A$ occurs in a trial, it means that either event $A_1$ has occurred, or event $A_2$ has occurred, but not both since $A_1$ and $A_2$ are mutually exclusive.

The union of the events $A_1 \bigcup A_2$ includes all the outcomes of both events, without any overlap since they are mutually exclusive. So, when counting the number of outcomes in $A$, we are essentially counting the number of outcomes in $A_1$ and $A_2$ separately and then adding them together. Therefore, we can write:

$$
\frac{N(A)}{N} = \frac{N(A_1)}{N} + \frac{N(A_2)}{N}
$$

where $N$ is the total number of trials in the experiment, and $N(A)$, $N(A_1)$, $N(A_2)$ are the total number of trials leading to events $A$, $A_1$, and $A_2$, respectively.

For a sufficiently large number of trials $N$, the relative frequencies $\frac{N(A)}{N}$, $\frac{N(A_1)}{N}$, $\frac{N(A_2)}{N}$ will coincide with the corresponding probabilities $P(A)$, $P(A_1)$, $P(A_2)$. We get:

$$
P(A) = P(A_1) + P(A_2)
$$

Similarly, if events $A_1$, $A_2$, and $A_3$ are mutually exclusive, it means that no two of these events can occur simultaneously. This implies that their pairwise intersections are empty: $A_1 \bigcap A_2 = \emptyset$, $A_2 \bigcap A_3 = \emptyset$, and $A_1 \bigcap A_3 = \emptyset$. As a result, the union of events $A_1$ and $A_2$ is also mutually exclusive with $A_3$. For the probability, this can be expressed as:

$$
P(A_1 \bigcup A_2 \bigcup A_3) = P(A_1 \bigcup A_2) + P(A_3)
$$

Since $A_1$ and $A_2$ are mutually exclusive, we get:

$$
P(A_1 \bigcup A_2 \bigcup A_3) =  P(A_1) + P(A_2) + P(A_3)
$$

More generally, given $N$ mutually exclusive events $A_1$, $A_2$, $\cdots$, $A_n$, we have the formula:

$$
P(\bigcup_{k=1}^{n}A_{k}) = \sum_{k=1}^{n} P(A_{k}) 
$$

**Example**

We are interested in finding the probability that a randomly selected respondent is **either a liberal Democrat or a conservative Republican**. To do this, we will identify and use two mutually exclusive events based on the respondents' political party affiliations (`partyid`) and views (`polviews`).

for the liberal and conservative party affiliations the code are :

- 1 Extremely liberal
- 2 Liberal
- 3 Slightly liberal
- 5 Slightly conservative
- 6 Conservative
- 7 Extremely conservative

For the political view of democrat and republican we have:

- 0 Strong democrat
- 1 Not strong democrat
- 2 Independent, near democrat
- 4 Independent, near republican
- 5 Not strong republican
- 6 Strong republican





In [91]:
# Political view
liberal = (gss['polviews'] <= 3)
conservative = (gss['polviews'] >= 5)

# Party affiliations
democrat = (gss['partyid'] <= 1)
republican = (gss['partyid'] >= 4)

# Intersections of partyid and polviews
liberal_democrat = liberal & democrat
conservative_republican = conservative & republican

# Intersection of mutually exclusive set of events --> Empty set
empty_set = (liberal_democrat & conservative_republican).sum()
print( "liberal democrat Intersection conservative republican = ", empty_set)

liberal democrat Intersection conservative republican =  0


The probability of randomly selecting either a liberal democrat or a conservative republican is given by:

In [83]:
# Total probability of either a liberal democrat or a conservative republican
p = prob(liberal_democrat) + prob(conservative_republican)
print('Total Probability =', p)

Total Probability = 0.3445120714140799


### **Addition Law for Conditional Probability**

If $A_1, \cdots, A_n$ are mutually exclusive events, with union $A = \bigcup_{k=1}^{n}A_{k}$, then the addition law for conditional probability is
  $P(A|B) = \sum_k P(A_k|B)$.

$\textbf{proof}:$
  
$$A \cap B = \bigcup_{k=1}^{n}(A_{k}\cap B) $$
that gives the union of each intersection of $A_k \cap B$ for each $k = 1,2,...n$. By the **addition law** : 

$$P(\bigcup_{k=1}^{n}(A_{k}\cap B)) = \sum_{k=1}^{n} P(A_{k}\cap B) $$

and dividing  by $P(B)$ we get:

$$\frac{P(\bigcup_{k=1}^{n}(A_{k}\cap B))}{P(B)} = \sum_{k=1}^{n} \frac{P(A_{k}\cap B)}{P(B)} $$

then
$$P(A|B) = \sum_{k=1}^{n} P(A_k|B) ~~~~\square$$

**Example**

In a survey, respondents were asked about their age and their political party affiliation. To conduct this analysis, we focus on two age groups: those younger than 30 years old, and those between 30 and 40 years old. What is the probability that a respondent falls into one of two age categories, given that they identify as a Democrat?


Here, the union of two mutually exclusive events $A_1$ and $A_2$ to create a new event that we can call "age group". This "age group" event $A$ encompasses all the possible outcomes for a respondent's age being less than 30 or between 30 and 40, and it can be formally defined as

$$
A = A_1 \cup A_2
$$

where
- $A_1$: The event where the respondent is younger than 30 years old
- $A_2$: The event where the respondent is between 30 and 40 years old

utilizing the addition law for conditional probability, where $B$ is the event that the respondent identifies as a Democrat we can get the following result:

In [98]:
def conditional(proposition, given):
    prob = lambda A: A.mean() # ordinary probability
    return prob(proposition[given])


democrat = (gss['partyid'] <= 1)
less_30 = (gss['age'] < 30)
between_30_40 = (gss['age'] >= 30) & (gss['age'] <= 40)

# Mutually exclusive events
print("Intersection age sets = {} \n".format( (less_30 & between_30_40).sum()) )

# Calculate conditional probabilities
p = conditional(less_30, given = democrat) + conditional(between_30_40, given = democrat)
print("P(ages group|democrat)= ", p)

Intersection age sets = 0 

P(ages group|democrat)=  0.39638841189829943


## **Law of Total Probability**

Suppose we have a complete set of mutually exclusive and exhaustive events $B_1, \cdots, B_n$, meaning only one of these events can occur at a time, and their union covers the entire sample space. We can find the ordinary probability of event $A$ occurring using the total probability formula:

$$P(A) = \sum_k P(A|B_k)P(B_k)$$

To prove this, first let's better define the concept of mutually exclusive and exhaustive events. Mutually exclusive events, as we see before, are events that cannot occur simultaneously. On the other hand, exhaustive events are events that, when considered together, cover the entire sample space $\Omega$. When a set of events is both mutually exclusive and exhaustive, it means that they cover all possible outcomes without overlapping.

To better understand mutually exclusive and exhaustive events, consider the example of rolling a fair six-sided die. The sample space for this experiment is the set of outcomes $\{1, 2, 3, 4, 5, 6\}$. We can define the events as follows:

- Event $A_1$: The die shows an odd number (outcomes: $\{1, 3, 5\}$)
- Event $A_2$: The die shows an even number (outcomes: $\{2, 4, 6\}$)

The events $A_1$ and $A_2$ are mutually exclusive, because no outcome can be both odd and even simultaneously, meaning the intersection of $A_1$ and $A_2$ is an empty set. Furthermore, these events are exhaustive, because together they cover the entire sample space (every possible outcome is either odd or even). Thus, the set {$A_1$, $A_2$} is both mutually exclusive and exhaustive.

$\textbf{proof}:$


Consider $\Omega$ as the sample space. If $B_1, \cdots, B_n$ are all the possible mutually exclusive and exhaustive events. If their union covers the entire sample space, then:

$$\bigcup_k B_k= \Omega$$

Consequently, event $A$ can be expressed as the union of its intersections with each of the mutually exclusive events $B_k$:

$$A = \bigcup_k (A \cap B_k)$$

Since the events $A \cap B_k$ are mutually exclusive, we can apply the addition rule of probability:
$$P(A) = P(\bigcup_k (A \cap B_k) = \sum_k P(A \cap B_k)$$

Now we can rewrite the probability of the intersection using conditional probability:

$$P(A) = P(\bigcup_k (A \cap B_k)) =  \sum_k P(A \cap B_k) = \sum_k \frac{P(A \cap B_k)}{P(B_k)} P(B_k) = \sum_k  P(A |B_k)P(B_k)~~~~ \square$$

This proof demonstrates how we can find the probability of an event $A$ occurring by considering its relationship with a set of mutually exclusive and exhaustive events $B_1, \cdots, B_n$.

**Example**

Compute the probability that a respondent is a banker using the law of total probability.

Utilizing the ordinary probability, we get:

$$P(\text{banker}) = \frac{N(\text{banker})}{N}$$

In [101]:
banker = (gss['indus10'] == 6870)

print("P(banker) = ", prob(banker))

P(banker) =  0.014769730168391155


For applying the law of total probability, we need a mutually exclusive and exhaustive set partitioning the sample space. Here, the 'male' and 'female' categories fulfill this requirement, allowing us to express the probability as a summation:

$$P(\text{banker}) = P(\text{banker}|\text{male})P(\text{male}) + P(\text{banker}|\text{female})P(\text{female}) $$

In [103]:
male = (gss['sex'] == 1)
female = (gss['sex'] == 2)

p = conditional(banker, given= male)*prob(male) + conditional(banker, given = female)*prob(female)
print("P(banker) = ", p)

P(banker) =  0.014769730168391153


We can further extend our analysis by partitioning the sample space into more refined categories using the `polviews` feature, which represents the political views of the respondents. This feature contains seven distinct values, each corresponding to a specific political viewpoint.

To implement this, we will compute the total probability of a respondent being a banker by summing over all the conditional probabilities of being a banker given each political view, weighted by the probability of that political view:

$$P(\text{banker}) = \sum^{7}_{i=1}  P(\text{banker} |\text{polviews} = i)P(\text{polviews} = i)$$

In [104]:
polviews = gss['polviews']

polviews.value_counts().sort_index()

polviews
1.0     1442
2.0     5808
3.0     6243
4.0    18943
5.0     7940
6.0     7319
7.0     1595
Name: count, dtype: int64

In [107]:
p = sum( prob(polviews == i)*conditional(banker, given = (polviews == i))
    for i in range(1, 8)) 

print("P(banker) = ", p)

P(banker) =  0.014769730168391157


## **Bayes Rule**



Given two events A and B, Bayes' theorem relates the conditional probability of A given B ($P(A|B)$) to the conditional probability of B given A ($P(B|A)$), along with the individual probabilities of A ($P(A)$) and B ($P(B)$):

​$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} = \frac{P(B|A) \cdot P(A)}{\sum_k P(B|A_k)P(A_k)} $$

where:

- $P(A|B)$: is the **posterior probability**. It represents the probability of event A happening, given that event B has occurred. This is what we're trying to find using Bayes' theorem. It reflects our updated belief about event A after taking into account the new information provided by event B.

- $P(B|A)$: is the **likelihood**. It represents the probability of event B happening, given that event A has occurred. This is often a known value or can be estimated from available data. It tells us how likely it is to observe event B when event A is true.

- $P(A)$: is the **prior probability**. It represents the probability of event A happening before taking into account any new information from event B. This is our initial belief about event A and can be based on previous data, expert opinion, or assumptions.

- $P(B)$: is the **marginal probability or evidence**. It represents the overall probability of event B happening, regardless of whether event A happens or not. This value can be calculated using the law of total probability, which takes into account both the probabilities of event A and its complement, event $\neg A$ (i.e., event A not happening). Specifically, if you have a finite set of mutually exclusive and exhaustive events $A_1, \cdots, A_n$, then the probability of event B can be expressed as:
  $$P(B) = \sum_k P(B|A_k)P(A_k)$$
  - **Mutually exclusive:** The events $A_k$ do not occur simultaneously. For any pair of events $A_i$ and $A_j$, if $i \neq j$, then $P(A_i \cap A_j) = 0$.
  - **Exhaustive:** The union of all events $A_k$ covers the entire sample space, meaning that at least one of the events $A_k$ must occur. Mathematically, $\bigcup_k A_k = \Omega$, where $\Omega$ is the sample space.

The main idea behind Bayes' theorem is to update our belief about event A (the prior probability) using the new information provided by event B (the likelihood). The result is the posterior probability, which reflects our revised belief about event A after considering the occurrence of event B. It's important to note that the Bayes' rule itself doesn't make any assumptions about the relationship between A and B; it's a general formula applicable to any pair of events.

$\textbf{proof}:$

For event A given event B, the conditional probability is defined as:

$$P(A|B) = \frac{P(A \bigcap B)}{P(B)}$$

Likewise, the conditional probability of event B given event A is expressed as:
$$P(B|A) = \frac{P(A \bigcap B)}{P(A)}$$

Our objective is to derive Bayes' theorem, which connects $P(A|B)$ and $P(B|A)$. To achieve this, we first isolate $P(A \cap B)$ in the second equation::
$$P(A \bigcap B) = P(B|A) \cdot P(A)$$

Next, substitute this expression for $P(A \cap B)$ into the first equation:
$$P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}~~~~\square$$

This is Bayes' theorem. It links the conditional probability of A given B with the conditional probability of B given A, factoring in the individual probabilities of A and B. The product $P(B|A)P(A)$ represents the joint probability of events A and B occurring, which is $P(A \cap B)$. The connection between this product and the total probability formula shows how the probability of A given B is related to the joint probability of A and B, as well as the probabilities of A and B given different events in the set $\{A_1, \cdots, A_n\}$