# 概率相关定理

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
df = df.dropna()
df.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE
5,Adelie,Torgersen,39.3,20.6,190.0,3650.0,MALE


In [3]:
df.shape

(333, 7)

In [4]:
set(df.species), set(df.island), set(df.sex)

({'Adelie', 'Chinstrap', 'Gentoo'},
 {'Biscoe', 'Dream', 'Torgersen'},
 {'FEMALE', 'MALE'})

In [5]:
## Fraction of Adelie species
adelie = (df['species'] == 'Adelie')
adelie.head()

0    True
1    True
2    True
4    True
5    True
Name: species, dtype: bool

In [6]:
adelie.sum()

146

In [7]:
# This is the fraction of True values in the Series
# Therefore, it is the fractionn of adelie species penguins
adelie.mean()

0.43843843843843844

In [8]:
def prob(A):
    '''probability of A'''
    '''Input: a series of True and False values'''
    return A.mean()

In [9]:
prob(adelie)

0.43843843843843844

## Conjunction

A & B

In [10]:
## Adelie and Female
female = (df['sex'] == 'FEMALE')
female.head()

0    False
1     True
2     True
4     True
5    False
Name: sex, dtype: bool

In [11]:
prob(female)

0.4954954954954955

In [12]:
prob(adelie & female)

0.21921921921921922

In [13]:
prob(female & adelie)

0.21921921921921922

In [14]:
# The two results are different because being adelie and being female are not independent
prob(adelie) * prob(female)

0.21724427129832535

## Conditional probability
What is the probability that a penguins is female, given that it is of the species of Adelie?

In [15]:
female[adelie].head()

0    False
1     True
2     True
4     True
5    False
Name: sex, dtype: bool

In [16]:
adelie[female].head()

1     True
2     True
4     True
6     True
12    True
Name: species, dtype: bool

In [17]:
# number of Adelie
adelie_df = df[df.species == 'Adelie']
adelie_df.shape

(146, 7)

In [18]:
## female in adelie
female_adelie = adelie_df[adelie_df.sex == 'FEMALE']
female_adelie.shape

(73, 7)

In [19]:
female_adelie.shape[0] / adelie_df.shape[0]

0.5

In [20]:
prob(female[adelie])

0.5

In [21]:
prob(adelie[female])

0.44242424242424244

So we can know that probability of A, given B can be computated as `prob(A[B])`

## Some laws

$$P(A|B) = \frac{P(A \& B)}{P(B)}$$

In [22]:
prob(female[adelie])

0.5

In [23]:
prob(adelie)

0.43843843843843844

In [24]:
prob(female & adelie)

0.21921921921921922

In [25]:
prob(female[adelie]) == prob(female & adelie)/prob(adelie)

True

So we know that 

$$P(A \& B) = P(B) P(A|B)$$

So

$$P(B \& A) = P(A) P(B|A)$$

Because 

$$P(A \& B) = P(B \& A)$$

So:

$$\begin{aligned}  P(A|B) &= \frac{P(A \& B)}{P(B)} \\ &= \frac{P(B \& A)}{P(B)} \\ &= \frac{P(A) P(B|A)}{P(B)}\end{aligned}$$