## 贝叶斯网络Student模型
一个学生拥有成绩、课程难度、智力、SAT得分、推荐信等变量。通过一张有向无环图可以把这些变量的关系表示出来，可以想象成绩由课程难度和智力决定，SAT成绩由智力决定，而推荐信由成绩决定。该模型对应的概率图如下： 

![TIM%E6%88%AA%E5%9B%BE20181025143651.png](attachment:TIM%E6%88%AA%E5%9B%BE20181025143651.png)

In [None]:
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD

# 通过边来定义贝叶斯模型
model = BayesianModel([('D', 'G'), ('I', 'G'), ('G', 'L'), ('I', 'S')])

# 定义条件概率分布
cpd_d = TabularCPD(variable='D', variable_card=2, values=[[0.6, 0.4]])
cpd_i = TabularCPD(variable='I', variable_card=2, values=[[0.7, 0.3]])

# variable：变量
# variable_card：基数
# values：变量值
# evidence：
cpd_g = TabularCPD(variable='G', variable_card=3, 
                   values=[[0.3, 0.05, 0.9,  0.5],
                           [0.4, 0.25, 0.08, 0.3],
                           [0.3, 0.7,  0.02, 0.2]],
                  evidence=['I', 'D'],
                  evidence_card=[2, 2])

cpd_l = TabularCPD(variable='L', variable_card=2, 
                   values=[[0.1, 0.4, 0.99],
                           [0.9, 0.6, 0.01]],
                   evidence=['G'],
                   evidence_card=[3])

cpd_s = TabularCPD(variable='S', variable_card=2,
                   values=[[0.95, 0.2],
                           [0.05, 0.8]],
                   evidence=['I'],
                   evidence_card=[2])

# 将有向无环图与条件概率分布表关联
model.add_cpds(cpd_d, cpd_i, cpd_g, cpd_l, cpd_s)

# 验证模型：检查网络结构和CPD，并验证CPD是否正确定义和总和为1
model.check_model()


# 概率图模型
model.get_cpds()

# 结点G的概率表
print(model.get_cpds('G'))

# 获取整个贝叶斯网络的局部依赖
model.local_independencies(['D', 'I', 'S', 'G', 'L'])

In [2]:
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD

In [3]:
# 通过边来定义贝叶斯模型
model = BayesianModel([('D', 'G'), ('I', 'G'), ('G', 'L'), ('I', 'S')])

# 定义条件概率分布
cpd_d = TabularCPD(variable='D', variable_card=2, values=[[0.6, 0.4]])
cpd_i = TabularCPD(variable='I', variable_card=2, values=[[0.7, 0.3]])

# variable：变量
# variable_card：基数
# values：变量值
# evidence：
cpd_g = TabularCPD(variable='G', variable_card=3, 
                   values=[[0.3, 0.05, 0.9,  0.5],
                           [0.4, 0.25, 0.08, 0.3],
                           [0.3, 0.7,  0.02, 0.2]],
                  evidence=['I', 'D'],
                  evidence_card=[2, 2])

cpd_l = TabularCPD(variable='L', variable_card=2, 
                   values=[[0.1, 0.4, 0.99],
                           [0.9, 0.6, 0.01]],
                   evidence=['G'],
                   evidence_card=[3])

cpd_s = TabularCPD(variable='S', variable_card=2,
                   values=[[0.95, 0.2],
                           [0.05, 0.8]],
                   evidence=['I'],
                   evidence_card=[2])


In [4]:
# 将有向无环图与条件概率分布表关联
model.add_cpds(cpd_d, cpd_i, cpd_g, cpd_l, cpd_s)

In [5]:
# 验证模型：检查网络结构和CPD，并验证CPD是否正确定义和总和为1
model.check_model()

True

In [6]:
# 概率图模型
model.get_cpds()

[<TabularCPD representing P(D:2) at 0x23cf3a4b470>,
 <TabularCPD representing P(I:2) at 0x23cf3a4b518>,
 <TabularCPD representing P(G:3 | I:2, D:2) at 0x23cf3a4b4e0>,
 <TabularCPD representing P(L:2 | G:3) at 0x23cf1a1d0b8>,
 <TabularCPD representing P(S:2 | I:2) at 0x23cf3a4b438>]

In [7]:
# 结点G的概率表
print(model.get_cpds('G'))

╒═════╤═════╤══════╤══════╤═════╕
│ I   │ I_0 │ I_0  │ I_1  │ I_1 │
├─────┼─────┼──────┼──────┼─────┤
│ D   │ D_0 │ D_1  │ D_0  │ D_1 │
├─────┼─────┼──────┼──────┼─────┤
│ G_0 │ 0.3 │ 0.05 │ 0.9  │ 0.5 │
├─────┼─────┼──────┼──────┼─────┤
│ G_1 │ 0.4 │ 0.25 │ 0.08 │ 0.3 │
├─────┼─────┼──────┼──────┼─────┤
│ G_2 │ 0.3 │ 0.7  │ 0.02 │ 0.2 │
╘═════╧═════╧══════╧══════╧═════╛


In [8]:
# 获取整个贝叶斯网络的局部依赖
model.local_independencies(['D', 'I', 'S', 'G', 'L'])

(D _|_ S, I, G, L)
(I _|_ S, D, G, L)
(S _|_ D, G, L | I)
(G _|_ S, L | D, I)
(L _|_ S, D, I | G)

In [9]:
# G的概率，需要边缘化其他所给的参数
from pgmpy.inference import VariableElimination
infer = VariableElimination(model)
print(infer.query(['G']) ['G'])


╒═════╤══════════╕
│ G   │   phi(G) │
╞═════╪══════════╡
│ G_0 │   0.3620 │
├─────┼──────────┤
│ G_1 │   0.2884 │
├─────┼──────────┤
│ G_2 │   0.3496 │
╘═════╧══════════╛


In [10]:
# 一个聪明的学生、他的一门比较难的课程及其SAT成绩较高的情况下，最终成绩G的概率分布
print(infer.query(['G'], evidence={'D': 0, 'I': 1}) ['G'])

╒═════╤══════════╕
│ G   │   phi(G) │
╞═════╪══════════╡
│ G_0 │   0.9000 │
├─────┼──────────┤
│ G_1 │   0.0800 │
├─────┼──────────┤
│ G_2 │   0.0200 │
╘═════╧══════════╛


In [11]:
print(model.get_cpds('L'))

╒═════╤═════╤═════╤══════╕
│ G   │ G_0 │ G_1 │ G_2  │
├─────┼─────┼─────┼──────┤
│ L_0 │ 0.1 │ 0.4 │ 0.99 │
├─────┼─────┼─────┼──────┤
│ L_1 │ 0.9 │ 0.6 │ 0.01 │
╘═════╧═════╧═════╧══════╛


In [12]:
print(infer.query(['L']) ['L'])

╒═════╤══════════╕
│ L   │   phi(L) │
╞═════╪══════════╡
│ L_0 │   0.4977 │
├─────┼──────────┤
│ L_1 │   0.5023 │
╘═════╧══════════╛


In [15]:
print(infer.query(['L'], evidence={'D': 0, 'I': 1}) ['L'])

╒═════╤══════════╕
│ L   │   phi(L) │
╞═════╪══════════╡
│ L_0 │   0.1418 │
├─────┼──────────┤
│ L_1 │   0.8582 │
╘═════╧══════════╛


In [16]:
print(infer.query(['L'], evidence={'D': 0, 'I': 1,'G':0}) ['L'])

╒═════╤══════════╕
│ L   │   phi(L) │
╞═════╪══════════╡
│ L_0 │   0.1000 │
├─────┼──────────┤
│ L_1 │   0.9000 │
╘═════╧══════════╛


In [17]:
print(infer.query(['L'], evidence={'D': 0, 'I': 1,'G':1}) ['L'])

╒═════╤══════════╕
│ L   │   phi(L) │
╞═════╪══════════╡
│ L_0 │   0.4000 │
├─────┼──────────┤
│ L_1 │   0.6000 │
╘═════╧══════════╛
