A Bayesian Network is a probabilistic graphical model that represents a set of variables
and their conditional dependencies through a Directed Acyclic Graph (DAG).

Bayesian networks are built from probability distributions and leverage
probability theory for tasks such as prediction, anomaly detection, diagnostics, and
decision-making under uncertainty

Directed Acyclic Graph (DAG): Represents the structure of dependencies between
variables.

Conditional Probability Tables (CPTs): Specify the quantitative strength of those dependencies.

**Elements of a Bayesian Network**

Nodes (or Vertices): Represent random variables in the domain being modeled. These can be:

Observable variables (e.g., symptoms)

Latent or hidden variables (e.g., disease presence)

Edges: Directed links between nodes that represent probabilistic dependencies.
An edge from node A → B implies that B is conditionally dependent on A.

Conditional Probability Tables (CPTs): Each non-root node is associated with a CPT that defines the probability distribution of that variable given its parent nodes.

Parent Nodes: Nodes that have edges pointing to a specific node.
A node’s value depends on its parent nodes.

Root Nodes: Nodes with no incoming edges.
These are independent variables or inputs to the system.

Leaf Nodes: Nodes with no outgoing edges.
Often represent outputs or final observed variables.

Network Structure: The overall connection pattern among the nodes.
This structure determines conditional independencies and supports efficient
probabilistic inference.

Q: In a home system, the alarm can be triggered by either a burglary or an earthquake. If
the alarm goes off, the system notifies two neighbors, John and Mary, who may call the
homeowner. Given that both John and Mary have called, what is the probability that a
burglary has actually occurred?

In [None]:
pip install pgmpy

Collecting pgmpy
  Downloading pgmpy-1.0.0-py3-none-any.whl.metadata (9.4 kB)
Collecting pyro-ppl (from pgmpy)
  Downloading pyro_ppl-1.9.1-py3-none-any.whl.metadata (7.8 kB)
Collecting pyro-api>=0.1.1 (from pyro-ppl->pgmpy)
  Downloading pyro_api-0.1.2-py3-none-any.whl.metadata (2.5 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->pgmpy)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch->pgmpy)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch->pgmpy)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch->pgmpy)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch->pgmpy)
  Downloa

In [None]:
from pgmpy.models import DiscreteBayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

# Step 1: Define the structure of the Bayesian Network
model = DiscreteBayesianNetwork([
('Burglary', 'Alarm'),
('Earthquake', 'Alarm'),
('Alarm', 'JohnCalls'),
('Alarm', 'MaryCalls')
])

# Step 2: Define the CPDs (Conditional Probability Distributions)

# P(Burglary)
cpd_burglary = TabularCPD(variable='Burglary', variable_card=2,
values=[[0.999], [0.001]])

# P(Earthquake)
cpd_earthquake = TabularCPD(variable='Earthquake', variable_card=2,
values=[[0.998], [0.002]])

# P(Alarm | Burglary, Earthquake)
cpd_alarm = TabularCPD(
  variable='Alarm',
  variable_card=2,
  values=[
    [0.999, 0.71, 0.06, 0.05], # Alarm = False
    [0.001, 0.29, 0.94, 0.95] # Alarm = True
  ],
  evidence=['Burglary', 'Earthquake'],
  evidence_card=[2, 2]
)

# P(JohnCalls | Alarm)
cpd_john = TabularCPD(
  variable='JohnCalls',
  variable_card=2,
  values=[
    [0.3, 0.9], # JohnCalls = False
    [0.7, 0.1] # JohnCalls = True
  ],
  evidence=['Alarm'],
  evidence_card=[2]
)

# P(MaryCalls | Alarm)
cpd_mary = TabularCPD(
  variable='MaryCalls',
  variable_card=2,
  values=[
    [0.2, 0.99], # MaryCalls = False
    [0.8, 0.01] # MaryCalls = True
  ],
  evidence=['Alarm'],
  evidence_card=[2]
)

# Step 3: Add CPDs to the model
model.add_cpds(cpd_burglary, cpd_earthquake, cpd_alarm, cpd_john, cpd_mary)

# Step 4: Verify the model
assert model.check_model(), "Model is incorrect"

# Step 5: Perform inference
inference = VariableElimination(model)

# Query: What is the probability of a burglary given that both John and Mary called?
result = inference.query(variables=['Burglary'], evidence={'JohnCalls': 1,
'MaryCalls': 1})
print(result)

+-------------+-----------------+
| Burglary    |   phi(Burglary) |
| Burglary(0) |          0.9999 |
+-------------+-----------------+
| Burglary(1) |          0.0001 |
+-------------+-----------------+


Great question!

The `cpd_alarm` definition defines the **Conditional Probability Distribution (CPD)** for the variable **`Alarm`** given two parent variables: **`Burglary`** and **`Earthquake`**. This means you're specifying how likely the alarm is to go off (True or False) depending on whether there's a burglary and/or an earthquake.

---

### 📊 Format Explanation:

```python
values=[
    [0.999, 0.71, 0.06, 0.05],  # Alarm = False
    [0.001, 0.29, 0.94, 0.95]   # Alarm = True
],
evidence=['Burglary', 'Earthquake'],
evidence_card=[2, 2]
```

Here’s how to read this:

* Both `Burglary` and `Earthquake` are binary (0 = False, 1 = True)
* There are `2 x 2 = 4` possible combinations of `(Burglary, Earthquake)`
* The CPD matrix lists the probability of `Alarm = False` and `Alarm = True` for each combination

---

### 🧮 Breakdown of Each Column:

| Burglary | Earthquake | P(Alarm=False) | P(Alarm=True) |
| -------- | ---------- | -------------- | ------------- |
| 0        | 0          | 0.999          | 0.001         |
| 0        | 1          | 0.71           | 0.29          |
| 1        | 0          | 0.06           | 0.94          |
| 1        | 1          | 0.05           | 0.95          |

---

### 💡 Interpretation:

* If **no burglary and no earthquake**, the alarm almost never goes off (`0.001`)
* If **only earthquake**, alarm sometimes goes off (`0.29`)
* If **only burglary**, alarm often goes off (`0.94`)
* If **both**, alarm almost always goes off (`0.95`)

These probabilities capture the idea that the alarm is most sensitive to **burglary**, somewhat to **earthquake**, and very unlikely to go off randomly.

---

Would you like me to help you visualize this CPD in a nice table or plot?



---

### 🧠 What does it represent?

This line:

```python
# P(MaryCalls | Alarm)
```

...defines the **probability that Mary will call** based on whether the **alarm went off**.

* `Alarm = 0` → Alarm **did not** go off
* `Alarm = 1` → Alarm **did** go off

---

### 📊 The Table Explained:

```python
values=[
  [0.2, 0.99],  # MaryCalls = False
  [0.8, 0.01]   # MaryCalls = True
],
evidence=['Alarm'],
evidence_card=[2]
```

There are **2 columns**, one for each state of the evidence (`Alarm`):

| Alarm | P(MaryCalls = False) | P(MaryCalls = True) |
| ----- | -------------------- | ------------------- |
| 0     | 0.2                  | 0.8                 |
| 1     | 0.99                 | 0.01                |

---

### 🧯 Interpretation (careful!):

Actually, you’ve got the order **reversed** in your table. Based on:

```python
values=[
  [0.2, 0.99],  # MaryCalls = False
  [0.8, 0.01]   # MaryCalls = True
]
```

It means:

| Alarm | **MaryCalls = False**    | **MaryCalls = True**  |
| ----- | ------------------------ | --------------------- |
| 0     | 0.2 (she *doesn't* call) | 0.8 (she *does* call) |
| 1     | 0.99 (*doesn't* call)    | 0.01 (*does* call)    |

But that would imply Mary is **more likely to call when there's no alarm**, which doesn't make sense.

---

### ✅ Correct Version (more realistic):

To match the logic that Mary is more likely to call **when the alarm goes off**, you should **flip the probabilities**:

```python
cpd_mary = TabularCPD(
  variable='MaryCalls',
  variable_card=2,
  values=[
    [0.8, 0.01],  # MaryCalls = False
    [0.2, 0.99]   # MaryCalls = True
  ],
  evidence=['Alarm'],
  evidence_card=[2]
)
```

| Alarm | P(MaryCalls = False) | P(MaryCalls = True) |
| ----- | -------------------- | ------------------- |
| 0     | 0.8                  | 0.2                 |
| 1     | 0.01                 | 0.99                |

Now it means:

* If **alarm didn't go off**, Mary calls with low probability (0.2)
* If **alarm did go off**, Mary almost certainly calls (0.99)

---

### ✅ Conclusion:

Your syntax is correct, but the **values are likely logically inverted**. Fixing them makes the model behave more realistically.

Want me to update both John and Mary's CPDs for you in the same corrected style?
