In [1]:
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination



  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Define the structure
model = BayesianNetwork([('Battery', 'Starter'),
                         ('Starter', 'CarStarts'),
                         ('Ignition', 'CarStarts')])

# Define the Conditional Probability Distributions (CPDs)
"""variable_card specifies the number of states or categories that a variable can take (e.g., binary variables can take two states: 0 or 1)."""
"""values - probabibility of each state"""
cpd_battery = TabularCPD(variable='Battery', variable_card=2, values=[[0.6], [0.4]])
cpd_starter = TabularCPD(variable='Starter', variable_card=2, 
                         values=[[0.8, 0.2], [0.2, 0.8]], 
                         evidence=['Battery'], evidence_card=[2])
cpd_ignition = TabularCPD(variable='Ignition', variable_card=2, values=[[0.7], [0.3]])
cpd_carstarts = TabularCPD(variable='CarStarts', variable_card=2, 
                           values=[[1.0, 0.8, 0.9, 0.0], 
                                   [0.0, 0.2, 0.1, 1.0]],
                           evidence=['Starter', 'Ignition'], 
                           evidence_card=[2, 2])

# Add CPDs to the model
model.add_cpds(cpd_battery, cpd_starter, cpd_ignition, cpd_carstarts)

# Verify the model
assert model.check_model()



In [3]:
# Perform inference
inference = VariableElimination(model)
result = inference.query(variables=['CarStarts'], evidence={'Battery': 1, 'Ignition': 1})
print(result)


+--------------+------------------+
| CarStarts    |   phi(CarStarts) |
| CarStarts(0) |           0.1600 |
+--------------+------------------+
| CarStarts(1) |           0.8400 |
+--------------+------------------+


Let’s break down the example and understand key aspects like how the `variable_card` and the values in the Conditional Probability Distributions (CPDs) are chosen.

### 1. **Defining the Structure**

In the code, we define a **Bayesian Network** structure using directed edges:
```python
model = BayesianNetwork([('Battery', 'Starter'),
                         ('Starter', 'CarStarts'),
                         ('Ignition', 'CarStarts')])
```
This structure represents how the variables influence each other:
- `Battery` affects `Starter`.
- Both `Starter` and `Ignition` affect whether the `CarStarts`.

### 2. **Understanding CPD and `variable_card`**

A **Conditional Probability Distribution (CPD)** defines the probability of a variable given its parent nodes. In a Bayesian Network:
- `variable_card` specifies the number of states or categories that a variable can take (e.g., binary variables can take two states: 0 or 1).

#### Example 1: Battery
```python
cpd_battery = TabularCPD(variable='Battery', variable_card=2, values=[[0.6], [0.4]])
```
- **Variable**: `Battery`.
- **variable_card=2**: This means the `Battery` variable is binary, with two possible states: `0` (Battery is dead) and `1` (Battery is working).
- **values=[[0.6], [0.4]]**: The probability of each state is provided:
  - \( P(\text{Battery} = 0) = 0.6 \)
  - \( P(\text{Battery} = 1) = 0.4 \)

Since `Battery` has no parent, this is a simple probability distribution (a marginal distribution).

#### Example 2: Starter
```python
cpd_starter = TabularCPD(variable='Starter', variable_card=2, 
                         values=[[0.8, 0.2], [0.2, 0.8]], 
                         evidence=['Battery'], evidence_card=[2])
```
- **Variable**: `Starter`.
- **variable_card=2**: `Starter` is binary (0: Starter doesn’t work, 1: Starter works).
- **evidence=['Battery']**: `Starter` depends on `Battery` (i.e., whether the battery is working or not).
- **evidence_card=[2]**: This means the `Battery` has two possible states (0 or 1), so we define conditional probabilities for both cases.

The `values` table corresponds to the conditional probabilities:

| Battery | P(Starter=0) | P(Starter=1) |
|---------|--------------|--------------|
| 0       | 0.8          | 0.2          |
| 1       | 0.2          | 0.8          |

This means:
- If `Battery = 0` (dead), \( P(\text{Starter} = 0) = 0.8 \) and \( P(\text{Starter} = 1) = 0.2 \).
- If `Battery = 1` (working), \( P(\text{Starter} = 0) = 0.2 \) and \( P(\text{Starter} = 1) = 0.8 \).

#### Example 3: Ignition
```python
cpd_ignition = TabularCPD(variable='Ignition', variable_card=2, values=[[0.7], [0.3]])
```
- **Variable**: `Ignition`.
- **variable_card=2**: `Ignition` is binary (0: Ignition is off, 1: Ignition is on).
- **values=[[0.7], [0.3]]**: The probabilities are:
  - \( P(\text{Ignition} = 0) = 0.7 \)
  - \( P(\text{Ignition} = 1) = 0.3 \)

#### Example 4: CarStarts
```python
cpd_carstarts = TabularCPD(variable='CarStarts', variable_card=2, 
                           values=[[1.0, 0.8, 0.9, 0.0], 
                                   [0.0, 0.2, 0.1, 1.0]],
                           evidence=['Starter', 'Ignition'], 
                           evidence_card=[2, 2])
```
- **Variable**: `CarStarts`.
- **variable_card=2**: `CarStarts` is binary (0: Car does not start, 1: Car starts).
- **evidence=['Starter', 'Ignition']**: `CarStarts` depends on both `Starter` and `Ignition`.
- **evidence_card=[2, 2]**: Both `Starter` and `Ignition` are binary variables, meaning there are \( 2 \times 2 = 4 \) possible combinations of their states.

The `values` table defines the conditional probabilities for each combination of `Starter` and `Ignition`:

| Starter | Ignition | P(CarStarts=0) | P(CarStarts=1) |
|---------|----------|----------------|----------------|
| 0       | 0        | 1.0            | 0.0            |
| 0       | 1        | 0.8            | 0.2            |
| 1       | 0        | 0.9            | 0.1            |
| 1       | 1        | 0.0            | 1.0            |

This means:
- If both `Starter = 0` and `Ignition = 0`, the car won’t start with probability 1.
- If `Starter = 1` and `Ignition = 1`, the car will start with probability 1.
- For other combinations, the car might or might not start with varying probabilities.

### 3. **Inference**

After defining the Bayesian Network and CPDs, we can perform **inference** to calculate probabilities given evidence. In the example:

```python
inference = VariableElimination(model)
result = inference.query(variables=['CarStarts'], evidence={'Battery': 1, 'Ignition': 1})
print(result)
```

Here:
- We want to compute the probability of the car starting (`CarStarts`) given that `Battery = 1` (working) and `Ignition = 1` (on).
  
The inference algorithm (Variable Elimination) calculates the posterior probabilities for `CarStarts` based on this evidence.

### Summary of Key Points:
- **`variable_card`**: Specifies how many states a variable can take (e.g., 2 for binary variables).
- **CPT (Conditional Probability Table)**: Defines the probability distribution of a variable given its parent nodes (conditional probabilities).
- **Evidence**: Shows how the variables are conditioned on each other, where a node’s probability depends on the states of its parent nodes.
