# pyGMs Introduction: Variables and Factors

`pyGMs` is a simple toolkit for working with (usually probabilistic) graphical models, which are high-dimensional functions, for example a probability distribution over many variables, that can be represented as a collection of smaller functions, or "factors", that each involve only a few variables at a time.

In this notebook, we'll go through the `pyGMs` notion of random variables, `gm.Var`, and factors, `gm.Factor`.

First, we load the toolkit:

In [2]:
import numpy as np
import pyGMs as gm
import matplotlib.pyplot as plt

## Variables

Every language or toolkit for reasoning about probabilistic models will contain some mechanism for specifying what are the "random variables" of the system, and uniquely identifying them by a name or ID of some kind.

In `pyGMs`, this is the `gm.Var` object.  We assume that there are $n$ variables, whose "global" names (identities) are simply $X_0,\ldots,X_{n-1}$; this allows any variable to be identified by a "unique ID" (integer). The python object `gm.Var` is a more explicit reference to one of these variables, consisting of a label index (its unique ID) and a number of states (the possible values that $X_i$ can take on, again assumed to be the values $\{0,...,d-1\}$).

In [2]:
foo = gm.Var(0, 3)   # Define variable "X0" taking 3 states; 
                     # "foo" is a python object reference to this variable
X0 = foo             # This python variable will be more reflective of its meaning
X1 = gm.Var(1, 2)    # Define "X1" taking on 2 states to be the python object "X1"
X2 = gm.Var(2, 2)    # Define "X2" taking on 2 states, also

Variables are pretty minimal; we can get their label (ID) and number of states, and that's about it:

In [3]:
print(X0.label)
print(X0.states)

0
3


If we "print" a variable, we just show its label (unique ID):

In [4]:
print(X1)

1


----

## Variable Sets

Variable sets are (sorted) sets of variables:

In [5]:
vs = gm.VarSet([X0,X1])
print(vs)

{0,1}


They are internally kept sorted for efficiency reasons; so this is the same:

In [6]:
vs2 = gm.VarSet([X1,X0])
print(vs2)

{0,1}


In [7]:
vs==vs2

True

Notionally, a set of variables has a tuple configuration, e.g., $X_0=1$, $X_1=0$ is the same as $(X_0,X_1) = (1,0)$.

----

## Factors

Now let's try creating a factor; first we'll create a univariate one:

In [8]:
Fa = gm.Factor( [X0], 3.)   # 1st argument: scope of the factor; 2nd: how to fill the table

In [9]:
print("Variables: ",Fa.vars)
print("Table: ", Fa.table)

Variables:  {0}
Table:  [3. 3. 3.]


So, we've create a factor of all "3.0" values.  We can access the factor just like an array:

In [10]:
Fa[0] = 1.1
Fa[2] = 2.2
print(Fa.table)

[1.1 3.  2.2]


Factor objects have a number of useful methods; for instance, if we would like to normalize Fa to correspond to a probability, we need to compute the sum over all the values of $X_0$:

In [11]:
Fa.sum()

6.3

and then normalize the table by dividing by this sum:

In [12]:
Fa_norm = Fa / Fa.sum()
print(Fa_norm.table)

[0.17460317 0.47619048 0.34920635]


We can also make factors over several variables:

In [13]:
Fb = gm.Factor( [X1,X2], 0.0)
print(Fb.table)

[[0. 0.]
 [0. 0.]]


We access entries in `Fb` using a tuple, e.g.,

In [14]:
Fb[0,0] = 1.0         # Note: this is the same as, say: Fb[ (0,0) ] = 1.0
Fb[1,0] = 0.5
Fb[1,1] = 1.2
print(Fb.table)

[[1.  0. ]
 [0.5 1.2]]


Internally, `Fb.table` is a `numpy` array, with an axis for each variable (argument of the factor).

In Jupyter, there is a pretty-print function for looking at the whole table, which shows the arguments and table values:

In [15]:
Fb         # equivalent to "display(Fb)"; may not display without re-running the cell

| X1 | X2 | | $f(x)$ |
| :--: | :--: | :--: | :--: |
| 0 | 0 | | 1.0000 |
| 0 | 1 | | 0.0000 |
| 1 | 0 | | 0.5000 |
| 1 | 1 | | 1.2000 |


**NOTE**: often our tables can be rather large, so the default `print(F)` does not display the entire table, just the list of arguments (the "scope" of the factor).  However, for debugging it can be important to know whether two printed factors are the same objects (or more generally, point to the same table in memory), so we also show the table's memory location for disambiguation.

In [15]:
print(Fb)

| X1 | X2 | | $f(x)$ |
| :--: | :--: | :--: | :--: |
| 0 | 0 | | 1.0000 |
| 0 | 1 | | 0.0000 |
| 1 | 0 | | 0.5000 |
| 1 | 1 | | 1.2000 |


We can also access the factor table entries using the usual "Variable=Value" style of probability in math, using a Python `dict`:

In [16]:
print( Fb[ {X1:1, X2:0} ] )
# or even just using the ID numbers:
print( Fb[ {1:1, 2:0} ])

0.5
0.5


Factors have a number of methods that are useful in the multi-variate case.  For example, we can sum over just one variable, say $X_1$, leaving a function over the other, e.g.:

In [17]:
Fbsum1 = Fb.sum([X1])
print("Vars:",Fbsum1.vars, "Table:", Fbsum1.table)

Vars: {2} Table: [1.5 1.2]


so that the result is now a factor over just $X_2$, whose values are `Fb[0,0]+Fb[1,0]` and `Fb[0,1]+Fb[1,1]`.

Summing out $X_1$ is equivalent to computing the "marginal function" over the remaining variable, $X_2$:

In [18]:
Fb.marginal([2]).table

array([1.5, 1.2])

### <span style="color:red"> WARNING </span>
Factors store their arguments in sorted order according to the variable labels (IDs), NOT in the order in which they appear when the variable is created.  So, for example:

In [19]:
Ftest1 = gm.Factor( [X1,X2], 0.); Ftest1[ {X1:0,X2:1} ] = 3.0
Ftest2 = gm.Factor( [X2,X1], 0.); Ftest2[ {X1:0,X2:1} ] = 3.0
print("Both factors have the same set of arguments: ",Ftest1.vars, '==', Ftest2.vars)
print("By dict:", Ftest1[{1:0,2:1}] )
print("Test 1 (0,1):", Ftest1[0,1])
print("Test 2 (0,1):", Ftest2[0,1])   # SAME ORDER even though declared in reverse

Both factors have the same set of arguments:  {1,2} == {1,2}
By dict: 3.0
Test 1 (0,1): 3.0
Test 2 (0,1): 3.0


This means that if you use the simple and convienent form of "tuple" indexing, you should be very careful of the order your variables are labelled (ID number).  To avoid confusion, it's usually a good idea to always create your factors using variables that are already in label-sorted order.

This ambiguity is a necessary byproduct of the convenience of accessing the factor's entries using only a tuple of values -- since many of the functions we will build are created automatically, we need a consistent way of addressing them.

----

## Factor operations

Factors associate a (tabular) function with a set of arguments, which allows us to keep track of how they should be combined mathematically.  So, for example, the function
$$ F(X_0,X_1,X_2) = F_b(X_1,X_2) + F_a(X_0) $$
can be directly computed as,

In [20]:
F = Fb + Fa
print("F arguments: ", F.vars)
print("Table shape:", F.table.shape)  # equivalent to F.dims() 
print("Table entries:")
print(F.table)

F arguments:  {0,1,2}
Table shape: (3, 2, 2)
Table entries:
[[[2.1 1.1]
  [1.6 2.3]]

 [[4.  3. ]
  [3.5 4.2]]

 [[3.2 2.2]
  [2.7 3.4]]]


and we can address this function with any joint configuration of $(X_0,X_1,X_2)$ (again, in sorted order!):

In [21]:
print("F[0,0,0] =", F[0,0,0])   # = Fa(0) + Fb(0,0) = 1.1 + 1.
print("F[2,1,0] =", F[2,1,0])   # = Fa(2) + Fb(1,0) = 2.2 + 0.5

F[0,0,0] = 2.1
F[2,1,0] = 2.7


### Extracting sub-tables

We can extract a slice of a tabular function by "conditioning" on a known value for one or more of its arguments, e.g.,

In [22]:
F00 = F.condition({X0:0})           # Find the subtable where X0=0
print("F00 arguments:", F00.vars)
print("F00 shape:",F00.dims())
print("F00 entries:")
print(F00.table)

F00 arguments: {1,2}
F00 shape: (2, 2)
F00 entries:
[[2.1 1.1]
 [1.6 2.3]]


### Elimination Operators

We may want to marginalize over some variable, say,
$$ F_0(X_0) = \sum_{x_1} \sum_{x_2} F(X_0,x_1,x_2) $$

In [23]:
F0 = F.sum([X1,X2])  # or equivalently, F.sum([1,2]) 
print(F0.vars)
print(F0.table)      # the table has an entry for each value of X0, equal to the sum of four entries of F:

{0}
[ 7.1 14.7 11.5]


and similarly for maximizing, minimizing, etc.:
$$ G_{02}(X_0,X_2) = \max_{x_1} F(X_0,x_1,X_2) $$

In [24]:
G01 = F.max( [X1] )
print(G01.vars)
print(G01.table)    # Table has an entry for each (x0,x2), equal to the largest entry in F for any x1

{0,2}
[[2.1 2.3]
 [4.  4.2]
 [3.2 3.4]]


#### Example: conditional distributions
The conditional distribution $p(X|Y)$ is given by,
$$p(X|Y) = \frac{p(X,Y)}{p(Y)} = \frac{p(X,Y)}{\sum_x p(X,Y)}$$

In [25]:
p01 = G01 / G01.sum()  # just to make a joint probability table from something

p_X0_X1  = G01 / G01.sum([0])  # compute p(X0 | X1)
print(p_X0_X1.vars)    # still a function of both X0 and X1
print(p_X0_X1.table)   # 

{0,2}
[[0.22580645 0.23232323]
 [0.43010753 0.42424242]
 [0.34408602 0.34343434]]


We can see that each column (index of $X_1$) sums to one over the rows (index of $X_0$), so it defines a conditional distribution.

### Configurations
Configurations of variable sets are typically managed as tuples:

In [26]:
print("G01's maximum value is: ", G01.max())
print("The corresponding configuration is: ", G01.argmax())

G01's maximum value is:  4.2
The corresponding configuration is:  (1, 1)


If a function can be normalized and interpreted as a joint probability, we can sample from it:

In [27]:
print("Normalized probabilities from G01:")
print( (G01/G01.sum()).table )
print()
print("3 random samples drawn:")
print( [G01.sample() for i in range(3)] )  # draw 3 samples

Normalized probabilities from G01:
[[0.109375   0.11979167]
 [0.20833333 0.21875   ]
 [0.16666667 0.17708333]]

3 random samples drawn:
[(0, 1), (0, 1), (0, 0)]


Storing a configuration $x$ as a tuple allows us to easily compute G(x), for example:

In [28]:
print(G01.table,"\n")
x = G01.sample()   # sample x from (normalized version of) G01
print(x)           # sampled configuration
print(G01[x])      # index G using the tuple x

[[2.1 2.3]
 [4.  4.2]
 [3.2 3.4]] 

(0, 1)
2.3


Notice, however, that these tuples are only partial configurations of $X$, corresponding to values `(x0,x2)`, the arguments of `G01`.  The tuple form only tells us a sequence of states, not to which variables those states correspond.  Thus for a more precise representation of a partial configuration, we may prefer the `dict` form, `{X0:x0, X2:x2}`.

### Data Sets
A collection of configurations of $X$ is a data set.  We may want to compute the values of several elements of a data set, or use a data set to compute an empirical probability:

In [29]:
D = [(1, 1, 1), (2, 0, 0), (1, 0, 0), (1, 1, 1), (1, 0, 1)]

Notice that each entry of $D$ is a complete configuration (x0,x1,x2).

If the factor of interest is defined on only a subset of $X$, we must extract the part of each data point $x$ that is relevant to that factor, for example:

In [30]:
list(G01[{v:x[v] for v in G01.vars}] for x in D)

[4.2, 3.2, 4.0, 4.2, 4.2]

using the dict {Xi:xi} indexing method, or equivalently using tuple-indexing:

In [31]:
list(G01[ tuple(x[v] for v in G01.vars) ] for x in D)

[4.2, 3.2, 4.0, 4.2, 4.2]

We may want to use a data set to reason about the empirical frequency of different outcomes.  However, we may not want to compute the empirical joint distribution directly.  The pyGMs library has a helper function to compute the empirical distribution over subsets of variables, returning a list of factors corresponding to the empirical counts:

In [32]:
phat_list = gm.misc.empirical( [ [X0], [X0,X2] ] , D )  # list of variable sets, then data set

phat_X0 = phat_list[0]                                  # First returned factor is over [X0]
print("Vars:",phat_X0.vars,"Table:",phat_X0.table,"\n")

phat_X0X1 = phat_list[1]                                # Second returned factor is over [X0,X2]
print("Vars:",phat_X0X1.vars,"\nTable:",phat_X0X1.table)

Vars: {0} Table: [0. 4. 1.] 

Vars: {0,2} 
Table: [[0. 0.]
 [1. 3.]
 [1. 0.]]


In [33]:
for f in phat_list: display(f)

| X0 | | $f(x)$ |
| :--: | :--: | :--: |
| 0 | | 0.0000 |
| 1 | | 4.0000 |
| 2 | | 1.0000 |


| X0 | X2 | | $f(x)$ |
| :--: | :--: | :--: | :--: |
| 0 | 0 | | 0.0000 |
| 0 | 1 | | 0.0000 |
| 1 | 0 | | 1.0000 |
| 1 | 1 | | 3.0000 |
| 2 | 0 | | 1.0000 |
| 2 | 1 | | 0.0000 |


## <span style="color:red"> ADVANCED: MEMORY SHARING </span>
By default, each factor makes a local copy of the table that defines its values.  In some settings, this leads to inefficient copying and duplication.  If necessary, there is a private method to construct a factor using the table as-is.  However, this option should be used with care.  When shared, any in-place operations performed on one copy of the table will also change the values of the other factor:

In [10]:
X = [gm.Var(i,3) for i in range(6)]
tab = np.array([[1,0,0],[0,1,0],[0,0,1]])
f_orig  = gm.Factor(); f_orig._Factor__build([X[0],X[1]],tab)
f_copy  = gm.Factor([X[4],X[5]],tab)

print("Our table starts off as desired:")
print(f_orig.table)
print("But, if we alter the shared table, it changes the original:")
tab += 1
print(f_orig.table)
print("but does not change a factor with the usual copy behavior:")
print(f_copy.table)

Our table starts off as desired:
[[1 0 0]
 [0 1 0]
 [0 0 1]]
But, if we alter the shared table, it changes the original:
[[2 1 1]
 [1 2 1]
 [1 1 2]]
but does not change a factor with the usual copy behavior:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
