# Session 2: PyMC and PyTensor

In this session, we'll explore the fundamental components of PyMC: PyTensor and PyMC's variable classes. We'll learn how PyTensor defines and optimizes computational graphs, and how PyMC uses these capabilities to build probabilistic models.


## PyTensor Basics

PyTensor is the computational backend for PyMC. It defines symbolic variables and operations on them, which are compiled into efficient functions that can run on CPUs or GPUs. Let's begin by exploring the basic elements of PyTensor.

In PyTensor, you define a computational graph explicitly. You start with input variables that are essentially placeholders and from these, build intermediate variables by applying operators. These intermediate variables can then be treated as final outputs or as inputs for further computation.

While PyTensor is designed to feel similar to NumPy to ease the learning curve, it's important to remember they are distinct. PyTensor operations build a graph of computations (which are executed lazily) rather than immediately returning values like NumPy.

### Tensors and Basic Operations

To begin, let's define some PyTensor tensors and show how to perform some basic operations.

A tensor can be a scalar or a vector with any number of dimensions.

Concretely:


In [None]:
import pytensor
import pytensor.tensor as pt
import numpy as np

x = pt.tensor(shape=(), dtype="float64")
y = pt.tensor(shape=(2,), dtype="float64")

print(
    f"""
x type: {x.type}
x shape = {x.type.shape}
---
y type: {y.type}
y shape = {y.type.shape}
"""
)

Now that we have defined the `x` and `y` tensors, we can create a new one by adding them together.


In [None]:
z = x + y
z.name = "x + y"
z

To make the computation a bit more complex let's take the logarithm of the resulting tensor.


In [None]:
w = pt.log(z)
w, type(w)

We did not give `w` a name, so it prints something more descriptive: its the index-0 output of calling the `log` function. Its type is `TensorVariable`, which is the base class for all PyTensor variables.

So PyTensor works something like NumPy, but it builds a graph of operations rather than executing, more like a symbolic computation library.

We can use the `pytensor.dprint` function to print the computational graph of any given tensor.


In [None]:
pytensor.dprint(w)

This output shows the structure of the computation that PyTensor has built for the variable `w`. Think of it as a recipe (in reverse):

- **`Log [id A] 'log(x + y)'`**: This is the final result, `w`. It's calculated by taking the logarithm (`Log`) of an intermediate value named `'log(x + y)'`. PyTensor assigns it an internal identifier `A`.
- **`Add [id B] 'x + y'`**: This is the input to the `Log` operation. It's an intermediate value named `'x + y'` (which we called `z` in the code), calculated by an addition (`Add`). Its internal ID is `B`.
- **`ExpandDims{axis=0} [id C]`**: This is the first input to the `Add` operation. `ExpandDims` is an operation that changes the shape of a tensor. Here, it's likely making the scalar `x` compatible for addition with the vector `y`. Its ID is `C`.
  - **`<Scalar(float64, shape=())> [id D]`**: This is the input to `ExpandDims`. It's our original scalar tensor `x` (ID `D`), which holds a single 64-bit floating-point number.
- **`<Vector(float64, shape=(3,))> [id E]`**: This is the second input to the `Add` operation. It's our original vector tensor `y` (ID `E`), which holds three 64-bit floating-point numbers.


### Functions

Note that this graph does not do any computation (yet!). It is simply defining the sequence of steps to be done. We can use `pytensor.function` to define a callable object so that we can push values through the graph.

PyTensor functions are compiled from symbolic expressions into efficient callable functions. The `pytensor.function` constructor takes several key arguments that define how the function will behave:

- The `inputs` argument specifies which PyTensor variables will be provided when calling the function. These become the function's parameters.

- The `outputs` argument defines which symbolic expressions should be evaluated and returned when the function is called.


In [5]:
f = pytensor.function(inputs=[x, y], outputs=w)

Now that the graph is compiled, we can push some concrete values:


In [None]:
f(0, [1, np.e])

> TIP:
> Sometimes we just want to debug, we can use `pytensor.graph.basic.Variable.eval` for that:


In [None]:
w.eval({x: 0, y: [1, np.e]})

You can set intermediate values as well


In [None]:
w.eval({z: [1, np.e]})

### Graph Optimization

One of the most important features of `pytensor` is that it can automatically **optimize** the mathematical operations inside a graph. Let's consider a simple example:


In [None]:
a = pt.tensor(shape=(), name="a")
b = pt.tensor(shape=(), name="b")

c = a / b
c.name = "a / b"

pytensor.dprint(c)

Now let's multiply `b` times `c`. This should result in simply `a`.


In [None]:
d = b * c
d.name = "b * c"

pytensor.dprint(d)

The graph shows the full computation, but once we compile it the operation becomes the identity on `a` as expected.


In [None]:
g = pytensor.function(inputs=[a, b], outputs=d)

pytensor.dprint(g)

### What is in a PyTensor Graph?

The following diagram shows the basic structure of an `pytensor` graph.

![pytensor graph](images/apply.png)


We can make these concepts more tangible by explicitly indicating them in our earlier example. Let's compute the graph components for the tensor `z`.


In [None]:
print(
    f"""
z type: {z.type}
z name = {z.name}
z owner = {z.owner}
z owner inputs = {z.owner.inputs}
z owner op = {z.owner.op}
z owner output = {z.owner.outputs}
"""
)

### Graph Manipulation

Another interesting feature of PyTensor is the ability to manipulate the computational graph, something that is not possible with TensorFlow or PyTorch. Here we'll see how to modify an existing graph.


In [None]:
# get input tensors
list(pytensor.graph.graph_inputs(graphs=[w]))

As a simple example, let's add an `pytensor.tensor.exp` before the `pytensor.tensor.log` (to get the identity function).


In [14]:
parent_of_w = w.owner.inputs[0]  # get z tensor
new_parent_of_w = pt.exp(parent_of_w)  # modify the parent of w
new_parent_of_w.name = "exp(x + y)"

Note that the graph of `w` has actually not changed:


In [None]:
pytensor.dprint(w)

To modify the graph we need to use the `pytensor.clone_replace` function, which _returns a copy of the initial subgraph with the corresponding substitutions._


In [None]:
new_w = pytensor.clone_replace(output=[w], replace={parent_of_w: new_parent_of_w})[0]
new_w.name = "log(exp(x + y))"
pytensor.dprint(new_w)

Finally, we can test the modified graph by passing some input to the new graph.


In [None]:
new_w.eval({x: 0, y: [1, np.e]})

As expected, the new graph is just the identity function.


> NOTE:
> Again, note that `pytensor` is clever enough to omit the `exp` and `log` once we compile the function.


In [None]:
f = pytensor.function(inputs=[x, y], outputs=new_w)

pytensor.dprint(f)

In [None]:
f(0, [1, np.e])

This type of manipulation is called a **graph rewrite**. Rewrites bridge the gap between allowing users to define graphs in any way they please, while allowing the resulting computation to be carried out in an optimal way, from the standpoint of performance and stability.

While some rewrites are performed automatically, we can invoke them manually as well. For example:


In [None]:
x = pt.tensor('x', shape=(None, ))
y = pt.log(1 + x)
y.dprint(print_type=True)

The expression for `y` can be numerically unstable when `x` is very close to zero, due to floating point precision.

PyTensor's stabilization rewrite replaces such expressions to use more numerically stable equivalents, in this case with `pt.special.log1p(x)` which calculates `log(1 + x)` accurately even for small `x`.


In [None]:
stable_y = pytensor.graph.rewrite_graph(y, include=("stabilize",))
stable_y.dprint()

### Example: Logistic Regression

Let's try building a simple model in pure PyTensor. We will estimate the parameters of a logistic regression model using a simple version of gradient descent.

Gelman et al. (2003) present an example of an acute toxicity test, commonly performed on animals to estimate the toxicity of various compounds.

In this dataset `log_dose` includes 4 levels of dosage, on the log scale, each administered to `n=5` rats during the experiment. The response variable is `death`, the number of positive responses to the dosage.

The number of deaths can be modeled as a binomial response, with the probability of death being a linear function of dose:

$$
\begin{aligned}
\text{logit}(p_i) &= a + b x_i \\
y_i &\sim \text{Bin}(n_i, p_i) \\
\end{aligned}
$$

The common statistic of interest in such experiments is the **LD50**, the dosage at which the probability of death is 50%.


In [22]:
import numpy as np

rng = np.random

dose = np.array([-0.86, -0.3, -0.05, 0.73])
n = 5
deaths = np.array([0, 1, 3, 5])

First, let's declare our symbolic variables.


In [23]:
X = pt.tensor("X", shape=(None,))
Y = pt.tensor("Y", shape=(None,))
a = pt.tensor("a", shape=())
b = pt.tensor("b", shape=())

... then construct the expression graph:


In [24]:
# Probability that target = 1
p_1 = 1 / (1 + pt.exp(-(a + X * b)))
p_1.name = "prob_target_1"

# The prediction threshold
prediction = p_1 > 0.5

# Cross-entropy loss function
xent = -Y * pt.log(p_1) - (n - Y) * pt.log(1 - p_1)
xent.name = "cross-entropy"

# The cost to minimize
cost = xent.mean()

# Perform rewrites
stable_cost = pytensor.graph.rewrite_graph(cost, include=('canonicalize', 'stabilize'))

# Compute the gradient of the cost
ga, gb = pt.grad(stable_cost, [a, b])

# Learning rate
step = pt.tensor("step", shape=())

# Update the parameters
a_new = a - step * ga
b_new = b - step * gb
step_new = step * 0.99

Recall that in order to use these expressions, we need to compile them into functions.

Below we compile two functions: `train()` which performs gradient descent by updating parameters based on input data, and `predict()` which makes predictions using the current parameter values.


In [25]:
train = pytensor.function(
    inputs=[X, Y, a, b, step],
    outputs=[prediction, xent, a_new, b_new, step_new],
)
predict = pytensor.function(inputs=[X, a, b], outputs=prediction)

Now we can train the model.


In [None]:
alpha, beta, lr = 0.0, 1.0, 10.0
training_steps = 1000
for i in range(training_steps):
    pred, err, alpha, beta, lr = train(dose, deaths, alpha, beta, lr)

print("Final model:", alpha, beta)

In [None]:
import plotly.graph_objects as go


def logit(x):
    return 1.0 / (1 + np.exp(-x))


xvals = np.linspace(-1, 1)

go.Figure().add_trace(
    go.Scatter(
        x=xvals, y=logit(beta * xvals + alpha), mode="lines", name="Fitted Model"
    )
).add_trace(
    go.Scatter(
        x=dose,
        y=[d / n for d in deaths],
        mode="markers",
        marker=dict(color="red", size=10),
        name="Observed Data",
    )
).update_layout(
    title="Logistic Regression Model",
    xaxis_title="Log Dose",
    yaxis_title="Probability of Death",
    width=600,
).show()

## The PyMC API

Bayesian inference begins with specification of a probability model relating unknown variables to data. PyMC provides the basic building blocks for Bayesian probability models: stochastic random variables, deterministic variables, and factor potentials.

A **stochastic random variable** is a factor whose value is not completely determined by its parents, while the value of a **deterministic random variable** is entirely determined by its parents. Most models can be constructed using only these two variable types. The third quantity, the **factor potential**, is _not_ a variable but simply a
log-likelihood term or constraint that is added to the joint log-probability to modify it.


### The Distribution Class

A stochastic variable is represented in PyMC by a `Distribution` class. This structure adds functionality to Pytensor's `pytensor.tensor.random.op.RandomVariable` class, mainly by registering it with an associated PyMC `Model` -- so `Distribution` objects are only usable inside of a `Model` context.

`Distribution` subclasses (i.e. implementations of specific statistical distributions) will accept several arguments when constructed:

`name`
: Name for the new model variable. This argument is **required**, and is used as a label and index value for the variable.

`model`
: The PyMC model to which the variable belongs.

`shape`
: The variable's shape.

`total_size`
: The overall size of the variable (this variable will not exist for scalars).

`dims`
: A tuple of dimension names known to the model.

`transform`
: A transformation to be applied to the distribution when used by the model, especially when the distribution is constrained.

`initval`
: Numeric or symbolic untransformed initial value of matching shape, or one of the following initial value strategies: "moment", "prior". Depending on the sampler's settings, a random jitter may be added to numeric, symbolic or moment-based initial values in the transformed space.

Sometimes we wish to use a particular statistical distribution, without using it as a variable in a model; for example, to generate random numbers from the distribution. For this purpose, `Distribution` objects have a method `dist` that returns a **stateless** probability distribution of that type; that is, without being wrapped in a PyMC random variable object.


In [None]:
import pymc as pm
import plotly.express as px

x = pm.Exponential.dist(1)
samples = pm.draw(x, draws=1000)

fig = px.histogram(samples, title="Exponential Distribution Samples")
fig.update_layout(xaxis_title="Value", yaxis_title="Count", showlegend=False)
fig.show();

## Building Models in PyMC

Now that we understand the basic building blocks of PyMC models, let's see how to combine them to build a complete model. We'll use a real-world example of predicting college basketball game outcomes.


### NCAA Basketball Matchup Model

College basketball is an immensely popular sport, and the annual "March Madness" NCAA tournament is a staple of the American sports calendar. Each year, millions of fans fill out NCAA tournament brackets, trying to predict the outcomes of each round of the knockout tournament.

We will use a dataset of NCAA basketball games from the 2017-18 season to build a model that predicts the outcome of a given game based on the strengths of the competing teams. Conceivably, this model could be used to help fill out a bracket, or to make a prediction for the outcome of individual games.

![2018 NCAA Tournament](https://i.turner.ncaa.com/sites/default/files/images/2018/03/11/2018-ncaa-tournament-bracket.jpg)

The dataset consists of two files:

- `ncaa_team_data.parquet`: a table of team statistics for each team in the dataset.
- `ncaa_game_data.parquet`: a table of game results, including the date, location, and outcome of each game.

The team data contains a variety of statistics for each team, including the number of games they won and lost, the number of points they scored and allowed, and a variety of other metrics. We can use these as predictors for game outcomes.The game data will be used to fit the model, as it contains the date, location, and outcome of each game.

Let's load the data and take a look at the first few rows of each table.


In [None]:
import polars as pl

team_data = pl.read_parquet('../data/ncaa_team_data.parquet')
team_data.head()

In [None]:
game_data = pl.read_parquet("../data/ncaa_game_data.parquet")
game_data.head()

As a target variable, we will use the margin of victory for the home team, which is positive if the home team wins and negative if the away team wins.

Let's take a look at the distribution of game margins.


In [None]:
fig = px.histogram(game_data, x="home_margin", nbins=40, title="Distribution of Game Margins")
fig.update_layout(
    xaxis_title="Game Margin (home)",
    yaxis_title="Count",
    bargap=0.1,
    width=600
)

The distribution looks roughly normal, centered slightly above zero. This reflects the "home court advantage" that is common in most sports, particularly basketball.

Thus it would seem reasonable to select a Gaussian likelihood for the model.


In [32]:
y = game_data['home_margin'].to_numpy()

Let's look at the set of potential predictor variables available in the team data.

The predictor variables used in the model are:

- **FG%**: Field Goal Percentage - percentage of field goals made
- **3P%**: Three-Point Percentage - percentage of three-point shots made
- **FT%**: Free Throw Percentage - percentage of free throws made
- **ORB**: Offensive Rebounds - number of rebounds on the offensive end
- **TRB**: Total Rebounds - total number of rebounds (offensive + defensive)
- **AST**: Assists - number of passes leading directly to a made basket
- **STL**: Steals - number of times the ball was taken from the opponent
- **BLK**: Blocks - number of shots blocked
- **TOV**: Turnovers - number of times the ball was lost to the opponent
- **PF**: Personal Fouls - number of fouls committed

Since many of these are on different scales, we will standardize them to have mean zero and unit variance. This will help with numerical stability and convergence of the MCMC algorithm, as well as making it easier to interpret the coefficients.


In [None]:
predictor_cols = ['FG%',
 '3P%',
 'FT%',
 'ORB',
 'TRB',
 'AST',
 'STL',
 'BLK',
 'TOV',
 'PF']

# Standardize the predictor columns in polars
X = team_data.select(
    [(pl.col(c) - pl.col(c).mean()) / pl.col(c).std() for c in predictor_cols]
)

fig = px.scatter_matrix(X, height=1000, width=1000)
fig.show()

The scatter plot is encouraging: there is no strong multicollinearity between the predictors.


### The Matchup Model

The idea behind this model is that the outcome of a game is determined by the relative strengths of the two teams playing. For each game $g$, we want to predict the score differential (home team score minus away team score). This expectation is characterized by the difference in latent variables representing the strengths of the competing teams:
$$ \delta_g = S_{\text{home}_g} - S_{\text{away}_g} $$

Also, recall that there is a home court advantage, so we need to account for the fact that the home team is expected to have a slight advantage over the away team. This is represented by the intercept $H$, which is the average home court advantage across all games. Thus, the expected score differential is given by:
$$ \mu_g = H + \delta_g $$

The team strength parameters $S$ will be calculated as a linear combination of the predictor variables.

$$ S_t = \sum_{p \in \text{predictors}} \beta_p \cdot X_{t,p} $$

We will proceed with the assumption that the observed score differential for each game $y_g$, follows a Normal distribution. The mean of this distribution is the `expected_margin` and its standard deviation is the `observation_error` ($\sigma$):

$$ y_g \sim \text{Normal}(\mu_g, \sigma) $$

**Priors**

Now that we have defined the general structure of the model, we can choose some priors.

For the predictor coefficients $\beta_p$, we will use a Normal distribution with a mean of 0 and a standard deviation of 100:

$$ \beta_p \sim \text{Normal}(0, 10) $$

The mean of 0 suggests no initial bias towards positive or negative impact, and a large standard deviation of 10 indicates a weak prior (allowing data to inform the values).

The home advantage $H$ is a single, global parameter representing the average point advantage for the home team across all games. Since we are confident that the home advantage is positive, we will use a Half-Normal distribution with a standard deviation of 10:

$$ H \sim \text{HalfNormal}(10) $$

Finally, the standard deviation of the game outcomes is the inherent variability or noise not explained by the predictors. It's drawn from a Half-Cauchy distribution, a common choice for scale parameters as it's restricted to positive values and has heavy tails, allowing for occasional large deviations:

$$ \sigma \sim \text{HalfCauchy}(1) $$

Let's implement this model in PyMC.

---

It is not required, but recommended to use named dimensions for the predictor and team variables. This allows us to pass the predictor names as coordinates to the model, which will allow us to use them later when we are working with the model output. We will include the college names, since the predictor variables are school-specific, as well as the predictor variable names.


In [34]:
coords = dict(
    predictor=predictor_cols,
    team=team_data["School"].to_list()
)

These are stored as a dictionary, which is passed to the `Model` constructor as the `coords` argument.

The first part of the model to define are the priors.


In [35]:
with pm.Model(coords=coords) as ncaa_model:
    
    # Predictor coefficients
    beta = pm.Normal('beta', 0, sigma=10, dims="predictor")

    # Home advantage
    home_advantage = pm.HalfNormal('home_advantage', sigma=10)

    # Observation error
    sigma = pm.HalfCauchy('sigma', 1)

## Parameter Transformation

To support efficient sampling by PyMC's MCMC algorithms, any continuous variables that are constrained to a sub-interval of the real line are automatically transformed so that their support is unconstrained. This frees sampling algorithms from having to deal with boundary constraints.

For example, if we look at the variables we have created in the model so far:

In [None]:
print(ncaa_model.value_vars)

The model's `value_vars` attribute stores the values of each random variable actually used by the model's log-likelihood.

As the name suggests, the variables `sigma` and `home_advantage` have been log-transformed, and this is the space over which posterior sampling takes place. When a sample is drawn, the value of the transformed variable is simply back-transformed to recover the original variable.

By default, auto-transformed variables are ignored when summarizing and plotting model output, since they are not generally of interest to the user.


### Deterministic Variables

A deterministic variable is one whose values are **completely determined** by the values of their parents.

In this model, the team strength parameters are defined as a linear combination of the predictor variables. Thus, we employ a `Deterministic` variable to represent them. Two things happen when a variable is created this way:

1. The variable is given a name (passed as the first argument)
2. The variable is appended to the model's list of random variables. This will ensure that the variable's values are automatically stored in the model's trace.

In [34]:
with ncaa_model:
    
    team_strength = pm.Deterministic('team_strength', beta.dot(X.to_numpy().T), dims="team")

The expected margin is then the sum of the home advantage and the difference in team strengths. There is one value for every game in the dataset, so we may not want to store these values in the trace. Hence, we will not pass them to the `Deterministic` constructor but are included as *anonymous* variables; they will be computed on the fly and discarded at each iteration. So, this approach is only appropriate for intermediate values in your model that you do not wish to obtain posterior estimates for, alongside the other variables in the model.

In [35]:
with ncaa_model:
    
    expected_margin = home_advantage + team_strength[game_data['home_team_id'].to_numpy()] - team_strength[game_data['away_team_id'].to_numpy()]

### Observed Random Variables

Stochastic random variables whose values are observed are represented by a different class than unobserved random variables. An `ObservedRV` object is instantiated any time a stochastic variable is specified with data passed as the `observed` argument. 

In our model, the observed game outcomes are represented by an `ObservedRV` using a normal random variable.

In [37]:
with ncaa_model:

    pm.Normal('outcome', expected_margin, sigma=sigma, observed=y)

An important responsibility of `ObservedRV` is to automatically handle missing values in the data, if there are any.

Here's the graph representation of the final model.


In [None]:
pm.model_to_graphviz(ncaa_model)

### Factor Potentials

For some applications, we want to be able to modify the joint density by incorporating terms that don't correspond to probabilities of variables conditional on parents. For example, suppose in the IQ drug model we want to constrain the difference between the placebo and drug means to be less than 10, so that the joint density becomes:

$$p(y,\nu,\mu_1,\mu_2, \sigma_1, \sigma_2) \propto p(y|\nu,\mu_1,\mu_2, \sigma_1, \sigma_2) p(\nu) p(\mu_1) p(\mu_2) p(\sigma_1) p(\sigma_2) I(|\mu_2-\mu_1| \lt 10)$$

We call such log-probability terms **factor potentials** (Jordan 2004).

A potential can be created via the `Potential` function, in a way very similar to `Deterministic`'s named interface:

```python
with disaster_model:

    diff_constraint = pm.Potential('diff_constraint', pm.math.switch(pm.math.abs(treat_mean-placebo_mean)>10, -np.inf, 0))
```

The function takes just a `name` as its first argument and an expression returning the appropriate log-probability as the second argument.

A common use of a factor potential is to represent an observed likelihood, where the **observations are partly a function of model variables**. In the contrived example below, we are representing the error in a linear regression model as a zero-mean normal random variable. Thus, the "data" in this scenario is the residual, which is a function both of the data and the regression parameters.

If we represent this as a standard likelihood function (a `Distribution` with an `observed` keyword argument), we run into problems. This parameterization would not be compatible with an observed stochastic, because the `err` term would become fixed in the likelihood and not be allowed to change during sampling.


In [None]:
y_vals = np.array([15, 10, 16, 11, 9, 11, 10, 18, 11])
x_vals = np.array([1, 2, 4, 5, 6, 8, 19, 18, 12])

with pm.Model() as regression:
    sigma = pm.HalfCauchy("sigma", 5)
    beta = pm.Normal("beta", 0, sigma=2)
    mu = pm.Normal("mu", 0, sigma=10)

    err = y_vals - (mu + beta * x_vals)

    like = pm.Normal("like", 0, sigma=sigma, observed=err)

Instead, we can re-express the likelihood as a factor potential, which is a function of the data and the model parameters.


In [None]:
with pm.Model() as regression:
    sigma = pm.HalfCauchy("sigma", 5)
    beta = pm.Normal("beta", 0, sigma=2)
    mu = pm.Normal("mu", 0, sigma=10)

    err = y_vals - (mu + beta * x_vals)

    like = pm.Potential("like", pm.logp(pm.Normal.dist(0, sigma=sigma), err))

## Exercise: Bioassay Model

Let's return to the logistic regression model that we previously hand-coded in PyTensor.

If you recall, we are trying to predict the number of deaths in an acute toxicity test, which we model as a binomial response $y$, with the probability of death being a linear function of log-dose ($x$):

$$
\begin{aligned}
\text{logit}(p_i) &= a + b x_i \\
y_i &\sim \text{Bin}(n_i, p_i) \\
\end{aligned}
$$

Now construct this model using PyMC:

In [None]:
# Log dose in each group
log_dose = [-0.86, -0.3, -0.05, 0.73]

# Sample size in each group
n = 5

# Outcomes
deaths = [0, 1, 3, 5]

In [None]:
# Write your answer here

---

## References

1. Ching & Chen. 2007. Transitional Markov chain Monte Carlo method for Bayesian model updating, model class selection and model averaging. Journal of Engineering Mechanics 2007
2. Hoffman MD, Gelman A. 2014. The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. The Journal of Machine Learning Research. 15(1):1593-1623.
3. M.I. Jordan. 2004. Graphical models. Statist. Sci., 19(1):140–155.
4. Neal, R. M. 2003. Slice sampling. The Annals of Statistics, 31(3), 705–767. doi:10.1111/1467-9868.00198


In [None]:
%load_ext watermark
%watermark -n -u -v -iv -w