Permalink
Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
595 lines (483 sloc) 20.2 KB
\documentclass[12pt]{article}
\usepackage{mathptmx}
\usepackage{courier} % use courier for typewriter font
%\renewcommand*\ttdefault{cmvtt} % use computer modern as typewriter
\usepackage[T1]{fontenc}
\usepackage{fullpage}
\usepackage{graphicx}
\usepackage[algoruled,linesnumbered]{algorithm2e}
\usepackage{amsmath,amsfonts,amsthm,amssymb}
\usepackage{wasysym}
\usepackage{color} % add more color
\usepackage{longtable}
\usepackage{url}
\usepackage[square,comma]{natbib}
\renewcommand{\cite}{\citep} % use the natbib style of citation
\usepackage[tight]{subfigure}
\usepackage{mdwlist}
%\usepackage[letterpaper,twoside,vscale=.8,hscale=.75,nomarginpar,hmarginratio=1:1]{geometry}
\usepackage[parfill]{parskip}
\usepackage{minted}
%\usemintedstyle{manni}
\usepackage{rotating}
\usepackage[hang, small, bf, margin=20pt, tableposition=top]{caption}
\setlength{\belowcaptionskip}{5pt}
%\usepackage{mathpazo}
\usepackage[colorlinks,pagebackref]{hyperref}
\usepackage[figure,table]{hypcap} % Correct a problem with hyperref
\hypersetup{citecolor={orange},linkcolor={blue}}
\usepackage{wrapfig}
\usepackage{array} % for the use with tabular
\usepackage{xspace}
\usepackage{verbatim}
\usepackage{fancyvrb}
\usepackage{multirow}
\usepackage{appendix}
\input{dfn.tex}
\title{The BLOG language syntax \\ {version 0.4}}
\author{Lei Li\\
Computer Science Division\\
University of California Berkeley\\
\email{leili@cs.berkeley.edu}\\
\and
Stuart Russell\\
Computer Science Division\\
University of California Berkeley\\
\email{russell@cs.berkeley.edu}
}
\date{\today}
%\Year{2011}
\begin{document}
\maketitle
\begin{abstract}
This document introduces the syntax of BLOG, a probabilistic programming language.
BLOG defines probabilistic generative models over first-order structures.
BLOG has the following features:
(a) it employs an open-universe semantics;
(b) it can describe relational uncertainty;
(c) it can handle identity uncertainty;
(d) it is empowered by first-order logic.
The particular syntax covered in this document corresponds to BLOG version 0.4. The current version includes significant redesign and extension to the previous versions of BLOG, based on the principles of usability and implementation efficiency.
\end{abstract}
\tableofcontents
\section{Basic language concepts}
A \bl program consists a list of statements.
Each statement ends with semicolon(;).
Statements include
\begin{enumerate*}
\item Type declaration;
\item Fixed function declaration;
\item Distinct symbol declaration;
\item Random function declaration;
\item Origin function declaration;
\item Number statement;
\item Evidence statement;
\item Query statement.
\end{enumerate*}
Each \bl program defines a set of random variables and their probabilistic dependencies.
A very toy example of defining a random variable in \bl is:
\begin{minted}{blog}
random Real x ~ Gaussian(0, 1);
\end{minted}
which states that $x$ is distributed according to the standard normal distribution.
\section{Declaring types}
\bl is a strongly typed language. To declare a type in \bl:
\begin{minted}{blog}
type typename;
\end{minted}
\section{Fixed functions}
To declare a function with fixed interpretation for all satisfying possible worlds:
\begin{minted}{blog}
fixed type0 funcname(type1) = e;
\end{minted}
The statement defines a function with name \texttt{funcname} with one argument of type, \texttt{type1}, and return type, \texttt{type0} . The function body \texttt{e} must be a fixed/nonrandom expression, which will be introduced later.
\reminder{seems we should disallow this, to make it easier for parsing and closer to java syntax.}
We could declare function and its function body separately as well.
\begin{minted}{blog}
fixed type0 funcname(type1);
funcname(type1 v) = e;
\end{minted}
Functions can have zero or multiple arguments as well. Functions without arguments are constants.
\subsection{Constants}
A {\em constant} is a nonrandom zero-ary function.
\begin{minted}{blog}
fixed type name = nonrandom-expression;
\end{minted}
Where the {\tt nonrandom-expression} will be described in Section~\ref{sec:expression}.
For example:
\begin{minted}{blog}
fixed Real a = 1.0;
fixed Boolean b = true;
\end{minted}
Another example involves a bit more on expression.
\begin{minted}{blog}
fixed Real b = 1.0 + a;
\end{minted}
Such names can be referred anywhere fixed zero-ary function could appear.
For example:
\begin{minted}{blog}
random NaturalNum x ~ Poisson(a);
\end{minted}
\subsection{Distinct ``object'' symbols}
There is a special type of functions, distinct symbols.
Distinct symbols are fixed zero-ary functions without function bodies.
\begin{minted}{blog}
distinct type name1, name2, name3, ...;
\end{minted}
It defines several symbols of name, \texttt{name1}, \texttt{name2}, \texttt{name3} \dots.
These symbols will have fixed interpretation in all satisfying possible worlds. In addition, all these symbols will have \emph{different} interpretations.
We can have multiple \texttt{distinct} statements in one blog, and all those symbols for the same type will have
distinct interpretation in possible worlds. In this sense, these symbols are equivalent to ``objects'' in model structures.
\subsection{Build-in distinct symbols}
There are predefined distinct symbols: all integers, all real numbers, and text strings, e.g. 1, 3.14, ``hello''. These predefined symbols are also called {\em literals}.
\section{Random functions}
Random functions may have different interpretations across possible worlds. Random functions are defined in the similar way as fixed functions, but with \texttt{random} keyword.
To declare a random function:
\begin{minted}{blog}
random type0 funcname(type1 x);
\end{minted}
The statement defines a random function with name \texttt{funcname} with one argument of type, \texttt{type1}, and return type, \texttt{type0} . The function body can be declared separately.
\begin{minted}{blog}
funcname(x) = deterministic-expression;
funcname(x) ~ Distribution;
funcname(x) dependency-expression;
\end{minted}
Each of the type will be described in later sections.
\section{Origin functions}
Origin functions are used in number statement to generate objects in possible worlds.
\begin{minted}{blog}
origin type0 funcname(type1);
\end{minted}
\section{Parameters}
A parameter is only declared but not initialized with a particular value. Its value can be learned in the inference engine. To declare a parameter
\begin{minted}{blog}
param type1 name(type2, ...);
param type1 name(type2, ...) : condition;
\end{minted}
This declares a parameter that should satisfy the optional condition. \texttt{condition} is a nonrandom expression that returns a Boolean value.
Note here a parameter can be a function with argument of \texttt{type2, ...}, with the restriction that \texttt{type2, ...} should only involve types with distinct statements and number statements on nonrandom expression or param.
\reminder{a bit ugly}.
Learnable parameters can be used in complex expressions. For example:
\begin{minted}{blog}
param Real a;
random Real x ~ Gaussian(2 ^ a, 1.0);
random Real y ~ Gaussian(a, 1.0);
\end{minted}
We could specifying the range of a parameter, for example:
\begin{minted}{blog}
param Real a: 0 < a & a < 10;
param Real b: b > 1;
random Real x ~ Gaussian(a, b);
\end{minted}
Commonly used parameters are often without arguments. However, the syntax does not prevent specifying infinite parameters like:
\begin{minted}{blog}
param Real mu(Int);
\end{minted}
%\hide{
\subsection{Declaring number param}
\reminder{Seems this will create problem to the EM algorithm}
A special case of parameter would the number statement. For example,
\begin{minted}{blog}
type Person;
param NaturalNum #Person;
param Real TrueHeight(Person p) : TrueHeight(p) > 0;
random Real MeasuredHeight(Person p) ~ Gaussian(TrueHeight(p), 1.0);
\end{minted}
We will describe the notation for distributions in Section~\ref{sec:distribution}. For the moment, we use Gaussian as a notation for Gaussian distributions.
This example defines a set of parameters, and its size itself is a parameter.
Note that since \texttt{TrueHeight} takes \texttt{Person} as an argument, we should not allow \texttt{\#Person} to be generated from a random distribution.
haha
%}
\section{Expressions}
\label{sec:expression}
An general expression can include both nonrandom and random terms.
It is defined as
\begin{itemize}
\item a \emph{nonrandom expression};
\item a proper reference to an element in {\tt Array} or {\tt List}, with index of general expression\;
\item random function with a \emph{general expression} as arguments {\tt randomfun(p1, p2, \dots)},
where {\tt p1, p2, \dots} are also \emph{general expressions};
\item arithmetic expression on numerical type: \texttt{e1 + e2}, \texttt{e1 - e2}, \texttt{e1 * e2}, \texttt{e1 / e2}, \texttt{e1 \^{} e2}, \texttt{- e1}, \texttt{+ e1}, \texttt{- e1}, \texttt{(e1)}, where \texttt{e1} and \texttt{e2} are also \emph{general expression} of type \texttt{NaturalNum} or \texttt{Real}.
\item logic expression on Boolean type: \texttt{e1 \& e2}, \texttt{e1 | e2}, \texttt{! e1}, \texttt{(e1)} where
\texttt{e1} and \texttt{e2} are also \emph{general expression} of type \texttt{Boolean}.
\item relational expression: \texttt{e1 > e2}, \texttt{e1 >= e2}, \texttt{e1 < e2}, \texttt{e1 <= e2}, where \texttt{e1} and \texttt{e2} should be general expression with value type \texttt{NaturalNum} or \texttt{Real}; \texttt{e1 == e2}, \texttt{e1 != e2} can be any general expression.
\item proper reference to a distribution, see Section~\ref{sec:distribution}.
\end{itemize}
\paragraph{Fixed expression}
Expression without any appearance of random function symbols.
Example:
\begin{minted}{blog}
1.0 + 2.0 * 3.0
a - 2.0
Twice(10.0) * 5.5
\end{minted}
Where \texttt{Twice(\mycdot)} is declared as
\begin{minted}{blog}
fixed Real Twice(Real x) = x * 2;
\end{minted}
\section{Array type}
to declare an array type
\begin{minted}{blog}
type[size]
\end{minted}
where size should be a natural number or an \emph{expression} that returns a natural number.
\subsection{Constant array}
To declare an constant array
\begin{minted}{blog}
fixed type[size] name = Array_Constructor;
\end{minted}
For example, to declare an array with natural numbers:
\begin{minted}{blog}
fixed NaturalNum[10] c = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
\end{minted}
When referring to an element in the array, it could be referred as \texttt{c[0], c[1], c[2]}, etc.
\subsection{Variable size}
The size of an array is usually a constant. However it can also be random value.
For example:
\begin{minted}{blog}
random NaturalNum n ~ Poisson(10.0) + 1;
random Real[n] x;
\end{minted}
\subsection{Multi-dimensional array}
Since \texttt{Array} is a type, itself could be nested in Array declaration, thus yielding multi-dimensional array.
\begin{minted}{blog}
fixed type[size1][size2] table = [...];
\end{minted}
For example, a two dimensional array of int will be
\begin{minted}{blog}
fixed NaturalNum[2][3] table = [[1, 2, 3], [4, 5, 6]];
\end{minted}
The following syntax in short hand is also correct:
\begin{minted}{blog}
fixed NaturalNum[2][3] table = [1, 2, 3; 4, 5, 6];
\end{minted}
For example, a transition matrix in Kalman filters with Newton dynamics can be declared as:
\begin{minted}{blog}
fixed Real[2][3] A = [1, 1, .5; 0, 1, 1; 0, 0, 1];
\end{minted}
An element in such a dimensional array can be referred as \texttt{A[0][0]}.
\hide{By convenience, a shorthand notation could also be used
\texttt{
Array<type>[size1][size2]
}
Or could be {\tt type[size1][size2]}
}
\subsection{Size of an array}
There are two special functions obtain the size of an array.
\texttt{length(\mycdot)} and \texttt{size(\mycdot)}.
\texttt{length(\mycdot)} returns the length of one dimensional array, while \texttt{size(\mycdot)} returns the lengths at all dimensions.
In the above example, \texttt{length(table)} will be 3, and \texttt{size(table)} will be \texttt{[2, 3]}.
\subsection {Array of Parameters}
We could define an array of parameters using
\begin{minted}{blog}
param type[size] name;
\end{minted}
where {\tt size} should be a constant natural number.
\begin{minted}{blog}
param Real[10] theta;
\end{minted}
\section{List}
The declaration of a list:
\begin{minted}{blog}
List
List<type>
\end{minted}
A list is an array with indefinite length (like java).
When constructing list, one could use
\begin{minted}{blog}
fixed List<NaturalNum> a = List(1, 2, 3, 4, 5, 6);
\end{minted}
Or simply using the short notation
\begin{minted}{blog}
fixed List<NaturalNum> a = [1, 2, 3, 4, 5, 6];
\end{minted}
\subsection{Constant list literals}
As we already seen, we could use the square brackets, \texttt{[]} to denote constant list literals.
Elements in a list is separated by a comma (,). Lists can be nested within a list. A shorthand notation is to use semicolon (;) to separated multiple lists within a list.
The following two lists are regarded equivalent.
\begin{minted}{blog}
[1, 2, 3; 4, 5, 6];
[[1, 2, 3], [4, 5, 6]];
\end{minted}
Constant list literals are used to assign values to an array or passing parameters to a function.
\section{Map}
\begin{minted}{blog}
Map<type1, type2>
\end{minted}
internally it is defined as (non)random function.
e.g.
\begin{minted}{blog}
fixed Map<Boolean, Real> map1 = {true -> 0.3, false -> 0.7};
\end{minted}
or equivalently
\begin{minted}{blog}
fixed Map<Boolean, Real> map2 = Map(true -> 0.3, false -> 0.7);
\end{minted}
A Map's key must be some constant, while its value can be evaluated value of a \emph{general expression}, as long as the type matches.
\begin{minted}{blog}
random Real x ~ Uniform(0,1);
random Real y = 1-x;
random Map<Boolean, Real> map2 = Map(true -> x, false -> y);
random Map<Boolean, Real> map3 = {true -> x^2, false -> y/2};
\end{minted}
In addition, \texttt{type2} in a map can be the \texttt{Distribution} type, which will be introduced in Section~\ref{sec:distribution}.
\subsection{Multi-dimensional map}
The type in a map can be an array, which will result in a multi-dimensional map.
For example,
\begin{minted}{blog}
fixed Map<Int[2], Real> map4 =
{[1, 1] -> 0.1, [1, 2] -> 0.3, [2, 1] -> 0.2, [2, 2] -> 0.4};
\end{minted}
This will be useful in creating TabularCPD (see later sections) with multiple parent variables.
\section{Elementary probability distribution and conditional probability distribution}
\label{sec:distribution}
The internal logic is, \texttt{Categorical} and \texttt{TabularCPD} are sub-type of \texttt{Distribution}.
\subsection{Categorical distribution as defined by probability mass table}
\begin{minted}{blog}
Distribution<type> name = Categorical(Map<type, Real>);
\end{minted}
It defines probability mass on type, with probability in the Map.
The first \texttt{Categorical} is type declaration, while the second is constructor.
The probability should sum up to 1.0, otherwise it will by default add an entry null -> residual probability.
For example:
\begin{minted}{blog}
Distribution<Boolean> cpd1 = Categorical({true -> 0.3, false -> 0.7});
\end{minted}
or in short, it can be simplified with
\begin{minted}{blog}
Distribution cpd1 = Categorical({true -> 0.3, false -> 0.7});
\end{minted}
If the probabilities sum to more than 1.0, the BLOG compiler will produce a runtime error.
\texttt{Array} can be viewed as a subtype of \texttt{Map}, therefore the following statement is also allowed.
\begin{minted}{blog}
Distribution<type> name = Categorical(Array<Real>);
\end{minted}
\subsection{Elementary Distribution}
To declare a distribution:
\begin{minted}{blog}
Distribution<type0> name(type1, type2, ...)
\end{minted}
where {\tt type0} is the value type of the random variable, while {\tt type1, type2, \dots} are types of parameters.
For example,
\begin{minted}{blog}
Distribution<Real> Gaussian(Real, Real)
\end{minted}
Distributions in the standard library can be used even without declaration. For example:
\begin{minted}{blog}
random Real x ~ Gaussian(0, 1.0);
\end{minted}
The Bernoulli can be defined as
\begin{minted}{blog}
Distribution<Boolean> Bernoulli(Real p) =
Categorical<Boolean>({true -> p, false -> 1-p});
\end{minted}
However, customer written distribution should be declared first.
When referring a distribution, valid parameters can be \emph{general expressions}.
For example:
\begin{minted}{blog}
random Real mu ~ uniform(0, 1);
random Real x ~ Gaussian(mu, 1.0);
\end{minted}
\subsection{TabularCPD}
To declare and construct a tabular conditional probability distribution,
\begin{minted}{blog}
TabularCPD<type1> name(type2 v) =
TabularCPD(Map<type2, Distribution<type1>>, v);
\end{minted}
Without introducing confusion, it can be declared concisely,
\begin{minted}{blog}
TabularCPD<type1> name = TabularCPD(Map<type2, Distribution<type1>>);
\end{minted}
For example:
\begin{minted}{blog}
Categorical<Boolean> cpd1 = Categorical({true -> 0.3, false -> 0.7});
Categorical<Boolean> cpd2 = Categorical({true -> 0.6, false -> 0.4});
TabularCPD<Boolean> cpd3(Boolean x) =
TabularCPD({true -> cpd1, false -> cpd2}, x);
\end{minted}
It can be referred using
\begin{minted}{blog}
random Boolean y ~ Bernoulli(0.2);
random Boolean x ~ cpd3(y);
\end{minted}
It is equivalent to write:
\begin{minted}{blog}
random Boolean x(Boolean y)
~ TabularCPD({true -> cpd1, false -> cpd2})(y);
\end{minted}
With the comprehension, we could even declare conditional mixture of Gaussian very easily. For example:
\begin{minted}{blog}
TabularCPD<Real> mixGaussian1(Int a) =
TabularCPD({0 -> Gaussian(5, 1.0), 1 -> Gaussian(10, 1.0)}, a);
random Int z ~ Categorical({0 -> 0.4, 1 -> 0.6});
random Real x ~ mixGaussian1(z);
\end{minted}
\subsubsection{Multiple dependent variables}
If a CPD is dependent on several parent variables, a multi-dimensional map will be used
\begin{minted}{blog}
TabularCPD<Real> mixGaussian2(Int x, Int y) =
TabularCPD({[0, 0] -> Gaussian(5, 1.0),
[0, 1] -> Gaussian(10, 1.0),
[1, 0] -> Gaussian(2, 4.0),
[1, 1] -> Gaussian(20, 4.0)}, [x, y]);
\end{minted}
\hide{
\reminder{Alternative design, which one do you like?? dropped...}
\begin{minted}{blog}
TabularCPD<Real> mixGaussian2(Int x, Int y) =
TabularCPD({[0, 0] -> Gaussian(5, 1.0),
[0, 1] -> Gaussian(10, 1.0),
[1, 0] -> Gaussian(2, 4.0),
[1, 1] -> Gaussian(20, 4.0)}, x, y);
\end{minted}
}
\subsubsection{Tree conditionals}
If a CPD is based on a map from key to a TabularCPD, it will be a hierarchical \emph{tree conditional}.
The following example will essentially construct the same distribution as above.
\begin{minted}{blog}
TabularCPD<Real> mixGaussian3(Int x) =
TabularCPD({0 -> Gaussian(2, 4.0), 1 -> Gaussian(20, 4.0)}, x);
TabularCPD<Real> mixGaussian4(Int x, Int y) =
TabularCPD({0 -> mixGaussian1(y), 1 -> mixGaussian3(y)}, x);
\end{minted}
In general there can be multiple hierarchies.
\subsection{Reference to Distribution}
If a distribution is referred in a \emph{general expression}, it is regarded as a random variable and will take a sample as the value.
For example,
\begin{minted}{blog}
random Real x = 1 + Gaussian(0, 1.0);
\end{minted}
It has the same semantics as
\begin{minted}{blog}
random Real y ~ Gaussian(0, 1.0);
random Real x = 1 + y;
\end{minted}
\texttt{\~{}} is used when the right side is a distribution, while \texttt{=} is used when the right side is a general expression.
\reminder{Shaunak suggests using \texttt{=} primarily, and \texttt{\~{}} only in certain cases that are not clear from context.}
\section{Dependency Statement}
\section{Import user libraries}
\subsection{Import blog program}
\begin{minted}{blog}
import path;
\end{minted}
Where \texttt{path} is a java style package description.
For example, to import urn-ball.blog:
\begin{minted}{blog}
import urn-ball;
\end{minted}
\subsection{Referring to external library}
\begin{minted}{blog}
extern blog.distribution.*;
\end{minted}
It will make all implementation of Distribution visible to blog inference engine.
\section{Observations}
%\subsection{Observability}
\section{Query and inference guidance}
\section{Types and Functions for DBLOG}
\begin{itemize}
\item {\tt Timestep}: a type to denote time tick in DBLOG.
\item {\tt Prev(Timestep)}: returns previous time step
\end{itemize}
\section{User defined distribution}
\reminder{TO-DO, how to define pdf and sampling function?}
\end{document}