-
Notifications
You must be signed in to change notification settings - Fork 80
/
mainpage.txt
169 lines (142 loc) · 7.19 KB
/
mainpage.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
/** \defgroup Tutorial Getting started
A TMB projects consists of an R file (*.R) and a C++ file (*.cpp).
The R file does pre- and post processing of data in addition to maximizing
the log-likelihood contained in *.cpp. See \ref Examples for more details.
All R functions are documented within the standard help system i R.
This tutorial describes how to write the C++ file, and assumes familiarity with C++ and to some extent with R.
The purpose of the C++ program is to evaluate the objective function, i.e. the negative log-likelihood
of the model. The program is compiled and called from R, where
it can be fed to a function minimizer like nlminb().
The objective function should be of the following C++ type:
\code
template<class Type>
Type objective_function<Type>::operator() ()
{
.... Here goes your C++ code .....
}
\endcode
Note that the objective function is a templated class where <tt><Type></tt> is
the data type of both the input values and the return value of the objective function.
The reason for the allowing different data types is that the same chunk of C++ code
can evaluate the both the objective function and its derivatives, by calling
it with different values for "Type". The technical aspects
of this are hidden from the user. There is however one aspect
that surprises the new TMB user. When a constant like "1.2" is used
in a calculation that affects the return value it must be "cast" to Type:
\code
Type nll; // Define variable that holds the return value (neg. log. lik)
nll = Type(1.2); // Assign value 1.2; a cast is needed.
\endcode
<h2>Obtaining data and parameter values from R</h2>
Obviously, we will need to pass both data and parameter values to the objective function.
This is done through a set of macros that TMB defines for us. To see
which macros are available start typing <tt>DATA_</tt> or <tt>PARAMETER_</tt>
in the Doxygen search field of your browser (you may need to refresh the browser window between
each time you make a new search). A simple example if you want to read a vector
of numbers (doubles) is the following
\code
DATA_INTEGER(n); // Length of "x"
DATA_VECTOR(x); // Vector x(0),x(1),...,x(n-1)
\endcode
Note that all vectors and matrices in TMB uses a \b zero-based indexing scheme.
It is not necessary to explicitely pass the dimension of "x", but is often convenient.
The dimension of x is set on the R side when the C++ program is called, and
there are ways of retrieving the dimenson of x inside the C++ program.
<h2>An extended C++ language</h2>
TMB extends C++ with functionality that is important for formulating
likelihood functions. You have three toolboxes available:
- Standard C++ used for infrastructure like loops etc.
- Vector, matrix and array library (see \ref matrix_arrays)
- Probability distributions (see \ref Distributions)
In addition to the variables defined through the <tt>DATA_</tt> or <tt>PARAMETER_</tt>
macros there can be "local" variables, for which ordinary C++ scoping rules apply.
There must also be a variable that holds the return value (neg. log. likelihood).
\code
DATA_VECTOR(x); // Vector x(0),x(1),...,x(n-1)
Type tmp = x(1);
Type nll=tmp*tmp;
\endcode
As in ordinary C++ local variable tmp must be assign a value before it can enter into
a calculation.
<h2>Statistical modelling</h2>
TMB can handle complex statistical problems with hierarchical structure (latent
random variables) and multiple data sources. Latent random variables must be continuous
(discrete distributions are not handled). The <tt>PARAMETER_</tt> macros are used to pass
two types of parameters.
- \b Parameters: to be estimated by maximum likelihood. These include fixed effects and variance
components in the mixed model litterature. They will also correspond to hyper parameters
with non-informative priors in the Bayesian literature.
- \b Latent \b random \b variables: to be integrated out of the likelihood using a Laplace approximation.
Which of these are chosen is controlled from R, and is not specified in C++. However,
for a latent random variable it is usually necessary to assign a probablity distribution,
which is done in C++ file.
The purpose of the C++ program is to calculate the (negative) joint density of data and latent
random variables. Each datum and individual latent random gives a contribution
to log likelihood, which may be though of as a "distribution allignment" by users
familiar with software in the BUGS family.
\code
PARAMETER_VECTOR(u); // Latent random variable
Type nll = Type(0); // Return value
nll -= dnorm(u(0),0,1) // Distributional assignment: u(0) ~ N(0,1)
\endcode
The following rules apply:
- Distribution assignments do not need to take place before the latent variable
is used in a calculation.
- More complicated distributionional assigments are allowed, say u(0)-u(1) ~ N(0,1),
but this requires the user to have a deeper understanding of the probabilistic aspects of the model.
- For latent variables only normal distributions should be used (otherwise
the Laplace approximation will perform poorly).
- The library \ref Distributions contains many probability distributions, especially
multivariate normal distributions. For probability distributions not contained
in the library, the user can use raw C++ code. Due to the above rule that
latent variables shold be normally distributed this is only relevant for the response distribution.
\code
DATA_VECTOR(y); // Data vector
Type nll = Type(0); // Return value
nll -= log(Type(0.5)) - abs(y(0)); // y(0) has a Laplace distribution
\endcode
See \ref Examples for more examples
*/
/** \defgroup matrix_arrays Matrices and arrays
\section Relation_R Relationship to R
In R you can apply both matrix multiplication ("%*%")
and elementwise multiplication ("*") to objects of
type "matrix", i.e. it is the operator that determines the operation.
In TMB we instead have two different types of objects, while
the multiplication operator "*" is the same:
- \c matrix: linear algebra
- \c array: elementwise operations; () and [] style indexing.
- \c vector: can be used in linear algebra with \c matrix, but at the same
time admits R style element-wise operations.
See \ref matrix_arrays.cpp "matrix_arrays.cpp" for examples of use.
\section Relation_Eigen Relationship to Eigen
The TMB types \c matrix and \c array inherits from the the Eigen
types Matrix and Array. The advanced user of TMB will befinit from familiarity with
the <a href="http://eigen.tuxfamily.org/dox/group__TutorialMatrixClass.html">Eigen documentation</a>.
*/
/** \defgroup Distributions Probability distributions
\brief TMB contains several classes of probability distributions. These are organized into C++ name spaces
as shown below.
*/
*/
/** \defgroup Examples
For a list of all examples please click on the "Examples" tab on
the top of the page.
Simple example:
\include simple.cpp
\example linreg.cpp
\example matrix_arrays.cpp
\example nmix.cpp
\example orange_big.cpp
\example rw.cpp
\example rw_sparse.cpp
\example sdv_multi.cpp
\example socatt.cpp
\example spatial.cpp
\example ar1xar1.cpp
\example atomic.cpp
\example randomregression.cpp
\example rw.cpp
\example simple.cpp
\example sumtest.cpp
*/