Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Made the first corrections suggested by reviewer 1. Added a table of …

…contents.

git-svn-id: https://pymc.googlecode.com/svn/trunk@1310 15d7aa0b-6f1a-0410-991a-d59f85d14984
  • Loading branch information...
commit 005fd0521acfb3a1fd293c7034ec3d69ca60d730 1 parent 97bc070
david.huard authored
View
2  INSTALL.rst
@@ -220,6 +220,6 @@ the `issue tracker`_. Comments and questions are welcome and should be
addressed to PyMC's `mailing list`_.
-.. _`issue tracker`: http://code.google.com/p/pymc/issues/list .
+.. _`issue tracker`: http://code.google.com/p/pymc/issues/list
.. _`mailing list`: pymc@googlegroups.com
View
4 docs/INSTALL.tex
@@ -200,7 +200,7 @@ \section{Running the test suite}
In case of failures, messages detailing the nature of these failures will
appear. In case this happens (it shouldn't), please report
-the problems on the \href{http://code.google.com/p/pymc/issues/list.}{issue tracker}, specifying the version you are using and
+the problems on the \href{http://code.google.com/p/pymc/issues/list}{issue tracker}, specifying the version you are using and
the environment.
@@ -211,6 +211,6 @@ \section{Bugs and feature requests}
\label{bugs-and-feature-requests}
Report problems with the installation, bugs in the code or feature request at
-the \href{http://code.google.com/p/pymc/issues/list.}{issue tracker}. Comments and questions are welcome and should be
+the \href{http://code.google.com/p/pymc/issues/list}{issue tracker}. Comments and questions are welcome and should be
addressed to PyMC's \href{mailto:pymc@googlegroups.com}{mailing list}.
View
3  docs/jss/jss_article.tex
@@ -87,7 +87,8 @@
%\maketitle
-%\tableofcontents
+\tableofcontents
+\cleardoublepage
%\linenumbers
\section[Introduction]{Introduction}
\label{sec:intro}
View
12 docs/pymc.bib
@@ -214,3 +214,15 @@ @article{Christakos:2002p5506
Journal = {Advances in Water Resources},
Title = {{On the assimilation of uncertain physical knowledge bases: Bayesian and non-Bayesian techniques}},
Year = {2002}}
+
+@book{Lutz:2007,
+ Author = {Mark Lutz},
+ Title = {Learning Python},
+ Publisher = {O'Reilly},
+ Year = {2007}}
+
+@book{Langtangen:2009,
+ Author = {Hans Petter Langtangen},
+ Title = {Python Scripting for Computational Science},
+ Publisher = {Springer-Verlag},
+ Year = {2009}}
View
22 docs/review.txt
@@ -18,19 +18,35 @@ I think that the PyMC package will provide a valuable tool for statistical model
At 82 pages (59 + Appendices), this user guide is a bit on the long side, so while I recommend that you accept it, I recommend that the authors go through the manuscript again and trim it down as much as possible (especially in the later sections). The following minor comments include the places where I think the paper could be tightened up, and also a few places where I think that _more_ detail could be helpful as well.
-p. v, line -2, -5: more details on where to find the issue tracker would be helpful.
+p. v, line -2, -5: more details on where to find the issue tracker would be helpful.
+
+ DH: The pdf contains an inline http link. Clicking on it should open the issue tracker.
+ There was a typo in the link that is now fixed.
p. vi, line 1: this tutorial assumes some familiarity with python, so it would be good to point uninitiated readers towards some online tutorial or introductory book.
+
+ DH: Added references to two books by Lutz and Langtangen, as well as a link to the official python
+ documentation web page.
p. vii, line 5: maybe include a reference for this Bayesian interpretation of probability, e.g. Jaynes, _Probability Theory: the logic of science_.
+
+ DH: Done
p. vii, line 13: I've also heard "DeterminedByValuesOfParents" called the "Systemic Part" of the model
+
+ DH: Added this terminology in the paragraph.
p. viii, line -16: p(D|e,s,l) = p(D|r) is potentially confusing, I would prefer just p(D|e,s,l)
+
+ DH: Don't know about this. I understand why p(D|r) adds some meaning to the sentence.
p. ix, figure: the label on arc l is very hard to read, you should shift it to the right a little bit
-p. xii, line 16: the description beginning "After a sufficiently large number of iterations..." seems imprecise. I would prefer it to say "As the number of samples tends to infinity, the MCMC distribution converges to the stationary distribution."
+p. xii, line 16: the description beginning "After a sufficiently large number of iterations..." seems imprecise. I would prefer it to say "As the number of samples tends to infinity, the MCMC distribution converges to the stationary distribution."
+
+ DH: Changed to "As the number of samples tends to infinity, the MCMC distribution of $s$, $e$
+ and $l$ converges to the stationary distribution. In other words, their
+ values can be considered as random draws from the posterior $p(s,e,l|D)$. "
p. xiv, line 13: you could mention that excluding rows with missing observations can also lead to biased results
@@ -96,6 +112,8 @@ For an article over 80 pages long the lack of a contents table is
unacceptable. (This might be an editorial detail.) The structure is
relatively clear, and the overall balance between theoretical and
practical parts seems appropriate.
+
+ DH: Table of contents added.
The package is easy to install, with just one library (numpy) required
and a few others optional. I had no problems with installing and
View
41 docs/tutorial.tex
@@ -1,4 +1,9 @@
-%!TEX root = guide2.0.tex
+
+This tutorial will guide you through a typical \code{pymc} application.
+Familiarity with Python is assumed, so if you are new to Python, books such as
+\citet{Lutz:2007} or \citet{Langtangen:2009} are the place to start. Tons of
+online documentation can also be found on the
+\href{http://www.python.org/doc/}{Python documentation} page.
\section{An example statistical model}
Consider the following dataset, which is a time series of recorded coal mining disasters in the UK from 1851 to 1962 \citep{Jarrett:1979fr}.
@@ -34,9 +39,23 @@ \section{An example statistical model}
\section{Two types of variables}
-At the model-specification stage (before the data are observed), $D$, $s$, $e$, $r$ and $l$ are all random variables. Bayesian `random' variables have not necessarily arisen from a physical random process. The Bayesian interpretation of probability is \emph{epistemic}, meaning random variable $x$'s probability distribution $p(x)$ represents our knowledge and uncertainty about $x$'s value. Candidate values of $x$ for which $p(x)$ is high are relatively more probable, given what we know. Random variables are represented in PyMC by the classes \code{Stochastic} and \code{Deterministic}.
-
-The only \code{Deterministic} in the model is $r$. If we knew the values of $r$'s parents ($s$, $l$ and $e$), we could compute the value of $r$ exactly. A \code{Deterministic} like $r$ is defined by a mathematical function that returns its value given values for its parents. The nomenclature is a bit confusing, because these objects usually represent random variables; since the parents of $r$ are random, $r$ is random also. A more descriptive (though more awkward) name for this class would be \code{DeterminedByValuesOfParents}.
+At the model-specification stage (before the data are observed), $D$, $s$, $e$,
+$r$ and $l$ are all random variables. Bayesian `random' variables have not
+necessarily arisen from a physical random process. The Bayesian interpretation
+of probability is \emph{epistemic}, meaning random variable $x$'s probability
+distribution $p(x)$ represents our knowledge and uncertainty about $x$'s value
+\citep{jaynes}. Candidate values of $x$ for which $p(x)$ is high are
+relatively more probable, given what we know. Random variables are represented
+in PyMC by the classes \code{Stochastic} and \code{Deterministic}.
+
+The only \code{Deterministic} in the model is $r$. If we knew the values of
+$r$'s parents ($s$, $l$ and $e$), we could compute the value of $r$ exactly. A
+\code{Deterministic} like $r$ is defined by a mathematical function that returns
+its value given values for its parents. \code{Deterministic} variables are
+sometimes called the \emph{systemic} part of the model. The nomenclature is a
+bit confusing, because these objects usually represent random variables; since
+the parents of $r$ are random, $r$ is random also. A more descriptive (though
+more awkward) name for this class would be \code{DeterminedByValuesOfParents}.
On the other hand, even if the values of the parents of variables $s$, $D$ (before observing the data), $e$ or $l$ were known, we would still be uncertain of their values. These variables are characterized by probability distributions that express how plausible their candidate values are, given values for their parents. The \code{Stochastic} class represents these variables. A more descriptive name for these objects might be \code{RandomEvenGivenValuesOfParents}.
@@ -239,7 +258,19 @@ \subsection{What does it mean to fit a model?}
`Fitting' a model means characterizing its posterior distribution somehow. In this case, we are trying to represent the posterior $p(s,e,l|D)$ by a set of joint samples from it. To produce these samples, the MCMC sampler randomly updates the values of $s$, $e$ and $l$ according to the Metropolis-Hastings algorithm (\cite{gelman}) for \code{iter} iterations.
-After a sufficiently large number of iterations, the current values of $s$, $e$ and $l$ can be considered a sample from the posterior. PyMC assumes that the \code{burn} parameter specifies a `sufficiently large' number of iterations for convergence of the algorithm, so it is up to the user to verify that this is the case (see chapter \ref{chap:modelchecking}). Consecutive values sampled from $s$, $e$ and $l$ are necessarily dependent on the previous sample, since it is a Markov chain. However, MCMC often results in strong autocorrelation among samples that can result in imprecise posterior inference. To circumvent this, it is often effective to thin the sample by only retaining every $k$th sample, where $k$ is an integer value. This thinning interval is passed to the sampler via the \code{thin} argument.
+As the number of samples tends to infinity, the MCMC distribution of $s$, $e$
+and $l$ converges to the stationary distribution. In other words, their
+values can be considered as random draws from the posterior $p(s,e,l|D)$.
+PyMC assumes that the \code{burn} parameter specifies a `sufficiently large'
+number of iterations for convergence of the algorithm, so it is up to the user
+to verify
+that this is the case (see chapter \ref{chap:modelchecking}). Consecutive values
+sampled from $s$, $e$ and $l$ are necessarily dependent on the previous sample,
+since it is a Markov chain. However, MCMC often results in strong
+autocorrelation among samples that can result in imprecise posterior inference.
+To circumvent this, it is often effective to thin the sample by only retaining
+every $k$th sample, where $k$ is an integer value. This thinning interval is
+passed to the sampler via the \code{thin} argument.
If you are not sure ahead of time what values to choose for the \code{burn} and \code{thin} parameters, you may want to retain all the MCMC samples, that is to set \code{burn=0} and \code{thin=1}, and then discard the `burnin period' and thin the samples after examining the traces (the series of samples). See \cite{gelman} for general guidance.
Please sign in to comment.
Something went wrong with that request. Please try again.