Navigation Menu

Skip to content

Commit

Permalink
Add RStudio project file. Update docs.
Browse files Browse the repository at this point in the history
  • Loading branch information
hadley committed Jun 29, 2015
1 parent 1cdfdc2 commit 650d732
Show file tree
Hide file tree
Showing 33 changed files with 533 additions and 476 deletions.
4 changes: 3 additions & 1 deletion .Rbuildignore
@@ -1,2 +1,4 @@
bench
notes.md
notes.md
^.*\.Rproj$
^\.Rproj\.user$
6 changes: 6 additions & 0 deletions .gitignore
@@ -0,0 +1,6 @@
.Rproj.user
.Rhistory
.RData
src/*.o
src/*.so
src/*.dll
2 changes: 2 additions & 0 deletions NAMESPACE
@@ -1,3 +1,5 @@
# Generated by roxygen2 (4.1.1): do not edit by hand

S3method("[",dgrid)
S3method("[<-",ranged)
S3method(Math,condensed)
Expand Down
20 changes: 20 additions & 0 deletions bigvis.Rproj
@@ -0,0 +1,20 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX

AutoAppendNewline: Yes

BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
PackageRoxygenize: rd,collate,namespace
14 changes: 7 additions & 7 deletions man/autoplot.condensed.Rd
@@ -1,20 +1,20 @@
% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/autoplot.r
\name{autoplot.condensed}
\alias{autoplot.condensed}
\title{Autoplot condensed summaries.}
\usage{
\method{autoplot}{condensed} (x,
var = last(summary_vars(x)), ...)
\method{autoplot}{condensed}(x, var = last(summary_vars(x)), ...)
}
\arguments{
\item{x}{a condensed summary}
\item{x}{a condensed summary}

\item{var}{which summary variable to display}
\item{var}{which summary variable to display}

\item{...}{other arguments passed on to individual
methods}
\item{...}{other arguments passed on to individual methods}
}
\description{
Autoplot condensed summaries.
Autoplot condensed summaries.
}
\examples{
if (require("ggplot2")) {
Expand Down
59 changes: 27 additions & 32 deletions man/best_h.Rd
@@ -1,48 +1,44 @@
% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/h.r
\name{best_h}
\alias{best_h}
\title{Find "best" smoothing parameter using leave-one-out cross validation.}
\usage{
best_h(x, h_init = NULL, ..., tol = 0.01,
control = list())
best_h(x, h_init = NULL, ..., tol = 0.01, control = list())
}
\arguments{
\item{x}{condensed summary to smooth}
\item{x}{condensed summary to smooth}

\item{h_init}{initial values of bandwidths to start
search out. If not specified defaults to 5 times the
binwidth of each variable.}
\item{h_init}{initial values of bandwidths to start search out. If not
specified defaults to 5 times the binwidth of each variable.}

\item{...}{other arguments (like \code{var}) passed on to
\code{\link{rmse_cv}}}
\item{...}{other arguments (like \code{var}) passed on to
\code{\link{rmse_cv}}}

\item{tol}{numerical tolerance, defaults to 1\%.}
\item{tol}{numerical tolerance, defaults to 1\%.}

\item{control}{additional control parameters passed on to
\code{\link{optim}} The most useful argument is probably
trace, which makes it possible to follow the progress of
the optimisation.}
\item{control}{additional control parameters passed on to \code{\link{optim}}
The most useful argument is probably trace, which makes it possible to
follow the progress of the optimisation.}
}
\value{
a single numeric value representing the bandwidth that
minimises the leave-one-out estimate of rmse. Vector has
attributes \code{evaluations} giving the number of times
the objective function was evaluated. If the optimisation
does not converge, or smoothing is not needed (i.e. the
estimate is on the lower bounds), a warning is thrown.
a single numeric value representing the bandwidth that minimises
the leave-one-out estimate of rmse. Vector has attributes
\code{evaluations} giving the number of times the objective function
was evaluated. If the optimisation does not converge, or smoothing is not
needed (i.e. the estimate is on the lower bounds), a warning is thrown.
}
\description{
Minimises the leave-one-out estimate of root mean-squared
error to find find the "optimal" bandwidth for smoothing.
Minimises the leave-one-out estimate of root mean-squared error to find
find the "optimal" bandwidth for smoothing.
}
\details{
L-BFGS-B optimisation is used to constrain the bandwidths
to be greater than the binwidths: if the bandwidth is
smaller than the binwidth it's impossible to compute the
rmse because no smoothing occurs. The tolerance is set
relatively high for numerical optimisation since the
precise choice of bandwidth makes little difference
visually, and we're unlikely to have sufficient data to
make a statistically significant choice anyway.
L-BFGS-B optimisation is used to constrain the bandwidths to be greater
than the binwidths: if the bandwidth is smaller than the binwidth it's
impossible to compute the rmse because no smoothing occurs. The tolerance
is set relatively high for numerical optimisation since the precise choice
of bandwidth makes little difference visually, and we're unlikely to have
sufficient data to make a statistically significant choice anyway.
}
\examples{
x <- rchallenge(1e4)
Expand All @@ -56,8 +52,7 @@ autoplot(smooth(xsum, h))
}
}
\seealso{
Other bandwidth estimation functions:
\code{\link{h_grid}}, \code{\link{rmse_cv}},
\code{\link{rmse_cvs}}
Other bandwidth estimation functions: \code{\link{h_grid}};
\code{\link{rmse_cv}}, \code{\link{rmse_cvs}}
}

4 changes: 3 additions & 1 deletion man/bigvis.Rd
@@ -1,9 +1,11 @@
% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/bigvis.r
\docType{package}
\name{bigvis}
\alias{bigvis}
\alias{bigvis-package}
\title{The big vis package.}
\description{
The big vis package.
The big vis package.
}

32 changes: 15 additions & 17 deletions man/bin.Rd
@@ -1,33 +1,31 @@
% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/bin.r
\name{bin}
\alias{bin}
\title{Create a binned variable.}
\usage{
bin(x, width = find_width(x),
origin = find_origin(x, width), name = NULL)
bin(x, width = find_width(x), origin = find_origin(x, width), name = NULL)
}
\arguments{
\item{x}{numeric or integer vector}
\item{x}{numeric or integer vector}

\item{width}{bin width. If not specified, about 10,000
bins will be chosen using the algorithim in
\code{\link{find_width}}.}
\item{width}{bin width. If not specified, about 10,000 bins will be chosen
using the algorithim in \code{\link{find_width}}.}

\item{origin}{origin. If not specified, guessed by
\code{\link{find_origin}}.}
\item{origin}{origin. If not specified, guessed by \code{\link{find_origin}}.}

\item{name}{name of original variable. This will be
guessed from the input to \code{group} if not supplied.
Used in the output of \code{\link{condense}} etc.}
\item{name}{name of original variable. This will be guessed from the input to
\code{group} if not supplied. Used in the output of
\code{\link{condense}} etc.}
}
\description{
Create a binned variable.
Create a binned variable.
}
\details{
This function produces an R reference class that wraps
around a C++ function. Generally, you should just treat
this as an opaque object with reference semantics, and
you shouldn't call the methods on it - pass it to
\code{\link{condense}} and friends.
This function produces an R reference class that wraps around a C++ function.
Generally, you should just treat this as an opaque object with reference
semantics, and you shouldn't call the methods on it - pass it to
\code{\link{condense}} and friends.
}
\examples{
x <- runif(1e6)
Expand Down
20 changes: 10 additions & 10 deletions man/breaks.Rd
@@ -1,25 +1,25 @@
% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/breaks.r
\name{breaks}
\alias{breaks}
\title{Compute breaks given origin and width.}
\usage{
breaks(x, binwidth, origin = min(x))
breaks(x, binwidth, origin = min(x))
}
\arguments{
\item{x}{numeric vector}
\item{x}{numeric vector}

\item{origin}{bin origin}
\item{binwidth}{bin width}

\item{binwidth}{bin width}
\item{origin}{bin origin}
}
\description{
Breaks are right-open, left-closed [x, y), so if
\code{max(x)} is an integer multiple of binwidth, then we
need one more break. This function only returns the
left-side of the breaks.
Breaks are right-open, left-closed [x, y), so if \code{max(x)} is an integer
multiple of binwidth, then we need one more break. This function only returns
the left-side of the breaks.
}
\details{
The first break is special, because it always contains
missing values.
The first break is special, because it always contains missing values.
}
\examples{
breaks(10, origin = 0, binwidth = 1)
Expand Down
33 changes: 16 additions & 17 deletions man/condense.Rd
@@ -1,32 +1,31 @@
% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/condense.r
\name{condense}
\alias{condense}
\title{Efficient binned summaries.}
\usage{
condense(..., z = NULL, summary = NULL, w = NULL,
drop = NULL)
condense(..., z = NULL, summary = NULL, w = NULL, drop = NULL)
}
\arguments{
\item{...}{group objects created by \code{\link{bin}}}
\item{...}{group objects created by \code{\link{bin}}}

\item{z}{a numeric vector to summary for each group.
Optional for some summary statistics.}
\item{z}{a numeric vector to summary for each group. Optional for some
summary statistics.}

\item{summary}{the summary statistic to use. Currently
must be one of count, sum, mean, median or sd. If
\code{NULL}, defaults to mean if y is present, count if
not.}
\item{summary}{the summary statistic to use. Currently must be one of
count, sum, mean, median or sd. If \code{NULL}, defaults to mean if
y is present, count if not.}

\item{w}{a vector of weights. Not currently supported by
all summary functions.}
\item{w}{a vector of weights. Not currently supported by all summary
functions.}

\item{drop}{if \code{TRUE} only locations with data will
be returned. This is more efficient if the data is very
sparse (<1\% of cells filled), and is slightly less
efficient. Defaults to \code{TRUE} if you are condensing
over two or more dimensions, \code{FALSE} for 1d.}
\item{drop}{if \code{TRUE} only locations with data will be returned. This
is more efficient if the data is very sparse (<1\% of cells filled), and
is slightly less efficient. Defaults to \code{TRUE} if you are condensing
over two or more dimensions, \code{FALSE} for 1d.}
}
\description{
Efficient binned summaries.
Efficient binned summaries.
}
\examples{
x <- runif(1e5)
Expand Down
34 changes: 18 additions & 16 deletions man/condensed.Rd
@@ -1,35 +1,37 @@
% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/condensed.r
\name{condensed}
\alias{as.condensed}
\alias{condensed}
\alias{is.condensed}
\title{Condensed: an S3 class for condensed summaries.}
\usage{
condensed(groups, grouped, summary)
condensed(groups, grouped, summary)

is.condensed(x)
is.condensed(x)

as.condensed(x)
as.condensed(x)
}
\arguments{
\item{groups}{list of \code{\link{bin}}ed objects}
\item{groups}{list of \code{\link{bin}}ed objects}

\item{grouped,summary}{output from C++ condense function}
\item{grouped,summary}{output from C++ condense function}

\item{x}{object to test or coerce}
\item{x}{object to test or coerce}
}
\description{
This object managed the properties of condensed
(summarised) data frames.
This object managed the properties of condensed (summarised) data frames.
}
\section{S3 methods}{
Mathematical functions with methods for \code{binsum}
object will modify the x column of the data frame and
\code{\link{rebin}} the data, calculating updated summary
statistics.

Currently methods are provided for the \code{Math} group
generic, logical comparison and arithmetic operators, and
\code{\link[plyr]{round_any}}.


Mathematical functions with methods for \code{binsum} object will modify
the x column of the data frame and \code{\link{rebin}} the data, calculating
updated summary statistics.

Currently methods are provided for the \code{Math} group generic,
logical comparison and arithmetic operators, and
\code{\link[plyr]{round_any}}.
}
\examples{
if (require("ggplot2")) {
Expand Down
22 changes: 11 additions & 11 deletions man/dchallenge.Rd
@@ -1,26 +1,26 @@
% Generated by roxygen2 (4.1.1): do not edit by hand
% Please edit documentation in R/challenge.r
\name{dchallenge}
\alias{dchallenge}
\alias{rchallenge}
\title{Density and random number generation functions for a challenging
distribution.}
\usage{
dchallenge(x)
dchallenge(x)

rchallenge(n)
rchallenge(n)
}
\arguments{
\item{x}{values to evaluate pdf at}
\item{x}{values to evaluate pdf at}

\item{n}{number of random samples to generate}
\item{n}{number of random samples to generate}
}
\description{
This is a 1/3-2/3 mixture of a t-distribution with 2
degrees of freedom centered at 15 and scaled by 2, and a
gamma distribution with shape 2 and rate 1/3. (The
t-distribution is windsorised at 0, but this has
negligible effect.) This distribution is challenging
because it mixes heavy tailed and asymmetric
distributions.
This is a 1/3-2/3 mixture of a t-distribution with 2 degrees of freedom
centered at 15 and scaled by 2, and a gamma distribution with shape 2
and rate 1/3. (The t-distribution is windsorised at 0, but this
has negligible effect.) This distribution is challenging because it
mixes heavy tailed and asymmetric distributions.
}
\examples{
plot(dchallenge, xlim = c(-5, 60), n = 500)
Expand Down

0 comments on commit 650d732

Please sign in to comment.