Skip to content

Commit

Permalink
Version 0.5.1
Browse files Browse the repository at this point in the history
  • Loading branch information
marberts committed Jun 9, 2023
1 parent f25db1b commit 4e103af
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 19 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: sps
Title: Sequential Poisson Sampling
Version: 0.5.0.9002
Version: 0.5.1
Authors@R: c(
person("Steve", "Martin", role = c("aut", "cre", "cph"), email = "stevemartin041@gmail.com", comment = c(ORCID = "0000-0003-2544-9480")),
person("Justin", "Francis", role = "ctb")
Expand Down
2 changes: 1 addition & 1 deletion man/sps-package.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Sequential Poisson sampling is a variation of Poisson sampling for drawing proba
}

\section{Usage}{
Given a vector of sizes for units in a population (e.g., revenue for sampling businesses) and a desired sample size, a stratified sequential Poisson sample can be drawn with the \code{\link[=sps]{sps()}} function. Allocations are often proportional to size when drawing such samples, and the \code{\link[=prop_allocation]{prop_allocation()}} function provides a variety of methods for generating proportional-to-size allocations. Once the sample is drawn, the design weights for the sample can then be used to generate bootstrap replicate weights with the \code{\link[=sps_repweights]{sps_repweights()}} function.
Given a vector of sizes for units in a population (e.g., revenue for sampling businesses) and a desired sample size, a stratified sequential Poisson sample can be drawn with the \code{\link[=sps]{sps()}} function. Allocations are often proportional to size when drawing such samples, and the \code{\link[=prop_allocation]{prop_allocation()}} function provides a variety of methods for generating proportional-to-size allocations. Once the sample is drawn, the design weights for the sample can then be used to generate bootstrap replicate weights with the \code{\link[=sps_repweights]{sps_repweights()}} function. The vignette gives an extended example of this workflow: \code{vignette("sps")}.

Sequential Poisson sampling is often used to sample data for price indexes. Balk (2008, chapter 5) discusses the construction of price indexes when data are sampled using probability-proportional-to-size methods, and their resulting statistical properties. The CPI manual (2020, chapter 4) describes other methods for sampling price data. Tillé (2020, chapter 5) gives a practical overview of different probability-proportional-to-size sampling methods; compared to existing implementations of several of these methods (e.g., Brewer, Sampford, maximum entropy), however, sequential Poisson sampling is relatively fast for larger frames.
}
Expand Down
34 changes: 17 additions & 17 deletions vignettes/sps.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Drawing a Sequential Poisson Sample"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{sps}
%\VignetteIndexEntry{Drawing a Sequential Poisson Sample}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
Expand All @@ -14,7 +14,7 @@ knitr::opts_chunk$set(
)
```

Sequential Poisson sampling is a variation of Poisson sampling for drawing probability-proportional-to-size samples with a given number of units. It's a fast and simple method for drawing probability-proportional-to-size samples, and is often used for sampling businesses. The purpose of this vignette is to give a simple of example of how the functions in this package can be used to easily draw a sample using the sequential Poisson method.
Sequential Poisson sampling is a variation of Poisson sampling for drawing probability-proportional-to-size samples with a given number of units. It's a fast, simple, and flexible method for sampling units proportional to their size, and is often used for drawing a sample of businesses. The purpose of this vignette is to give an example of how the functions in this package can be used to easily draw a sample using the sequential Poisson method. More details can be found on the help pages the functions used in this vignette.

## Drawing a sample of businesses

Expand All @@ -32,7 +32,7 @@ frame <- data.frame(
head(frame)
```

Associated with each business is a value for their sales for the current quarter, although these values are not observable for all businesses. The purpose of drawing a sample is to observe sales for a subset of businesses, and extrapolate the value of sales for the sample of business to the entire population. Sales are positively correlated with last year's revenue, and this is the basis for sampling businesses proportional to revenue.
Associated with each business is a value for their sales for the current quarter, although these values are not observable for all businesses. The purpose of drawing a sample is to observe sales for a subset of businesses, and extrapolate the value of sales from the sample of business to the entire population. Sales are positively correlated with last year's revenue, and this is the basis for sampling businesses proportional to revenue.

```{r outcome}
sales <- round(frame$revenue * runif(1e3, 0.5, 2))
Expand All @@ -48,17 +48,17 @@ allocation
With the sample size for each region in hand, it's now time to draw a sample and observe the value of sales for these businesses. In practice this is usually the result of a survey that's administered to the sampled units.

```{r sample}
spsample <- with(frame, sps(revenue, allocation, region))
sample <- with(frame, sps(revenue, allocation, region))
survey <- cbind(frame[spsample, ], sales = sales[spsample])
survey <- cbind(frame[sample, ], sales = sales[sample])
head(survey)
```

An important piece of information from the sampling process is the design weights, as these enable estimating the value of sales in the population with the usual Horvitz-Thompson estimator.

```{r weights}
survey$weight <- weights(spsample)
survey$weight <- weights(sample)
head(survey)
```
Expand All @@ -79,7 +79,7 @@ But in practice it's not possible to determine how far an estimate is from the t
A general approach for estimating the variance of the Horvitz-Thompson estimator is to construct bootstrap replicate weights from the design weights for the sample, compute a collection of estimates for the total based on these replicate weights, and then compute the variance of this collection of estimates.

```{r variance}
repweights <- sps_repweights(weights(spsample), tau = 2)
repweights <- sps_repweights(weights(sample), tau = 2)
var <- attr(repweights, "tau")^2 *
mean((colSums(survey$sales * repweights) - ht)^2)
Expand Down Expand Up @@ -121,35 +121,35 @@ Permanent random numbers can be used with methods other than sequential Poisson-
```{r prn samples}
pareto <- order_sampling(\(x) x / (1 - x))
spsample <- with(frame, sps(revenue, allocation, region, prn))
sample <- with(frame, sps(revenue, allocation, region, prn))
parsample <- with(frame, pareto(revenue, allocation, region, (prn - 0.5) %% 1))
length(intersect(spsample, parsample)) / 100
length(intersect(sample, parsample)) / 100
```

Although there is still a meaningful overlap between the units in both samples, this is roughly half of what would be expected without using permanent random numbers.

```{r prn simualtion}
replicate(1000, {
s <- with(frame, pareto(revenue, allocation, region))
length(intersect(spsample, s)) / 100
length(intersect(sample, s)) / 100
}) |>
summary()
```

## Topping up

The sequential nature of sequential Poisson sampling means that it's easy to grow a sample. Suppose that there is a need to sample 10 more businesses in region 1 after the sample is drawn. Simply adding 10 units to the allocation for region 1 results in a new sample that includes all the previously sampled units, so the extra units can be surveyed without discarding the previously-collected data or affecting the statistical properties of the sample.
The sequential part of sequential Poisson sampling means that it's easy to grow a sample. Suppose that there is a need to sample 10 more businesses in region 1 after the sample is drawn. Simply adding 10 units to the allocation for region 1 results in a new sample that includes all the previously sampled units, so the extra units can be surveyed without discarding the previously-collected data or affecting the statistical properties of the sample.

```{r top up}
set.seed(1234)
spsample <- with(frame, sps(revenue, allocation, region, prn))
sample <- with(frame, sps(revenue, allocation, region))
set.seed(1234)
spsample_tu <- with(frame, sps(revenue, allocation + c(10, 0, 0), region, prn))
sample_tu <- with(frame, sps(revenue, allocation + c(10, 0, 0), region))
frame[setdiff(spsample_tu, spsample), ]
all(sample %in% sample_tu)
```

## Bias in the Horvitz-Thompson estimator
Expand All @@ -158,9 +158,9 @@ Despite it's simplicity, sequential Poisson sampling is only asymptotically prop

```{r ht bias}
sampling_distribution <- replicate(1000, {
spsample <- with(frame, sps(revenue, allocation, region))
sum(sales[spsample] * weights(spsample))
sample <- with(frame, sps(revenue, allocation, region))
sum(sales[sample] * weights(sample))
})
summary(sampling_distribution / sum(sales) - 1)
```
```

0 comments on commit 4e103af

Please sign in to comment.