# <font color='red'>Criteria for Designing a Sampling Plan</font> 

A well-designed sampling plan (delineamento amostral) is critical for ensuring that the data collected in a study can be reliably analyzed with appropriate statistical methods. This involves defining clear objectives, selecting suitable sampling methods, and understanding potential sources of error so that results are both valid and meaningful.


## <font color='red'>Defining the Sampling Design</font> 

The first step in creating a sampling plan is to set the study's objectives and choose how to select participants or units from the population. By doing so, it’s possible to:

- Estimate experimental error.
- Determine the best selection procedure for meaningful results.
- Enhance the precision and reliability of the findings.

Good domain knowledge (e.g., about the population’s characteristics or environmental conditions) helps guide the choice of sampling strategy and ensures the sample is representative.


## <font color='red'>Probability Sampling Techniques</font> 

Probability sampling ensures each element of the population has a known, nonzero probability of selection. This framework allows for quantifying sampling error and making valid inferences.

### 1. Simple Random Sampling (AAS)
- Every unit has an equal probability of being chosen.
- Straightforward method but may not always be the most efficient.

### 2. Stratified Sampling (AE)
- The population is divided into strata (e.g., by age or income level).
- Independent random samples are drawn from each stratum.
- Improves precision and ensures representation of key subgroups.

### 3. Cluster (Conglomerate) Sampling
- The population is divided into clusters (e.g., neighborhoods, schools).
- A random selection of clusters is chosen, and then all or a subset of elements within those clusters are surveyed.
- Useful for large or geographically dispersed populations.

### 4. Systematic Sampling
- Units are selected at regular intervals from an ordered list.
- Often simpler to implement operationally, but care must be taken to avoid hidden patterns that could bias the sample.


## <font color='red'> Sampling Error vs. Nonsampling Error</font> 

### Sampling Error
- Occurs because only a portion of the population is measured.
- Can be reduced by choosing more appropriate sampling methods, increasing sample size, or improving design quality.

### Nonsampling Error
- Arises from factors other than the sampling process (e.g., measurement errors, incorrect data, nonresponse).
- Requires careful questionnaire design, data validation, and execution strategies.
- More challenging to quantify and control, but crucial to address for data quality.


## <font color='red'>Unified Sampling Theory Concepts</font> 

The text introduces the idea that various sampling designs can be expressed in terms of probabilities over sets of samples. By assigning different probability distributions to different sample sets, one can create multiple “plans” that define how samples are selected and weighted.

This framework:
- Clarifies the relationships between sample selection probabilities and the resulting estimates.
- Shows that there can be infinitely many sampling plans, each with its own probability structure and implications.


## <font color='red'>Unified Sampling Theory Concepts</font> 

Sampling theory can be understood in a unifying framework where each sampling design corresponds to a probability distribution over all possible subsets of the population. By defining a set of rules that determine which samples are more or less likely, we can compare and contrast different sampling plans in a common theoretical language.

- **Foundational Idea:**  
  Any sampling procedure can be viewed as assigning probabilities to every conceivable sample. For example, in Simple Random Sampling, each sample of a given size is equally likely; in Stratified Sampling, the probability structure is more complex due to partitioning the population into strata.

- **Infinite Variety of Designs:**  
  Because there are infinitely many ways to assign probabilities over the space of possible samples, the universe of potential sampling designs is vast. Each design has its own properties, trade-offs, and theoretical implications.

- **Comparing Designs:**  
  By expressing different sampling schemes as probability distributions, researchers can:
  - Evaluate and compare efficiency, bias, and variance of estimators across designs.
  - Identify which design best balances practical constraints (cost, time) with statistical rigor.
  - Develop hybrid or more complex designs that combine desirable features of simpler methods.

Understanding this unified perspective helps researchers move beyond ad-hoc approaches to sampling, enabling them to reason systematically about how changes in the selection process affect the quality and reliability of the resulting estimates.


## <font color='red'>Parameters, Statistics, and Estimators</font> 

When dealing with statistical inference, parameters describe population-level characteristics, while statistics are computed from sample data. Estimators are the formulas or rules that use these sample-based statistics to infer the unknown parameters.

### Population Parameters

- **Population Mean (μ):**  
  The average value of a characteristic across all \(N\) units in the population.  
  $$
  \mu = \frac{1}{N} \sum_{i=1}^{N} y_i
  $$
  
- **Population Total (T):**  
  The sum of all values in the population.  
  $$
  T = \sum_{i=1}^{N} y_i
  $$

- **Population Proportion (P):**  
  The fraction of the population possessing a certain attribute, where \( z_i \) is an indicator variable (1 if the unit has the attribute, 0 otherwise).  
  $$
  P = \frac{1}{N} \sum_{i=1}^{N} z_i
  $$

- **Population Variance (σ²):**  
  A measure of the spread of values around the mean:  
  $$
  \sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (y_i - \mu)^2
  $$

### Sample Statistics

When we observe a sample of \(n\) units from the population, we compute analogous quantities:

- **Sample Mean ( \(\hat{\mu}\) or \( \bar{y} \) ):**  
  $$
  \bar{y} = \frac{1}{n} \sum_{j=1}^{n} y_j
  $$
  
- **Sample Total ( \(\hat{T}\) ):**  
  $$
  \hat{T} = \sum_{j=1}^{n} y_j
  $$

- **Sample Proportion ( \(\hat{P}\) ):**  
  $$
  \hat{P} = \frac{1}{n} \sum_{j=1}^{n} z_j
  $$

- **Sample Variance ( \( s^2 \) ):**  
  $$
  s^2 = \frac{1}{n-1} \sum_{j=1}^{n} (y_j - \bar{y})^2
  $$

### Estimators

Estimators are functions of the sample data that provide our “best guess” of the population parameters. For example, \(\bar{y}\) is an estimator of \(\mu\), and \(\hat{P}\) is an estimator of \(P\).

A “good” estimator has desirable theoretical properties:

- **Unbiasedness:** The expected value of the estimator equals the true parameter.
- **Consistency:** The estimator converges to the true parameter as \( n \to \infty \).
- **Efficiency:** Among unbiased estimators, the one with the smallest variance is considered better.

### Mean Squared Error (MSE)

The MSE of an estimator \(\hat{\theta}\) for a parameter \(\theta\) is defined as:
$$
\text{MSE}(\hat{\theta}) = E[(\hat{\theta} - \theta)^2] = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2
$$

Minimizing MSE leads to more reliable estimations, balancing both the variance and the bias of the estimator.

In summary, parameters characterize the unknown aspects of the full population, statistics summarize what we see in a sample, and estimators are the tools that let us use those statistics to infer the underlying population parameters.


### <font color='green'>Exercise 1</font> 

Given a population of size \(N\), the population mean \(\mu\) is defined as:
$$
\mu = \frac{1}{N}\sum_{i=1}^{N}y_i
$$

If we draw a simple random sample of size \(n\) and compute the sample mean:
$$
\bar{y} = \frac{1}{n}\sum_{j=1}^{n}y_j
$$

Which of the following statements is correct regarding \(\bar{y}\) as an estimator of \(\mu\)?

A. $\bar{y}$ is an unbiased estimator of $\mu$.  
B. $\bar{y}$ is always greater than $\mu$.  
C. $\bar{y}$ can never equal $\mu$.  
D. $\bar{y}$ decreases as $n$ increases.


**Answer:** A

**Explanation:**
The sample mean is an unbiased estimator of the population mean:
$$
E(\bar{y}) = \mu.
$$

There’s no guarantee that $\bar{y}$ is always greater than $\mu$ or can never equal it, and it does not systematically decrease as $n$ increases. Instead, as $n$ grows, $\bar{y}$ tends to get closer to $\mu$, maintaining its unbiasedness.


### <font color='green'>Exercise 2</font> 

The population proportion \( P \) is:
$$
P = \frac{1}{N}\sum_{i=1}^{N}z_i
$$

If we take a random sample of size \( n \) and compute:
$$
\hat{P} = \frac{1}{n}\sum_{j=1}^{n} z_j
$$

Which of the following is true?

A. $\hat{P}$ is an unbiased estimator of $P$.  
B. $\hat{P}$ has a positive bias.  
C. $\hat{P}$ is always equal to $P$.  
D. $\hat{P}$ decreases as $n$ increases.


**Answer:** A

**Explanation:**
The expected value of \(\hat{P}\) equals \(P\):
$$
E(\hat{P}) = P.
$$

Thus, $\hat{P}$ is an unbiased estimator. It does not always equal $P$ for any given sample, nor is it systematically larger, smaller, or dependent on $n$ in that manner.


### <font color='green'>Exercise 3</font> 

The mean squared error (MSE) of an estimator $\hat{\theta}$ for a parameter $\theta$ is:
$$
\text{MSE}(\hat{\theta}) = E[(\hat{\theta}-\theta)^2] = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2
$$

If an estimator has zero bias, which statement best describes how to minimize its MSE?

A. Increase its bias.  
B. Decrease its variance.  
C. Increase the population size $N$.  
D. Decrease the parameter $\theta$.


**Answer:** B

**Explanation:**
If $\text{Bias}(\hat{\theta})=0$, then:
$$
\text{MSE}(\hat{\theta}) = \text{Var}(\hat{\theta}).
$$

To minimize MSE, we must reduce the variance. Adjusting bias, population size, or parameter values does not directly address MSE in this scenario.


### <font color='green'>Exercise 4</font>

Consider two unbiased estimators for the same parameter $\theta$. How would you choose the better one?

A. Pick the estimator that is easier to compute.  
B. Pick the estimator with the lower MSE.  
C. Pick the estimator with fewer decimal places in its result.  
D. Pick the estimator that depends on fewer sample values.


**Answer:** B

**Explanation:**
For unbiased estimators, minimizing MSE reduces to minimizing variance since bias is zero. The estimator with the lower variance (and thus lower MSE) provides more precise and reliable estimates of \(\theta\).
