# <font color='red'>Chapter 8: Non-Probability Sampling</font>

## <font color='green'>Introduction to Non-Probability Sampling</font>

Non-probability sampling is a technique where samples are drawn from the population without relying on random selection. These methods are often used in exploratory research, pilot studies, or when probability sampling is not feasible due to constraints such as time, cost, or accessibility.

Unlike probability sampling, these methods do not allow for precise calculation of sampling error, which can result in biased and unrepresentative findings. However, they are still widely used in fields such as market research, qualitative studies, and in situations where representativeness is not the primary concern.


## <font color='red'>8.1 Types of Non-Probability Sampling</font>

### <font color='green'>8.1.1 Convenience Sampling</font>
- **Definition**: Selecting individuals who are easiest to access or are readily available.
- **Example**: Surveying people in a shopping mall to understand consumer preferences.
- **Advantages**:
  - Quick and inexpensive.
  - Useful for exploratory research.
- **Disadvantages**:
  - High risk of bias.
  - Findings are not generalizable to the population.

---

### <font color='green'>8.1.2 Quota Sampling</font>
- **Definition**: Dividing the population into categories (quotas) and sampling a specific number of participants from each category.
- **Example**: Ensuring a survey includes 50% men and 50% women, regardless of how they are selected.
- **Advantages**:
  - Ensures representation of key groups.
  - Allows for targeted research.
- **Disadvantages**:
  - Sampling within quotas is not random.
  - Risk of selection bias.

---

### <font color='green'>8.1.3 Snowball Sampling</font>
- **Definition**: Recruiting participants through referrals from initial subjects, often used for hard-to-reach populations.
- **Example**: Identifying participants in a study on underground musicians through personal networks.
- **Advantages**:
  - Effective for niche or hidden populations.
  - Builds trust among participants.
- **Disadvantages**:
  - Highly dependent on initial participants.
  - May not be representative of the broader population.


## <font color='red'>8.2 Applications and Limitations</font>

### <font color='green'>8.2.1 Applications</font>
- **Exploratory Research**: Used when there is little prior information about the subject.
- **Qualitative Studies**: Often employed in studies focusing on understanding behaviors, motivations, and experiences.
- **Limited Resources**: Used when time or budget constraints make probability sampling impractical.

---

### <font color='green'>8.2.2 Limitations</font>
- **Bias**: Non-random selection introduces bias, making it hard to generalize findings.
- **Lack of Representativeness**: Results may not reflect the broader population.
- **No Sampling Error Measurement**: It is impossible to quantify the error or confidence intervals associated with the results.


## <font color='red'>8.3 Bias and Lack of Representativeness</font>

Non-probability sampling methods are particularly vulnerable to bias. This bias arises from the non-random nature of the selection process, leading to overrepresentation or underrepresentation of certain groups. For example:
- **Convenience Sampling**: May overrepresent people who are more available or willing to participate.
- **Quota Sampling**: Relies heavily on the judgment of the researcher, which can skew results.
- **Snowball Sampling**: Can lead to homogeneity in the sample, as participants refer others similar to themselves.

Mitigating bias involves transparency in sampling methods and acknowledging limitations in study results. Researchers should be cautious about drawing broad conclusions from studies based on non-probability sampling.


## <font color='red'>Conclusion</font>

While non-probability sampling lacks the robustness of randomization, it serves an important role in research when resources are limited or when targeting specific populations. By understanding the strengths and limitations of these methods, researchers can use them effectively and appropriately.


# <font color='red'>Examples</font> 

## <font color='green'>Example 1: Convenience Sampling</font>

### <font color='blue'>Problem</font>
A researcher at a university wants to estimate the average number of hours students spend studying per week. They survey 20 students in the library. The results are as follows:  
$ [15, 20, 18, 12, 16, 22, 14, 17, 19, 21, 13, 20, 18, 16, 15, 19, 14, 17, 21, 20] $

### Tasks:
1. Calculate the sample mean.
2. Calculate the sample variance.

---

## <font color='green'>Solution</font>

### 1. Sample Mean
The sample mean is calculated as:  
$$ \bar{y} = \frac{\sum y_i}{n} $$  
Substituting the values:  
$$ \bar{y} = \frac{15 + 20 + 18 + ... + 20}{20} = 17.65 $$

---

### 2. Sample Variance
The sample variance is calculated as:  
$$ S^2 = \frac{\sum (y_i - \bar{y})^2}{n - 1} $$  
Substituting the values:  
$$ S^2 = \frac{(15 - 17.65)^2 + (20 - 17.65)^2 + ...}{19} = 6.72 $$


## <font color='green'>Example 2: Quota Sampling</font>

### <font color='blue'>Problem</font>
A marketing firm needs to survey 100 people, ensuring equal representation of gender. They decide on a quota of 50 men and 50 women. The results for weekly spending on groceries (in $) are:  
- **Men**: $ [60, 65, 70, 55, 50, ..., 75] $ (mean = 62, variance = 20)  
- **Women**: $ [80, 85, 78, 82, 77, ..., 88] $ (mean = 81, variance = 25)

### Tasks:
1. Calculate the overall mean spending.
2. Calculate the overall variance.

---

## <font color='green'>Solution</font>

### 1. Overall Mean
The overall mean is calculated as:  
$$ \bar{y} = \frac{n_1 \cdot \bar{y}_1 + n_2 \cdot \bar{y}_2}{n} $$  
Substituting the values:  
$$ \bar{y} = \frac{50 \cdot 62 + 50 \cdot 81}{100} = 71.5 $$

---

### 2. Overall Variance
The overall variance is calculated as:  
$$ S^2 = \frac{n_1 \cdot S_1^2 + n_2 \cdot S_2^2}{n} $$  
Substituting the values:  
$$ S^2 = \frac{50 \cdot 20 + 50 \cdot 25}{100} = 22.5 $$


## <font color='green'>Example 3: Snowball Sampling</font>

### <font color='blue'>Problem</font>
A researcher is studying a hidden population of freelancers in a city. They start with 5 known freelancers who each refer 2 others. The researcher records their weekly working hours:  
$ [40, 35, 50, 45, 30, 42, 38, 48, 36, 41, 37, 49, 44, 39, 47] $

### Tasks:
1. Calculate the sample mean.
2. Estimate the population total, assuming 500 freelancers in the city.

---

## <font color='green'>Solution</font>

### 1. Sample Mean
The sample mean is calculated as:  
$$ \bar{y} = \frac{\sum y_i}{n} $$  
Substituting the values:  
$$ \bar{y} = \frac{40 + 35 + ... + 47}{15} = 41.8 $$

---

### 2. Population Total
The population total is estimated as:  
$$ T = N \cdot \bar{y} $$  
Substituting the values:  
$$ T = 500 \cdot 41.8 = 20,900 $$


# <font color='red'>Exercises</font> 

## <font color='green'>Exercise 1: Convenience Sampling</font>

### <font color='blue'>Problem</font>

A researcher at a shopping mall wants to understand the average amount customers spend in a single visit. They survey 20 customers at random times and record the following values (in $):  
$ [25, 30, 45, 50, 20, 35, 40, 55, 60, 50, 30, 40, 35, 45, 60, 20, 25, 50, 40, 30] $

### Tasks:
1. Calculate the sample mean.
2. Calculate the sample variance.

---

## <font color='green'>Exercise 1.2: Quota Sampling</font>

### <font color='blue'>Problem</font>

A company wants to survey 100 employees, ensuring proportional representation by department. The employee distribution is as follows:
- **HR**: 20 employees (average satisfaction score = 8.2, variance = 0.5)
- **IT**: 50 employees (average satisfaction score = 7.5, variance = 0.7)
- **Sales**: 30 employees (average satisfaction score = 8.8, variance = 0.4)

### Tasks:
1. Calculate the overall average satisfaction score.
2. Calculate the overall variance.


In [23]:
# Solution

In [25]:
# Solution

## <font color='green'>Exercise 3: Snowball Sampling</font>

### <font color='blue'>Problem</font>

A researcher is studying a hard-to-reach population of freelance writers. They start with 10 known writers, each referring 2 more writers. The researcher collects data on their weekly working hours:  
$ [40, 35, 50, 45, 30, 42, 38, 48, 36, 41, 37, 49, 44, 39, 47, 43, 46, 34, 33, 45] $

### Tasks:
1. Calculate the sample mean.
2. Estimate the total working hours for the entire population, assuming there are 500 freelance writers in the city.
