Simple Random Sampling:
When the population is homogeneous, and every individual has an equal chance of being selected.
Useful when you want unbiased estimates of the population.

In [1]:
import numpy as np
import pandas as pd

# Generate a population of 1000 individuals (e.g., with ages between 18 and 65)
population = np.random.randint(18, 65, size=1000)

# Simple Random Sampling: randomly select a sample of 100 individuals from the population
sample_size = 100
random_sample = np.random.choice(population, sample_size, replace=False)

# Show the random sample
print("Random Sample:", random_sample)


Random Sample: [56 59 24 26 32 36 28 36 31 26 48 64 61 39 46 25 40 55 40 55 19 50 36 21
 29 40 52 61 64 61 33 31 58 51 57 36 34 51 18 47 28 51 48 48 52 62 59 49
 38 41 18 32 55 48 18 62 47 45 36 19 31 55 46 36 20 47 57 29 33 62 41 28
 46 23 57 40 45 28 22 27 63 46 41 33 54 64 30 48 19 57 38 34 64 61 41 62
 61 50 31 25]


Stratified Sampling:
When you need to ensure that specific subgroups (e.g., age groups, income levels, or geographic regions) are adequately represented in the sample.

In [2]:
# Create a population with two strata (e.g., Males and Females)
males = np.random.randint(18, 65, size=500)  # 500 males
females = np.random.randint(18, 65, size=500)  # 500 females

# Stratified Sampling: Randomly select 50 males and 50 females
male_sample = np.random.choice(males, 50, replace=False)
female_sample = np.random.choice(females, 50, replace=False)

# Combine the male and female samples into one stratified sample
stratified_sample = np.concatenate((male_sample, female_sample))

# Show the stratified sample
print("Stratified Sample:", stratified_sample)

Stratified Sample: [27 62 60 53 18 42 59 57 36 33 29 58 48 54 37 22 57 58 32 63 43 46 30 58
 22 46 23 42 19 63 32 33 25 18 47 32 62 49 49 30 39 24 31 23 53 22 63 54
 58 54 55 58 38 47 47 64 58 57 23 21 58 39 27 33 57 33 28 19 36 32 21 44
 63 25 53 47 63 58 28 25 50 45 40 45 61 46 60 30 28 27 47 44 41 45 50 43
 39 38 36 46]


Systematic Sampling type of probability sampling method in which you select every nth member from a population list after choosing a random starting point. It is often used when it is difficult or impractical to randomly select each member of the population but still want to maintain some form of randomness and representativeness in the sample.

Key Features:
Simple to Implement: Systematic sampling is easy to apply when you have a large, ordered population.

Uniform Spread: This method ensures that samples are spread out across the entire population, making it less prone to clustering compared to simple random sampling.

Fixed Interval: The sampling interval (i.e., how often you select an individual) is fixed and based on the desired sample size.

How Systematic Sampling Works:
Define the population: Create a list or ordered set of all members of the population.

Choose a starting point: Randomly select a starting point between 1 and the sampling interval (k).

Select every nth individual: After the starting point, select every nth individual based on the calculated sampling interval.

Continue until the desired sample size is reached.

Sampling Interval (k) = Population size / Sample size

In [4]:
import numpy as np
import pandas as pd

# Create a sample population (e.g., 500 individuals)
population = np.arange(1, 501)  # Population: 500 individuals (1, 2, 3,... 500)

# Define sample size
sample_size = 50

# Calculate the sampling interval (k)
sampling_interval = len(population) // sample_size
print(f"Sampling Interval (k): {sampling_interval}")

# Choose a random starting point between 1 and the sampling interval
starting_point = np.random.randint(1, sampling_interval+1)
print(f"Random Starting Point: {starting_point}")

# Select every nth individual (Systematic Sampling)
sample = population[starting_point-1::sampling_interval]

# Show the sample
print(f"Systematic Sample: {sample}")


Sampling Interval (k): 10
Random Starting Point: 1
Systematic Sample: [  1  11  21  31  41  51  61  71  81  91 101 111 121 131 141 151 161 171
 181 191 201 211 221 231 241 251 261 271 281 291 301 311 321 331 341 351
 361 371 381 391 401 411 421 431 441 451 461 471 481 491]


Cluster Sampling is a type of probability sampling method where the population is divided into distinct groups or clusters, and a random sample of clusters is selected. Then, either all individuals within the selected clusters are surveyed (single-stage) or a random sample of individuals is chosen from each of the selected clusters (two-stage).

In [5]:
import numpy as np

# Create a population of 1000 individuals (grouped into 10 clusters)
population = np.arange(1, 1001)  # 1000 individuals
clusters = np.split(population, 10)  # Dividing population into 10 clusters

# Number of clusters to sample
num_clusters_to_sample = 3

# Randomly select 3 clusters
selected_clusters = np.random.choice(range(10), num_clusters_to_sample, replace=False)

# Retrieve all individuals from the selected clusters
sample = np.concatenate([clusters[i] for i in selected_clusters])

# Show the selected sample
print(f"Selected Clusters: {selected_clusters}")
print(f"Cluster Sample: {sample}")


Selected Clusters: [2 1 9]
Cluster Sample: [ 201  202  203  204  205  206  207  208  209  210  211  212  213  214
  215  216  217  218  219  220  221  222  223  224  225  226  227  228
  229  230  231  232  233  234  235  236  237  238  239  240  241  242
  243  244  245  246  247  248  249  250  251  252  253  254  255  256
  257  258  259  260  261  262  263  264  265  266  267  268  269  270
  271  272  273  274  275  276  277  278  279  280  281  282  283  284
  285  286  287  288  289  290  291  292  293  294  295  296  297  298
  299  300  101  102  103  104  105  106  107  108  109  110  111  112
  113  114  115  116  117  118  119  120  121  122  123  124  125  126
  127  128  129  130  131  132  133  134  135  136  137  138  139  140
  141  142  143  144  145  146  147  148  149  150  151  152  153  154
  155  156  157  158  159  160  161  162  163  164  165  166  167  168
  169  170  171  172  173  174  175  176  177  178  179  180  181  182
  183  184  185  186  187  188  18

Multistage Sampling

Multistage Sampling is a sampling technique that combines multiple sampling methods in a sequential manner. It is typically used when a population is large, geographically dispersed, or difficult to access, and other sampling techniques 

Example:
Consider you want to conduct a survey to understand the educational quality of students across a country.

Stage 1: Divide the country into 10 regions.

Stage 2: Randomly select 3 regions out of the 10.

Stage 3: In each selected region, randomly choose 2 cities or districts.

Stage 4: In each city, randomly select 5 schools.

Stage 5: Finally, within each selected school, randomly choose 10 students.

This approach uses a combination of cluster sampling (for regions, cities, and schools) and simple random sampling (for students within schools).

In [6]:
import numpy as np
import random

# Stage 1: Divide the country into 5 regions
regions = ["Region 1", "Region 2", "Region 3", "Region 4", "Region 5"]

# Stage 2: Randomly select 2 regions
selected_regions = random.sample(regions, 2)
print(f"Selected Regions: {selected_regions}")

# Stage 3: For each selected region, randomly select 2 cities
cities_in_regions = {
    "Region 1": ["City 1A", "City 1B", "City 1C"],
    "Region 2": ["City 2A", "City 2B", "City 2C"],
    "Region 3": ["City 3A", "City 3B", "City 3C"],
    "Region 4": ["City 4A", "City 4B", "City 4C"],
    "Region 5": ["City 5A", "City 5B", "City 5C"]
}

selected_cities = {region: random.sample(cities, 2) for region, cities in cities_in_regions.items() if region in selected_regions}
print(f"Selected Cities from each region: {selected_cities}")

# Stage 4: For each selected city, randomly select 2 schools
schools_in_cities = {
    "City 1A": ["School A1", "School A2", "School A3"],
    "City 1B": ["School B1", "School B2", "School B3"],
    "City 1C": ["School C1", "School C2", "School C3"],
    "City 2A": ["School D1", "School D2", "School D3"],
    "City 2B": ["School E1", "School E2", "School E3"],
    "City 2C": ["School F1", "School F2", "School F3"],
    "City 3A": ["School G1", "School G2", "School G3"],
    "City 3B": ["School H1", "School H2", "School H3"],
    "City 3C": ["School I1", "School I2", "School I3"],
    "City 4A": ["School J1", "School J2", "School J3"],
    "City 4B": ["School K1", "School K2", "School K3"],
    "City 4C": ["School L1", "School L2", "School L3"],
    "City 5A": ["School M1", "School M2", "School M3"],
    "City 5B": ["School N1", "School N2", "School N3"],
    "City 5C": ["School O1", "School O2", "School O3"]
}

selected_schools = {city: random.sample(schools, 2) for region in selected_regions for city, schools in cities_in_regions.items() if city in selected_cities[region]}
print(f"Selected Schools: {selected_schools}")

# Stage 5: From each selected school, randomly select 3 students
students_in_schools = {
    "School A1": ["Student 1", "Student 2", "Student 3", "Student 4", "Student 5"],
    "School A2": ["Student 6", "Student 7", "Student 8", "Student 9", "Student 10"],
    "School A3": ["Student 11", "Student 12", "Student 13", "Student 14", "Student 15"],
    "School B1": ["Student 16", "Student 17", "Student 18", "Student 19", "Student 20"],
    "School B2": ["Student 21", "Student 22", "Student 23", "Student 24", "Student 25"],
    # ... Continue similarly for all other schools
}

final_sample = {school: random.sample(students, 3) for school, students in students_in_schools.items() if school in selected_schools}
print(f"Final Sample of Students: {final_sample}")


Selected Regions: ['Region 4', 'Region 5']
Selected Cities from each region: {'Region 4': ['City 4A', 'City 4B'], 'Region 5': ['City 5A', 'City 5B']}
Selected Schools: {}
Final Sample of Students: {}


Convenience Sampling: Overview
Convenience Sampling is a non-probability sampling technique where individuals are selected based on their easy accessibility or convenience. In this method, researchers choose subjects who are easiest to reach or who are readily available, rather than using random selection or a more structured sampling approach. It's the simplest and least expensive method of sampling.

In [7]:
import numpy as np

# Create a population of 100 students
students = [f"Student {i}" for i in range(1, 101)]

# Perform Convenience Sampling by selecting the first 10 students (easy access)
sampled_students = students[:10]

# Show the sample
print("Convenience Sample:", sampled_students)


Convenience Sample: ['Student 1', 'Student 2', 'Student 3', 'Student 4', 'Student 5', 'Student 6', 'Student 7', 'Student 8', 'Student 9', 'Student 10']


Judgmental Sampling, also known as Purposive Sampling or Selective Sampling, is a non-probability sampling technique where the researcher uses their judgment to select specific individuals or units from the population that are deemed to be representative or informative. 

In [8]:
# Sample population of employees with their skill sets
employees = [
    {"name": "Alice", "skills": ["Python", "Machine Learning"]},
    {"name": "Bob", "skills": ["Marketing", "Sales"]},
    {"name": "Charlie", "skills": ["Data Science", "Python"]},
    {"name": "David", "skills": ["Python", "AI"]},
    {"name": "Eve", "skills": ["Project Management", "Agile"]},
]

# Judgmental Sampling: Select employees with Python skills for a focus group
selected_employees = [employee for employee in employees if "Python" in employee["skills"]]

# Show selected employees
print("Selected Employees for Focus Group:")
for employee in selected_employees:
    print(f"- {employee['name']}")


Selected Employees for Focus Group:
- Alice
- Charlie
- David


Snowball Sampling:
    Instead of randomly selecting participants, it starts by approaching a few known members of the community and ask them to refer others who participate in the same online platform. As more members refer others, sample grows.

In [10]:
import random

# Seed participants in the community
seed_participants = ["Alice", "Bob", "Charlie", "David", "Eve"]

# Create a function to simulate snowball sampling
def snowball_sampling(seeds, num_referrals=3, max_rounds=2):
    sample = set(seeds)  # To keep track of unique participants
    round_num = 0
    while round_num < max_rounds:
        new_participants = []
        for participant in sample:
            # Simulate that each participant refers 'num_referrals' new participants
            referrals = [f"{participant}_{i}" for i in range(1, num_referrals + 1)]
            new_participants.extend(referrals)
        
        # Add new participants to the sample
        sample.update(new_participants)
        round_num += 1
        
    return sample

# Simulate snowball sampling starting with seed participants
final_sample = snowball_sampling(seed_participants)

# Show the final sample
print("Final Sample of Participants (After Snowball Sampling):")
print(final_sample)


Final Sample of Participants (After Snowball Sampling):
{'David_2_3', 'Alice_3', 'David_3_3', 'Alice_2_1', 'Bob', 'Charlie', 'David_3_1', 'Alice_1_2', 'Alice_3_2', 'Eve_3_2', 'Alice_1_3', 'Alice_2_3', 'Charlie_2_1', 'Bob_1_2', 'Eve_3_1', 'David_2_2', 'David_3_2', 'David_2', 'David_3', 'Alice_3_1', 'Bob_2', 'David_1', 'Charlie_3_2', 'Bob_1_1', 'Eve_2_1', 'Charlie_1_3', 'Eve_3', 'Eve_1_1', 'Eve_2', 'Charlie_3', 'David_2_1', 'David_1_1', 'Charlie_2_2', 'Eve_1_3', 'Bob_1_3', 'Alice_2_2', 'Bob_3_2', 'Charlie_3_1', 'Eve_2_2', 'Bob_3', 'Alice_1_1', 'Charlie_1_2', 'David_1_2', 'Alice_3_3', 'Charlie_3_3', 'David_1_3', 'Eve_3_3', 'Charlie_2_3', 'Bob_2_3', 'Alice', 'Alice_1', 'Eve_2_3', 'Eve', 'Charlie_1', 'Bob_2_2', 'Alice_2', 'Eve_1', 'Charlie_2', 'Bob_2_1', 'Bob_3_3', 'Bob_1', 'Charlie_1_1', 'Bob_3_1', 'David', 'Eve_1_2'}
