1. Random Sampling

   Random Sampling means selecting items from a population completely at random, so each element has an equal chance of being picked.

In [1]:
import numpy as np

In [2]:
population = np.arange(1,101)

In [3]:
sample = np.random.choice(population,size=10,replace=False)

In [4]:
print("Population:", population)
print("Random Sample:", sample)

Population: [  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
  91  92  93  94  95  96  97  98  99 100]
Random Sample: [ 31 100  12  53  67  58  89  97  17  71]


2. Systematic Sampling

   In Systematic Sampling, we select every kᵗʰ element from the population after a random start.
   
   If we have 100 students and need 10 samples →
   
   k=100/10=10.
   
   So, we pick every 10th student (like 3, 13, 23, 33, …).

In [10]:
import numpy as np

# Create a population
population = np.arange(1, 101)  # 1 to 100

N = len(population)   # Population size
n = 10                # Sample size
k = N // n            # Step size

# Random start between 0 and k-1
start = np.random.randint(0, k)

# Select every k-th item
systematic_sample = population[start::k]

print("Population:", population)
print("Step size (k):", k)
print("Random start index:", start)
print("Systematic Sample:", systematic_sample)

Population: [  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
  91  92  93  94  95  96  97  98  99 100]
Step size (k): 10
Random start index: 4
Systematic Sample: [ 5 15 25 35 45 55 65 75 85 95]


3. Stratified Sampling

   In Stratified Sampling, the population is divided into groups (called strata) based on some characteristic (like gender, department, or region).

In [12]:
import pandas as pd
import numpy as np

# Create dataset
np.random.seed(42)
data = pd.DataFrame({
    'Student_ID': range(1, 101),
    'Department': np.random.choice(['CSE', 'ECE', 'EEE'], 100, p=[0.6, 0.25, 0.15]),
    'Marks': np.random.randint(50, 100, 100)
})

print("Department counts:")
print(data['Department'].value_counts())

# Desired total sample size
sample_size = 10

# Step 1: Find proportion of each stratum
stratum_counts = data['Department'].value_counts()
stratum_proportions = (stratum_counts / len(data)) * sample_size

# Step 2: Round to nearest integer for each stratum
stratum_samples = stratum_proportions.round().astype(int)

# Step 3: Sample from each stratum
samples = []
for dept, n in stratum_samples.items():
    stratum = data[data['Department'] == dept]
    sample = stratum.sample(n=n, random_state=42)
    samples.append(sample)

# Step 4: Combine all samples
final_sample = pd.concat(samples)

print("\nFinal Stratified Sample:")
print(final_sample.sort_values(by='Department'))


Department counts:
Department
CSE    63
ECE    24
EEE    13
Name: count, dtype: int64

Final Stratified Sample:
    Student_ID Department  Marks
98          99        CSE     69
93          94        CSE     58
0            1        CSE     81
65          66        CSE     51
10          11        CSE     64
57          58        CSE     88
38          39        ECE     52
75          76        ECE     93
80          81        EEE     53
