---
title: "Selection Bias & Missing Data Challenge"
subtitle: "Creating a Statistics Meme: Write Your Own Functions"
format:
  html: default
execute:
  echo: false
  eval: true
---

# ðŸŽ¨ Selection Bias & Missing Data Challenge

In [None]:
#| label: step1-prepare
#| echo: false
#| eval: true
#| include: false

import numpy as np
import matplotlib.pyplot as plt
from step1_prepare_image import prepare_image

# Load and prepare the image
img_path = 'profilepic.jpeg'
gray_image = prepare_image(img_path, max_size=512)

In [None]:
#| label: step2-stipple
#| echo: false
#| eval: true
#| include: false
#| warning: false

from step2_create_stipple import create_stipple

# Create stippled image
stipple_pattern, samples = create_stipple(
    gray_image,
    percentage=0.08,
    sigma=0.9,
    content_bias=0.9,
    noise_scale_factor=0.1,
    extreme_downweight=0.5,
    extreme_threshold_low=0.2,
    extreme_threshold_high=0.8,
    extreme_sigma=0.1
)

In [None]:
#| label: step3-tonal
#| echo: false
#| eval: false
#| include: false
# Optional tonal analysis - not needed for final meme

In [None]:
#| label: step4-block-letter
#| echo: false
#| eval: true
#| include: false

import importlib
import step4_create_block_letter
importlib.reload(step4_create_block_letter)
from step4_create_block_letter import create_block_letter_s

# Get image dimensions
h, w = gray_image.shape

# Create block letter S
block_letter = create_block_letter_s(h, w, letter="S", font_size_ratio=0.9)

In [None]:
#| label: step5-masked
#| echo: false
#| eval: true
#| include: false

from step5_create_masked import create_masked_stipple

# Create masked stippled image
masked_stipple = create_masked_stipple(
    stipple_pattern,
    block_letter,
    threshold=0.5
)

In [None]:
#| label: create-final-meme
#| echo: false
#| eval: true
#| include: false

import importlib
import create_meme
importlib.reload(create_meme)
from create_meme import create_statistics_meme

# Create the final meme
create_statistics_meme(
    original_img=gray_image,
    stipple_img=stipple_pattern,
    block_letter_img=block_letter,
    masked_stipple_img=masked_stipple,
    output_path="my_statistics_meme.png",
    dpi=150,
    background_color="white"
)

In [None]:
#| label: final-meme
#| echo: false
#| fig-cap: Statistics meme demonstrating selection bias

# Display the final meme
import matplotlib.image as mpimg
fig, ax = plt.subplots(figsize=(16, 4))
img = mpimg.imread("my_statistics_meme.png")
ax.imshow(img)
ax.axis('off')
plt.tight_layout()
plt.show()

**Explanation:**

This meme illustrates selection bias by showing how systematic missing data distorts our view of reality. The original image (Reality) represents the true population, the stippled image (Your Model) shows our sampled data, and the masked version (Estimate) reveals how systematic data lossâ€”represented by the "S" patternâ€”creates a biased estimate that no longer reflects the true population, just as non-random missing data in research can lead to incorrect conclusions.
