# Chapter 23: Secure Coding Practices

Writing secure code goes beyond cryptography. It requires disciplined input validation,
proper output encoding, safe handling of external data, and following established security
principles. This notebook covers practical secure coding patterns in Python.

## Topics Covered
- **Input validation**: Whitelisting vs blacklisting approaches
- **Common vulnerabilities**: Injection and XSS concepts
- **Output encoding**: `html.escape()` for HTML safety
- **Shell safety**: `shlex.quote()` for command construction
- **SQL injection prevention**: Parameterized queries
- **SSL/TLS basics**: Creating SSL contexts, certificate verification
- **Secrets management**: Environment variables with `os.environ`
- **Secure coding checklist**: Principles and guidelines

## Input Validation: Whitelisting vs Blacklisting

Input validation is the first line of defense. There are two fundamental approaches:

- **Whitelisting** (allow-list): Define exactly what IS allowed and reject everything
  else. This is the preferred approach because it is secure by default.
- **Blacklisting** (deny-list): Define what is NOT allowed and accept everything else.
  This is fragile because attackers can often find inputs you forgot to block.

**Rule**: Always prefer whitelisting. Only use blacklisting as an additional layer.

In [None]:
import re


# WHITELIST approach: define exactly what is allowed
def validate_username(username: str) -> bool:
    """Validate a username using a whitelist pattern.

    Only allows lowercase letters, digits, underscores, and hyphens.
    Must be 3-20 characters long.
    """
    pattern: str = r"^[a-z0-9_-]{3,20}$"
    return bool(re.match(pattern, username))


def validate_email(email: str) -> bool:
    """Basic email validation using a whitelist pattern.

    Note: For production use, consider the email-validator library
    or actually sending a verification email.
    """
    pattern: str = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    return bool(re.match(pattern, email)) and len(email) <= 254


def validate_age(age_str: str) -> int | None:
    """Validate and convert age input with strict bounds checking."""
    try:
        age: int = int(age_str)
    except ValueError:
        return None
    if 0 <= age <= 150:  # Reasonable range
        return age
    return None


# Test username validation
test_usernames: list[tuple[str, bool]] = [
    ("alice", True),
    ("bob_123", True),
    ("my-name", True),
    ("AB", False),         # Too short
    ("Alice", False),      # Uppercase not allowed
    ("user@name", False),  # Special chars not allowed
    ("'; DROP TABLE--", False),  # SQL injection attempt
]

print("Username validation (whitelist):")
for username, expected in test_usernames:
    result: bool = validate_username(username)
    status: str = "PASS" if result == expected else "FAIL"
    print(f"  [{status}] {username!r:25s} -> valid={result}")

# Test email validation
print("\nEmail validation:")
test_emails: list[str] = ["user@example.com", "bad@", "no-at-sign", "a@b.co"]
for email in test_emails:
    print(f"  {email!r:25s} -> valid={validate_email(email)}")

# Test age validation
print("\nAge validation:")
for age_input in ["25", "0", "150", "-1", "200", "abc", "25; DROP TABLE"]:
    print(f"  {age_input!r:20s} -> {validate_age(age_input)}")

In [None]:
from dataclasses import dataclass
import re


@dataclass
class ValidationResult:
    """Result of a validation check."""
    is_valid: bool
    value: str
    errors: list[str]


def validate_and_sanitize(raw_input: str, field_name: str,
                          max_length: int = 100,
                          allowed_pattern: str = r"^[\w\s.,!?-]+$") -> ValidationResult:
    """Validate and sanitize user input with detailed error reporting.

    Steps:
    1. Strip leading/trailing whitespace
    2. Check length constraints
    3. Validate against allowed character pattern
    4. Normalize whitespace
    """
    errors: list[str] = []

    # Step 1: Strip whitespace
    cleaned: str = raw_input.strip()

    # Step 2: Length checks
    if len(cleaned) == 0:
        errors.append(f"{field_name} cannot be empty")
    if len(cleaned) > max_length:
        errors.append(f"{field_name} exceeds maximum length of {max_length}")

    # Step 3: Character whitelist check
    if cleaned and not re.match(allowed_pattern, cleaned):
        errors.append(f"{field_name} contains disallowed characters")

    # Step 4: Normalize internal whitespace
    cleaned = re.sub(r"\s+", " ", cleaned)

    return ValidationResult(
        is_valid=len(errors) == 0,
        value=cleaned,
        errors=errors,
    )


# Test cases
test_inputs: list[str] = [
    "  Hello, World!  ",
    "Normal input text.",
    "",
    "<script>alert('xss')</script>",
    "A" * 150,
    "Line one\n\nLine two",
]

print("Input validation and sanitization:")
for raw in test_inputs:
    result: ValidationResult = validate_and_sanitize(raw, "comment")
    display: str = raw[:40] + "..." if len(raw) > 40 else raw
    print(f"\n  Input:  {display!r}")
    print(f"  Valid:  {result.is_valid}")
    if result.errors:
        print(f"  Errors: {result.errors}")
    else:
        print(f"  Clean:  {result.value!r}")

## Common Vulnerability Types: Injection and XSS

**Injection attacks** occur when untrusted data is sent to an interpreter as part of a
command or query. The two most common types in web applications are:

- **SQL Injection**: Malicious SQL code in user input modifies database queries
- **Cross-Site Scripting (XSS)**: Malicious JavaScript in user input executes in
  other users' browsers

The defense is always the same: **never trust user input**, and always use the appropriate
encoding or parameterization for the output context.

In [None]:
# Demonstrating WHY injection is dangerous (conceptual examples)

# BAD: String formatting for SQL (NEVER do this)
def bad_login_query(username: str, password: str) -> str:
    """INSECURE: Demonstrates SQL injection vulnerability.

    WARNING: This is an example of what NOT to do.
    """
    return f"SELECT * FROM users WHERE name='{username}' AND pass='{password}'"


# Normal input
normal_query: str = bad_login_query("alice", "secret123")
print("Normal query:")
print(f"  {normal_query}")

# SQL injection: the attacker's input changes the query structure
injection_query: str = bad_login_query("' OR '1'='1", "' OR '1'='1")
print("\nInjected query (bypasses authentication):")
print(f"  {injection_query}")

# Destructive injection
destructive_query: str = bad_login_query("'; DROP TABLE users;--", "anything")
print("\nDestructive injection:")
print(f"  {destructive_query}")

# XSS example: user input rendered as HTML
def bad_greeting(name: str) -> str:
    """INSECURE: Demonstrates XSS vulnerability."""
    return f"<h1>Welcome, {name}!</h1>"


normal_html: str = bad_greeting("Alice")
print(f"\nNormal HTML: {normal_html}")

xss_html: str = bad_greeting("<script>alert('XSS')</script>")
print(f"XSS HTML:    {xss_html}")
print("  ^ The script tag would execute in a browser!")

## html.escape(): HTML Output Encoding

When rendering user-supplied data in HTML, you must escape special characters to prevent
XSS attacks. `html.escape()` converts characters like `<`, `>`, `&`, and `"` into their
HTML entity equivalents so they are displayed as text rather than interpreted as HTML.

In [None]:
import html


def safe_greeting(name: str) -> str:
    """SECURE: Escapes user input before embedding in HTML."""
    safe_name: str = html.escape(name)
    return f"<h1>Welcome, {safe_name}!</h1>"


# html.escape() converts special HTML characters to entities
test_strings: list[str] = [
    "Alice",
    "<script>alert('XSS')</script>",
    'Bob" onmouseover="alert(1)',
    "Tom & Jerry <together>",
    "O'Brien",
]

print("html.escape() examples:")
for s in test_strings:
    escaped: str = html.escape(s)
    print(f"\n  Original: {s}")
    print(f"  Escaped:  {escaped}")

# Safe rendering
print("\nSafe HTML output:")
print(f"  {safe_greeting('Alice')}")
print(f"  {safe_greeting('<script>alert(1)</script>')}")

# html.escape() with quote=True (default) escapes both " and '
# quote=False leaves quotes unescaped (rarely what you want)
attr_value: str = 'value" onclick="alert(1)'
print(f"\nAttribute escaping:")
print(f"  Original:      {attr_value}")
print(f"  quote=True:    {html.escape(attr_value, quote=True)}")
print(f"  quote=False:   {html.escape(attr_value, quote=False)}")

# html.unescape() reverses the process
encoded: str = "&lt;b&gt;Bold&lt;/b&gt; &amp; &quot;quoted&quot;"
print(f"\nhtml.unescape():")
print(f"  Encoded:   {encoded}")
print(f"  Unescaped: {html.unescape(encoded)}")

## shlex.quote(): Shell Command Safety

When you must construct shell commands that include user input (which you should avoid
whenever possible), `shlex.quote()` properly escapes the input to prevent shell injection.

**Best practice**: Use `subprocess.run()` with a **list of arguments** instead of a shell
string. This avoids shell interpretation entirely. Use `shlex.quote()` only when you must
construct a shell command string.

In [None]:
import shlex
import subprocess


# Dangerous: shell injection through string formatting
def bad_file_search(filename: str) -> str:
    """INSECURE: Demonstrates shell injection vulnerability."""
    return f"find /tmp -name '{filename}'"


# Normal input
print("Shell command construction:")
print(f"  Normal:   {bad_file_search('report.txt')}")

# Shell injection: the attacker breaks out of the quotes
malicious: str = "'; rm -rf /; echo '"
print(f"  Injected: {bad_file_search(malicious)}")
print("  ^ Would execute: rm -rf / (catastrophic!)")

# SAFE: Using shlex.quote()
def safe_file_search_cmd(filename: str) -> str:
    """SECURE: Uses shlex.quote() to prevent shell injection."""
    safe_name: str = shlex.quote(filename)
    return f"find /tmp -name {safe_name}"


print(f"\n  Safe cmd: {safe_file_search_cmd(malicious)}")
print("  ^ Malicious input is safely quoted as a single argument")

# shlex.quote() examples
dangerous_inputs: list[str] = [
    "normal.txt",
    "file with spaces.txt",
    "file;rm -rf /",
    "$(whoami)",
    "`id`",
    "file\nnewline",
]

print("\nshlex.quote() examples:")
for inp in dangerous_inputs:
    print(f"  {inp!r:30s} -> {shlex.quote(inp)}")

# BEST: Use subprocess with a list (no shell at all)
print("\nBest practice: subprocess with list args (no shell):")
result = subprocess.run(
    ["echo", "Hello from", "subprocess!"],
    capture_output=True, text=True, check=False,
)
print(f"  Output: {result.stdout.strip()}")
print("  No shell interpretation = no shell injection possible")

## Parameterized Queries: SQL Injection Prevention

The correct defense against SQL injection is **parameterized queries** (also called
prepared statements). Instead of formatting values into the SQL string, you pass them
as separate parameters. The database driver handles escaping automatically.

Python's `sqlite3` module uses `?` as the parameter placeholder.

In [None]:
import sqlite3

# Create an in-memory database for demonstration
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()

# Set up a users table
cursor.execute("""
    CREATE TABLE users (
        id INTEGER PRIMARY KEY,
        username TEXT NOT NULL,
        email TEXT NOT NULL
    )
""")
cursor.executemany(
    "INSERT INTO users (username, email) VALUES (?, ?)",
    [
        ("alice", "alice@example.com"),
        ("bob", "bob@example.com"),
        ("charlie", "charlie@example.com"),
    ],
)
conn.commit()

# BAD: String formatting (vulnerable to SQL injection)
def bad_lookup(username: str) -> list[tuple[int, str, str]]:
    """INSECURE: Do NOT use string formatting for SQL."""
    query: str = f"SELECT * FROM users WHERE username = '{username}'"
    return cursor.execute(query).fetchall()


# GOOD: Parameterized query (safe)
def safe_lookup(username: str) -> list[tuple[int, str, str]]:
    """SECURE: Uses parameterized query."""
    return cursor.execute(
        "SELECT * FROM users WHERE username = ?", (username,)
    ).fetchall()


# Normal lookup works with both
print("Normal lookup:")
print(f"  Bad:  {bad_lookup('alice')}")
print(f"  Safe: {safe_lookup('alice')}")

# Injection attempt: returns all users with bad method
injection: str = "' OR '1'='1"
print(f"\nInjection attempt with: {injection!r}")
print(f"  Bad (returns all):  {bad_lookup(injection)}")
print(f"  Safe (returns none): {safe_lookup(injection)}")

# The parameterized query treats the entire input as a literal value
# It looks for a user literally named "' OR '1'='1" (which doesn't exist)

conn.close()
print("\nKey takeaway: ALWAYS use parameterized queries (? placeholders)")

## ssl Module Basics: SSL Contexts and Certificate Verification

The `ssl` module provides TLS/SSL support for network connections. An `SSLContext`
encapsulates the configuration for secure connections: protocol version, certificate
verification, cipher suites, etc.

**Always** use `ssl.create_default_context()` unless you have a very specific reason
not to. It enables certificate verification, uses modern protocols, and disables
known-insecure options.

In [None]:
import ssl

# create_default_context(): the recommended way to create an SSL context
# It enables certificate verification and uses secure defaults
ctx: ssl.SSLContext = ssl.create_default_context()

print("Default SSL context settings:")
print(f"  Protocol:              {ctx.protocol}")
print(f"  Verify mode:           {ctx.verify_mode}")
print(f"  Check hostname:        {ctx.check_hostname}")
print(f"  Minimum TLS version:   {ctx.minimum_version}")
print(f"  Maximum TLS version:   {ctx.maximum_version}")

# The default context loads the system's trusted CA certificates
stats: dict[str, int] = ctx.cert_store_stats()
print(f"\nCertificate store stats:")
print(f"  x509 certificates:     {stats['x509']}")
print(f"  x509 CA certificates:  {stats['x509_ca']}")
print(f"  CRL entries:           {stats['crl']}")

# List available ciphers (first 5 for brevity)
ciphers: list[dict[str, object]] = ctx.get_ciphers()
print(f"\nAvailable ciphers: {len(ciphers)} total")
for cipher in ciphers[:5]:
    print(f"  {cipher['name']}")
print("  ...")

In [None]:
import ssl
import urllib.request

# SECURE: Using default context (verifies certificates)
print("Secure HTTPS request (certificate verification enabled):")
secure_ctx: ssl.SSLContext = ssl.create_default_context()

try:
    with urllib.request.urlopen("https://www.python.org",
                                context=secure_ctx) as response:
        print(f"  Status: {response.status}")
        print(f"  URL:    {response.url}")
        # Get the server's certificate info
        cert: dict | None = response.fp.raw._sock.getpeercert()  # type: ignore[union-attr]
        if cert:
            subject: tuple = cert.get("subject", ())
            for field in subject:
                for key, value in field:
                    if key == "commonName":
                        print(f"  Cert CN: {value}")
            not_after: str | None = cert.get("notAfter")
            if not_after:
                print(f"  Expires: {not_after}")
except Exception as e:
    print(f"  Connection error: {e}")

# INSECURE: Disabling verification (NEVER do this in production)
print("\nWARNING: Creating an insecure context (for demonstration only):")
insecure_ctx = ssl.create_default_context()
insecure_ctx.check_hostname = False
insecure_ctx.verify_mode = ssl.CERT_NONE
print(f"  Verify mode: {insecure_ctx.verify_mode}")
print(f"  Check hostname: {insecure_ctx.check_hostname}")
print("  NEVER disable certificate verification in production!")

## Environment Variables for Secrets (os.environ)

Secrets like API keys, database passwords, and encryption keys should **never** be
hardcoded in source code. The standard approach is to use environment variables, which
can be set per-deployment without changing code.

Common tools for managing environment variables:
- `.env` files loaded with `python-dotenv` (for development)
- System environment variables (for production)
- Secret management services (AWS Secrets Manager, HashiCorp Vault, etc.)

In [None]:
import os


def get_required_env(name: str) -> str:
    """Get a required environment variable or raise an error.

    Use this for secrets that MUST be set for the application to run.
    """
    value: str | None = os.environ.get(name)
    if value is None:
        raise RuntimeError(
            f"Required environment variable {name!r} is not set. "
            f"Set it before running the application."
        )
    return value


def get_optional_env(name: str, default: str = "") -> str:
    """Get an optional environment variable with a default."""
    return os.environ.get(name, default)


class AppConfig:
    """Application configuration loaded from environment variables.

    Centralizing config access makes it easy to audit which secrets
    your application uses and where they come from.
    """

    def __init__(self) -> None:
        # Set some demo variables for this example
        os.environ["APP_SECRET_KEY"] = "demo-secret-key-12345"
        os.environ["APP_DATABASE_URL"] = "postgresql://user:pass@localhost/db"
        os.environ["APP_DEBUG"] = "false"

        self.secret_key: str = get_required_env("APP_SECRET_KEY")
        self.database_url: str = get_required_env("APP_DATABASE_URL")
        self.debug: bool = get_optional_env("APP_DEBUG", "false").lower() == "true"
        self.log_level: str = get_optional_env("APP_LOG_LEVEL", "INFO")

    def __repr__(self) -> str:
        """Repr that masks sensitive values."""
        return (
            f"AppConfig("
            f"secret_key='***', "
            f"database_url='***', "
            f"debug={self.debug}, "
            f"log_level={self.log_level!r})"
        )


# Load configuration
config = AppConfig()
print(f"Config: {config}")
print(f"  Note: __repr__ masks secrets to prevent accidental logging")

# Missing required variable
print("\nMissing required variable:")
try:
    _ = get_required_env("NONEXISTENT_SECRET")
except RuntimeError as e:
    print(f"  Error: {e}")

# BAD: Hardcoded secrets (NEVER do this)
print("\nAnti-patterns to avoid:")
print('  BAD:  API_KEY = "sk-1234567890abcdef"  # Hardcoded in source')
print('  BAD:  password = "admin123"             # Literal password')
print('  GOOD: API_KEY = os.environ["API_KEY"]   # From environment')
print('  GOOD: password = get_required_env("DB_PASSWORD")  # Validated')

# Cleanup demo variables
for key in ["APP_SECRET_KEY", "APP_DATABASE_URL", "APP_DEBUG"]:
    os.environ.pop(key, None)

## Secure Coding Checklist and Principles

Security is not a feature you add at the end -- it is a practice woven into every
line of code. Below are essential principles to follow.

In [None]:
# Principle 1: Defense in Depth
# Apply multiple layers of security -- never rely on a single check

import re
import html


def process_comment(raw_comment: str) -> str:
    """Process a user comment with multiple security layers.

    Layer 1: Input validation (reject clearly invalid input)
    Layer 2: Sanitization (clean the input)
    Layer 3: Output encoding (escape for the target context)
    """
    # Layer 1: Reject if too long or empty
    if not raw_comment.strip() or len(raw_comment) > 10_000:
        return ""

    # Layer 2: Strip control characters (except newlines)
    cleaned: str = re.sub(r"[\x00-\x09\x0b-\x1f\x7f]", "", raw_comment)
    cleaned = cleaned.strip()

    # Layer 3: HTML-escape for safe rendering
    safe: str = html.escape(cleaned)

    return safe


test_comments: list[str] = [
    "Great article! Thanks for sharing.",
    "<script>document.cookie</script>",
    "Normal text with \x00null\x01bytes",
]

print("Defense in Depth: Processing user comments")
for comment in test_comments:
    result: str = process_comment(comment)
    print(f"  Input:  {comment!r}")
    print(f"  Output: {result!r}\n")

In [None]:
# Principle 2: Least Privilege and Fail Securely

import os
import stat
import tempfile
from pathlib import Path


def create_secure_temp_file(content: str) -> Path:
    """Create a temporary file with restrictive permissions.

    Only the owner can read/write the file (mode 0o600).
    """
    fd, path_str = tempfile.mkstemp(suffix=".secret", prefix="app_")
    path = Path(path_str)
    try:
        # Set restrictive permissions BEFORE writing content
        os.chmod(fd, stat.S_IRUSR | stat.S_IWUSR)  # 0o600
        os.write(fd, content.encode("utf-8"))
    finally:
        os.close(fd)
    return path


# Create a secure temp file
secret_file: Path = create_secure_temp_file("secret_api_key=abc123")
file_stat = secret_file.stat()
mode: str = oct(file_stat.st_mode)[-3:]
print(f"Secure temp file: {secret_file}")
print(f"  Permissions: {mode} (owner read/write only)")
print(f"  Contents:    {secret_file.read_text()}")

# Fail securely: never expose internals in error messages
def safe_division(a: float, b: float) -> dict[str, object]:
    """Division that fails securely with a generic error message.

    Internal details are logged, not shown to the user.
    """
    try:
        result: float = a / b
        return {"success": True, "result": result}
    except ZeroDivisionError:
        # Log the full error internally (would use logging in production)
        # logger.error(f"Division by zero: {a}/{b}", exc_info=True)

        # Return a generic message to the user
        return {"success": False, "error": "Invalid operation"}


print(f"\nFail securely:")
print(f"  10 / 3 = {safe_division(10, 3)}")
print(f"  10 / 0 = {safe_division(10, 0)}")
print("  Note: error message is generic, no internal details exposed")

# Cleanup
secret_file.unlink()

In [None]:
# Principle 3: Sensitive Data Handling
# Minimize exposure of sensitive data in memory and logs

import secrets


class SensitiveString:
    """A string wrapper that prevents accidental exposure in logs.

    The actual value is only accessible through the .get_secret_value() method.
    __repr__ and __str__ always return a masked version.
    """

    def __init__(self, value: str) -> None:
        self._value: str = value

    def get_secret_value(self) -> str:
        """Explicitly retrieve the secret value."""
        return self._value

    def __repr__(self) -> str:
        return "SensitiveString('***')"

    def __str__(self) -> str:
        return "***"

    def __eq__(self, other: object) -> bool:
        if isinstance(other, SensitiveString):
            # Use constant-time comparison
            import hmac
            return hmac.compare_digest(
                self._value.encode(), other._value.encode()
            )
        return NotImplemented

    def __hash__(self) -> int:
        return hash(self._value)


# Usage
api_key = SensitiveString(secrets.token_hex(16))
db_password = SensitiveString("super-secret-password")

# Safe to log or print
print(f"API Key: {api_key}")          # Prints: ***
print(f"DB Pass: {db_password!r}")    # Prints: SensitiveString('***')

# Must explicitly request the value
print(f"\nActual API Key (explicit): {api_key.get_secret_value()[:16]}...")

# Safe in data structures
config_data: dict[str, object] = {
    "host": "localhost",
    "port": 5432,
    "password": db_password,
}
print(f"\nConfig dict: {config_data}")
print("  Password is masked even in dict repr")

## Summary

### Key Takeaways

| Principle | Tool / Technique | Purpose |
|-----------|------------------|---------|
| **Input validation** | Whitelist patterns, `re.match()` | Reject invalid input at the boundary |
| **HTML safety** | `html.escape()` | Prevent XSS in rendered HTML |
| **Shell safety** | `shlex.quote()`, `subprocess.run(list)` | Prevent shell injection |
| **SQL safety** | Parameterized queries (`?` placeholders) | Prevent SQL injection |
| **TLS/SSL** | `ssl.create_default_context()` | Secure network connections with cert verification |
| **Secrets management** | `os.environ`, config classes | Keep secrets out of source code |
| **Defense in depth** | Validate + sanitize + encode | Multiple security layers |
| **Fail securely** | Generic error messages | Never expose internals to users |

### Secure Coding Checklist
- Validate all external input using whitelist patterns
- Use parameterized queries for all database operations
- Escape output for the target context (HTML, shell, SQL, etc.)
- Store secrets in environment variables, never in source code
- Use `ssl.create_default_context()` for all TLS connections
- Never disable certificate verification in production
- Mask sensitive data in logs and error messages
- Apply the principle of least privilege to file permissions and access
- Prefer `subprocess.run()` with list arguments over shell strings
- Use constant-time comparison (`hmac.compare_digest()`) for security tokens