# Bubble Sort

Bubble sort is one of the simplest sorting algorithms, often used for educational purposes to introduce the concept of sorting. It works by repeatedly iterating through the list, comparing adjacent elements, and swapping them if they are in the wrong order. This process "bubbles" the largest (or smallest, depending on the order) elements to their correct positions. While straightforward, its inefficiency for large datasets makes it a classic example in complexity analysis.

In this analysis, we'll provide a step-by-step pedagogical explanation with mathematical rigor. We'll cover the algorithm's mechanics, time complexity (best, worst, and average cases), space complexity, and some advanced considerations like exact operation counts and probabilistic models for average-case analysis. All derivations will be explicit and grounded in counting comparisons and swaps.

## Step 1: Understanding the Algorithm

### Pseudocode
Here's the standard implementation of bubble sort for sorting an array $ A $ of $ n $ elements in ascending order:

```pseudocode
procedure BubbleSort(A: array of n elements)
    for i from 1 to n-1 do  // Outer loop for passes
        swapped ← false     // Optimization flag
        for j from 1 to n-i do  // Inner loop for comparisons
            if A[j] > A[j+1] then
                swap A[j] and A[j+1]
                swapped ← true
        if not swapped then
            break  // Early termination if no swaps (array is sorted)
end procedure
```

- **Key Insight**: Each pass (outer loop iteration) guarantees that the largest unsorted element bubbles to the end. The optimization (using the `swapped` flag) allows early termination if the array becomes sorted before all passes.

Without the optimization, the algorithm always performs $ n-1 $ passes, but with it, the best case improves.

## Step 2: Time Complexity Analysis

Time complexity measures the number of basic operations (comparisons and swaps) as a function of input size $ n $. We'll analyze comparisons primarily, as they dominate in asymptotic terms, but we'll also count swaps for precision.

### Best-Case Time Complexity
The best case occurs when the array is already sorted.

- **Step-by-Step Derivation**:
  1. In the first pass ($ i = 1 $), the inner loop runs $ n-1 $ times, performing $ n-1 $ comparisons.
  2. Since the array is sorted, no swaps occur, so `swapped` remains false.
  3. The algorithm breaks after the first pass due to the optimization.
  4. Total comparisons: $ n-1 $.
  5. Swaps: 0.

- **Mathematical Expression**: $ T_{\text{best}}(n) = n - 1 $, which is $ \Theta(n) $ in asymptotic notation (linear time).

Without optimization, it would still run all passes, leading to $ \sum_{i=1}^{n-1} (n - i) = \frac{n(n-1)}{2} $ comparisons, or $ \Theta(n^2) $. The flag makes bubble sort adaptive.

### Worst-Case Time Complexity
The worst case occurs when the array is sorted in reverse order (e.g., descending for ascending sort).

- **Step-by-Step Derivation**:
  1. In each pass $ i $, the inner loop runs $ n - i $ times.
  2. Every comparison results in a swap because adjacent elements are always out of order initially.
  3. Swaps occur in every pass, so no early termination.
  4. Total comparisons: Sum over all passes = $ \sum_{i=1}^{n-1} (n - i) = \sum_{k=1}^{n-1} k = \frac{(n-1)n}{2} $, where $ k = n - i $.
  5. Total swaps: Same as comparisons in the worst case, since every comparison triggers a swap: $ \frac{n(n-1)}{2} $.

- **Mathematical Expression**: $ T_{\text{worst}}(n) = \frac{n(n-1)}{2} $ for both comparisons and swaps, which is $ \Theta(n^2) $ (quadratic time).

This is the upper bound, and bubble sort is notoriously slow for large $ n $ due to this.

### Average-Case Time Complexity
Average-case analysis assumes a uniform random distribution of input permutations (all $ n! $ permutations equally likely). We count the expected number of operations.

- **Step-by-Step Derivation for Comparisons**:
  1. Without optimization, the number of comparisons is always fixed: $ \frac{n(n-1)}{2} $, regardless of input.
  2. With optimization, it's trickier because early termination depends on the input. However, for rigor, let's first analyze the unoptimized version, then discuss the impact.
  3. In the unoptimized case: Expected comparisons = $ \frac{n(n-1)}{2} $ (constant).
  4. For swaps: A swap occurs for each pair $ (j, j+1) $ if $ A[j] > A[j+1] $.
  5. For any fixed pair $ (j, j+1) $, the probability that $ A[j] > A[j+1] $ in a random permutation is $ \frac{1}{2} $ (symmetric).
  6. There are $ \frac{n(n-1)}{2} $ such adjacent pairs across all passes, but actually, each pass checks a subset.
  7. More precisely: The total number of swaps is the sum over all possible adjacent inversions encountered.
  8. In bubble sort, the total swaps equal the number of inversions in the array, where an inversion is a pair $ (p, q) $ with $ p < q $ but $ A[p] > A[q] $.
  9. Bubble sort resolves exactly one inversion per swap, and all inversions are resolved.
  10. Expected number of inversions in a random permutation: For each pair $ (p, q) $ with $ p < q $, Prob(inversion) = $ \frac{1}{2} $, and there are $ \binom{n}{2} = \frac{n(n-1)}{2} $ pairs.
  11. Thus, expected inversions (and swaps) = $ \frac{1}{2} \cdot \frac{n(n-1)}{2} = \frac{n(n-1)}{4} $.

- **Impact of Optimization**: In the average case, the expected number of passes is still $ O(n) $, but the total comparisons remain $ O(n^2) $ because early termination doesn't trigger often enough to reduce the asymptotic bound. Rigorous probabilistic analysis shows the expected time is still $ \Theta(n^2) $, as the probability of early termination decreases with $ n $.

- **Mathematical Expression**: Expected $ T_{\text{avg}}(n) \approx \frac{n(n-1)}{2} $ comparisons + $ \frac{n(n-1)}{4} $ swaps, which is $ \Theta(n^2) $.

For a more advanced probabilistic model, consider the position of the last swap in each pass, but this confirms the quadratic average.

## Step 3: Space Complexity

- **Step-by-Step Analysis**:
  1. Bubble sort is an in-place algorithm: It only requires a constant amount of extra space for temporary variables (e.g., for swapping).
  2. No additional data structures like stacks or auxiliary arrays are used.
  3. Variables: Indices $ i, j $, boolean `swapped`, and a temp variable for swap (if not using XOR or arithmetic swap).
  4. Total extra space: $ O(1) $, independent of $ n $.

- **Mathematical Expression**: $ S(n) = O(1) $.

## Step 4: Advanced Considerations

### Exact Operation Counts
- Let $ C(n) $ be comparisons, $ S(n) $ be swaps.
- Worst: $ C(n) = S(n) = \frac{n(n-1)}{2} $.
- Best: $ C(n) = n-1 $, $ S(n) = 0 $.
- Average: $ C(n) = \frac{n(n-1)}{2} $ (unoptimized), $ S(n) = \frac{n(n-1)}{4} $.

### Why Quadratic? A Deeper Look
Bubble sort's inefficiency stems from its $ O(n) $ passes, each doing $ O(n) $ work. In contrast, efficient sorts like merge sort divide the problem (recursion depth $ O(\log n) $, work per level $ O(n) $), yielding $ O(n \log n) $.

### Limitations and Optimizations
- **Cocktail Shaker Sort**: A bidirectional variant, still $ O(n^2) $ worst-case.
- **Not Stable Without Care**: Standard implementation is stable (preserves order of equal elements).
- **Practical Use**: Only for small $ n $ or nearly sorted data.
- **Lower Bound**: Sorting requires $ \Omega(n \log n) $ comparisons in the worst case (information-theoretic bound), so bubble sort is suboptimal.

This analysis highlights why bubble sort, despite its simplicity, is rarely used in practice for large datasets. For further rigor, simulations via code can verify these counts empirically.

The statement that sorting requires $\Omega(n \log n)$ comparisons in the worst case, making bubble sort suboptimal, stems from the **information-theoretic lower bound** for comparison-based sorting algorithms. Let’s dive into why this is true, with a step-by-step explanation that balances mathematical rigor with pedagogical clarity, as requested. We’ll explore the information-theoretic bound, why it applies to comparison-based sorting, and why bubble sort’s $\Theta(n^2)$ complexity is suboptimal compared to this bound.

---

## Why Sorting Requires $\Omega(n \log n)$ Comparisons

### Step 1: Understanding Comparison-Based Sorting
Bubble sort, like many sorting algorithms (e.g., quicksort, mergesort, insertion sort), is a **comparison-based sorting algorithm**. This means it sorts by repeatedly comparing pairs of elements and using the results (e.g., $a < b$, $a > b$, or $a = b$) to determine their relative order. The algorithm makes no assumptions about the values beyond their comparability (i.e., it doesn’t use specific properties like numerical ranges or digit distributions).

The key question is: **What is the minimum number of comparisons needed to sort $n$ elements in the worst case?**

### Step 2: Modeling the Sorting Problem
- When sorting $n$ distinct elements, the goal is to determine their correct permutation (order) from all possible permutations.
- There are $n!$ possible permutations of $n$ distinct elements, as each element can occupy any of the $n$ positions, then $n-1$, and so on.
- Each comparison (e.g., $a_i < a_j$) provides at most **1 bit of information** because it has two outcomes: true ($a_i < a_j$) or false ($a_i \geq a_j$).
- To uniquely identify the correct permutation, the algorithm must gather enough information to distinguish one of the $n!$ permutations from all others.

### Step 3: Information-Theoretic Lower Bound
The number of comparisons required is related to the amount of information needed to specify one permutation out of $n!$. This is where information theory comes in.

- **Information Required**: To identify one permutation out of $n!$, the algorithm needs at least $\log_2(n!)$ bits of information, because each permutation is equally likely in the worst case, and $\log_2(n!)$ represents the entropy (or uncertainty) of choosing one permutation.
- **Approximating $\log_2(n!)$**: Using **Stirling’s approximation** for factorials:
  $$
  n! \approx \sqrt{2\pi n} \cdot \left(\frac{n}{e}\right)^n
  $$
  Taking the base-2 logarithm:
  $$
  \log_2(n!) \approx \log_2\left(\sqrt{2\pi n} \cdot \left(\frac{n}{e}\right)^n\right)
  $$
  $$
  = \log_2(\sqrt{2\pi n}) + \log_2\left(\left(\frac{n}{e}\right)^n\right)
  $$
  $$
  = \frac{1}{2}\log_2(2\pi n) + n \log_2\left(\frac{n}{e}\right)
  $$
  $$
  = \frac{1}{2}(\log_2(2\pi) + \log_2 n) + n (\log_2 n - \log_2 e)
  $$
  $$
  \approx n \log_2 n - n \log_2 e + \frac{1}{2}\log_2 n + \text{constant terms}
  $$
  The dominant term is $n \log_2 n$, so:
  $$
  \log_2(n!) \in \Theta(n \log n)
  $$

- **Implication**: Any comparison-based sorting algorithm must perform at least $\lceil \log_2(n!) \rceil \approx \Theta(n \log n)$ comparisons in the worst case to gather enough information to identify the correct permutation.

### Step 4: Decision Tree Model
Another way to understand this is through the **decision tree model** for comparison-based sorting:
- Each comparison has two outcomes, forming a binary decision tree.
- The leaves of the tree represent the $n!$ possible permutations.
- To have at least $n!$ leaves, the tree’s height (the number of comparisons in the worst case) must be at least $\lceil \log_2(n!) \rceil$.
- Thus, the worst-case number of comparisons is at least $\Theta(n \log n)$.

### Step 5: Why Bubble Sort is Suboptimal
Bubble sort’s worst-case time complexity is $\Theta(n^2)$, as derived previously:
- It performs $\frac{n(n-1)}{2} = \Theta(n^2)$ comparisons and swaps in the worst case (e.g., reverse-sorted array).
- This is because bubble sort resolves inversions one at a time, with up to $n-1$ passes, each performing up to $n-1$ comparisons.

Compare this to the information-theoretic lower bound:
- $\Theta(n^2) \gg \Theta(n \log n)$ for large $n$. For example:
  - If $n = 100$, $\Theta(n \log n) \approx 100 \cdot \log_2 100 \approx 664$, while $\Theta(n^2) = 100^2 = 10,000$.
  - Bubble sort performs far more comparisons than the minimum required.

Bubble sort is **suboptimal** because it uses a naive strategy that doesn’t leverage the structure of the data efficiently. Algorithms like **mergesort** or **quicksort** achieve $\Theta(n \log n)$ in the worst or average case by dividing the problem (e.g., via recursion or partitioning) and reducing the number of comparisons needed per element.

### Step 6: Why Can’t Bubble Sort Do Better?
- Bubble sort’s design is inherently local: It only compares and swaps adjacent elements, moving elements one position per pass.
- To sort a reverse-ordered array, it needs $n-1$ passes to move the smallest element from the end to the beginning, each pass requiring $O(n)$ comparisons.
- This contrasts with divide-and-conquer algorithms, which make decisions that eliminate larger portions of the permutation space per comparison.

### Step 7: Exceptions to the Bound
The $\Omega(n \log n)$ bound applies specifically to **comparison-based sorting algorithms**. Non-comparison-based algorithms (e.g., radix sort, bucket sort) can achieve better complexity (e.g., $O(n)$) under specific conditions (e.g., bounded integer keys). However, bubble sort is comparison-based, so it’s subject to this bound and fails to meet it optimally.

---

## Summary
- The $\Omega(n \log n)$ lower bound arises because sorting $n$ elements requires distinguishing one of $n!$ permutations, needing at least $\log_2(n!) \approx \Theta(n \log n)$ comparisons.
- Bubble sort’s $\Theta(n^2)$ worst-case complexity is much higher than this bound due to its inefficient, pass-based approach.
- This makes bubble sort suboptimal for general-purpose sorting, especially for large $n$, compared to algorithms like mergesort or heapsort, which achieve the optimal $\Theta(n \log n)$.

This analysis underscores the importance of algorithm design in meeting theoretical limits, and why bubble sort, while pedagogically useful, is impractical for large datasets.