# Bubble Sort

Bubble sort repeteadly scans the list to be sorted, compares each pair of adjacent items and swaps them if they are in the wrong order.

The pass-through the list is repeated until no swaps are needed, which indicates that the list is sorted.

**Time Complexity**

- Best: $O(n)$. This happens if the array is already sorted (i.e. no swaps, but we still need to scan the array once).

- Worst: $O(n^{2})$

- Average: $O(n^{2})$

**Space Complexity**

$O(1)$. It's an in-place sorting algorithm. We just need to store a temp variable when we swap two elements.

In [171]:
from copy import copy

def bubble_sort(orig_list, debug=False):
    arr = copy(orig_list)
    passage_num = 1
    swapped = True
    # keep scanning until encounter no swaps in an entire passage (i.e. the array is sorted)
    while swapped:
        swapped = False
        for i in range(len(arr)-1):
            if arr[i] > arr[i+1]:
                arr[i], arr[i+1] = arr[i+1], arr[i]
                swapped = True
        if debug:
            print(f'After passage {passage_num}')
            print(arr)
        passage_num += 1
    
    return arr

In [2]:
from copy import copy

def bubble_sort_optimized(orig_list, debug=False):
    arr = copy(orig_list)
    passage_num = 1
    # outer loop to set where to stop scanning in a single passage
    for n in range(len(arr)-1, 0, -1):
        # inner loop for scanning (one passage)
        for i in range(n):
            if arr[i] > arr[i+1]:
                arr[i], arr[i+1] = arr[i+1], arr[i]
        if debug:
            print(f'After passage {passage_num} (last comparison at index {n})')
            print(arr)
        passage_num += 1
    
    return arr

### Test it with some data

In [3]:
original_list = [5, 1, 3, 10, 4, 8, 7, 6, 2, 9]

In [4]:
print(f'Original: {original_list}\n')
sorted_list = bubble_sort(original_list, debug=True)

Original: [5, 1, 3, 10, 4, 8, 7, 6, 2, 9]

After passage 1
[1, 3, 5, 4, 8, 7, 6, 2, 9, 10]
After passage 2
[1, 3, 4, 5, 7, 6, 2, 8, 9, 10]
After passage 3
[1, 3, 4, 5, 6, 2, 7, 8, 9, 10]
After passage 4
[1, 3, 4, 5, 2, 6, 7, 8, 9, 10]
After passage 5
[1, 3, 4, 2, 5, 6, 7, 8, 9, 10]
After passage 6
[1, 3, 2, 4, 5, 6, 7, 8, 9, 10]
After passage 7
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
After passage 8
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [5]:
print(f'Original: {original_list}\n')
sorted_list = bubble_sort_optimized(original_list, debug=True)

Original: [5, 1, 3, 10, 4, 8, 7, 6, 2, 9]

After passage 1 (last comparison at index 9)
[1, 3, 5, 4, 8, 7, 6, 2, 9, 10]
After passage 2 (last comparison at index 8)
[1, 3, 4, 5, 7, 6, 2, 8, 9, 10]
After passage 3 (last comparison at index 7)
[1, 3, 4, 5, 6, 2, 7, 8, 9, 10]
After passage 4 (last comparison at index 6)
[1, 3, 4, 5, 2, 6, 7, 8, 9, 10]
After passage 5 (last comparison at index 5)
[1, 3, 4, 2, 5, 6, 7, 8, 9, 10]
After passage 6 (last comparison at index 4)
[1, 3, 2, 4, 5, 6, 7, 8, 9, 10]
After passage 7 (last comparison at index 3)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
After passage 8 (last comparison at index 2)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
After passage 9 (last comparison at index 1)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [6]:
%%timeit
bubble_sort(original_list)

15.4 µs ± 497 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [7]:
%%timeit
bubble_sort_optimized(original_list)

12.2 µs ± 188 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


### Speed things up with Cython

In [5]:
%load_ext Cython

Define a cython function.

In order to make it visible to all other cells we must declare it with `cpdef`. If we declare it with `cdef` [it will not be visible to other cells](https://stackoverflow.com/questions/45792727/why-does-jupyter-notebook-forget-cython-from-one-cell-to-the-next)

In [119]:
%%cython --annotate

from copy import copy

cpdef bubble_sort_cython(orig_list, debug=False):
    arr = copy(orig_list)
    cdef passage_num = 1
    # outer loop to set where to stop scanning in a single passage
    for n in range(len(arr)-1, 0, -1):
        # inner loop for scanning (one passage)
        for i in range(n):
            if arr[i] > arr[i+1]:
                arr[i], arr[i+1] = arr[i+1], arr[i]
        if debug:
            print(f'After passage {passage_num} (last comparison at index {n})')
            print(arr)
        passage_num += 1
    
    return arr

In [47]:
print(f'Original: {original_list}\n')
print(f'Sorted:   {bubble_sort_cython(original_list)}')

Original: [5, 1, 3, 10, 4, 8, 7, 6, 2, 9]

Sorted:   [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [120]:
%%timeit
bubble_sort_cythonized(original_list)

6.27 µs ± 54 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Declare types http://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#types

In [121]:
%%cython --annotate
import numpy as np

cpdef bubble_sort_cython2(double[:] orig_arr, bint debug=False):
    cdef double[:] arr
    cdef double temp
    cdef int n, i, passage_num
    
    arr = np.copy(orig_arr)
    
    passage_num = 1
    for n in range(len(arr)-1, 0, -1):
        for i in range(n):
            if arr[i] > arr[i+1]:
                temp = arr[i]
                arr[i] = orig_arr[i+1]
                arr[i+1] = temp

        if debug:
            print(f'After passage {passage_num} (last comparison at index {n})')
            print(arr)
        passage_num += 1
    
    return arr

In [116]:
original_array = np.array(original_list, dtype=np.double)

In [122]:
%%timeit
bubble_sort_cython2(original_array)

7.01 µs ± 307 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


*Why is it slower?*

### Estimate time complexity class of a function from execution time.

In [123]:
import big_o

In [152]:
positive_int_generator = lambda n: big_o.datagen.integers(n, min_=10, max_=100)

In [169]:
best, others = big_o.big_o(bubble_sort,
                           positive_int_generator,
                           min_n=10, max_n=1000, n_repeats=10)

  coeff, residuals, rank, s = np.linalg.lstsq(x, y)


In [170]:
print(best)

Quadratic: time = 0.0079 + 1.5E-06*n^2
