In [21]:
import math

# Code design matters

## Prime numbers

A prime number is an integer greater than 1 that is not a product of two smaller integer values. I.e it can only be divided by itself and one.  As an example, 2 is a prime number, but 4 is not.

## The first $n$ prime numbers

A naive implementation of an algorithm to find the first $n$ prime numbers is **trial division** where each number between 2 and $n$ (actually you only need to go to $\sqrt{n}$) is checked for primality.  This works, but it computationally demanding even for small $n$.

> We won't implement this here as its just too painfully slow!

A more efficient algorithm design, at least for $n$ < 1 million, is the **Sieve of Eratosthenes**.  It works as follows:

Consider the primes up to 10. First write out a list of number 2 to 10. Here we'll represent this as a python list called `candidates`.  As we work out which numbers are not primes we will cross (sieve) them out. For simplicity of illustration we will store those in a list called `crossed_out` 

```python
candidates = [2, 3, 4, 5, 6, 7, 8, 9, 10]
crossed_out = []
```
The first candidate number in the list is 2. The Sieve of Eratosthenes algorithm crosses out every 2nd number in the list after 2 by counting up from 2 in increments of 2: 

```python
candidates = [2, 3, 4, 5, 6, 7, 8, 9, 10]
crossed_out = [4, 6, 8, 10]
```
The next number in `candidates` after 2 **that has not been crossed out** is 3. We now cross out every 3rd number in the list by counting up from 3 increments of 3.  

```python
candidates = [2, 3, 4, 5, 6, 7, 8, 9, 10]
crossed_out = [4, 6, 8, 10, 9]
```

This is the basic process that is repeated in each iteration of the Sieve of Eratosthenes.  What might not be obvious is that we do not need to traverse every item in `candidates`, but instead only iterate up to the $\sqrt{n}$.  At the end of the algorithm the numbers that have not been crossed out are the prime numbers between 2 and $n$  

```python
primes = [i for i in candidates if i not in crossed_off]  
```

In [30]:
def prime_sieve(n):
    '''
    Compute the prime numbers between 1 and n.
    Naive implementation of the Sieve of Eratosthenes.
    
    Parameters:
    ----------
    n: int
        The upper limit
    
    Returns:
    -------
    list
        a list of primes
    '''
    # a list of candidate numbers to sieve
    candidates = [i for i in range(n+1)]
    # a list of numbered eliminated from consideration
    crossed_out = []
    # maximum iterations required
    limit = int(math.sqrt(n)) + 1
    
    for factor in range(2, limit): 
        step = candidates[factor]
        if factor*factor not in crossed_out:
            for i in range(factor+factor, len(candidates), factor):              
                if not candidates[i] in crossed_out:
                    crossed_out.append(candidates[i])
    
    return [i for i in candidates if i not in crossed_out]   

In [35]:
def prime_sieve(n=30):
    candidates = [i for i in range(2, n+1)]
    crossed_out = []
    limit = int(math.sqrt(n)) + 1
    for current_index in range(limit): 
        step = candidates[current_index]
        if step not in crossed_out:
            for i in range(current_index+step, len(candidates), step):              
                if not candidates[i] in crossed_out:
                    crossed_out.append(candidates[i])
    
    return [i for i in candidates if i not in crossed_out]        

In [42]:
i = 2
crossed_off = []
crossed_off += candidates[i*i::i]
crossed_off

[6, 8, 10]

In [48]:
candidates

[2, 3, 4, 5, 6, 7, 8, 9, 10]

In [43]:
i = 3
crossed_off += candidates[i*i::i]
crossed_off

[6, 8, 10]

In [36]:
prime_sieve(10)

[2, 3, 5, 7]

In [1]:
candidates = [2, 3, 4, 5, 6, 7, 8, 9, 10]
crossed_out = [4, 6, 8, 10, 9]


In [13]:
%timeit set(candidates).symmetric_difference(set(crossed_out))

530 ns ± 4.13 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [14]:
%timeit [i for i in candidates if i not in crossed_out]

546 ns ± 4.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
