This problem came by way of a DM consultation. Sharing it with everyone as this is an excellent problem to illustrate several key ideas. We're also going to show why you may want to use numpy arrays and the notion of vectorization.

#### Problem
Given a list of numbers [-1, 2, 3, -4, 5, 8], find the sum of all the positive numbers.

In [1]:
raw_numbers = [-1, 2, 3, -4, 5, 8]

One approach would be to go through the list and check entry-by-entry if it's greater than 0, and add it to the variable holding the running sum. We'll call the variable for the running sum ``total`` and initialize it's value to zero.

To go through each entry, we use ``for`` entry ``in`` list. For each entry, check ``if`` the entry is greater than 0. I'll write the solution inside a function so we can easily call it later.

In [2]:
def list_sum(raw_numbers):
    total = 0
    for number in raw_numbers:
        if number > 0:
            total += number   #this is a shorter way of writing: total = total + number
    return total

In [3]:
list_sum(raw_numbers)

18

Now, let's look at how we can do this using a ``numpy.array`` and avoid using a ``for`` loop. Google: vectorization. Let's import ``numpy`` first.

In [4]:
import numpy as np

In [5]:
def array_sum(raw_numbers):
    numbers = np.array(raw_numbers)
    total = np.sum((numbers > 0) * numbers)
    return total

In [6]:
array_sum(raw_numbers)

18

Let's go through each step:
- ``numbers = np.array(raw_numbers)`` converts the list of integers into a numpy array.
- ``(numbers > 0)`` checks if the entries are greater than zero. The output of this is Boolean, i.e. True or False
- Array multiplication multiplies aligned entries. When you multiply an integer by ``True``, that's equivalent to multiplying it by 1, and when you use ``False``, that's equivalent multiplying by zero. Try it: ``2 * True`` vs. ``2 * False``.  
- ``np.sum(array)`` would sum up all the elements in the array. For arrays in more than one dimension, you can specify if you're summing along a certain ``axis``. Run ``np.sum?``

The advantage of vectorization over loops becomes clear as your array size increases. Let's make a large list of random integers.

In [7]:
n = 10000000
low_int = -30
high_int = 30
n_numbers = np.random.randint(low=low_int, high=high_int, size=n)

In [8]:
%%timeit 
list_sum(n_numbers)

1.75 s ± 6.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [9]:
%%timeit 
array_sum(n_numbers)

48 ms ± 175 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
