# Computation on Arrays

In [1]:
import numpy as np

The beauty of doing computation on NumPy arrays is that we can do calculations very quickly when we use built-in vectorized operations. Instead of iterating over the array and doing an operation at each iteration (e.g. using a for loop), it is much much faster to use a vectorized operation. This can be done by performing an operation on the array, which can be applied to each element. Essentially, this vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster computation.

## For Loop vs. Vectorized Operation

Suppose we wanted to add 1 to each element in an array. The straightforward way to do it is to use a for loop.. 

In [2]:
some_array = np.arange(1, 1000)

In [3]:
def add_one(some_array):
    for i in np.arange(0, len(some_array)):
        some_array[i] = some_array[i] + 1
    return some_array

In [4]:
# For Loop
%timeit add_one(some_array)

695 µs ± 26.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


If we used the vectorization inherent to NumPy arrays, we would get a much faster compute time.

In [5]:
some_array = np.arange(1, 1000)

In [6]:
# Vectorized Operation
%timeit some_array + 1

1.26 µs ± 30.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


```{note}
$1.29 \mu s$  is a lot smaller than  $688 \mu s$! Since we get the same result from either method, it is preferable to avoid using for loops when we can use vectorized operations on NumPy arrays
```

## Array Arithmetic

Hopefully now I've convinced you to use vectorized operations in NumPy. They are, fortunately, very easy to use and feel very natural to use with the standard addition, subtraction, multiplication, and division operatiors in Python. Essentially, any operator on a NumPy array is applied on each element.

In [7]:
some_array = np.arange(9)
some_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [8]:
print("x + 3 ->", some_array + 3)
print("x - 3 ->", some_array - 3)
print("x * 2 ->", some_array * 2)
print("x / 2 ->", some_array / 2)
print("x // 2 ->", some_array // 2)
print("x ** 2 ->", some_array ** 2)
print("x % 2 ->", some_array % 2)

x + 3 -> [ 3  4  5  6  7  8  9 10 11]
x - 3 -> [-3 -2 -1  0  1  2  3  4  5]
x * 2 -> [ 0  2  4  6  8 10 12 14 16]
x / 2 -> [0.  0.5 1.  1.5 2.  2.5 3.  3.5 4. ]
x // 2 -> [0 0 1 1 2 2 3 3 4]
x ** 2 -> [ 0  1  4  9 16 25 36 49 64]
x % 2 -> [0 1 0 1 0 1 0 1 0]


We can combine these operations together as well to do more complex operations to all elements of an array. For instance, suppose we wanted to approximate $\pi$ which is $$\frac{\pi^2}{6} = \frac{1}{1^2} + \frac{1}{2^2} + \frac{1}{3^2} + \frac{1}{4^2} + \space ...$$

In [9]:
# We first calculate the RHS elements
rhs = 1 / (np.arange(1, 1000)**2)

In [10]:
# Then we take the sum, multiply by 6, and take the sqrt
(sum(rhs)*6) ** (1/2)

3.14063710098594

## Wrappers in NumPy

Each of the standard operators (e.g. `+`, `-`, `*`) were simply convenient wrappers for specific functions built into NumPy. For example, the `+` operator is a wrapper for the `add` function in NumPy. Equivalently, we can use the syntax:

In [11]:
print("x + 3 ->", np.add(some_array, 3))
print("x - 3 ->", np.subtract(some_array, 3))
print("x * 2 ->", np.multiply(some_array, 2))
print("x / 2 ->", np.divide(some_array, 2))
print("x // 2 ->", np.floor_divide(some_array, 2))
print("x ** 2 ->", np.power(some_array, 2))
print("x % 2 ->", np.mod(some_array, 2))

x + 3 -> [ 3  4  5  6  7  8  9 10 11]
x - 3 -> [-3 -2 -1  0  1  2  3  4  5]
x * 2 -> [ 0  2  4  6  8 10 12 14 16]
x / 2 -> [0.  0.5 1.  1.5 2.  2.5 3.  3.5 4. ]
x // 2 -> [0 0 1 1 2 2 3 3 4]
x ** 2 -> [ 0  1  4  9 16 25 36 49 64]
x % 2 -> [0 1 0 1 0 1 0 1 0]


## Aggregation Functions

Apart from doing arithmetic with NumPy arrays, we can also get summary statistics easily using built-in NumPy operations. When we're analyzing data, it is always useful to compute these to get a good sense of what our data looks like. Here are some of the most common NumPy operations:

| Aggregation | Description        |
|-------------|--------------------|
| `np.mean`     | mean               |
| `np.max`      | maximum            |
| `np.min`      | minimum            |
| `np.median`   | median             |
| `np.var`      | variance           |
| `np.std`     | standard deviation |

```{tip}
You can always type `np.` andd press `TAB` to see all the operations available!

```

### Example

To illustrate how we can use these operations, it's best to learn through an example. Suppose we collected some data on textbook prices and wanted to do some initial data exploration on this data.

In [12]:
tb = np.array([95,19.95,51.5,128.5,96,48.5,146.75,92,19.5,85.5,16.95,9.95,5.95,58.75,6.5,70.75,
               4.25,115.25,158,6.5,130.5,7,41.25,169.75,71.25,82.25,12.95,127,41.5,31])

In [13]:
print("Minimum price:", np.min(tb))
print("Maximum price:", np.max(tb))
print("Mean price:", np.mean(tb))
print("Median price:", np.mean(tb))
print("Variance:", np.var(tb))
print("Standard deviation:", np.std(tb))

Minimum price: 4.25
Maximum price: 169.75
Mean price: 65.01666666666667
Median price: 65.01666666666667
Variance: 2555.8738888888884
Standard deviation: 50.55565140406054


```{note}
Pretty straightforward isn't it? :)
```