# NumPy Basics

Numerical Python, or "NumPy" for short, is a foundational package on which many of the most common data science packages are built.  Numpy provides us with high performance multi-dimensional arrays which we can use as vectors or matrices.  

The key features of numpy are:

- ndarrays: n-dimensional arrays of the same data type which are fast and space-efficient.  There are a number of built-in methods for ndarrays which allow for rapid processing of data without using loops (e.g., compute the mean).
- Broadcasting: a useful tool which defines implicit behavior between multi-dimensional arrays of different sizes.
- Vectorization: enables numeric operations on ndarrays.
- Input/Output: simplifies reading and writing of data from/to file.

**Additional Recommended Resources:**
- [Numpy Documentation](https://docs.scipy.org/doc/numpy/reference/)

In this brief tutorial, I will demonstrate some of the common NumPy operations you will see during the rest of the week.

In [1]:
import numpy as np
from __future__ import print_function

A common habit is to import under the ``np`` namespace as you will often find yourself typing ``numpy`` a lot otherwise. Two letters is easier on your fingers and your computer.

# Rank 1

In [2]:
np.arange(-1.0, 1.0, 0.1)

array([ -1.00000000e+00,  -9.00000000e-01,  -8.00000000e-01,
        -7.00000000e-01,  -6.00000000e-01,  -5.00000000e-01,
        -4.00000000e-01,  -3.00000000e-01,  -2.00000000e-01,
        -1.00000000e-01,  -2.22044605e-16,   1.00000000e-01,
         2.00000000e-01,   3.00000000e-01,   4.00000000e-01,
         5.00000000e-01,   6.00000000e-01,   7.00000000e-01,
         8.00000000e-01,   9.00000000e-01])

In [3]:
print(np.random.randint(0, 5, size=10))
print(np.ones(10))
print(np.zeros(10))

[0 4 1 2 2 4 4 0 1 1]
[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]


In [4]:
rank1_array = np.array([3, 33, 333])
print(type(rank1_array))
print(rank1_array.shape)
print(rank1_array.size)
print(rank1_array.dtype)
print(rank1_array[0], rank1_array[1], rank1_array[2]) 
print(rank1_array[:], rank1_array[1:], rank1_array[:2])

<type 'numpy.ndarray'>
(3,)
3
int64
3 33 333
[  3  33 333] [ 33 333] [ 3 33]


# Rank 2

In [5]:
print(np.ones((10,2))) # 10 rows, 2 columns
print(np.zeros((2,10))) # 2 columns, 10 rows

[[ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]]
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]


In [6]:
np.eye(10,10)*3 # diagonal of 1s but multiplied by 3

array([[ 3.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  3.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  3.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  3.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  3.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  3.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  3.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  3.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  3.]])

In [7]:
rank2_array = np.array([[11,12,13],[21,22,23],[31,32,33]])
print(type(rank2_array))
print(rank2_array.shape)
print(rank2_array.size)
print(rank2_array.dtype)
print(rank2_array[0], rank2_array[1], rank2_array[2]) 

<type 'numpy.ndarray'>
(3, 3)
9
int64
[11 12 13] [21 22 23] [31 32 33]


In [8]:
print(rank2_array[:]) # print everything in array

[[11 12 13]
 [21 22 23]
 [31 32 33]]


In [9]:
print(rank2_array[1:]) # slice from 2nd row and on

[[21 22 23]
 [31 32 33]]


In [10]:
print(rank2_array[:,0]) # all rows, but 1st column

[11 21 31]


In [11]:
print(rank2_array[:,1]) # all rows, but 2nd column

[12 22 32]


In [12]:
print(rank2_array[:,2]) # all rows, but 3rd column

[13 23 33]


In [13]:
print(rank2_array[0,1]) # i=0, j=1 of the 3x3 matrix we just made

12


# Rank 3 and beyond!

In [14]:
np.random.randint(0, 5, (2,5,5)) # 2 x 5 x 5 [3D matrix!]

array([[[3, 4, 0, 3, 3],
        [0, 3, 4, 0, 0],
        [3, 2, 4, 2, 2],
        [0, 0, 4, 3, 1],
        [4, 0, 1, 1, 1]],

       [[0, 3, 1, 1, 0],
        [2, 0, 0, 1, 3],
        [1, 4, 1, 4, 4],
        [2, 0, 2, 2, 3],
        [3, 3, 3, 4, 2]]])

In [15]:
np.random.randint(0, 5, (2,5,5)).shape

(2, 5, 5)

## Reshaping and Slicing Arrays

Oftentimes, we would like to change up the dimensions a bit. One natural way to do this with NumPy is to reshape arrays. Let's start with a 1-dimensional array of 72 elements to help understand how things get re-ordered or changed around.

In [16]:
np.arange(72).reshape(3,24)

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
        41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
        65, 66, 67, 68, 69, 70, 71]])

In [17]:
np.arange(72).reshape(24,3).T # tranpose; this is not the same as above! beware

array([[ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48,
        51, 54, 57, 60, 63, 66, 69],
       [ 1,  4,  7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49,
        52, 55, 58, 61, 64, 67, 70],
       [ 2,  5,  8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50,
        53, 56, 59, 62, 65, 68, 71]])

Note that the transpose is just ``ndarray().T``. But remember, things are not always what they seem. The above two examples have the exact same dimensionality -- but the reshaping will slice up the vector in different ways! Be careful!

In [18]:
np.arange(72).reshape(3, 2, -1) # -1 means to let NumPy figure out the size of the remaining dimension

array([[[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]],

       [[24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
        [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]],

       [[48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71]]])

In [19]:
np.arange(72).reshape(3, -1, 12) # -1 means to let NumPy figure out the size of the remaining dimension

array([[[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]],

       [[24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
        [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]],

       [[48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71]]])

In [20]:
np.arange(36).reshape(6, 6)

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

We can even combine multiple indices with Python slicing!

In [21]:
np.arange(36).reshape(6,6)[2:4,:3]

array([[12, 13, 14],
       [18, 19, 20]])

## Filtering

In [22]:
unfiltered_arr = np.arange(72).reshape(3, -1, 12)
unfiltered_arr

array([[[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]],

       [[24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
        [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]],

       [[48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71]]])

In [23]:
condition = unfiltered_arr % 3 == 0 # divisible by 3
condition # this is a bitmask!

array([[[ True, False, False,  True, False, False,  True, False, False,
          True, False, False],
        [ True, False, False,  True, False, False,  True, False, False,
          True, False, False]],

       [[ True, False, False,  True, False, False,  True, False, False,
          True, False, False],
        [ True, False, False,  True, False, False,  True, False, False,
          True, False, False]],

       [[ True, False, False,  True, False, False,  True, False, False,
          True, False, False],
        [ True, False, False,  True, False, False,  True, False, False,
          True, False, False]]], dtype=bool)

In [24]:
unfiltered_arr[condition] # this creates a view (subset) of the original array, not a copy

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48,
       51, 54, 57, 60, 63, 66, 69])

In [25]:
unfiltered_arr[condition] = 0 # only change the values matching the condition 
unfiltered_arr

array([[[ 0,  1,  2,  0,  4,  5,  0,  7,  8,  0, 10, 11],
        [ 0, 13, 14,  0, 16, 17,  0, 19, 20,  0, 22, 23]],

       [[ 0, 25, 26,  0, 28, 29,  0, 31, 32,  0, 34, 35],
        [ 0, 37, 38,  0, 40, 41,  0, 43, 44,  0, 46, 47]],

       [[ 0, 49, 50,  0, 52, 53,  0, 55, 56,  0, 58, 59],
        [ 0, 61, 62,  0, 64, 65,  0, 67, 68,  0, 70, 71]]])

In [26]:
unfiltered_arr.reshape(-1) # flatten it back!

array([ 0,  1,  2,  0,  4,  5,  0,  7,  8,  0, 10, 11,  0, 13, 14,  0, 16,
       17,  0, 19, 20,  0, 22, 23,  0, 25, 26,  0, 28, 29,  0, 31, 32,  0,
       34, 35,  0, 37, 38,  0, 40, 41,  0, 43, 44,  0, 46, 47,  0, 49, 50,
        0, 52, 53,  0, 55, 56,  0, 58, 59,  0, 61, 62,  0, 64, 65,  0, 67,
       68,  0, 70, 71])