In [2]:
import numpy as np


# Creating ndarrays

The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data. For example, a list is a good candidate for conversion:


In [3]:
x = [[1, 2, 3], [4, 5, 6]]
x = np.array(object=x)
x


array([[1, 2, 3],
       [4, 5, 6]])

In [4]:
x = [[[1, 2, 3], [1, 2, 3]], [[1, 2, 3], [1, 2, 3]],
     [[1, 2, 3], [1, 2, 3]], [[1, 2, 3], [1, 2, 3]]]
x = np.array(object=x)
x


array([[[1, 2, 3],
        [1, 2, 3]],

       [[1, 2, 3],
        [1, 2, 3]],

       [[1, 2, 3],
        [1, 2, 3]],

       [[1, 2, 3],
        [1, 2, 3]]])

In [5]:
np.shape(a=x)


(4, 2, 3)

In [47]:
np.ndim(a=x)


3

In addition to np.array, there are a number of other functions for creating new arrays. As examples, zeros and ones create arrays of 0s or 1s, respectively, with a given length or shape. empty creates an array without initializing its values to any particular value. To create a higher dimensional array with these methods, pass a tuple for the shape:


In [48]:
x = np.zeros(shape=10)
x


array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [49]:
x = np.ones(shape=(2, 3))
x


array([[1., 1., 1.],
       [1., 1., 1.]])

In [50]:
x = np.identity(n=4)
x


array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [51]:
x = np.full(shape=(2, 3), fill_value=3)
x


array([[3, 3, 3],
       [3, 3, 3]])

In [52]:
x = np.arange(start=40, stop=55)
x


array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54])

In [53]:
x = np.linspace(start=0, stop=10, num=10)
x


array([ 0.        ,  1.11111111,  2.22222222,  3.33333333,  4.44444444,
        5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ])

In [54]:
x = np.random.rand(3, 2)
x


array([[0.19846928, 0.70634456],
       [0.54556805, 0.75085976],
       [0.24423252, 0.05171883]])

In [55]:
x = np.random.randn(3, 4)
x


array([[-0.22748978, -0.65013224,  0.78508591,  0.26825463],
       [-1.42862199,  2.1157815 , -0.44232938, -1.0079761 ],
       [ 0.22387529,  1.19560086,  0.19095163,  0.51373562]])

`ones_like`, `zeros_like` and `full_like` takes another array and produces a array of the same shape and dtype, fill of ones, zeros or some value


# Data Types for ndarrays

The `data` type or dtype is a special object containing the information (or metadata, data about data) the ndarray needs to interpret a chunk of memory as a particular type of data:


In [56]:
x = np.array(object=x, dtype=np.float64)
x.dtype


dtype('float64')

Don’t worry about memorizing the NumPy dtypes, especially if you’re a new user. It’s often only necessary to care about the general kind of data you’re dealing with, whether floating point, complex, integer, boolean, string, or general Python object. When you need more control over how data are stored in memory and on disk, especially large datasets, it is good to know that you have control over the storage type.


You can explicitly convert or cast an array from one dtype to another using ndarray’s astype method:


In [57]:
x = x.astype(dtype=np.float64)
x.dtype


dtype('float64')

It’s important to be cautious when using the numpy.string\_ type, as string data in NumPy is fixed size and may truncate input without warning. pandas has more intuitive out-of-the-box behavior on non-numeric data.


# Arithmetic with NumPy Arrays

Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays applies the operation element-wise:


In [58]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
x


array([[1., 2., 3.],
       [4., 5., 6.]])

In [59]:
x * x


array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [60]:
x - x


array([[0., 0., 0.],
       [0., 0., 0.]])

Arithmetic operations with scalars propagate the scalar argument to each element in the array:


In [61]:
x * 2


array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]])

In [62]:
1 / x


array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

Comparisons between arrays of the same size yield boolean arrays:


In [63]:
y = x * 2
x > y


array([[False, False, False],
       [False, False, False]])

# Basic Indexing and Slicing

NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists:


In [64]:
x = np.arange(start=1, stop=10)
x


array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [65]:
x[4]


5

In [66]:
x[5:8]


array([6, 7, 8])

In [67]:
x[5:8] = 12
x


array([ 1,  2,  3,  4,  5, 12, 12, 12,  9])

In [68]:
x = np.random.randn(3, 3).astype(dtype=np.int32)
x


array([[-1,  0, -1],
       [ 0,  1,  1],
       [ 0,  0,  0]])

In [69]:
x[0]


array([-1,  0, -1])

In [70]:
x[0][2] == x[0, 2]


True

In [71]:
x[0] = 43
x


array([[43, 43, 43],
       [ 0,  1,  1],
       [ 0,  0,  0]])

In [72]:
x[:2, :1]


array([[43],
       [ 0]])

# Boolean Indexing

Let’s consider an example where we have some data in an array and an array of names with duplicates. I’m going to use here the randn function in numpy.random to generate some random normally distributed data:


In [73]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 4).astype(dtype=np.int32)
data


array([[-1,  0, -1, -2],
       [ 0, -1,  0, -1],
       [-1,  0,  0,  0],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0],
       [ 0, -1,  0,  0]])

Suppose each name corresponds to a row in the data array and we wanted to select all the rows with corresponding name 'Bob'. Like arithmetic operations, comparisons (such as ==) with arrays are also vectorized. Thus, comparing names with the string 'Bob' yields a boolean array:


In [74]:
names == 'Bob'


array([ True, False, False,  True, False, False, False])

This boolean array can be passed when indexing the array:


In [75]:
data[names == 'Bob']


array([[-1,  0, -1, -2],
       [ 0,  0,  0,  0]])

Boolean selection will not fail if the boolean array is not the correct length, so I recommend care when using this feature.


In [76]:
data[~(names == 'Bob')]


array([[ 0, -1,  0, -1],
       [-1,  0,  0,  0],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0],
       [ 0, -1,  0,  0]])

The Python keywords and and or do not work with boolean arrays. Use & (and) and | (or) instead.


In [77]:
data[data < 0] = 0
data


array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

# Fancy Indexing

Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.


In [78]:
x = np.empty(shape=(8, 3))
for i in range(8):
    x[i] = i
x


array([[0., 0., 0.],
       [1., 1., 1.],
       [2., 2., 2.],
       [3., 3., 3.],
       [4., 4., 4.],
       [5., 5., 5.],
       [6., 6., 6.],
       [7., 7., 7.]])

To select out a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order:


In [79]:
x[[4, 3, 0, 6]]


array([[4., 4., 4.],
       [3., 3., 3.],
       [0., 0., 0.],
       [6., 6., 6.]])

Passing multiple index arrays does something slightly different; it selects a one-dimensional array of elements corresponding to each tuple of indices:


In [80]:
x = np.arange(start=0, stop=32)
x = np.reshape(a=x, newshape=(8, 4))
x


array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [81]:
x[[1, 5, 7, 2], [0, 3, 1, 2]]


array([ 4, 23, 29, 10])

Here the elements (1, 0), (5, 3), (7, 1), and (2, 2) were selected. Regardless of how many dimensions the array has (here, only 2), the result of fancy indexing is always one-dimensional.


The behavior of fancy indexing in this case is a bit different from what some users might have expected (myself included), which is the rectangular region formed by selecting a subset of the matrix’s rows and columns. Here is one way to get that:


In [82]:
x[[1, 5, 7, 2]][:, [0, 3, 1, 2]]


array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

Keep in mind that fancy indexing, unlike slicing, always copies the data into a new array.


# Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping that similarly returns a view on the underlying data without copying anything. Arrays have the transpose method and also the special T attribute:


In [83]:
x = np.arange(start=0, stop=15)
x = np.reshape(a=x, newshape=(3, 5))
x


array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [84]:
x.T


array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes (for extra mind bending):


In [85]:
x = np.arange(12)
x = np.reshape(a=x, newshape=(2, 3, 2))
x


array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]]])

In [86]:
x = np.transpose(a=x, axes=(1, 0, 2))
x


array([[[ 0,  1],
        [ 6,  7]],

       [[ 2,  3],
        [ 8,  9]],

       [[ 4,  5],
        [10, 11]]])

In [87]:
x = np.swapaxes(a=x, axis1=0, axis2=1)
x


array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]]])