# NumPy Basics: Arrays and Vectorized Computation

NumPy, short for Numerical Python, is the fundamental package required for high
performance scientific computing and data analysis. It is the foundation on which
nearly all of the higher-level tools in this book are built. Here are some of the things it
provides:

* ndarray, a fast and space-efficient multidimensional array providing vectorized
arithmetic operations and sophisticated broadcasting capabilities

* Standard mathematical functions for fast operations on entire arrays of data
without having to write loops

* Tools for reading / writing array data to disk and working with memory-mapped
files

* Linear algebra, random number generation, and Fourier transform capabilities

* Tools for integrating code written in C, C++, and Fortran

The last bullet point is also one of the most important ones from an ecosystem point
of view. Because NumPy provides an easy-to-use C API, it is very easy to pass data to
external libraries written in a low-level language and also for external libraries to return
data to Python as NumPy arrays. This feature has made Python a language of choice
for wrapping legacy C/C++/Fortran codebases and giving them a dynamic and easyto-
use interface.
While NumPy by itself does not provide very much high-level data analytical functionality,
having an understanding of NumPy arrays and array-oriented computing will
help you use tools like pandas much more effectively. If you’re new to Python and just
looking to get your hands dirty working with data using pandas, feel free to give this
chapter a skim. For more on advanced NumPy features like broadcasting, see Chapter
12.

For most data analysis applications, the main areas of functionality I’ll focus on are:

* Fast vectorized array operations for data munging and cleaning, subsetting and
filtering, transformation, and any other kinds of computations

* Common array algorithms like sorting, unique, and set operations

* Efficient descriptive statistics and aggregating/summarizing data

* Data alignment and relational data manipulations for merging and joining together
heterogeneous data sets

* Expressing conditional logic as array expressions instead of loops with if-elifelse
branches

* Group-wise data manipulations (aggregation, transformation, function application).

Much more on this in Chapter 5
While NumPy provides the computational foundation for these operations, you will
likely want to use pandas as your basis for most kinds of data analysis (especially for
structured or tabular data) as it provides a rich, high-level interface making most common
data tasks very concise and simple. pandas also provides some more domainspecific
functionality like time series manipulation, which is not present in NumPy.

## The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray, which
is a fast, flexible container for large data sets in Python. Arrays enable you to perform
mathematical operations on whole blocks of data using similar syntax to the equivalent
operations between scalar elements:

In [4]:
import numpy as np
import pandas as pd

In [13]:
data = pd.read_csv('/Users/Kevin/Desktop/Books/Python4DataScience/ch04/array_ex.txt', header=None, sep=',', engine='python')

# convert to an array 
data = np.array(data)

In [14]:
data.shape


(6, 4)

In [15]:
data

array([[ 0.580052,  0.18673 ,  1.040717,  1.134411],
       [ 0.194163, -0.636917, -0.938659,  0.124094],
       [-0.12641 ,  0.268607, -0.695724,  0.047428],
       [-1.484413,  0.004176, -0.744203,  0.005487],
       [ 2.302869,  0.200131,  1.670238, -1.88109 ],
       [-0.19323 ,  1.047233,  0.482803,  0.960334]])

An ndarray is a generic multidimensional container for homogeneous data; that is, all
of the elements must be the same type. Every array has a shape, a tuple indicating the
size of each dimension, and a dtype, an object describing the data type of the array:

This chapter will introduce you to the basics of using NumPy arrays, and should be
sufficient for following along with the rest of the book. While it’s not necessary to have
a deep understanding of NumPy for many data analytical applications, becoming proficient
in array-oriented programming and thinking is a key step along the way to becoming
a scientific Python guru.

Whenever you see “array”, “NumPy array”, or “ndarray” in the text,
with few exceptions they all refer to the same thing: the ndarray object.

## Creating ndarrays

The easiest way to create an array is to use the array function. This accepts any sequence-
like object (including other arrays) and produces a new NumPy array containing
the passed data. For example, a list is a good candidate for conversion:

In [16]:
data1 = [6, 7.5, 8, 0 , 1]

In [17]:
arr1 = np.array(data1)

In [26]:
arr1

array([ 6. ,  7.5,  8. ,  0. ,  1. ])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional
array:

In [19]:
data2 = [[1,2,3,4], [5,6,7,8]]

In [20]:
arr2 = np.array(data2)

In [21]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [23]:
arr2.ndim

2

In [24]:
arr2.shape

(2, 4)

Unless explicitly specified (more on this later), np.array tries to infer a good data type
for the array that it creates. The data type is stored in a special dtype object; for example,
in the above two examples we have:

In [28]:
arr1.dtype

dtype('float64')

In [29]:
arr2.dtype

dtype('int64')

In addition to np.array, there are a number of other functions for creating new arrays.
As examples, zeros and ones create arrays of 0’s or 1’s, respectively, with a given length
or shape. empty creates an array without initializing its values to any particular value.
To create a higher dimensional array with these methods, pass a tuple for the shape:

In [30]:
np.zeros(10)

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [31]:
np.zeros((3,6))

array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

In [32]:
np.empty((2,3,2))

array([[[ -3.10503618e+231,   2.00389732e+000],
        [  2.96439388e-323,   0.00000000e+000],
        [  0.00000000e+000,   0.00000000e+000]],

       [[  0.00000000e+000,   0.00000000e+000],
        [  0.00000000e+000,   0.00000000e+000],
        [  0.00000000e+000,   8.34402697e-309]]])

arange is an array-valued version of the built-in Python range function:

In [33]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

See Table 4-1 for a short list of standard array creation functions. Since NumPy is
focused on numerical computing, the data type, if not specified, will in many cases be
float64 (floating point).

| Function          | Description                                                                                                                                                                  |
|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| array             | Convert input data (list, tuple, array, or other sequence type) to an ndarray either byinferring a dtype or explicitly specifying a dtype. Copies the input data by default. |
| asarray           | Convert input to ndarray, but do not copy if the input is already an ndarray                                                                                                 |
| arange            | Like the built-in range but returns an ndarray instead of a list.                                                                                                            |
| ones, ones_like   | Produce an array of all 1’s with the given shape and dtype. ones_like takes another array and produces a ones array of the same shape and dtype.                             |
| zeros, zeros_like | Like ones and ones_like but producing arrays of 0’s instead                                                                                                                  |
| empty, empty_like | Create new arrays by allocating new memory, but do not populate with any values like ones and zeros                                                                          |
| eye, identity     | Create a square N x N identity matrix (1’s on the diagonal and 0’s elsewhere)                                                                                                |

## Data Types for ndarrays

The data type or dtype is a special object containing the information the ndarray needs
to interpret a chunk of memory as a particular type of data:

In [34]:
arr1 = np.array([1,2,3], dtype=np.float64)

In [35]:
arr2 = np.array([1,2,3], dtype=np.int32)

In [37]:
arr1.dtype


dtype('float64')

In [38]:

arr2.dtype

dtype('int32')

Dtypes are part of what make NumPy so powerful and flexible. In most cases they map
directly onto an underlying machine representation, which makes it easy to read and
write binary streams of data to disk and also to connect to code written in a low-level
language like C or Fortran. The numerical dtypes are named the same way: a type name,
like float or int, followed by a number indicating the number of bits per element. A
standard double-precision floating point value (what’s used under the hood in Python’s
float object) takes up 8 bytes or 64 bits. Thus, this type is known in NumPy as
float64. See Table 4-2 for a full listing of NumPy’s supported data types.

You can explicitly convert or cast an array from one dtype to another using ndarray’s
astype method:

In [40]:
arr = np.array([1,2,3,4,5])

In [41]:
arr.dtype

dtype('int64')

In [42]:
float_arr = arr.astype(np.float64)

In [44]:
float_arr.dtype


dtype('float64')

In [45]:
float_arr

array([ 1.,  2.,  3.,  4.,  5.])

In this example, integers were cast to floating point. If I cast some floating point numbers
to be of integer dtype, the decimal part will be truncated:

In [46]:
arr = np.array([3.7, -1.2, -2.5, 0.5, 12.9, 10.1])

In [47]:
arr

array([  3.7,  -1.2,  -2.5,   0.5,  12.9,  10.1])

In [48]:
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10], dtype=int32)

Should you have an array of strings representing numbers, you can use astype to convert
them to numeric form:

In [52]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings

array([b'1.25', b'-9.6', b'42'], 
      dtype='|S4')

In [53]:
numeric_strings.astype(float)

array([  1.25,  -9.6 ,  42.  ])

If casting were to fail for some reason (like a string that cannot be converted to
float64), a TypeError will be raised. See that I was a bit lazy and wrote float instead of
np.float64; NumPy is smart enough to alias the Python types to the equivalent dtypes.

You can also use another array’s dtype attribute:

In [55]:
int_array = np.arange(10)

In [56]:
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)

In [57]:
int_array.astype(calibers.dtype)

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

There are shorthand type code strings you can also use to refer to a dtype:

In [58]:
empty_uint32 = np.empty(8, dtype='u4')

In [59]:
empty_uint32

array([0, 0, 1, 0, 2, 0, 3, 0], dtype=uint32)

Calling astype always creates a new array (a copy of the data), even if
the new dtype is the same as the old dtype.

## NOTE 
It’s worth keeping in mind that floating point numbers, such as those
in float64 and float32 arrays, are only capable of approximating fractional
quantities. In complex computations, you may accrue some
floating point error, making comparisons only valid up to a certain number
of decimal places.

## Operations between Arrays and Scalars
Arrays are important because they enable you to express batch operations on data
without writing any for loops. This is usually called vectorization. Any arithmetic operations
between equal-size arrays applies the operation elementwise:

In [60]:
arr = np.array([[1., 2., 3.], [4., 5., 6. ]])

In [61]:
arr

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

In [62]:
arr * arr

array([[  1.,   4.,   9.],
       [ 16.,  25.,  36.]])

In [63]:
arr - arr

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [64]:
arr

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Arithmetic operations with scalars are as you would expect, propagating the value to
each element:

In [65]:
1/arr

array([[ 1.        ,  0.5       ,  0.33333333],
       [ 0.25      ,  0.2       ,  0.16666667]])

In [66]:
arr ** 0.5

array([[ 1.        ,  1.41421356,  1.73205081],
       [ 2.        ,  2.23606798,  2.44948974]])

Operations between differently sized arrays is called broadcasting and will be discussed
in more detail in Chapter 12. Having a deep understanding of broadcasting is not necessary
for most of this book.

## Basic Indexing and Slicing

NumPy array indexing is a rich topic, as there are many ways you may want to select
a subset of your data or individual elements. One-dimensional arrays are simple; on
the surface they act similarly to Python lists:

In [67]:
arr = np.arange(10)

In [68]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [72]:
arr[5]

5

In [73]:
arr[5:8]

array([5, 6, 7])

In [74]:
arr[5:8] = 12

In [75]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

As you can see, if you assign a scalar value to a slice, as in arr[5:8] = 12, the value is
propagated (or broadcasted henceforth) to the entire selection. An important first distinction
from lists is that array slices are views on the original array. This means that
the data is not copied, and any modifications to the view will be reflected in the source
array:

In [76]:
arr_slice = arr[5:8]

In [78]:
arr_slice[1] = 12345

In [79]:
arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,     9])

In [80]:
arr_slice[:] = 64

In [81]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

If you are new to NumPy, you might be surprised by this, especially if they have used
other array programming languages which copy data more zealously. As NumPy has
been designed with large data use cases in mind, you could imagine performance and
memory problems if NumPy insisted on copying data left and right.

If you want a copy of a slice of an ndarray instead of a view, you will
need to explicitly copy the array; for example arr[5:8].copy().

With higher dimensional arrays, you have many more options. In a two-dimensional
array, the elements at each index are no longer scalars but rather one-dimensional
arrays:

In [82]:
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [83]:
arr2d[2]

array([7, 8, 9])

Thus, individual elements can be accessed recursively. But that is a bit too much work,
so you can pass a comma-separated list of indices to select individual elements. So these
are equivalent:

In [85]:
arr2d[0][2]

3

In [86]:
arr2d[0,2]

3

See Figure 4-1 for an illustration of indexing on a 2D array.

In multidimensional arrays, if you omit later indices, the returned object will be a lowerdimensional
ndarray consisting of all the data along the higher dimensions. So in the
2 × 2 × 3 array arr3d

In [87]:
arr3d = np.array([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]])

In [88]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

arr3d[0] is a 2 × 3 array:

In [89]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

Both scalar values and arrays can be assigned to arr3d[0]:

In [90]:
old_values = arr3d[0].copy()

In [91]:
arr3d[0] = 42

In [92]:
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [93]:
arr3d[0] = old_values

In [94]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Similarly, arr3d[1, 0] gives you all of the values whose indices start with (1, 0), forming
a 1-dimensional array:

In [96]:
arr3d[1,0]

array([7, 8, 9])

Note that in all of these cases where subsections of the array have been selected, the
returned arrays are views.

## Indexing with slices

Like one-dimensional objects such as Python lists, ndarrays can be sliced using the
familiar syntax:

In [97]:
arr[1:6]

array([ 1,  2,  3,  4, 64])

Higher dimensional objects give you more options as you can slice one or more axes
and also mix integers. Consider the 2D array above, arr2d. Slicing this array is a bit
different:

In [98]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [99]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

As you can see, it has sliced along axis 0, the first axis. A slice, therefore, selects a range
of elements along an axis. You can pass multiple slices just like you can pass multiple
indexes:

In [100]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

In [101]:
arr2d[1, :2]

array([4, 5])

In [102]:
arr2d[2, :1]

array([7])

See Figure 4-2 for an illustration. Note that a colon by itself means to take the entire
axis, so you can slice only higher dimensional axes by doing:

In [103]:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

Of course, assigning to a slice expression assigns to the whole selection:

In [104]:
arr2d[:2, 1:] = 0 

### Boolean Indexing
Let’s consider an example where we have some data in an array and an array of names
with duplicates. I’m going to use here the randn function in numpy.random to generate
some random normally distributed data:

In [108]:
names = np.array(['Bob', 'Joe', 'Will', 'Joe', 'Joe'], dtype='|S4')

In [106]:
data

array([[ 0.580052,  0.18673 ,  1.040717,  1.134411],
       [ 0.194163, -0.636917, -0.938659,  0.124094],
       [-0.12641 ,  0.268607, -0.695724,  0.047428],
       [-1.484413,  0.004176, -0.744203,  0.005487],
       [ 2.302869,  0.200131,  1.670238, -1.88109 ],
       [-0.19323 ,  1.047233,  0.482803,  0.960334]])

Suppose each name corresponds to a row in the data array. If we wanted to select all
the rows with corresponding name 'Bob'. Like arithmetic operations, comparisons
(such as ==) with arrays are also vectorized. Thus, comparing names with the string
'Bob' yields a boolean array:

In [109]:
names == 'Bob'

False

This boolean array can be passed when indexing the array:

In [111]:
data[names == 'Bob']

  if __name__ == '__main__':


array([ 0.580052,  0.18673 ,  1.040717,  1.134411])

The boolean array must be of the same length as the axis it’s indexing. You can even
mix and match boolean arrays with slices or integers (or sequences of integers, more
on this later):

In [112]:
data[names == 'Bob', 2]

  if __name__ == '__main__':


1.0407170000000001

In [113]:
data[names == 'Bob', 3]

  if __name__ == '__main__':


1.1344110000000001

To select everything but 'Bob', you can either use != or negate the condition using -:

In [115]:
names != 'Bob'

True

In [116]:
data[-(names == 'Bob')]

array([ 0.580052,  0.18673 ,  1.040717,  1.134411])

Selecting two of the three names to combine multiple boolean conditions, use boolean
arithmetic operators like & (and) and | (or):

In [117]:
mask = (names == 'Bob') | (names == 'Will')

In [118]:
mask

False

In [119]:
data[mask]

  if __name__ == '__main__':


array([ 0.580052,  0.18673 ,  1.040717,  1.134411])

Selecting data from an array by boolean indexing always creates a copy of the data,
even if the returned array is unchanged.

The Python keywords and and or do not work with boolean arrays

Setting values with boolean arrays works in a common-sense way. To set all of the
negative values in data to 0 we need only do:

In [120]:
data[data < 0] = 0

In [121]:
data

array([[ 0.580052,  0.18673 ,  1.040717,  1.134411],
       [ 0.194163,  0.      ,  0.      ,  0.124094],
       [ 0.      ,  0.268607,  0.      ,  0.047428],
       [ 0.      ,  0.004176,  0.      ,  0.005487],
       [ 2.302869,  0.200131,  1.670238,  0.      ],
       [ 0.      ,  1.047233,  0.482803,  0.960334]])

Setting whole rows or columns using a 1D boolean array is also easy:

In [122]:
data[names != 'Joe'] = 7

  if __name__ == '__main__':


In [123]:
data

array([[  5.80052000e-01,   1.86730000e-01,   1.04071700e+00,
          1.13441100e+00],
       [  7.00000000e+00,   7.00000000e+00,   7.00000000e+00,
          7.00000000e+00],
       [  0.00000000e+00,   2.68607000e-01,   0.00000000e+00,
          4.74280000e-02],
       [  0.00000000e+00,   4.17600000e-03,   0.00000000e+00,
          5.48700000e-03],
       [  2.30286900e+00,   2.00131000e-01,   1.67023800e+00,
          0.00000000e+00],
       [  0.00000000e+00,   1.04723300e+00,   4.82803000e-01,
          9.60334000e-01]])

## Fancy Indexing

Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.
Suppose we had a 8 × 4 array:

In [124]:
arr = np.empty((8, 4))

In [125]:
arr

array([[ -3.10503618e+231,  -3.10503618e+231,   2.12355000e-314,
          1.48219694e-323],
       [  6.45059033e+094,   2.13609541e-314,   2.20025118e-314,
         -3.23206337e-134],
       [  0.00000000e+000,   0.00000000e+000,   3.65378930e-038,
          0.00000000e+000],
       [  0.00000000e+000,   2.12668643e-314,   0.00000000e+000,
          0.00000000e+000],
       [ -1.15699947e+144,   2.14059053e-314,   2.20304490e-314,
          9.56451277e-243],
       [  0.00000000e+000,   0.00000000e+000,  -3.00935218e+242,
          0.00000000e+000],
       [  0.00000000e+000,   2.12668601e-314,   0.00000000e+000,
          0.00000000e+000],
       [ -3.10503618e+231,  -3.10503618e+231,   2.12355000e-314,
          2.47032823e-323]])

In [126]:
for  i in range(8):
    arr[i] = i

In [127]:
arr

array([[ 0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.],
       [ 7.,  7.,  7.,  7.]])

To select out a subset of the rows in a particular order, you can simply pass a list or
ndarray of integers specifying the desired order:

In [128]:
arr[[4,3,0,6]]

array([[ 4.,  4.,  4.,  4.],
       [ 3.,  3.,  3.,  3.],
       [ 0.,  0.,  0.,  0.],
       [ 6.,  6.,  6.,  6.]])

Hopefully this code did what you expected! Using negative indices select rows from
the end:

In [129]:
arr[[-3, -5, -7]]

array([[ 5.,  5.,  5.,  5.],
       [ 3.,  3.,  3.,  3.],
       [ 1.,  1.,  1.,  1.]])

Passing multiple index arrays does something slightly different; it selects a 1D array of
elements corresponding to each tuple of indices:

In [131]:
# more on reshape in chapter 12
arr = np.arange(32).reshape((8,4))

In [132]:
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [133]:
arr[[1,5,7,2], [0,3,1,2]]

array([ 4, 23, 29, 10])

Take a moment to understand what just happened: the elements (1, 0), (5, 3), (7,
1), and (2, 2) were selected. The behavior of fancy indexing in this case is a bit different
from what some users might have expected (myself included), which is the rectangular
region formed by selecting a subset of the matrix’s rows and columns. Here is one way
to get that:

In [134]:
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

Another way is to use the np.ix_ function, which converts two 1D integer arrays to an
indexer that selects the square region:

In [135]:
arr[np.ix_([1,5,7,2], [0,3,1,2])]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

Keep in mind that fancy indexing, unlike slicing, always copies the data into a new array.

### Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping which similarly returns a view on the underlying
data without copying anything. Arrays have the transpose method and also
the special T attribute:

In [136]:
arr = np.arange(15).reshape((3,5))

In [137]:
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [138]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

When doing matrix computations, you will do this very often, like for example computing
the inner matrix product XTX using np.dot:

In [139]:
arr = np.random.randn(6, 3)

In [140]:
np.dot(arr.T, arr)

array([[ 8.71631597, -0.94258049,  2.66123094],
       [-0.94258049,  3.96490088, -1.03423874],
       [ 2.66123094, -1.03423874,  9.02295543]])

For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute
the axes (for extra mind bending):

In [141]:
arr = np.arange(16).reshape((2,2,4))

In [142]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [143]:
arr.transpose((1,0,2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

Simple transposing with .T is just a special case of swapping axes. ndarray has the
method swapaxes which takes a pair of axis numbers:

In [144]:
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [145]:
arr.swapaxes(1,2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

swapaxes similarly returns a view on the data without making a copy.

### Universal Functions: Fast Element-wise Array Functions

A universal function, or ufunc, is a function that performs elementwise operations on
data in ndarrays. You can think of them as fast vectorized wrappers for simple functions
that take one or more scalar values and produce one or more scalar results.
Many ufuncs are simple elementwise transformations, like sqrt or exp:

In [146]:
arr = np.arange(10)

In [147]:
np.sqrt(arr)

array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ])

In [148]:
np.exp(arr)

array([  1.00000000e+00,   2.71828183e+00,   7.38905610e+00,
         2.00855369e+01,   5.45981500e+01,   1.48413159e+02,
         4.03428793e+02,   1.09663316e+03,   2.98095799e+03,
         8.10308393e+03])

These are referred to as unary ufuncs. Others, such as add or maximum, take 2 arrays
(thus, binary ufuncs) and return a single array as the result:

In [153]:
x = np.random.randn(8)

In [154]:
y = np.random.randn(8)

In [155]:
x

array([-0.69589628, -0.91914113,  1.31490981,  1.05201425, -0.32736247,
        0.37848823,  0.10974476, -0.32588143])

In [156]:
y

array([ 1.47778236,  0.18937067,  0.06745715, -2.58180732,  0.78026212,
       -0.36311702, -0.06417011,  0.95603191])

In [157]:
# element-wise maximum 
np.maximum(x,y) 

array([ 1.47778236,  0.18937067,  1.31490981,  1.05201425,  0.78026212,
        0.37848823,  0.10974476,  0.95603191])