In [2]:
import numpy as np


# Expressing Conditional Logic as Array Operations

The numpy.where function is a vectorized version of the ternary expression x if condition else y. Suppose we wanted to take a value from xarr whenever the corresponding value in cond is True, and otherwise take the value from yarr.


In [3]:
xarr = np.arange(start=1.1, step=0.1, stop=1.6)
yarr = np.arange(start=2.1, step=0.1, stop=2.6)
cond = np.array([True, False, True, True, False])
xarr, yarr, cond


(array([1.1, 1.2, 1.3, 1.4, 1.5]),
 array([2.1, 2.2, 2.3, 2.4, 2.5]),
 array([ True, False,  True,  True, False]))

With np.where you can write this very concisely:


In [4]:
result = np.where(cond, xarr, yarr)
result


array([1.1, 2.2, 1.3, 1.4, 2.5])

The second and third arguments to np.where don’t need to be arrays; one or both of them can be scalars. A typical use of where in data analysis is to produce a new array of values based on another array. Suppose you had a matrix of randomly generated data and you wanted to replace all positive values with 2 and all negative values with –2. This is very easy to do with np.where:


In [5]:
m = np.random.randn(4, 4)
m


array([[ 0.43147352, -0.24293988,  1.00034004,  0.17874913],
       [-1.25347415, -0.69294511, -0.27002237,  0.26531803],
       [-0.02968785,  0.45212987, -0.67668495,  0.53517712],
       [ 0.05113981,  2.74311302, -0.55341651, -0.93805042]])

In [6]:
result = np.where(m > 0, 2, -2)
result


array([[ 2, -2,  2,  2],
       [-2, -2, -2,  2],
       [-2,  2, -2,  2],
       [ 2,  2, -2, -2]])

You can combine scalars and arrays when using np.where. For example, I can replace all positive values in arr with the constant 2 like so:


In [7]:
result = np.where(m > 0, 2, m)
result


array([[ 2.        , -0.24293988,  2.        ,  2.        ],
       [-1.25347415, -0.69294511, -0.27002237,  2.        ],
       [-0.02968785,  2.        , -0.67668495,  2.        ],
       [ 2.        ,  2.        , -0.55341651, -0.93805042]])

# Mathematical and Statistical Methods

A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class. You can use aggregations (often called reductions) like sum, mean, and std (standard deviation) either by calling the array instance method or using the top-level NumPy function.

Here I generate some normally distributed random data and compute some aggregate statistics:


In [8]:
arr = np.random.randn(5, 4)
arr


array([[-1.46302906, -0.80702783, -0.40808779, -1.24174822],
       [ 1.7239681 , -1.03997388, -1.18737087, -0.25670151],
       [ 0.43588169,  0.094249  , -1.51841435,  0.36112125],
       [ 1.63713034, -0.78435896, -0.86198267, -0.71655086],
       [ 1.86229504,  0.16348912, -0.41143861,  0.29950148]])

In [9]:
sum = arr.sum()
sum


-4.1190485772668355

In [10]:
sum_a = arr.sum(axis=0)
sum_b = np.sum(a=arr, axis=0)
sum_a, sum_b


(array([ 4.19624611, -2.37362255, -4.38729428, -1.55437786]),
 array([ 4.19624611, -2.37362255, -4.38729428, -1.55437786]))

In [11]:
mean_a = arr.mean(axis=1)
mean_b = np.mean(a=arr, axis=1)
mean_a, mean_b


(array([-0.97997322, -0.19001954, -0.1567906 , -0.18144054,  0.47846176]),
 array([-0.97997322, -0.19001954, -0.1567906 , -0.18144054,  0.47846176]))

Other methods like cumsum and cumprod do not aggregate, instead producing an array of the intermediate results:


In [12]:
arr = np.arange(start=0, stop=8)
arr


array([0, 1, 2, 3, 4, 5, 6, 7])

In [13]:
acum = arr.cumsum()
acum


array([ 0,  1,  3,  6, 10, 15, 21, 28])

| Method           | Description                                                                                                          |
| :--------------- | :------------------------------------------------------------------------------------------------------------------- |
| `sum`            | Sum of all the elements in the array or along an axis; zero-length arrays have sum 0                                 |
| `mean`           | Arithmetic mean; zero-length arrays have `NaN` mean                                                                  |
| `std, var`       | Standard deviation and variance, respectively, with optional degrees of freedom adjustment (default denominator `n`) |
| `min, max`       | Minimum and maximum                                                                                                  |
| `argmin, argmax` | Indices of minimum and maximum elements, respectively                                                                |
| `cumsum`         | Cumulative sum of elements starting from 0                                                                           |
| `cumprod`        | Cumulative product of elements starting from 1                                                                       |


# Methods for Boolean Arrays

Boolean values are coerced to 1 (True) and 0 (False) in the preceding methods. Thus, sum is often used as a means of counting True values in a boolean array:


In [14]:
arr = np.random.randn(10)
arr


array([-2.08535515, -0.065388  , -0.03900853, -1.10027577, -0.47926534,
       -0.53330293,  1.13014956, -0.36744403,  0.92482734, -0.00957533])

In [15]:
count_a = (arr > 0).sum()
count_b = np.sum(a=arr > 0)
count_a, count_b


(2, 2)

There are two additional methods, any and all, useful especially for boolean arrays. any tests whether one or more values in an array is True, while all checks if every value is True:


In [16]:
bools = arr > 0
bools


array([False, False, False, False, False, False,  True, False,  True,
       False])

In [17]:
np.any(arr > 0)


True

In [18]:
np.all(arr > 0)


False

# Sorting


In [19]:
arr = np.random.randn(5, 3)
arr


array([[ 0.40024523, -1.16659724,  0.75001878],
       [ 1.22973523,  0.75573638, -1.22773601],
       [-1.54664885,  0.10042932,  0.76790885],
       [ 1.61941165, -0.22804117,  0.35300874],
       [ 1.87648378,  0.36577111, -1.56228005]])

In [20]:
arr = np.sort(a=arr, axis=1)
arr


array([[-1.16659724,  0.40024523,  0.75001878],
       [-1.22773601,  0.75573638,  1.22973523],
       [-1.54664885,  0.10042932,  0.76790885],
       [-0.22804117,  0.35300874,  1.61941165],
       [-1.56228005,  0.36577111,  1.87648378]])

# Unique and Other Set Logic

NumPy has some basic set operations for one-dimensional ndarrays. A commonly used one is np.unique, which returns the sorted unique values in an array:


In [21]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
names


array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [22]:
names = np.unique(ar=names)
names


array(['Bob', 'Joe', 'Will'], dtype='<U4')

Another function, np.in1d, tests membership of the values in one array in another, returning a boolean array


In [23]:
arr = np.array(object=[6, 0, 0, 3, 2, 5, 6])
arr


array([6, 0, 0, 3, 2, 5, 6])

In [24]:
arr = np.in1d(ar1=arr, ar2=[2, 3, 6])
arr


array([ True, False, False,  True,  True, False,  True])

| Method              | Description                                                                        |
| :------------------ | :--------------------------------------------------------------------------------- |
| `unique(x)`         | Compute the sorted, unique elements in `x`                                         |
| `intersect1d(x, y)` | Compute the sorted, common elements in `x` and `y`                                 |
| `union1d(x, y)`     | Compute the sorted union of elements                                               |
| `unique(x)`         | Compute the sorted, unique elements in `x`                                         |
| `intersect1d(x, y)` | Compute the sorted, common elements in `x` and `y`                                 |
| `union1d(x, y)`     | Compute the sorted union of elements                                               |
| `in1d(x, y)`        | Compute a boolean array indicating whether each element of `x` is contained in `y` |
| `setdiff1d(x, y)`   | Set difference, elements in `x` that are not in `y`                                |
| `setxor1d(x, y)`    | Set symmetric differences; elements that are in either of the arrays, but not both |


# Linear Algebra

Unlike some languages like MATLAB, multiplying two two-dimensional arrays with \* is an element-wise product instead of a matrix dot product. Thus, there is a function dot, both an array method and a function in the numpy namespace, for matrix multiplication:


In [26]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x, y, np.shape(a=x), np.shape(a=y)


(array([[1., 2., 3.],
        [4., 5., 6.]]),
 array([[ 6., 23.],
        [-1.,  7.],
        [ 8.,  9.]]),
 (2, 3),
 (3, 2))

In [29]:
res = np.dot(a=x, b=y)
res


array([[ 28.,  64.],
       [ 67., 181.]])

The @ symbol (as of Python 3.5) also works as an infix operator that performs matrix multiplication:


In [30]:
res = x @ y
res


array([[ 28.,  64.],
       [ 67., 181.]])

numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant. These are implemented under the hood via the same industry-standard linear algebra libraries used in other languages like MATLAB and R, such as BLAS, LAPACK, or possibly (depending on your NumPy build) the proprietary Intel MKL (Math Kernel Library):


In [48]:
X = np.random.randn(5, 5)
X


array([[-1.28509582,  1.98107986, -1.33629767, -1.00164732, -1.66440623],
       [ 0.94985483, -0.79587457, -0.97215701,  0.14036755,  0.68050897],
       [-1.84507951, -0.95719089, -1.85382065, -0.33537497, -0.76609293],
       [-1.56024073, -1.49149303, -0.08863089, -1.13232226, -0.07151287],
       [-0.87481565,  0.48963704, -0.68547723,  0.23272887,  0.49606477]])

In [49]:
mat = np.dot(a=X.T, b=X)
mat


array([[ 9.15766741,  0.36299658,  4.95226076,  3.6024355 ,  3.87642064],
       [ 0.36299658,  7.93860403, -0.30257982,  0.027763  , -2.75607164],
       [ 4.95226076, -0.30257982,  6.64516621,  1.76459311,  2.64907665],
       [ 3.6024355 ,  0.027763  ,  1.76459311,  2.47179318,  2.216022  ],
       [ 3.87642064, -2.75607164,  2.64907665,  2.216022  ,  4.07143326]])

In [50]:
mat_inverse = np.linalg.inv(a=mat)
mat_inverse


array([[ 0.36915969, -0.06672067, -0.14116197, -0.31880488, -0.13127507],
       [-0.06672067,  0.26343916, -0.03168244, -0.23125509,  0.38833749],
       [-0.14116197, -0.03168244,  0.27363575,  0.13493938, -0.13853269],
       [-0.31880488, -0.23125509,  0.13493938,  1.41082865, -0.70870028],
       [-0.13127507,  0.38833749, -0.13853269, -0.70870028,  1.1093495 ]])

In [51]:
mat_id = mat @ mat_inverse
mat_id


array([[ 1.00000000e+00,  5.23112635e-17, -3.14452795e-17,
         7.63939021e-17, -3.46793372e-16],
       [-2.02424801e-17,  1.00000000e+00, -1.80972107e-17,
        -3.52912802e-16,  1.90032722e-16],
       [ 1.50525585e-16, -6.84120725e-17,  1.00000000e+00,
         3.91540663e-16, -5.50444470e-16],
       [-9.52332529e-17, -8.11702344e-17,  5.31810414e-17,
         1.00000000e+00, -1.25813677e-16],
       [ 2.05521509e-16,  2.53505276e-16, -2.05783203e-17,
        -1.82260892e-17,  1.00000000e+00]])

In [52]:
q, r = np.linalg.qr(a=mat)
q, r


(array([[-0.78375592, -0.09011992,  0.37777654,  0.47530518, -0.0947361 ],
        [-0.03106694, -0.94674052, -0.02746135, -0.1530495 ,  0.28024802],
        [-0.42383759,  0.01064367, -0.9001399 , -0.00135784, -0.09997364],
        [-0.30831325, -0.02175974,  0.20288745, -0.77571508, -0.51144134],
        [-0.33176217,  0.30817853,  0.07152345, -0.38590816,  0.80057426]]),
 array([[-11.68433584,   0.50291644,  -8.11134555,  -5.06947241,
          -6.10929965],
        [  0.        ,  -8.40169814,   0.68888769,   0.29699103,
           3.49464616],
        [  0.        ,   0.        ,  -3.55493714,   0.43176592,
          -0.10362734],
        [  0.        ,   0.        ,   0.        ,  -1.06697713,
          -1.02949985],
        [  0.        ,   0.        ,   0.        ,   0.        ,
           0.72166099]]))

| Function | Description                                                                                                                                                |
| :------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `diag`   | Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or convert a 1D array into a square matrix with zeros on the off-diagonal |
| `dot`    | Matrix multiplication                                                                                                                                      |
| `trace`  | Compute the sum of the diagonal elements                                                                                                                   |
| `det`    | Compute the matrix determinant                                                                                                                             |
| `eig`    | Compute the eigenvalues and eigenvectors of a square matrix                                                                                                |
| `inv`    | Compute the inverse of a square matrix                                                                                                                     |
| `pinv`   | Compute the Moore-Penrose pseudo-inverse of a matrix                                                                                                       |
| `qr`     | Compute the QR decomposition                                                                                                                               |
| `svd`    | Compute the singular value decomposition (SVD)                                                                                                             |
| `solve`  | Solve the linear system Ax = b for x, where A is a square matrix                                                                                           |
| `lstsq`  | Compute the least-squares solution to `Ax = b`                                                                                                             |


# Pseudorandom Number Generation


| Function      | Description                                                                                          |
| :------------ | :--------------------------------------------------------------------------------------------------- |
| `seed`        | Seed the random number generator                                                                     |
| `permutation` | Return a random permutation of a sequence, or return a permuted range                                |
| `shuffle`     | Randomly permute a sequence in-place                                                                 |
| `rand`        | Draw samples from a uniform distribution                                                             |
| `randint`     | Draw random integers from a given low-to-high range                                                  |
| `randn`       | Draw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface) |
| `binomial`    | Draw samples from a binomial distribution                                                            |
| `normal`      | Draw samples from a normal (Gaussian) distribution                                                   |
| `beta`        | Draw samples from a beta distribution                                                                |
| `chisquare`   | Draw samples from a chi-square distribution                                                          |
| `gamma`       | Draw samples from a gamma distribution                                                               |
| `uniform`     | Draw samples from a uniform \[0, 1\) distribution                                                    |
