# NumPy Basics: Arrays and Vectorized Computation

> NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python. Many computational packages providing scientific functionality use NumPy's array objects as one of the standard interfaces lingua franca for data exchange. Numpy provides:



* ndarray, an efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities.

* Mathematical functions for fast operations on entire arrays of data without having to write loops.

* Tools for reading/writing array data to disk and working with memory-mapped files.

* Linear algebra, random number generation, and Fourier transform capabilities.

* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN

## What will be covered?

For most data analysis applications, the main areas of functionality I’ll focus on are:

 * Fast array-based operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations

 * Common array algorithms like sorting, unique, and set operations

 * Efficient descriptive statistics and aggregating/summarizing data

 * Data alignment and relational data manipulations for merging and joining together heterogeneous datasets

 * Expressing conditional logic as array expressions instead of loops with if-elif-else branches

 * Group-wise data manipulations (aggregation, transformation, function application)



> While NumPy provides a computational foundation for general numerical data processing, many readers will want to use pandas as the basis for most kinds of statistics or analytics, especially on tabular data. pandas also provides some more domain-specific functionality like time series manipulation, which is not present in NumPy.

### Why Numpy so Important?



One of the reasons NumPy is so important for numerical computations in Python is because it is designed for efficiency on large arrays of data. There are a number of reasons for this:

* NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy's library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.

* NumPy operations perform complex computations on entire arrays without the need for Python for loops, which can be slow for large sequences. NumPy is faster than regular Python code because its C-based algorithms avoid overhead present with regular interpreted Python code.


## Lets compare performance of NumPy and Python.

In [None]:
import numpy as np
np.random.seed(12345)
import matplotlib.pyplot as plt
plt.rc('figure', figsize=(10, 6))
np.set_printoptions(precision=4, suppress=True)

consider a NumPy array of one million integers, and the equivalent Python list:

In [None]:
list(range(5))

In [None]:
import numpy as np
my_arr = np.arange(1000000) # numpy array
my_list = list(range(1000000)) # python list

Now let's multiply each sequence by 2:

In [None]:
%time for _ in range(10): my_arr2 = my_arr * 2 #In Linux or Unix: %timeit
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

> NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts and use significantly less memory.

## The NumPy ndarray: A Multidimensional Array Object

- One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python. 

- Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.

Let us see how NumPy enables batch computations

In [None]:
import numpy as np
# Generate some random data
data = np.random.randn(2, 3)
data

In [None]:
data * 10

In [None]:
data + data

In the first example, all of the elements have been multiplied by 10. In the second, the corresponding values in each "cell" in the array have been added to each other.

> An ndarray is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type. Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an object describing the data type of the array:

In [None]:
data

In [None]:
data.shape

In [None]:
data.dtype

>  Whenever you see “array,” “NumPy array,” or “ndarray” in the book text, in most cases they all refer to the ndarray object.


### Creating ndarrays

> Using Array Creation: 

In [None]:
import numpy as np

In [None]:
x1 = np.array([[5, 10, 15],[3, 4, 5]])
x2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
x3 = np.array([1, 2, 3])

In [None]:
x1

In [None]:
x1
x1.shape

In [None]:
x2.shape

In [None]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

In [None]:
arr2.ndim

In [None]:
arr2.shape

In [None]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

In [None]:
arr1.ndim

In [None]:
arr1.shape

In [None]:
arr1.dtype
arr2.dtype

In [None]:
np.array((1, 2, 3))

> Note : To create an ndarray, we can pass a list, tuple or any array-like object into the array() method, and it will be converted into an ndarray:

Changing Dimensions

In [None]:
p = np.array([1, 2, 3, 4, 5])
print(p.shape)

In [None]:
p = np.array([1, 2, 3, 4, 5], ndmin = 2)

print(p.shape)


In [None]:
p = np.array([[1, 2, 3, 4, 5]])

print(p.shape)


### Dimensions in Arrays
A dimension in arrays is one level of array depth (nested arrays).

- 0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.
  
- An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array (often used to represent matrix or 2nd order tensors).
  
- An array that has 1-D arrays as its elements is called a 2-D array(often used to represent a 3rd order tensor.).
  
- An array that has 2-D arrays (matrices) as its elements is called 3-D array(These are often used to represent a 3rd order tensor.).

In [None]:
import numpy as np

a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])


print(a.ndim)
print(b.ndim)
print(c.ndim)
# print(d.ndim)     

Other functions for creating an ndarray are:

In [None]:
# np.zeros(10)
# np.zeros((3, 6))
np.empty((2, 3, 2))

> It’s not safe to assume that np.empty will return an array of all zeros. This function returns uninitialized memory and thus may contain non-zero "garbage" values.


arange is an array-valued version of the built-in Python range function:

In [None]:
np.arange(15)

<center>

![](./images/important_function_creationdy.png)

</center>

### Data Types for ndarrays

> The data type or dtype is a special object containing the information (or metadata, data about data) the ndarray needs to interpret a chunk of memory as a particular type of data:

In [None]:
a = 3
type(a)

In [None]:
array = np.arange(10)
array.dtype

In [None]:
type(array)

> Since NumPy is focused on numerical computing, the data type, if not specified, will in many cases be float64 (floating point).

In [None]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)

In [None]:
arr1.dtype

In [None]:
arr2.dtype

In [None]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype

You can explicitly convert or cast an array from one dtype to another using ndarray’s astype method:

In [None]:
float_arr = arr.astype(np.float64)
float_arr.dtype

In [None]:
a = 3.50
int(a)

In [None]:
float_arr

In [None]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr
arr.astype(np.int32)

If you have an array of strings representing numbers, you can use astype to convert them to numeric form:

In [None]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings.astype(float) # note we use float here not float64

> NumPy aliases the Python types to its own equivalent data dtypes

You can also use another array’s dtype attribute:

In [None]:
int_array = np.arange(10)
int_array

In [None]:
int_array.dtype

In [None]:
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
int_array.astype(calibers.dtype)

There are shorthand type code strings you can also use to refer to a dtype:

In [None]:
empty_uint32 = np.empty(8, dtype='u4')
empty_uint32

<center>

![](./images/numpy_data_type.png)

</center>


### Arithmetic with NumPy Arrays

> Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays applies the operation element-wise:

In [None]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

In [None]:
arr * arr

In [None]:
arr - arr

Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [None]:
1 / arr

In [None]:
arr ** 0.5

Comparisons between arrays of the same size yield boolean arrays:

In [None]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2
arr2 > arr

### Basic Indexing 

There are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists:

In [346]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [347]:
arr[5]

5

In [349]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [348]:
arr[5:8]

array([5, 6, 7])

In [350]:
arr[5:8] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

 if you assign a scalar value to a slice, as in arr[5:8] = 12, the value is propagated (or broadcasted henceforth) to the entire selection

> An important first distinction from Python's built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array

In [352]:
a = [1,3,5,7,9]

b = a[1:3]

In [353]:
b

[3, 5]

In [357]:
b[1] =100

In [358]:
b

[3, 100]

In [359]:
a

[1, 3, 5, 7, 9]

In [360]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [361]:
arr_slice = arr[5:8]
arr_slice

array([12, 12, 12])

In [362]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [363]:
arr_slice

array([12, 12, 12])

In [364]:
arr_slice[0] = 12345
arr_slice

array([12345,    12,    12])

In [365]:
arr

array([    0,     1,     2,     3,     4, 12345,    12,    12,     8,
           9])

> If you want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the array—for example, arr[5:8].copy().


In [None]:
arr = np.arange(10)
arr_slice = arr[5:8].copy()
arr_slice[1] = 12345
arr

Slicing higher dimensions is also possible:

In [367]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[0]

array([1, 2, 3])

Individual elements can be accessed recursively.

In [368]:
arr2d[0][0]

1

 But that is a bit too much work, so you can pass a comma-separated list of indices to select individual elements. So these are equivalent:

In [369]:
arr2d[0, 0]

1

In multidimensional arrays, if you omit later indices, the returned object will be a lower dimensional ndarray consisting of all the data along the higher dimensions. So in the 2 × 2 × 3 array arr3d:

In [375]:
arr3d = np.array([
                  [[1, 2, 3], [4, 5, 6]], 
                  [[7, 8, 9], [10, 11, 12]]
                  
                 ])
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [371]:
arr3d[0] #is a 2 × 3 array:

array([[1, 2, 3],
       [4, 5, 6]])

In [372]:
arr3d[1] #is a 2 × 3 array:

array([[ 7,  8,  9],
       [10, 11, 12]])

Both scalar values and arrays can be assigned to arr3d[0]:

In [373]:
old_values = arr3d[0].copy()
old_values

array([[1, 2, 3],
       [4, 5, 6]])

In [376]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [374]:
arr3d[0] = 42
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [377]:
arr3d[0] = old_values
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

> Similarly, arr3d[1, 0] gives you all of the values whose indices start with (1, 0), forming a 1-dimensional array:

In [378]:
arr3d[1, 0]

array([7, 8, 9])

In [None]:
arr3d[1, 1]

In [None]:
arr3d[0, 1]

Note that in all of these cases where subsections of the array have been selected, the returned arrays are views.

> This expression is the same as though we had indexed in two steps:

In [None]:
arr3d = np.array([
                  [[1, 2, 3], [4, 5, 6]], 
                  [[7, 8, 9], [10, 11, 12]]
                  
                 ])
arr3d

In [None]:
x = arr3d[1]
x
x[0]

In [None]:
x[0]

#### Indexing with slices

Like one-dimensional objects such as Python lists, ndarrays can be sliced with the familiar syntax:

In [379]:
arr = np.array([0,  1,  2,  3,  4, 64, 64, 64,  8,  9])
arr[1:6]

array([ 1,  2,  3,  4, 64])

Consider the two-dimensional array from before, arr2d. Slicing this array is a bit different:

In [380]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [381]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

You can pass multiple slices just like you can pass multiple indexes:

In [382]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

When slicing like this, you always obtain array views of the same number of dimensions. By mixing integer indexes and slices, you get lower dimensional slices.



In [383]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

For example, I can select the second row but only the first two columns like so:



In [384]:
arr2d[1, :2]

array([4, 5])

Similarly, we can select the third column but only the first two rows like so:



In [None]:
arr2d[:2, 2]

Note that a colon by itself means to take the entire axis, 

In [385]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [386]:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

ssigning to a slice expression assigns to the whole selection:



In [387]:
arr2d[:2, 1:] = 0
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

### Boolean Indexing

Let’s consider an example where we have some data in an array and an array of names with duplicates. 


In [388]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
names


array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [389]:
data = np.random.randn(7, 4)
data

array([[ 1.2525,  1.2563, -1.2063, -0.89  ],
       [-0.6116, -1.0631,  1.6946, -0.0862],
       [-2.1832,  0.2069,  0.9488,  1.418 ],
       [-1.0427, -0.0559, -0.7509,  1.0754],
       [-0.7558,  0.63  , -0.7485, -0.1017],
       [ 0.2334, -1.3075,  1.8284, -1.7304],
       [ 0.1441,  0.5168,  0.6293, -2.089 ]])

> Suppose each name corresponds to a row in the data array and we wanted to select all the rows with corresponding name 'Bob'. Like arithmetic operations, comparisons (such as ==) with arrays are also vectorized. Thus, comparing names with the string 'Bob' yields a boolean array:

In [390]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

This boolean array can be passed when indexing the array,The boolean array must be of the same length as the array axis it’s indexing.



In [391]:
data[names == 'Bob']

array([[ 1.2525,  1.2563, -1.2063, -0.89  ],
       [-1.0427, -0.0559, -0.7509,  1.0754]])

 You can even mix and match boolean arrays with slices or integers (or sequences of integers; more on this later).

> Below, we select from the rows where names == 'Bob' and index the columns, too:

In [392]:
data[names == 'Bob', 2:]


array([[-1.2063, -0.89  ],
       [-0.7509,  1.0754]])

In [393]:
data[names == 'Bob', 3]

array([-0.89  ,  1.0754])

> To select everything but 'Bob', you can either use != or negate the condition using ~:



In [394]:
names != 'Bob'


array([False,  True,  True, False,  True,  True,  True])

In [396]:
~(names == 'Bob')

array([False,  True,  True, False,  True,  True,  True])

In [397]:
data[~(names == 'Bob')]

array([[-0.6116, -1.0631,  1.6946, -0.0862],
       [-2.1832,  0.2069,  0.9488,  1.418 ],
       [-0.7558,  0.63  , -0.7485, -0.1017],
       [ 0.2334, -1.3075,  1.8284, -1.7304],
       [ 0.1441,  0.5168,  0.6293, -2.089 ]])

In [None]:
cond = names == 'Bob'
data[~cond]

Selecting two of the three names to combine multiple boolean conditions, use boolean arithmetic operators like & (and) and | (or):



In [398]:
mask = (names == 'Bob') | (names == 'Will')
mask


array([ True, False,  True,  True,  True, False, False])

The Python keywords **and** and **or** do not work with boolean arrays. Use & (and) and | (or) instead.



In [399]:
data[mask]

array([[ 1.2525,  1.2563, -1.2063, -0.89  ],
       [-2.1832,  0.2069,  0.9488,  1.418 ],
       [-1.0427, -0.0559, -0.7509,  1.0754],
       [-0.7558,  0.63  , -0.7485, -0.1017]])

> Selecting data from an array by boolean indexing and assigning the result to a new variable always creates a copy of the data, even if the returned array is unchanged.



Setting values with boolean arrays works by substituting the value or values on the right hand side into the locations where the boolean array's values are True. To set all of the negative values in data to 0 we need only do:



In [None]:
data[data < 0] = 0
data

You can also set whole rows or columns using a one-dimensional boolean array:



In [None]:
data[names != 'Joe'] = 7
data

> these types of operations on two-dimensional data are convenient to do with pandas

### Fancy Indexing

In [400]:
x = np.array([11,3,5])

In [401]:
x[1]

3

In [403]:
mask = x > 3
mask

array([ True, False,  True])

In [405]:
x

array([11,  3,  5])

In [406]:
mask

array([ True, False,  True])

In [407]:
x[mask]

array([11,  5])

Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays. Suppose we had an 8 × 4 array:



In [408]:
arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

To select out a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order:



In [409]:
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

 Using negative indices selects rows from the end:



In [410]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

the result of fancy indexing with multiple integer arrays is always one-dimensional.

Passing multiple index arrays does something slightly different; it selects a one-dimensional array of elements corresponding to each tuple of indices:



In [411]:
arr = np.arange(32).reshape((8, 4))
arr


array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [412]:
arr[[1, 5, 7, 2]]

array([[ 4,  5,  6,  7],
       [20, 21, 22, 23],
       [28, 29, 30, 31],
       [ 8,  9, 10, 11]])

The behavior of fancy indexing in this case is a bit different from what some users might have expected, which is the rectangular region formed by selecting a subset of the matrix’s rows and columns. Here is one way to get that:



In [413]:
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

>  fancy indexing, unlike slicing, always copies the data into a new array.

### Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping that similarly returns a view on the underlying data without copying anything. Arrays have the transpose method and also the special T attribute:



In [416]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [417]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

When doing matrix computations, you may do this very often—for example, when computing the inner matrix product using np.dot:



In [418]:
arr = np.random.randn(6, 3)
arr


array([[-0.0939, -1.4428,  0.0988],
       [-0.1116,  3.6406,  1.0246],
       [ 1.1633, -0.4258, -0.764 ],
       [-1.0624, -0.1659, -1.1122],
       [ 0.6888, -0.7401, -0.2246],
       [ 0.2968,  0.218 , -0.232 ]])

In [419]:
arr.shape

(6, 3)

In [None]:
arr (6,3) * arr (6,3)

Can we multiply arr * arr?

In [420]:
np.dot(arr, arr)

ValueError: shapes (6,3) and (6,3) not aligned: 3 (dim 1) != 6 (dim 0)

In [421]:
arr.T

array([[-0.0939, -0.1116,  1.1633, -1.0624,  0.6888,  0.2968],
       [-1.4428,  3.6406, -0.4258, -0.1659, -0.7401,  0.218 ],
       [ 0.0988,  1.0246, -0.764 , -1.1122, -0.2246, -0.232 ]])

In [426]:
arr.T.shape

(3, 6)

In [427]:
np.dot(arr.T, arr)

array([[ 3.0658, -1.0348, -0.0542],
       [-1.0348, 16.1397,  4.2128],
       [-0.0542,  4.2128,  2.9844]])

The @ infix operator is another way to do matrix multiplication:



In [428]:
arr.T @ arr

array([[ 3.0658, -1.0348, -0.0542],
       [-1.0348, 16.1397,  4.2128],
       [-0.0542,  4.2128,  2.9844]])

For higher dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes:



In [429]:
arr = np.arange(16).reshape((2, 2, 4))
arr


array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [430]:
arr.transpose((1, 0, 2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

Here, the axes have been reordered with the second axis first, the first axis second, and the last axis unchanged. While it can be difficult to visualize a multidimensional transposition, it is a "reorientation" of the array which does not result in any data being copied or moved around

>Simple transposing with .T is a special case of swapping axes. ndarray has the method swapaxes, which takes a pair of axis numbers and switches the indicated axes to rearrange the data:



In [None]:
arr

In [None]:
arr.shape

In [None]:
arr.swapaxes(1, 2)

In [None]:
arr.swapaxes(1, 2).shape

> swapaxes similarly returns a view on the data without making a copy.



## Universal Functions: Fast Element-Wise Array Functions

A universal function, or ufunc, is a function that performs element-wise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.



In [431]:
arr = np.arange(10)
arr


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [432]:
np.sqrt(arr)

array([0.    , 1.    , 1.4142, 1.7321, 2.    , 2.2361, 2.4495, 2.6458,
       2.8284, 3.    ])

In [433]:
np.exp(arr)

array([   1.    ,    2.7183,    7.3891,   20.0855,   54.5982,  148.4132,
        403.4288, 1096.6332, 2980.958 , 8103.0839])

> These are referred to as unary ufuncs. Others, such as add or maximum, take two arrays (thus, binary ufuncs) and return a single array as the result:



In [434]:
x = np.random.randn(8)
y = np.random.randn(8)


In [435]:
x


array([ 3.0089,  1.6861,  0.4341,  1.2239, -1.9554,  0.5037, -0.1961,
        1.6686])

In [436]:
y


array([-0.4837, -0.0579, -1.3437,  0.4177, -1.1981,  0.7666, -0.3486,
       -0.3737])

In [437]:
np.maximum(x, y)

array([ 3.0089,  1.6861,  0.4341,  1.2239, -1.1981,  0.7666, -0.1961,
        1.6686])

Here, numpy.maximum computed the element-wise maximum of the elements in x and y.



>While not common, a ufunc can return multiple arrays. modf is one example, a vectorized version of the built-in Python divmod; it returns the fractional and integral parts of a floating-point array:



In [438]:
arr = np.random.randn(7) * 5
arr


array([ 3.9905,  6.2159, 10.4706, -2.9525, -1.1402,  3.2345, -1.1191])

In [439]:
remainder, whole_part = np.modf(arr)


In [440]:
remainder


array([ 0.9905,  0.2159,  0.4706, -0.9525, -0.1402,  0.2345, -0.1191])

In [441]:
whole_part

array([ 3.,  6., 10., -2., -1.,  3., -1.])

Ufuncs accept an optional out argument that allows them to assign their results into an existing array rather than creating a new one:



In [None]:
arr


In [None]:
out = np.zeros_like(arr)
out

In [442]:
np.add(arr, 1)

array([ 4.9905,  7.2159, 11.4706, -1.9525, -0.1402,  4.2345, -0.1191])

In [None]:
np.add(arr, 1, out=out)

## Array-Oriented Programming with Arrays

In [None]:
points = np.arange(-5, 5, 0.01) # 1000 equally spaced points
xs, ys = np.meshgrid(points, points)
ys

In [None]:
z = np.sqrt(xs ** 2 + ys ** 2)
z

In [None]:
import matplotlib.pyplot as plt
plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")

In [None]:
plt.draw()

In [None]:
plt.close('all')

### Expressing Conditional Logic as Array Operations

In [None]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

In [None]:
result = [(x if c else y)
          for x, y, c in zip(xarr, yarr, cond)]
result

In [None]:
result = np.where(cond, xarr, yarr)
result

In [None]:
arr = np.random.randn(4, 4)
arr
arr > 0
np.where(arr > 0, 2, -2)

In [None]:
np.where(arr > 0, 2, arr) # set only positive values to 2

### Mathematical and Statistical Methods

In [None]:
arr = np.random.randn(5, 4)
arr
arr.mean()
np.mean(arr)
arr.sum()

In [None]:
arr.mean(axis=1)
arr.sum(axis=0)

In [None]:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])
arr.cumsum()

In [None]:
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr
arr.cumsum(axis=0)
arr.cumprod(axis=1)

### Methods for Boolean Arrays

In [None]:
arr = np.random.randn(100)
(arr > 0).sum() # Number of positive values

In [None]:
bools = np.array([False, False, True, False])
bools.any()
bools.all()

### Sorting

In [None]:
arr = np.random.randn(6)
arr
arr.sort()
arr

In [None]:
arr = np.random.randn(5, 3)
arr
arr.sort(1)
arr

In [None]:
large_arr = np.random.randn(1000)
large_arr.sort()
large_arr[int(0.05 * len(large_arr))] # 5% quantile

### Unique and Other Set Logic

In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)

In [None]:
sorted(set(names))

In [None]:
values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values, [2, 3, 6])

## File Input and Output with Arrays

In [None]:
arr = np.arange(10)
np.save('some_array', arr)

In [None]:
np.load('some_array.npy')

In [None]:
np.savez('array_archive.npz', a=arr, b=arr)

In [None]:
arch = np.load('array_archive.npz')
arch['b']

In [None]:
np.savez_compressed('arrays_compressed.npz', a=arr, b=arr)

In [None]:
!rm some_array.npy
!rm array_archive.npz
!rm arrays_compressed.npz

## Linear Algebra

In [None]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x
y
x.dot(y)

In [None]:
np.dot(x, y)

In [None]:
np.dot(x, np.ones(3))

In [None]:
x @ np.ones(3)

In [None]:
from numpy.linalg import inv, qr
X = np.random.randn(5, 5)
mat = X.T.dot(X)
inv(mat)
mat.dot(inv(mat))
q, r = qr(mat)
r

## Pseudorandom Number Generation

In [None]:
samples = np.random.normal(size=(4, 4))
samples

In [None]:
from random import normalvariate
N = 1000000
%timeit samples = [normalvariate(0, 1) for _ in range(N)]
%timeit np.random.normal(size=N)

In [None]:
np.random.seed(1234)

In [None]:
rng = np.random.RandomState(1234)
rng.randn(10)

## Example: Random Walks

In [None]:
import random
position = 0
walk = [position]
steps = 1000
for i in range(steps):
    step = 1 if random.randint(0, 1) else -1
    position += step
    walk.append(position)

In [None]:
plt.figure()

In [None]:
plt.plot(walk[:100])

In [None]:
np.random.seed(12345)

In [None]:
nsteps = 1000
draws = np.random.randint(0, 2, size=nsteps)
steps = np.where(draws > 0, 1, -1)
walk = steps.cumsum()

In [None]:
walk.min()
walk.max()

In [None]:
(np.abs(walk) >= 10).argmax()

### Simulating Many Random Walks at Once

In [None]:
nwalks = 5000
nsteps = 1000
draws = np.random.randint(0, 2, size=(nwalks, nsteps)) # 0 or 1
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(1)
walks

In [None]:
walks.max()
walks.min()

In [None]:
hits30 = (np.abs(walks) >= 30).any(1)
hits30
hits30.sum() # Number that hit 30 or -30

In [None]:
crossing_times = (np.abs(walks[hits30]) >= 30).argmax(1)
crossing_times.mean()

In [None]:
steps = np.random.normal(loc=0, scale=0.25,
                         size=(nwalks, nsteps))

## Conclusion