# Introduction to NumPy


<a id='index-1'></a>

### References

- Numpy documentation: [http://docs.scipy.org/doc/numpy/reference/](http://docs.scipy.org/doc/numpy/reference/)  

### Importing the numpy library

In [1]:
import numpy as np   # Importing numpy. 'np' is just the reference of numpy, we can use any suitable (usually shorter) name here

## NumPy Arrays


<a id='index-3'></a>
The most important thing that NumPy defines is an array data type formally called a [numpy.ndarray](http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html)

To create a NumPy array containing only zeros we use  [np.zeros](http://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html#numpy.zeros)

In [3]:
a = np.zeros(3)
a

array([0., 0., 0.])

In [4]:
type(a)

numpy.ndarray

NumPy arrays are somewhat like native Python lists, except that

- Data *must be homogeneous* (all elements of the same type)  
- These types must be one of the data types (`dtypes`) provided by NumPy  


The most important of these dtypes are:

- float64: 64 bit floating point number  
- int32: 32 bit integer  
- bool:  8 bit True or False  

On modern machines, the default dtype for arrays is `float64`

In [4]:
a = np.zeros(3)
type(a[0])

numpy.float64

If we want to use integers we can specify as follows:

In [5]:
a = np.zeros(3, dtype=int)
type(a[0])

numpy.int32


<a id='numpy-shape-dim'></a>

### Shape and Dimension


<a id='index-4'></a>
Consider the following assignment

In [6]:
z = np.zeros(10)
z

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

Here `z` is a *flat* array with no dimension — neither row nor column vector

The dimension is recorded in the `shape` attribute, which is a tuple

In [7]:
z.shape

(10,)

Here the shape tuple has only one element, which is the length of the array (tuples with one element end with a comma)

To give it dimension, we can change the `shape` attribute

In [9]:
z.shape = (2,5)
z

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [9]:
z = np.zeros(4)
z.shape = (2, 2)
z

array([[0., 0.],
       [0., 0.]])

In the last case, to make the 2 by 2 array, we could also pass a tuple to the `zeros()` function, as
in `z = np.zeros((2, 2))`


<a id='creating-arrays'></a>

In [10]:
np.zeros((2,2))

array([[0., 0.],
       [0., 0.]])

### Creating Arrays


<a id='index-5'></a>
As we’ve seen, the `np.zeros` function creates an array of zeros

Related is `np.empty`, which creates arrays in memory that can later be populated with data

In [10]:
z = np.empty(3)
z

array([0., 0., 0.])

The numbers you see here are garbage values

Let's see what `np.ones` creates

In [11]:
np.ones((6,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

To set up a grid of evenly spaced numbers use `np.linspace`

In [14]:
z = np.linspace(2, 10, 5)  # From 2 to 10, with 5 elements
z

array([ 2.,  4.,  6.,  8., 10.])

To create an identity matrix use either `np.identity` or `np.eye`

In [14]:
z = np.identity(3)
z

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In addition, NumPy arrays can be created from Python lists, tuples, etc. using `np.array`

In [15]:
z = np.array([10, 20])                 # ndarray from Python list
z

array([10, 20])

In [16]:
type(z)

numpy.ndarray

In [17]:
z = np.array((10, 20), dtype=float)    # Here 'float' is equivalent to 'np.float64'
z

array([10., 20.])

In [15]:
z = np.array([[1, 2], [3, 4], [5,6]])         # 2D array from a list of lists
z

array([[1, 2],
       [3, 4],
       [5, 6]])

See also `np.asarray`, which performs a similar function, but does not make
a distinct copy of data already in a NumPy array

In [24]:
na = np.linspace(10, 20, 2)
na is np.asarray(na)   # Does not copy NumPy arrays

True

In [21]:
na is np.array(na)     # Does make a new copy --- perhaps unnecessarily

False

### Array Indexing


<a id='index-6'></a>
For a flat array, indexing is the same as Python sequences:

In [21]:
z = np.linspace(1, 2, 5)
z

array([1.  , 1.25, 1.5 , 1.75, 2.  ])

In [22]:
z[0]

1.0

In [23]:
z[0:2]  # Two elements, starting at element 0

array([1.  , 1.25])

In [24]:
z[-1]

2.0

For 2D arrays the index syntax is as follows:

In [25]:
z = np.array([[1, 2], [3, 4]])
z

array([[1, 2],
       [3, 4]])

In [26]:
z[0, 0]

1

In [27]:
z[0, 1]

2

And so on

Note that indices are still zero-based, to maintain compatibility with Python sequences

Columns and rows can be extracted as follows

In [28]:
z[0, :]

array([1, 2])

In [29]:
z[:, 1]

array([2, 4])

NumPy arrays of integers can also be used to extract elements

In [30]:
z = np.linspace(2, 4, 5)
z

array([2. , 2.5, 3. , 3.5, 4. ])

In [31]:
indices = np.array((0, 2, 3))
z[indices]

array([2. , 3. , 3.5])

Finally, an array of `dtype bool` can be used to extract elements

In [32]:
z

array([2. , 2.5, 3. , 3.5, 4. ])

In [33]:
d = np.array([0, 1, 1, 0, 0], dtype=bool)
d

array([False,  True,  True, False, False])

In [34]:
z[d]

array([2.5, 3. ])

We’ll see why this is useful below

An aside: all elements of an array can be set equal to one number using slice notation

In [35]:
z = np.empty(3)
z

array([2. , 3. , 3.5])

In [36]:
z[:] = 42
z

array([42., 42., 42.])

### Array Methods


<a id='index-7'></a>
Arrays have useful methods, all of which are carefully optimized

In [37]:
a = np.array((52, 38, 36, 75))
a

array([52, 38, 36, 75])

In [38]:
a.sort()              # Sorts a in place, default is ascending order
a

array([36, 38, 52, 75])

In [39]:
a[::-1]               # For sorting in descending order

array([75, 52, 38, 36])

In [40]:
a.sum()               # Sum

201

In [41]:
a.mean()              # Mean

50.25

In [42]:
a.max()               # Max

75

In [43]:
a.argmax()            # Returns the index of the maximal element

3

In [44]:
a.cumsum()            # Cumulative sum of the elements of a

array([ 36,  74, 126, 201], dtype=int32)

In [45]:
a.cumprod()           # Cumulative product of the elements of a

array([     36,    1368,   71136, 5335200], dtype=int32)

In [46]:
a.var()               # Variance

242.1875

In [47]:
a.std()               # Standard deviation

15.562374497485916

In [48]:
a.shape = (2, 2)
a.T                   # Equivalent to a.transpose()

array([[36, 52],
       [38, 75]])

Many of the methods discussed above have equivalent functions in the NumPy namespace

In [49]:
a = np.array((4, 3, 2, 1))

In [50]:
np.sum(a)

10

In [51]:
np.mean(a)

2.5

## Operations on Arrays


<a id='index-8'></a>

### Arithmetic Operations

The operators `+`, `-`, `*`, `/` and `**` all act *elementwise* on arrays

In [52]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
a + b

array([ 6,  8, 10, 12])

In [53]:
a * b

array([ 5, 12, 21, 32])

We can add a scalar to each element as follows

In [54]:
a + 10

array([11, 12, 13, 14])

Scalar multiplication is similar

In [55]:
a * 10

array([10, 20, 30, 40])

The two dimensional arrays follow the same general rules

In [56]:
A = np.ones((2, 2))
B = np.ones((2, 2))
A + B

array([[2., 2.],
       [2., 2.]])

In [57]:
A + 10

array([[11., 11.],
       [11., 11.]])

In [58]:
A * B

array([[1., 1.],
       [1., 1.]])


<a id='numpy-matrix-multiplication'></a>
In particular, `A * B` is *not* the matrix product, it is an element-wise product

### Matrix Multiplication


<a id='index-9'></a>
With Anaconda’s scientific Python package based around Python 3.5 and above,
one can use the `@` symbol for matrix multiplication, as follows:

In [59]:
A = np.ones((2, 2))
B = np.ones((2, 2))
A @ B

array([[2., 2.],
       [2., 2.]])

(For older versions of Python and NumPy you need to use the [np.dot](http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html) function)

We can also use `@` to take the inner product of two flat arrays

In [60]:
A = np.array((1, 2))
B = np.array((10, 20))
A @ B

50

In fact, we can use `@` when one element is a Python list or tuple

In [61]:
A = np.array(((1, 2), (3, 4)))
A

array([[1, 2],
       [3, 4]])

In [62]:
A @ (0, 1)

array([2, 4])

Since we are postmultiplying, the tuple is treated as a column vector

In [63]:
a[-1] = 0  # Change last element to 0
a

array([1, 2, 3, 0])

### Mutability and copying of numpy arrays

In [64]:
a = np.random.randn(3)
a

array([-0.29865287,  0.14049209,  0.87386953])

In [65]:
b = a
b[0] = 0.0
a

array([0.        , 0.14049209, 0.87386953])

What’s happened is that we have changed `a` by changing `b`.

The name `b` is bound to `a` and becomes just another reference to the array. Hence, it has equal rights to make changes to that array.

This is in fact the most sensible default behavior! It means that we pass around only pointers to data, rather than making copies. Making copies is expensive in terms of both speed and memory.

#### Making Copies

It is of course possible to make `b` an independent copy of `a` when required

This can be done using `np.copy`

In [66]:
a = np.random.randn(3)
a

array([0.3370952 , 0.69381906, 0.07745963])

In [67]:
b = np.copy(a)
b

array([0.3370952 , 0.69381906, 0.07745963])

Now `b` is an independent copy (called a *deep copy*)

In [68]:
b[:] = 1
b

array([1., 1., 1.])

In [69]:
a

array([0.3370952 , 0.69381906, 0.07745963])

Note that the change to `b` has not affected `a`

## Additional Functionality

Let’s look at some other useful things we can do with NumPy

### Vectorized Functions


<a id='index-10'></a>
NumPy provides versions of the standard functions `log`, `exp`, `sin`, etc. that act *element-wise* on arrays

In [70]:
z = np.array([1, 2, 3])
np.sin(z)

array([0.84147098, 0.90929743, 0.14112001])

This eliminates the need for explicit element-by-element loops

In [71]:
z

array([1, 2, 3])

In [72]:
np.exp(- 0.5 * z**2)

array([0.60653066, 0.13533528, 0.011109  ])

### Subpackages

NumPy provides some additional functionality related to scientific programming
through its subpackages

We’ve already seen how we can generate random variables using np.random

In [73]:
z = np.random.randn(10000)  # Generate standard normals
y = np.random.binomial(10, 0.5, size=1000)    # 1,000 draws from Bin(10, 0.5)
y.mean()

5.004

Another commonly used subpackage is ```np.linalg```

In [74]:
A = np.array([[1, 2], [3, 4]])

np.linalg.det(A)           # Compute the determinant

-2.0000000000000004

In [75]:
np.linalg.inv(A)           # Compute the inverse

array([[-2. ,  1. ],
       [ 1.5, -0.5]])