# NumPy Basics

Numpy is short for *Numerical Python* and is the fundamental package used for high performance scientific computing in `Python`. 

Numpy provides the following:

- `ndarray` - a fast and memory-efficient multidimensional array providing vectorized operations and *broadcasting* capabilities.
- Standard mathematical functions for fast operations on entire arrays of data without the need for loops (i.e. *vectorized*)
- Tools for reading and writing data to disk and with memory-mapped files.
- Linear algebra, psuedo-random number generators, FFT's, etc
- Tools for linking `Python` to very efficient low-level codes written in `C`, `C++`, and `Fortran`

## The NumPy `ndarray`: A Multidimensional Array Object

The key feature of NumPy is its N-dimensional array object `ndarray`. This allows us to use a fast, flexible container for scientific data sets and to perform mathematical operations on these data efficiently.

Here is a simple example:

In [2]:
import numpy as np

data = np.ones((4,4))

In [2]:
whos

Variable   Type       Data/Info
-------------------------------
data       ndarray    4x4: 16 elems, type `float64`, 128 bytes
np         module     <module 'numpy' from '//a<...>kages/numpy/__init__.py'>


In [3]:
data

array([[ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])

In [3]:
data.dtype

dtype('float64')

In [5]:
data.shape

(4, 4)

## Creating `ndarrays`

The most straightforward way to create an `ndarray` is to use the `array` function like so with the argument being a `list`:

In [6]:
arr = np.array([1,2,4,6.5])

In [5]:
arr

array([ 1. ,  2. ,  4. ,  6.5])

Can also do it on a declared, and even nested list (will make a matrix):

In [7]:
arr.dtype

dtype('float64')

In [9]:
dat = [[1,2], [3,4]]
arr2 = np.array(dat)

In [10]:
arr2

array([[1, 2],
       [3, 4]])

In [11]:
arr2.shape

(2, 2)

## Data Types

In [11]:
arr3 = np.array([1,2,3], dtype=np.float64)

In [12]:
arr4 = np.array([1,2,3], dtype=np.int32)

In [13]:
arr3.dtype

dtype('float64')

In [14]:
arr4.dtype

dtype('int32')

In [15]:
whos

Variable   Type       Data/Info
-------------------------------
arr        ndarray    4: 4 elems, type `float64`, 32 bytes
arr2       ndarray    2x2: 4 elems, type `int32`, 16 bytes
arr3       ndarray    3: 3 elems, type `float64`, 24 bytes
arr4       ndarray    3: 3 elems, type `int32`, 12 bytes
dat        list       n=2
data       ndarray    4x4: 16 elems, type `float64`, 128 bytes
np         module     <module 'numpy' from 'C:\<...>ges\\numpy\\__init__.py'>


There are other data types, but these will be the most commonly used. You can also make explicit conversions between types:

In [16]:
arr5 = np.array([1,2,3,4,5])

In [17]:
arr5.dtype

dtype('int32')

In [18]:
float_arr = arr5.astype(np.float64)

In [19]:
float_arr.dtype

dtype('float64')

## Operations on Arrays

In [20]:
arr = np.array([[1.,2.,3.], [4.,5.,6.]])

In [21]:
arr

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

In [22]:
arr * arr

array([[  1.,   4.,   9.],
       [ 16.,  25.,  36.]])

In [23]:
arr - arr

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [24]:
1 / arr

array([[ 1.        ,  0.5       ,  0.33333333],
       [ 0.25      ,  0.2       ,  0.16666667]])

In [25]:
arr ** 0.5

array([[ 1.        ,  1.41421356,  1.73205081],
       [ 2.        ,  2.23606798,  2.44948974]])

In [26]:
2. * arr

array([[  2.,   4.,   6.],
       [  8.,  10.,  12.]])

In [27]:
np.pi * arr

array([[  3.14159265,   6.28318531,   9.42477796],
       [ 12.56637061,  15.70796327,  18.84955592]])

## Basic Indexing and Slicing

In [28]:
arr = np.arange(10)

In [29]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [31]:
arr[5]

5

In [32]:
arr[5:8]

array([5, 6, 7])

In [33]:
arr[5:8] = 12

In [34]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [35]:
arr_slice = arr[5:8]

In [36]:
arr_slice[1] = 12345

In [37]:
arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,     9])

In [38]:
arr_slice[:] = 64

In [39]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [40]:
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [42]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [43]:
arr2d[2]

array([7, 8, 9])

In [44]:
arr2d[0][2]

3

In [45]:
arr2d[0,2]

3

In [46]:
arr3d = np.array([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]])

In [47]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [48]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

Both scalar values and arrays can be assigned to `arr3d[0]`:

In [49]:
old_values = arr3d[0].copy()

In [52]:
arr3d[0] = 42

In [53]:
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [54]:
arr3d[0] = old_values

In [55]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [56]:
arr3d[1,0]

array([7, 8, 9])

In [57]:
arr3d[1,0,0]

7

## Indexing with Slices

`ndarrays` can be sliced just like `Python` `lists`:

In [58]:
arr[1:6]

array([ 1,  2,  3,  4, 64])

In [59]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [60]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [61]:
arr2d[:2,1:]

array([[2, 3],
       [5, 6]])

In [62]:
arr2d[1,:2]

array([4, 5])

In [63]:
arr2d[2,:1]

array([7])

In [64]:
arr2d[:,:1]

array([[1],
       [4],
       [7]])

And of course you can assign using slices as well:

In [65]:
arr2d[:2,1:] = 0

In [66]:
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

## Boolean Indexing

In [82]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [83]:
data = np.random.randn(7, 4)

In [84]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], 
      dtype='<U4')

In [85]:
data

array([[ 0.31719309,  0.62859804,  0.66692604, -0.06389443],
       [-0.2172365 , -0.98592543,  0.25246468,  0.17360294],
       [ 0.35442825, -0.51230109,  0.286875  , -0.66091488],
       [-0.3281151 ,  0.16946649,  0.24582368, -1.20395648],
       [-0.34418001, -0.39128756, -0.91611485, -0.74140819],
       [-0.87389575,  0.69674545,  1.71544446,  0.09597526],
       [-0.46867744, -0.33476343,  0.27533921,  0.65320697]])

In [86]:
names == 'Bob'

array([ True, False, False,  True, False, False, False], dtype=bool)

In [87]:
data[names == 'Bob']

array([[ 0.31719309,  0.62859804,  0.66692604, -0.06389443],
       [-0.3281151 ,  0.16946649,  0.24582368, -1.20395648]])

In [88]:
data[names == 'Bob', 2:]

array([[ 0.66692604, -0.06389443],
       [ 0.24582368, -1.20395648]])

In [89]:
data[names == 'Bob', 3]

array([-0.06389443, -1.20395648])

In [90]:
names != 'Bob'

array([False,  True,  True, False,  True,  True,  True], dtype=bool)

In [93]:
data[~(names == 'Bob')]

array([[-0.2172365 , -0.98592543,  0.25246468,  0.17360294],
       [ 0.35442825, -0.51230109,  0.286875  , -0.66091488],
       [-0.34418001, -0.39128756, -0.91611485, -0.74140819],
       [-0.87389575,  0.69674545,  1.71544446,  0.09597526],
       [-0.46867744, -0.33476343,  0.27533921,  0.65320697]])

In [94]:
~(names == 'Bob')

array([False,  True,  True, False,  True,  True,  True], dtype=bool)

In [95]:
mask = (names == 'Bob') | (names == 'Will')

In [96]:
mask

array([ True, False,  True,  True,  True, False, False], dtype=bool)

In [97]:
data[mask]

array([[ 0.31719309,  0.62859804,  0.66692604, -0.06389443],
       [ 0.35442825, -0.51230109,  0.286875  , -0.66091488],
       [-0.3281151 ,  0.16946649,  0.24582368, -1.20395648],
       [-0.34418001, -0.39128756, -0.91611485, -0.74140819]])

In [98]:
data[data < 0]

array([-0.06389443, -0.2172365 , -0.98592543, -0.51230109, -0.66091488,
       -0.3281151 , -1.20395648, -0.34418001, -0.39128756, -0.91611485,
       -0.74140819, -0.87389575, -0.46867744, -0.33476343])

In [99]:
data[data < 0] = 0

In [100]:
data

array([[ 0.31719309,  0.62859804,  0.66692604,  0.        ],
       [ 0.        ,  0.        ,  0.25246468,  0.17360294],
       [ 0.35442825,  0.        ,  0.286875  ,  0.        ],
       [ 0.        ,  0.16946649,  0.24582368,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.69674545,  1.71544446,  0.09597526],
       [ 0.        ,  0.        ,  0.27533921,  0.65320697]])

In [101]:
data[names != 'Joe'] = 7

In [102]:
data

array([[ 7.        ,  7.        ,  7.        ,  7.        ],
       [ 0.        ,  0.        ,  0.25246468,  0.17360294],
       [ 7.        ,  7.        ,  7.        ,  7.        ],
       [ 7.        ,  7.        ,  7.        ,  7.        ],
       [ 7.        ,  7.        ,  7.        ,  7.        ],
       [ 0.        ,  0.69674545,  1.71544446,  0.09597526],
       [ 0.        ,  0.        ,  0.27533921,  0.65320697]])

## Fancy Indexing

In [103]:
arr = np.empty((8, 4))

In [104]:
for i in range(8):
    arr[i] = i

In [105]:
arr

array([[ 0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.],
       [ 7.,  7.,  7.,  7.]])

In [106]:
arr[[4,3,0,6]]

array([[ 4.,  4.,  4.,  4.],
       [ 3.,  3.,  3.,  3.],
       [ 0.,  0.,  0.,  0.],
       [ 6.,  6.,  6.,  6.]])

Using negative indices select rows from the end:

In [107]:
arr[[-3,-5,-7]]

array([[ 5.,  5.,  5.,  5.],
       [ 3.,  3.,  3.,  3.],
       [ 1.,  1.,  1.,  1.]])

In [108]:
arr = np.arange(32).reshape((8, 4))

In [109]:
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [110]:
arr[[1,5,7,2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

What just happened? The elements `(1, 0)`, `(5, 3)`, `(7,1)`, and `(2,2)` were selected.

In [111]:
arr[[1,5,7,2]][:, [0,3,1,2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

Another way to index is to use the `np.ix_` function, which converts two 1D integer arrays into an indexer that selects the square region:

In [112]:
arr[np.ix_([1,5,7,2], [0, 3, 1, 2])]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

## Transposing Arrays and Swapping Axes

Blah blah blah

## Universal Functions: Fast Element-wise Array Functions

NumPy defines the concept of a universal function, or *ufunc*, which is a function that performs elementwise operations on the data stored in `ndarrays`. This gives us very fast vectorized functions for arrays of data.

Take for example, the functions `sqrt1` and `exp`:

In [3]:
arr = np.arange(10)

In [4]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [5]:
np.sqrt(arr)

array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ])

In [6]:
np.exp(arr)

array([  1.00000000e+00,   2.71828183e+00,   7.38905610e+00,
         2.00855369e+01,   5.45981500e+01,   1.48413159e+02,
         4.03428793e+02,   1.09663316e+03,   2.98095799e+03,
         8.10308393e+03])

These are so-called *unary* ufuncs - i.e. they operate on a single `ndarray`. There are also *binary* ufuncs that take two or more `ndarrays` as arguments.

In [7]:
x = np.random.randn(8)

In [8]:
y = np.random.randn(8)

In [9]:
x

array([ 0.55332496, -0.06680552, -0.14941974, -0.33648688,  2.08059321,
        0.63476202,  0.13393462, -1.02676545])

In [10]:
y

array([ 0.54576799,  0.92717782,  1.77901872,  2.61299574,  0.11990621,
       -0.07689426, -1.19110191, -1.40046383])

In [11]:
np.maximum(x, y) # element-wise maximum

array([ 0.55332496,  0.92717782,  1.77901872,  2.61299574,  2.08059321,
        0.63476202,  0.13393462, -1.02676545])

It is uncommon, but a ufunc can return multiple arrays as output. `modf` is one such example. It returns the fractional and integer parts of floating point numbers:

In [12]:
arr = np.random.randn(7) * 5

In [13]:
np.modf(arr)

(array([-0.71175935,  0.68842164,  0.91485913, -0.26202952, -0.66770609,
        -0.37118542, -0.24982886]), array([-1.,  0.,  3., -7., -1., -2., -2.]))

The following is a snapshot of Table 4-3 and Table 4-4 from the book, which gives a listing of the different ufuncs available:

![Tables 4-3 and 4-4](Table4-3-4.jpg)

## Data Processing Using Arrays

Using NumPy `ndarray`s allows us to write very compact snippets of code to express complex data processing tasks that otherwise would require verbose syntax involving many loops, etc

As a simple example, say we want to evaluate the function `sqrt(x^2 + y^2)` across a regular grid of numerical values. The `np.meshgrid` function takes two 1D arrays and produces two 2D matrices corresponding to all points of `(x,y)` in the two arrays:

In [14]:
points = np.arange(-5, 5, 0.01)  # 1000 equally spaced points

In [15]:
xs, ys = np.meshgrid(points, points)

In [16]:
ys

array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       ..., 
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])

In [17]:
z = np.sqrt(xs ** 2 + ys ** 2)

In [18]:
z

array([[ 7.07106781,  7.06400028,  7.05693985, ...,  7.04988652,
         7.05693985,  7.06400028],
       [ 7.06400028,  7.05692568,  7.04985815, ...,  7.04279774,
         7.04985815,  7.05692568],
       [ 7.05693985,  7.04985815,  7.04278354, ...,  7.03571603,
         7.04278354,  7.04985815],
       ..., 
       [ 7.04988652,  7.04279774,  7.03571603, ...,  7.0286414 ,
         7.03571603,  7.04279774],
       [ 7.05693985,  7.04985815,  7.04278354, ...,  7.03571603,
         7.04278354,  7.04985815],
       [ 7.06400028,  7.05692568,  7.04985815, ...,  7.04279774,
         7.04985815,  7.05692568]])

In [19]:
import matplotlib.pyplot as plt

In [21]:
plt.imshow(z, cmap=plt.cm.gray)
plt.colorbar()

<matplotlib.colorbar.Colorbar at 0x15442632a90>

In [22]:
plt.title("Image plot of $\sqrt{x^2+ y^2}$ for a grid of values")

<matplotlib.text.Text at 0x15442594ac8>

![*Plot of function evaluated on grid*](figure_1.jpeg)