## Ndarray basics



-   We looked at a number of ways of creating arrays
    -   From lists
    -   Using Numpy's `random` submodule
-   There are [many more ways](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html#array-creation-routines) to create Numpy arrays



### emtpy, zeros, ones, full



`np.empty`



In [2]:
import numpy as np
a = np.ones((2, 3, 4))
a

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

The `dtype` keyword sets the data type



In [3]:
np.zeros(2, dtype=np.bool)

array([False, False])

which we can query



In [4]:
a = np.empty((2, 3), dtype=np.float32)
a.dtype

dtype('float32')

What is an empty array?



In [5]:
a

array([[2.3694278e-38, 2.3694278e-38, 2.3694278e-38],
       [2.3694278e-38, 2.3694278e-38, 2.3694278e-38]], dtype=float32)

Why do we want to use `np.empty` instead of `np.zeros`?



In [7]:
%timeit np.empty(2000)

565 ns ± 4.18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [8]:
%timeit np.zeros(2000)

808 ns ± 7.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Instead of zeros you might want to initialize with `nan`



In [1]:
np.full((2, 3), np.NaN)

### Copying ndarrays



In [9]:
a = np.zeros(3)
b = a
b[1] = 1
a

array([0., 1., 0.])

When a variable is assigned to another variable in python, it creates a new reference **NOT** a copy



In [10]:
a = np.zeros(3)
b = np.copy(a)
b[1] = 1
a

array([0., 0., 0.])

### Useful ndarray sequences



`np.arange`



In [11]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [13]:
np.arange(3, 15.001, 3)

array([ 3.,  6.,  9., 12., 15.])

`np.linspace`



In [14]:
np.linspace(0, 1, 11)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

## ndarray memory layout



In [15]:
a = np.array([[1, 2, 3], 
              [4, 5, 6]])
a

array([[1, 2, 3],
       [4, 5, 6]])

What is the internal representation?



In [16]:
a.flatten()

array([1, 2, 3, 4, 5, 6])

That means that reading an array row is faster



In [20]:
x = np.zeros((1000, 1000))
%timeit x[:, 0]


153 ns ± 1.27 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


## ndarray indexing



-   There are [many ways](https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.indexing.html) to index Numpy ndarrays
-   Basic indexing in Python will return a **view** of the array (increasing efficiency)



In [21]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
arr[5]

5

In [23]:
arr[5:8]

array([5, 6, 7])

Can we assign to an array slice?



In [24]:
arr[5:8] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [27]:
arr[:] = 11
arr

TypeError: 'int' object does not support item assignment

This is an example of **broadcasting**



### Multidimensional array indexing



![img](images/2dindexing.png)



In [28]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [29]:
print(arr2d[0][2])
print(arr2d[0, 2])

3
3


Accessing sub-arrays



In [32]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [35]:
arr3d[0, :, :]

array([[1, 2, 3],
       [4, 5, 6]])

### Indexing with slices



In [37]:
print(arr2d)
arr2d[1:3, 0:2]

[[1 2 3]
 [4 5 6]
 [7 8 9]]


array([[4, 5],
       [7, 8]])

An alternative way of writing the same



In [38]:
arr2d[1:, :2]

array([[4, 5],
       [7, 8]])

Assignments work as expected



In [42]:
print(arr2d)
arr2d[0, 1:-1]

[[1 0 0]
 [4 0 0]
 [7 8 9]]


array([0])

![img](images/2dslicing.png)



### Boolean indexing



Instead of integer indexing, conditions can be used to index arrays



In [43]:
a = np.array([1, 2, 3, 4])
a[[True,  False, True, False]]

array([1, 3])

Most often we will be using this form of boolean indexing



In [44]:
x = np.array([[ 0,  1,  2],
              [ 3,  4,  5],
              [ 6,  7,  8],
              [ 9, 10, 11]])
x[x >= 8]

array([ 8,  9, 10, 11])

What is the shape of *x >= 8*?

Another way of indexing is using the `np.nonzero` function



In [49]:
print(x)
nz = np.nonzero(x >= 8)
nz2 = x >= 8

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


In [50]:
x[nz2]

array([ 8,  9, 10, 11])

We can also use a different array to index our array



In [51]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
print(names)
data = np.random.randn(7, 4)
print(data)

['Bob' 'Joe' 'Will' 'Bob' 'Will' 'Joe' 'Joe']
[[-0.24526879  0.63887201 -0.67074893 -0.46846567]
 [-0.17936784 -0.91099043  1.08908517 -1.32868238]
 [ 1.06375673  0.80840053  0.27571748  1.13193294]
 [ 1.02860539 -0.92769402  0.68950305 -0.06788286]
 [-0.60320196  0.707905   -0.35689771  0.16409144]
 [ 1.25695429  0.21639375  0.71467795  1.03315775]
 [ 0.82421754 -0.73057367 -1.24084656  2.02295732]]


Suppose each name corresponded to a row in `data`, let's select the rows for Bob



In [53]:
print(names)
names == 'Bob'


['Bob' 'Joe' 'Will' 'Bob' 'Will' 'Joe' 'Joe']


array([ True, False, False,  True, False, False, False])

In [55]:
data[names == 'Bob']
data[_53]

array([[-0.24526879,  0.63887201, -0.67074893, -0.46846567],
       [ 1.02860539, -0.92769402,  0.68950305, -0.06788286]])

We can combine there to start building powerful expressions



In [56]:
data[names == 'Bob', 2:]

array([[-0.67074893, -0.46846567],
       [ 0.68950305, -0.06788286]])

To select everything but Bob, we can negate the condition



In [57]:
data[~(names == 'Bob')]

array([[-0.17936784, -0.91099043,  1.08908517, -1.32868238],
       [ 1.06375673,  0.80840053,  0.27571748,  1.13193294],
       [-0.60320196,  0.707905  , -0.35689771,  0.16409144],
       [ 1.25695429,  0.21639375,  0.71467795,  1.03315775],
       [ 0.82421754, -0.73057367, -1.24084656,  2.02295732]])

Selecting multiple names can be done by combining conditions



In [59]:
print(names)
mask = (names == 'Bob') | (names == 'Will')
mask

['Bob' 'Joe' 'Will' 'Bob' 'Will' 'Joe' 'Joe']


array([ True, False,  True,  True,  True, False, False])

In [60]:
data[mask]

array([[-0.24526879,  0.63887201, -0.67074893, -0.46846567],
       [ 1.06375673,  0.80840053,  0.27571748,  1.13193294],
       [ 1.02860539, -0.92769402,  0.68950305, -0.06788286],
       [-0.60320196,  0.707905  , -0.35689771,  0.16409144]])

    The Python keywords and and or do not work with boolean arrays.
    Use & (and) and | (or) instead.



### Fancy indexing



The term is used to describe indexing using integer arrays



In [62]:
arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

To select a subset of the rows



In [63]:
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [64]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

## Exercise



Let's create a random array:

-   Print only values that are greater than 0.2
-   Add 1.2 to the values of the array that are less than 0.3
-   Calculate the square root (`np.sqrt`) of the non-negative values in the array that are also greater than 0.1



In [81]:
x = np.random.randn(3, 4)
np.sqrt(x[(x < 0.1) & (x >= 0)])
np.array([np.sqrt(y) for y in x.flatten() if y < 1.1 and y >=0])

array([0.94102521, 0.90692064, 0.69017259, 0.60110894, 0.47803809,
       0.8059896 , 0.4153957 , 0.93487955])