# NB: NumPy Indexing

## Indexing

 NumPy can create arrays in N dimensions.
 
 Here is a 2D array initialized from a list of lists.

In [63]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

Indexing into a dimension produces lower-order arrays.

In [64]:
arr2d[2]

array([7, 8, 9])

In [65]:
arr2d[0][2]

3

**Simplified notation:** NumPy offers an elegant way to specify multidimensional indices and slices.

Instead of `x[a][b][c]` you can write `x[a,b,c]`.

In [66]:
arr2d[0, 2]

3

In [67]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

In [68]:
arr3d.shape

(2, 2, 3)

In [69]:
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

I find NumPy's way of show the data a bit difficult to parse visually.

**Here is a way to visualize 3 and higher dimensional data:**

```python
[ # AXIS 0                     AXIS 1 CONTAINS 2 ELEMENTS (arrays)
    [ # AXIS 1                 EACH MEMBER OF AXIS 2 CONTAINS 2 ELEMENTS (arrays)
        [1, 2, 3], # AXIS 2    EACH MEMBER OF AXIS 3 CONTAINS 3 ELEMENTS (integers)
        [4, 5, 6]  # AXIS 2
    ],  
    [ # AXIS 1
        [7, 8, 9], 
        [10, 11, 12]
    ]
]
```
Each axis is a level in the nested hierarchy, i.e. a tree or DAG (directed-acyclic graph).

* Each axis is a container.
* There is only one top container.
* Only the bottom containers have data.

**Omit lower indices**

In multidimensional arrays, if you omit later indices, the returned object will be a **lower-dimensional ndarray** consisting of all the data contained by the higher indexed dimension. 

So in the 2 × 2 × 3 array `arr3d`:

In [70]:
arr3d[0] # The elements contained by the first row

array([[1, 2, 3],
       [4, 5, 6]])

Saving data before modifying an array.

You can work with these lower dimensional arrays using views and copies.

In [71]:
old_values = arr3d[0].copy() # Make a copy
arr3d[0] = 42                # Use a view to alter the original
arr3d                        # See result

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Putting the data back.

In [72]:
arr3d[0] = old_values
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Similarly, `arr3d[1, 0]` gives you all of the values whose indices start with (1, 0), forming a 1-dimensional array:

In [73]:
arr3d[1, 0]

array([7, 8, 9])

In [74]:
x = arr3d[1]
x

array([[ 7,  8,  9],
       [10, 11, 12]])

In [75]:
x[0]

array([7, 8, 9])

## Slicing

We demonstrate indexing in 2D arrays by means of slicing.

A slice is a subset of an array.

In [76]:
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [77]:
arr[1:6]

array([ 1,  2,  3,  4, 64])

In [78]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [79]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [80]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

In [81]:
arr2d[1, :2]

array([4, 5])

In [82]:
arr2d[:2, 2]

array([3, 6])

In [83]:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

In [84]:
arr2d[:2, 1:] = 0
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

## More Indexing and Slicing

**Example 1**

In [40]:
foo = np.random.randn(3,5)

In [41]:
foo

array([[ 0.62543484, -0.96229943,  0.95196573, -0.51305197, -0.1115699 ],
       [ 0.11019503, -0.10980058,  0.28150877, -0.51208741, -0.75206119],
       [ 1.69514313, -1.08493512,  0.1422037 , -0.8920906 , -0.71471137]])

In [42]:
foo.shape

(3, 5)

In [43]:
foo[1:, :2]

array([[ 0.11019503, -0.10980058],
       [ 1.69514313, -1.08493512]])

In [44]:
foo[1:, :2].shape

(2, 2)

Why is this different?

In [45]:
foo[1:][:2]

array([[ 0.11019503, -0.10980058,  0.28150877, -0.51208741, -0.75206119],
       [ 1.69514313, -1.08493512,  0.1422037 , -0.8920906 , -0.71471137]])

Because it operations in sequence, not simultaneously.

In [46]:
a = foo[1:]
a

array([[ 0.11019503, -0.10980058,  0.28150877, -0.51208741, -0.75206119],
       [ 1.69514313, -1.08493512,  0.1422037 , -0.8920906 , -0.71471137]])

In [47]:
a[:2]

array([[ 0.11019503, -0.10980058,  0.28150877, -0.51208741, -0.75206119],
       [ 1.69514313, -1.08493512,  0.1422037 , -0.8920906 , -0.71471137]])

**Example 2**

In [48]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [49]:
arr[5]

5

In [50]:
arr[5:8]

array([5, 6, 7])

Slices can be used to set values as well.

In [51]:
arr[5:8] = 12

In [52]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

## Summary 

A nice visual of a 2D array

<img src="https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781449323592/files/httpatomoreillycomsourceoreillyimages2172112.png" height="50%" width="50%"/>

**Two-Demensional Array Slicing**

<img src="https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781449323592/files/httpatomoreillycomsourceoreillyimages2172114.png" height="50%" width="50%"/>

**3D arrays**

## Boolean Indexing

This a crucial topic -- it applies to Pandas and R. 

You can pass a boolean representation of an array to the array indexer (i.e. the `[]` suffix) 
and it will return only those cells that are `True`.

Let's assume that we have two related arrays:
* `names` which holds the names associated with the data in each row, or **observations**, of a table.
* `data` which holds the data associated with each **feature** of a table.

There are $7$ observations and $4$ features.

In [85]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [86]:
data = np.random.randn(7, 4)
data

array([[-0.02296517,  0.13474624, -1.68950219,  0.38049711],
       [ 0.26840002,  1.04588115, -0.27682064,  1.34172096],
       [ 0.5600871 ,  0.40689673, -1.01778694, -1.26397703],
       [-0.30857368,  0.13819812, -0.51737801, -0.25058465],
       [-0.41433972, -0.31894006,  0.13259035, -0.78374819],
       [ 0.25037425, -0.09460502, -0.08853394, -1.96423597],
       [ 1.20040478, -0.03586405, -0.85179925,  0.02612226]])

In [87]:
names.shape, data.shape

((7,), (7, 4))

A comparison operation for an array returns an array of booleans.

Let's see which names are `'Bob'`:

In [88]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

Now, this boolean expression can be passed to an array indexer to the data:

In [89]:
data[names == 'Bob']

array([[-0.02296517,  0.13474624, -1.68950219,  0.38049711],
       [-0.30857368,  0.13819812, -0.51737801, -0.25058465]])

Along the second axis, we can use a slice to select data.

In [90]:
data[names == 'Bob', 2:]

array([[-1.68950219,  0.38049711],
       [-0.51737801, -0.25058465]])

In [91]:
data[names == 'Bob', 3]

array([ 0.38049711, -0.25058465])

If you know SQL, this is like the query:

```sql
SELECT col3, col4 FROM data WHERE name = 'Bob'
```

## Negation

Here are some examples of negated boolean operations being applied.

In [92]:
bix = names != 'Bob'
bix

array([False,  True,  True, False,  True,  True,  True])

In [93]:
data[bix]

array([[ 0.26840002,  1.04588115, -0.27682064,  1.34172096],
       [ 0.5600871 ,  0.40689673, -1.01778694, -1.26397703],
       [-0.41433972, -0.31894006,  0.13259035, -0.78374819],
       [ 0.25037425, -0.09460502, -0.08853394, -1.96423597],
       [ 1.20040478, -0.03586405, -0.85179925,  0.02612226]])

In [94]:
data[~bix] # Back to Bob

array([[-0.02296517,  0.13474624, -1.68950219,  0.38049711],
       [-0.30857368,  0.13819812, -0.51737801, -0.25058465]])

In [95]:
data[~(names == 'Bob')]

array([[ 0.26840002,  1.04588115, -0.27682064,  1.34172096],
       [ 0.5600871 ,  0.40689673, -1.01778694, -1.26397703],
       [-0.41433972, -0.31894006,  0.13259035, -0.78374819],
       [ 0.25037425, -0.09460502, -0.08853394, -1.96423597],
       [ 1.20040478, -0.03586405, -0.85179925,  0.02612226]])

Note that we don't use `not` but instead the tilde `~` sign to negate (flip) a value.

Nor do we use `and` and `or`; instead we use `&` and `|`.

Also, expressions join by these operators need to be in parentheses.

In [96]:
mask = (names == 'Bob') | (names == 'Will')
mask
data[mask]

array([[-0.02296517,  0.13474624, -1.68950219,  0.38049711],
       [ 0.5600871 ,  0.40689673, -1.01778694, -1.26397703],
       [-0.30857368,  0.13819812, -0.51737801, -0.25058465],
       [-0.41433972, -0.31894006,  0.13259035, -0.78374819]])

In [97]:
data[data < 0] = 0
data

array([[0.        , 0.13474624, 0.        , 0.38049711],
       [0.26840002, 1.04588115, 0.        , 1.34172096],
       [0.5600871 , 0.40689673, 0.        , 0.        ],
       [0.        , 0.13819812, 0.        , 0.        ],
       [0.        , 0.        , 0.13259035, 0.        ],
       [0.25037425, 0.        , 0.        , 0.        ],
       [1.20040478, 0.        , 0.        , 0.02612226]])

In [98]:
data[names != 'Joe'] = 7
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.26840002, 1.04588115, 0.        , 1.34172096],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.25037425, 0.        , 0.        , 0.        ],
       [1.20040478, 0.        , 0.        , 0.02612226]])

## Fancy Indexing

In so-call fancy indexing, we use array index numbers to access data.

This can be used to sub-select and re-order data from an array.

We pass a `list` of item numbers, instead of an integer or integer range with `:`, to the indexer.

In [99]:
arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

The following says _Select rows 4, 3, 0, and 6, in that order._

In [100]:
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

And we can go backwards.

In [101]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

We can use lists to perform some complex indexing.

In [102]:
arr = np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [103]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]  # Grab rows, then select columns from each row

array([ 4, 23, 29, 10])

In [104]:
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]] # Grab rows, then reorder columns 

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])