# Learning Outcome

1. Associating and comparing Fancy Indexing with other indexing methods
1. Experimenting fancy indexing on random sampling 
1. Explaining the merit of Fancy Indexing 

In previous, we introduced how to access and modify portions of arrays using simple indices, slice, and boolean masks. In this section, we will look at another style of array indexing, known as *fancy indexing*. Fancy indexing is like simple indexing, but passing arrays of indices in place of single scalars.

In [None]:
x[0] 


# Exploring Fancy Indexing

In [4]:
import numpy as np
rand = np.random.RandomState(42)

x = rand.randint(100, size=10)
print(x)

[51 92 14 71 60 20 82 86 74 74]


In [None]:
# Life without fancy indexing is miserable...
[x[3], x[7], x[2]]

In [None]:
# Life without fancy indexing is miserable...
[x[3], x[7], x[2]]

# Life with fancy indexing is so fancy!
idx = np.array([3,7,2])
x[idx].shape

**Remark:** When using fancy indexing, the shape of the result reflects the shape of the index array rather than the shape of the array being indexed

In [2]:
idx = np.array([[3,7],
                [4,5]])
x[idx]

array([[71, 86],
       [60, 20]])

Fancy indexing also works in multiple dimensions. Consider the following array:

In [21]:
X = np.arange(16).reshape((4,4))
X

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [8]:
row = np.array([0,1,2])
col = np.array([2,1,3])

# Take a guess - what will this return?
X[row,col]

X[0,2] # simple indexing
X[[0,1,2],[2,1,3]] # fancy indexing



array([ 2,  5, 11])

In [9]:
row[:,np.newaxis] * col[np.newaxis, :] 

array([[0, 0, 0],
       [2, 1, 3],
       [4, 2, 6]])

In [12]:
X[row[:,np.newaxis], col[np.newaxis, :]]

array([[ 2,  1,  3],
       [ 6,  5,  7],
       [10,  9, 11]])

In [None]:
X[[1,2],[1,3]]

**Question:** How do we use fancy indexing to copy the entire array using two vectors? 

In [22]:
X[np.arange(X.shape[0])[:, np.newaxis], np.arange(X.shape[1])[np.newaxis, :]]

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

**Question:** Using fancy indexing, select the even rows and odd columns from `X`

In [24]:
X[np.arange(0, X.shape[0], 2)[:, np.newaxis], np.arange(1, X.shape[1], 2)[np.newaxis,:]]

array([[ 1,  3],
       [ 9, 11]])

# Combined Indexing

In [None]:
print(X)

In [None]:
# Combine fancy and simple indices
X[2, [2,0,1]]

In [None]:
# Combine fancy indexing with slicing
X[1:, [2,0,1]]

In [None]:
# Combine fancy indexing with masking
mask = np.array([1,0,1,0], dtype=bool)
print(row)
y = X[row[:, np.newaxis], mask]
y

# Example: Selecting Random Points

In [None]:
mean = [0,0]
cov = [[1,2], 
       [2,5]]
X = rand.multivariate_normal(mean, cov, 100)
X.shape

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set()

plt.scatter(X[:,0], X[:,1])

We will use fancy indexing to select 20 random points, we will do so by first choosing 20 random indices with no repeats, and then use these indices to select a portion of the original array

In [None]:
indices = np.random.choice(X.shape[0], 20, replace=False)

In [None]:
indices

In [None]:
selection = X[indices,:]
selection.shape

Now to see which points were selected, let's over-plot large circles at the locations of the selected points:

In [None]:
plt.scatter(X[:,0], X[:,1], alpha = 0.3)
plt.scatter(selection[:,0], selection[:,1],facecolors='none', edgecolors='b', s=80);

This is used to quickly partition datasets, as is often needed in train/test splitting for validation.

# Modifying array with fancy indexing

In [None]:
x = np.arange(10)
i = np.array([2,1,8,4])
x[i] = 99
print(x)

We can use any assignment-type operator for this. For example:

In [None]:
x[i] -= 10
print(x)

Notice, though, that repeating indices with these operators can cause some potentionally unexpected results. Consider the following:

In [None]:
x = np.zeros(10)
x[[0,0]] = [4,6]

# where did 4 go?
print(x)

In [None]:
# what would you expect x to be after increment?
i = [2,3,3,4,4,4]
x[i] += 1
x

This is rather unintuitive because of buffering. 

If you really want to do this, you can use `at` function 

In [None]:
x = np.zeros(10)
np.add.at(x, i, 1)
x

# Example: Binning Data

You can also use these ideas to effeciently bin data manually to create histograms.

## np.searchsorted

In [None]:
bins = np.arange(0,11,2)
x = np.array([1,3,5,6])
np.searchsorted(bins, x)

In [None]:
np.random.seed(42)
x = np.random.randn(100)

bins = np.linspace(-5,5,20)
counts = np.zeros_like(bins)

# find the appropriate bin for each x
i = np.searchsorted(bins,x)

np.add.at(counts, i, 1)





In [None]:
counts

In [None]:
# plot the results
plt.plot(bins, counts, linestyle=':')

Of course, it would be silly to do this everytime we want to plot a histogram. There is the plt.hist() to help us out!

In [None]:
plt.hist(x, bins)

In [None]:
print("NumPy routine:")
%timeit counts, edges = np.histogram(x,bins)

print("Custom routine:")
%timeit np.add.at(counts, np.searchsorted(bins,x),1)

Out own algorithm is actually faster! What happens if the bins gets larger?

In [None]:
x = np.random.randn(1000000)
print("NumPy Routine:")
%timeit counts, edges = np.histogram(x, bins)

print("Custom routine:")
%timeit np.add.at(counts, np.searchsorted(bins,x),1)


Question regarding algorithm complexity is never a easy question. This calls for **Big-O Notation**, which we will discuss shortly.