# Fancy Indexing in NumPy

Fancy indexing is a NumPy technique for accessing and modifying multiple array elements at once by passing arrays of indices. Unlike simple indexing (e.g., `arr[0]`), slicing (`arr[:5]`), or Boolean masks (`arr[arr > 0]`), fancy indexing allows for non-contiguous and arbitrary selection of elements.

Notably, fancy indexing always returns a copy of the data, not a view. Changes to the new array will not affect the old array.

However, when fancy indexing occurs on the left side of `=`, it instead means `setting`, such that the newly assigned values on the right side of `=` *will* take place and modify the original array in-place.

NumPy distinguishes fancy indexing from accessing a specific element within a multi-dimensional array:
- Accessing a single element - tuple of integers
- Fancy indexing - array of integers
    - In fact, the two can be combined. To select multiple elements, a tuple of lists can be provided in the manner of `arr[[row_list], [col_list]]` so that each accessed element is at (row_list[0], col_list[0]), (row_list[1], col_list[1])...

## Exploring Fancy Indexing

Fancy indexing involves passing an array (or list) of indices to select multiple elements. The shape of the result will reflect the shape of the index array, not the original array.


In [2]:
import numpy as np

rand = np.random.RandomState(42)
x = rand.randint(100, size=10)
print(x) # Output: [51 92 14 71 60 20 82 86 74 74]


# Accessing specific elements using a list of indices
ind = [3, 7, 4]
print(x[ind]) # Output: array([71, 86, 60])

# The shape of the result matches the shape of the index array
ind = np.array([[3, 7],
                [4, 5]])
print(x[ind])
# Output: [[71 86]
#         [60 20]]


[51 92 14 71 60 20 82 86 74 74]
[71 86 60]
[[71 86]
 [60 20]]


**Fancy Indexing in Multiple Dimensions:**
When working with multi-dimensional arrays, fancy indexing applies broadcasting rules.


In [3]:
X = np.arange(12).reshape((3, 4))
print(X)
# Output: [[ 0  1  2  3]
#          [ 4  5  6  7]
#          [ 8  9 10 11]]

row = np.array([0, 1, 2])
col = np.array([2, 1, 3])

# Selects X[0, 2], X[1, 1], X[2, 3]
print(X[row, col])
# Output: array([ 2,  5, 11])

# Depending on where `np.newaxis` is placed, it adds a shape of `1` in that dimension
# Normal broadcasting rules are then applied to ensure the same shape for row/col selectors
# Elements in corresponding locations are matched (in row/col selector arrays) to pick elements from the original array `X`. 
# A copy of the array is returned in the same shape as the row/col selector arrays.
print(X[row[:, np.newaxis], col])


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[ 2  5 11]
[[ 2  1  3]
 [ 6  5  7]
 [10  9 11]]


The return value always reflects the broadcasted shape of the indices, not the shape of the array being indexed.

## Combined Indexing

Fancy indexing can be combined with other indexing schemes for more complex selections:


In [4]:
print(X)
# Output: [[ 0  1  2  3]
#          [ 4  5  6  7]
#          [ 8  9 10 11]]

# Combined with simple indexing (select row 2, then fancy index columns)
print(X[2, [2, 0, 1]])
# Output: array([10,  8,  9])

# Combined with slicing (select rows from index 1 onwards, then fancy index columns)
print(X[1:, [2, 0, 1]])
# Output: [[ 6  4  5]
#          [10  8  9]]

# Combined with masking (select rows using fancy indexing, then columns using a boolean mask)
mask = np.array([1, 0, 1, 0], dtype=bool)
print(X[row[:, np.newaxis], mask])
# Output: [[ 0  2]
#          [ 4  6]
#          [ 8 10]]


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[10  8  9]
[[ 6  4  5]
 [10  8  9]]
[[ 0  2]
 [ 4  6]
 [ 8 10]]


## Example: Selecting Random Points

Fancy indexing is useful for selecting subsets of rows from a matrix, often used in data sampling or splitting.


In [5]:
# Example: Selecting 20 random points from a 100x2 dataset
mean = [0, 0]
cov = [[1, 2],
       [2, 5]]
X_data = rand.multivariate_normal(mean, cov, 100)
print(X_data.shape) # (100, 2)

# Choose 20 random indices without replacement
indices = np.random.choice(X_data.shape[0], 20, replace=False)
print(indices)

# Use fancy indexing to select the corresponding rows
selection = X_data[indices]
print(selection.shape) # (20, 2)


(100, 2)
[13 70 78 15 26  0 25 59 79 52 14 90  2 99 27 34  9  6 48 58]
(20, 2)


## Modifying Values with Fancy Indexing

Fancy indexing can also be used to modify parts of an array.


In [6]:
x = np.arange(10)
i = np.array([2, 1, 8, 4])

# Assign a value to elements at specified indices
x[i] = 99
print(x)
# Output: [ 0 99 99  3 99  5  6  7 99  9]

# Use assignment-type operators
x[i] -= 10
print(x)
# Output: [ 0 89 89  3 89  5  6  7 89  9]


[ 0 99 99  3 99  5  6  7 99  9]
[ 0 89 89  3 89  5  6  7 89  9]


**Important Note on Repeated Indices:**
When using repeated indices for assignment, only the last assignment for a given index will persist.

In [7]:
x = np.zeros(10)
x[[0, 0]] = [4, 6]
print(x)
# Output: [ 6.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
# x[0] = 4 is overwritten by x[0] = 6.


[6. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


For cumulative operations (e.g., adding to elements at repeated indices), use the `at()` method of ufuncs (available since NumPy 1.8). This performs an in-place application of the operator.

This is due to a buffer when using the `+=` operator. After reading *unique* indices, NumPy modifies and writes back. When an index is repeated in fancy indexing, since the computation has already been done, it writes the same result back at that index instead of computing once more on top of that.

In [8]:
x = np.zeros(10)
i = [2, 3, 3, 4, 4, 4]

# Incorrect cumulative behavior with direct assignment
x[i] += 1
print(x)
# Output: [ 0.  0.  1.  1.  1.  0.  0.  0.  0.  0.]
# x[3] is 1, not 2; x[4] is 1, not 3. This is because x[i] = x[i] + 1 is evaluated, then assigned.

# Correct cumulative behavior using np.add.at()
x = np.zeros(10)
np.add.at(x, i, 1)
print(x)
# Output: [ 0.  0.  1.  2.  3.  0.  0.  0.  0.  0.]
# Now x[3] is 2 (added twice), x[4] is 3 (added thrice).


[0. 0. 1. 1. 1. 0. 0. 0. 0. 0.]
[0. 0. 1. 2. 3. 0. 0. 0. 0. 0.]


## Example: Binning Data

`np.add.at()` can be used to efficiently bin data, similar to creating a histogram manually.


In [14]:
np.random.seed(42)
x_data = np.random.randn(100)

# Define bins
bins = np.linspace(-5, 5, 20)
counts = np.zeros_like(bins)
print(counts)

# Find the appropriate bin for each x value
i = np.searchsorted(bins, x_data)

# Add 1 to each corresponding bin count
np.add.at(counts, i, 1) # `b` (what's being added) must be broadcastable to `indices` argument
print(counts)

# The 'counts' array now represents a histogram
# (Plotting would show a step-like histogram)


[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0.  0.  0.  0.  0.  1.  3.  7.  9. 23. 22. 17. 10.  7.  1.  0.  0.  0.
  0.  0.]


While NumPy provides optimized functions like `np.histogram` (used by `matplotlib.pyplot.hist`), understanding the underlying mechanisms with `np.add.at` allows for more flexible and custom data manipulation. For small datasets, a manual approach using `np.add.at` can sometimes be faster than `np.histogram`, but for very large datasets, `np.histogram` is generally more efficient due to its optimized C implementations. This highlights that algorithmic efficiency can vary based on data size.
