# NumPy Notes

## NumPy Introduction

First thing first, we should import the NumPy library with import module. As general, we use np as an alias for NumPy. Also in the same cell we will check the version of the NumPY.

In [3]:
import numpy as np
np.__version__

'1.18.1'

Also, you can always display the NumPy's built-in documentation via:

In [4]:
np?

### Fixed Type Arrays in Python

In [12]:
import array

L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Here ``'i'`` is a type code indicating the contents are integers.

Much more useful, however, is the ``ndarray`` object of the NumPy package.

While Python's ``array`` object provides efficient storage of array-based data, NumPy adds to this efficient *operations* on that data.

### Creating Arrays

#### Creating Array From Pyton List

In [13]:
my_list = [-1,0,1]
my_list, type(my_list)

([-1, 0, 1], list)

In [14]:
my_array = np.array(my_list)
my_array, type(my_array)

(array([-1,  0,  1]), numpy.ndarray)

If we want to create a 2D array:

In [15]:
my_matrix = [[1,2,3], [4,5,6], [7,8,9]]
my_matrix, type(my_matrix)

([[1, 2, 3], [4, 5, 6], [7, 8, 9]], list)

In [17]:
my_2darray = np.array(my_matrix)
my_2darray, type(my_2darray)

(array([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]]),
 numpy.ndarray)

We can also set the data type while creating the array from python list. In the jupyter notebook, you can always press shit + tab to see the usage of the function.

In [18]:
np.array(object = [1,2,3,4], dtype ='float32')

array([1., 2., 3., 4.], dtype=float32)

Let's create a nested-array from a python list

In [19]:
np.array([range(i, i+3) for i in [2,4,6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

We can also create an array from tuple

In [20]:
my_tuple = (-1,0,1)

my_array_t = np.array(my_tuple)
my_array_t, type(my_array_t)

(array([-1,  0,  1]), numpy.ndarray)

### Creating Array with Built-In Functions

In [21]:
# Create a length-20 integer array filled with zeros
np.zeros(shape = 20, dtype = int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [22]:
# Create a 3x5 floating-point array filled with ones

np.ones(shape = (3,5), dtype = int)

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

In [23]:
# Create a 3x5 array filled with 3.14

np.full(shape = (3,5), fill_value = 3.14, dtype = float)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [24]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)

np.arange(start = 0, stop = 20, step = 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [25]:
# Create an array of ten values evenly spaced between 0 and 1

np.linspace(start = 0, stop = 1, num = 10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

In [27]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random(size = (3,3))

array([[0.94528738, 0.89470454, 0.40127737],
       [0.07306535, 0.62175985, 0.1752419 ],
       [0.07365304, 0.80166764, 0.10683773]])

In [28]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1

np.random.normal(loc = 0, scale = 1, size = (3,3))

array([[ 0.37489027, -0.40931694, -1.33786761],
       [-0.84980752, -0.36384524, -1.33641663],
       [-0.29050975, -0.37910734, -1.26390567]])

In [29]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(low = 0, high = 10, size = (3,3))

array([[5, 3, 5],
       [7, 6, 5],
       [4, 1, 5]])

In [31]:
# Create a 3x3 identity matrix
np.eye(N = 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [36]:
#changing index of the diagonal in the identity matrix

#k = 1 means, start with column number 1
print(np.eye(N = 4, M = 4, k = 1))
print("-----------")
#k = 2 means, start with column number 1
print(np.eye(N = 4, M = 4, k = 2 ))

[[0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [0. 0. 0. 0.]]
-----------
[[0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


## NumPy Array Attributes

Each array has attributes ``ndim`` (the number of dimensions), ``shape`` (the size of each dimension), and ``size`` (the total size of the array):

In [55]:
x1 = np.random.randint(low = 0, high = 10, size = 6) #one-dimensional
x2 = np.random.randint(low = 0, high = 10, size= (3,4)) #2-dimensional
x3 = np.random.randint(low = 0, high = 10, size = (3,4,5) ) #3-dimensional

In [56]:
print("x3 ndim:", x3.ndim)
print("x2 ndim:", x2.ndim)
print("x1 ndim:", x1.ndim)

x3 ndim: 3
x2 ndim: 2
x1 ndim: 1


Another useful attribute is ``dtype``

In [57]:
print("x3 dtype:", x3.dtype)
print("x2 dtype:", x2.dtype)
print("x1 dtype:", x1.dtype)

x3 dtype: int64
x2 dtype: int64
x1 dtype: int64


### Reshaping Arrays

We can reshape the array using ``reshape`` . Please note that, it's not implicit.

In [67]:
# column vector via reshape
x1.reshape(2,3)

array([[1, 0, 9],
       [6, 0, 3]])

In [69]:
# row vector via reshape
x1.reshape((2,3))

array([[1, 0, 9],
       [6, 0, 3]])

In [65]:
x1.shape

(6,)

In [66]:
x1.reshape(2,3).shape

(2, 3)

Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix.
This can be done with the ``reshape`` method, or more easily done by making use of the ``newaxis`` keyword within a slice operation:

In [70]:
# row vector via newaxis
x1[np.newaxis, :]

array([[1, 0, 9, 6, 0, 3]])

In [71]:
# column vector via newaxis
x1[:, np.newaxis]

array([[1],
       [0],
       [9],
       [6],
       [0],
       [3]])

## Array Slicing

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:
``` python
x[start:stop:step]
```
If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.
We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

In [75]:
x = np.arange(start = 10, stop = 100, step = 9)
x

array([10, 19, 28, 37, 46, 55, 64, 73, 82, 91])

In [77]:
# first five elements
x[:5]

array([10, 19, 28, 37, 46])

In [78]:
x[4:7]  # middle sub-array

array([46, 55, 64])

In [79]:
x[1::2]  # every other element, starting at index 1

array([19, 37, 55, 73, 91])

In [80]:
x[::-1]  # all elements, reversed

array([91, 82, 73, 64, 55, 46, 37, 28, 19, 10])

In [81]:
x[5::-2]  # reversed every other from index 5

array([55, 37, 19])

### Multi-Dimensional Subarrays

In [83]:
x2 = np.random.randint(low = 10, high = 100, size = (3,4))
x2

array([[25, 25, 85, 20],
       [78, 91, 98, 84],
       [87, 77, 20, 66]])

In [84]:
x2[:2, :3]  # two rows, three columns

array([[25, 25, 85],
       [78, 91, 98]])

In [85]:
x2[::-1, ::-1] #subarray dimensions can even be reversed together:

array([[66, 20, 77, 87],
       [84, 98, 91, 78],
       [20, 85, 25, 25]])

#### Accessing array rows and columns

One commonly needed routine is accessing of single rows or columns of an array.
This can be done by combining indexing and slicing, using an empty slice marked by a single colon (``:``):

In [86]:
print(x2[:,0])

[25 78 87]


In [87]:
x2

array([[25, 25, 85, 20],
       [78, 91, 98, 84],
       [87, 77, 20, 66]])

### Subarrays as no-copy views

One important–and extremely useful–thing to know about array slices is that they return *views* rather than *copies* of the array data.
This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies.
Consider our two-dimensional array from before:

In [89]:
x2_sub = x2[:2, :2]
print(x2_sub)

[[25 25]
 [78 91]]


Now if we modify this subarray, we'll see that the original array is changed! Observe:

In [90]:
x2_sub [0,0] = 99
x2_sub

array([[99, 25],
       [78, 91]])

In [91]:
x2

array([[99, 25, 85, 20],
       [78, 91, 98, 84],
       [87, 77, 20, 66]])

This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

### Creating copies of arrays

Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the ``copy()`` method:

In [92]:
x2_sub_copy = x2[:2, :2].copy()
x2_sub_copy

array([[99, 25],
       [78, 91]])

If we now modify this subarray, the original array is not touched:

In [93]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

[[42 25]
 [78 91]]


In [94]:
x2

array([[99, 25, 85, 20],
       [78, 91, 98, 84],
       [87, 77, 20, 66]])

## Array Concatenation and Splitting

All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.

In [95]:
x = np.array([1,2,3])
y = np.array([3,2,1])
np.concatenate([x,y])

array([1, 2, 3, 3, 2, 1])

In [96]:
z = [99, 99, 99]

In [97]:
np.concatenate([x,y,z])

array([ 1,  2,  3,  3,  2,  1, 99, 99, 99])

In [103]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])
grid

array([[1, 2, 3],
       [4, 5, 6]])

In [99]:
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [102]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

For working with arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions:

In [104]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [105]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

### Splitting of Arrays

The opposite of concatenation is splitting, which is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``.  For each of these, we can pass a list of indices giving the split points:

In [109]:
grid = np.arange(16).reshape(4,4)
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [110]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [111]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


## Universal Functions (UFuncs)

For many types of operations, NumPy provides a convenient interface into just this kind of statically typed, compiled routine. This is known as a *vectorized* operation.
This can be accomplished by simply performing an operation on the array, which will then be applied to each element.
This vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution.

In [114]:
np.arange(start= 0, stop = 100, step = 5) / np.arange(start = 1, stop = 21)

array([0.        , 2.5       , 3.33333333, 3.75      , 4.        ,
       4.16666667, 4.28571429, 4.375     , 4.44444444, 4.5       ,
       4.54545455, 4.58333333, 4.61538462, 4.64285714, 4.66666667,
       4.6875    , 4.70588235, 4.72222222, 4.73684211, 4.75      ])

And ufunc operations are not limited to one-dimensional arrays–they can also act on multi-dimensional arrays as well:

In [119]:
x = np.arange(9).reshape((3,3))
2 ** x

array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]])

Arithmetic operations are simply convenient wrappers around specific functions built into NumPy; for example, the ``+`` operator is a wrapper for the ``add`` function:

In [121]:
x = np.arange(4)

x

array([0, 1, 2, 3])

In [122]:
np.add(x, 2)

array([2, 3, 4, 5])

The following table lists the arithmetic operators implemented in NumPy:

| Operator	    | Equivalent ufunc    | Description                           |
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|

### Absolute value & logarithm

Just as NumPy understands Python's built-in arithmetic operators, it also understands Python's built-in absolute value function:

In [123]:
x = np.array([-2, -1, 0, 1, 2])
abs(x)

array([2, 1, 0, 1, 2])

The basic ``np.log`` gives the natural logarithm; if you prefer to compute the base-2 logarithm or the base-10 logarithm, these are available as well:

In [124]:
x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

x        = [1, 2, 4, 10]
ln(x)    = [0.         0.69314718 1.38629436 2.30258509]
log2(x)  = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]


### Min, Max, Sum and More

In [125]:
L = np.random.random(100)
L

array([0.38705754, 0.93944477, 0.27134891, 0.63255501, 0.04628518,
       0.14973923, 0.48507855, 0.76775681, 0.90570727, 0.41731767,
       0.05944041, 0.46144975, 0.58221084, 0.16348261, 0.23780907,
       0.19655713, 0.46642361, 0.57046642, 0.77880885, 0.55141938,
       0.89592287, 0.69722859, 0.3431028 , 0.53122593, 0.21296182,
       0.56168328, 0.69441964, 0.64849308, 0.25851099, 0.32164705,
       0.09728801, 0.51791848, 0.01697   , 0.4631113 , 0.48379616,
       0.72458402, 0.71971628, 0.85408064, 0.5197291 , 0.91128806,
       0.46675743, 0.52490857, 0.69673455, 0.1528841 , 0.42194489,
       0.1370476 , 0.92495295, 0.87361389, 0.13762251, 0.50310214,
       0.85560112, 0.57322443, 0.23589781, 0.2090864 , 0.78067446,
       0.20309565, 0.2388742 , 0.45796451, 0.6154324 , 0.09993897,
       0.56628973, 0.86910354, 0.8720503 , 0.92554922, 0.65472271,
       0.91161756, 0.37111821, 0.99813347, 0.17832976, 0.37775509,
       0.06975963, 0.74837432, 0.92039893, 0.99192146, 0.88488

In [126]:
L.sum()

52.315550066882906

In [127]:
L.mean()

0.5231555006688291

In [128]:
L.std()

0.28303723551380644

In [129]:
L.argmin() # find index of min. value

32

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

# Broadcasting

Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes.

In [147]:
![alt text](broadcasting.png "Title")

/bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `[alt text](broadcasting.png "Title")'


## Rules of Broadcasting

Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

- Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is *padded* with ones on its leading (left) side.
- Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
- Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

To make these rules clear, let's consider a few examples in detail.

In [136]:
x1 = np.arange(10)
x1

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [137]:
x1[0:5] = 500
x1

array([500, 500, 500, 500, 500,   5,   6,   7,   8,   9])

In [140]:
x2 = np.ones(shape = (4,4))
x2

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [142]:
x2[0] = 300
x2

array([[300., 300., 300., 300.],
       [  1.,   1.,   1.,   1.],
       [  1.,   1.,   1.,   1.],
       [  1.,   1.,   1.,   1.]])

In [145]:
x3 = np.arange(4)
x3

array([0, 1, 2, 3])

In [146]:
x2 + x3

array([[300., 301., 302., 303.],
       [  1.,   2.,   3.,   4.],
       [  1.,   2.,   3.,   4.],
       [  1.,   2.,   3.,   4.]])

### Broadcasting in Action: Centering An Array

One commonly seen example is when centering an array of data.
Imagine you have an array of 10 observations, each of which consists of 3 values.
Using the standard convention, we'll store this in a $10 \times 3$ array:

In [148]:
X = np.random.random((10,3))
X

array([[0.21620833, 0.90074824, 0.6937325 ],
       [0.03065879, 0.47423699, 0.31591495],
       [0.12147804, 0.37163465, 0.00689897],
       [0.20945451, 0.70615502, 0.8753056 ],
       [0.57024796, 0.56538188, 0.47283459],
       [0.15356004, 0.06012537, 0.2125714 ],
       [0.47733279, 0.32383456, 0.65645204],
       [0.64224768, 0.23556631, 0.94860206],
       [0.66913403, 0.30285205, 0.41827594],
       [0.65049663, 0.80109063, 0.92125324]])

We can compute the mean of each feature using the ``mean`` aggregate across the first dimension:

In [153]:
Xmean = X.mean(axis = 0)
Xmean

array([0.37408188, 0.47416257, 0.55218413])

And now we can center the ``X`` array by subtracting the mean (this is a broadcasting operation):

In [154]:
X_centered = X - Xmean 
X_centered

array([[-1.57873553e-01,  4.26585674e-01,  1.41548373e-01],
       [-3.43423093e-01,  7.44198832e-05, -2.36269181e-01],
       [-2.52603843e-01, -1.02527921e-01, -5.45285160e-01],
       [-1.64627365e-01,  2.31992448e-01,  3.23121469e-01],
       [ 1.96166080e-01,  9.12193142e-02, -7.93495395e-02],
       [-2.20521844e-01, -4.14037196e-01, -3.39612730e-01],
       [ 1.03250914e-01, -1.50328014e-01,  1.04267913e-01],
       [ 2.68165802e-01, -2.38596263e-01,  3.96417934e-01],
       [ 2.95052149e-01, -1.71310518e-01, -1.33908190e-01],
       [ 2.76414753e-01,  3.26928056e-01,  3.69069111e-01]])

In [155]:
X_centered.mean(0)

array([-6.66133815e-17,  8.88178420e-17,  7.77156117e-17])

## Boolean Masks

We've already seen how we might count, say, all days with rain less than four inches, or all days with rain greater than two inches. But what if we want to know about all days with rain less than four inches and greater than one inch? This is accomplished through Python's bitwise logic operators, &, |, ^, and ~. Like with the standard arithmetic operators, NumPy overloads these as ufuncs which work element-wise on (usually Boolean) arrays.

As in the case of arithmetic operators, the comparison operators are implemented as ufuncs in NumPy; for example, when you write ``x < 3``, internally NumPy uses ``np.less(x, 3)``.
    A summary of the comparison operators and their equivalent ufunc is shown here:

| Operator	    | Equivalent ufunc    || Operator	   | Equivalent ufunc    |
|---------------|---------------------||---------------|---------------------|
|``==``         |``np.equal``         ||``!=``         |``np.not_equal``     |
|``<``          |``np.less``          ||``<=``         |``np.less_equal``    |
|``>``          |``np.greater``       ||``>=``         |``np.greater_equal`` |

In [156]:
rng = np.random.RandomState(0)
x = rng.randint(10, size = (3,4))
x

array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])

In [159]:
x < 6

array([[ True,  True,  True,  True],
       [False, False,  True,  True],
       [ True,  True, False, False]])

In [158]:
np.count_nonzero(x < 6)

8

In [160]:
np.sum(x<6)

8

In [161]:
# how many values less than 6 in each row?
np.sum(x < 6, axis=1)

array([4, 2, 2])

In [162]:
# are there any values greater than 8?
np.any(x > 8)

True

## Fancy Indexing

In the previous sections, we saw how to access and modify portions of arrays using simple indices (e.g., ``arr[0]``), slices (e.g., ``arr[:5]``), and Boolean masks (e.g., ``arr[arr > 0]``).

In this section, we'll look at another style of array indexing, known as *fancy indexing*.

Fancy indexing is like the simple indexing we've already seen, but we pass arrays of indices in place of single scalars.

This allows us to very quickly access and modify complicated subsets of an array's values.

In [163]:
array_2d = np.zeros((5,5))  # Create a zero matrix
array_2d.shape[1]           #using shape attribute, get the no to run the loop
                            # using range() in the loop
for i in range(array_2d.shape[1]):
    array_2d[i] = i

array_2d

array([[0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.]])

In [164]:
array_2d[[1,2,3,]]

array([[1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.]])

In [165]:
#order is important
array_2d[[3,0,1]]

array([[3., 3., 3., 3., 3.],
       [0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1.]])

In [166]:
rand = np.random.RandomState(42)

x = rand.randint(100, size=10)
print(x)

[51 92 14 71 60 20 82 86 74 74]


In [167]:
[x[3], x[7], x[2]]

[71, 86, 14]

In [168]:
ind = [3, 7, 4]
x[ind]

array([71, 86, 60])

In [169]:
ind = np.array([[3, 7],
                [4, 5]])
x[ind]

array([[71, 86],
       [60, 20]])

In [170]:
X = np.arange(12).reshape((3, 4))
X

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [171]:
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]

array([ 2,  5, 11])

## Array Sorting

This section covers algorithms related to sorting values in NumPy arrays.
These algorithms are a favorite topic in introductory computer science courses: if you've ever taken one, you probably have had dreams (or, depending on your temperament, nightmares) about *insertion sorts*, *selection sorts*, *merge sorts*, *quick sorts*, *bubble sorts*, and many, many more.
All are means of accomplishing a similar task: sorting the values in a list or array.

For example, a simple *selection sort* repeatedly finds the minimum value from a list, and makes swaps until the list is sorted. We can code this in just a few lines of Python:

In [172]:
def selection_sort(x):
    
    for i in range(len(x)):
        swap = i + np.argmin(x[i:])
        (x[i], x[swap]) = (x[swap], x[i])
        
    return x

In [173]:
x = np.array([2,1,4,3,5])
selection_sort(x)

array([1, 2, 3, 4, 5])

As any first-year computer science major will tell you, the selection sort is useful for its simplicity, but is much too slow to be useful for larger arrays.
For a list of $N$ values, it requires $N$ loops, each of which does on order $\sim N$ comparisons to find the swap value.
In terms of the "big-O" notation often used to characterize these algorithms, selection sort averages $\mathcal{O}[N^2]$: if you double the number of items in the list, the execution time will go up by about a factor of four.

Even selection sort, though, is much better than my all-time favorite sorting algorithms, the *bogosort*:

In [174]:
def bogosort(x):
    while np.any(x[:-1] > x[1:]):
        np.random.shuffle(x)
    return x

In [175]:
x = np.array([2, 1, 4, 3, 5])
bogosort(x)

array([1, 2, 3, 4, 5])

This silly sorting method relies on pure chance: it repeatedly applies a random shuffling of the array until the result happens to be sorted.
With an average scaling of $\mathcal{O}[N \times N!]$, (that's *N* times *N* factorial) this should–quite obviously–never be used for any real computation.

Fortunately, Python contains built-in sorting algorithms that are *much* more efficient than either of the simplistic algorithms just shown. We'll start by looking at the Python built-ins, and then take a look at the routines included in NumPy and optimized for NumPy arrays.

## Fast Sorting in NumPy: ``np.sort`` and ``np.argsort``

Although Python has built-in ``sort`` and ``sorted`` functions to work with lists, we won't discuss them here because NumPy's ``np.sort`` function turns out to be much more efficient and useful for our purposes.
By default ``np.sort`` uses an $\mathcal{O}[N\log N]$, *quicksort* algorithm, though *mergesort* and *heapsort* are also available. For most applications, the default quicksort is more than sufficient.

To return a sorted version of the array without modifying the input, you can use ``np.sort``:

In [176]:
x = np.array([2, 1, 4, 3, 5])
np.sort(x)

array([1, 2, 3, 4, 5])

In [177]:
x.sort()
print(x)

[1 2 3 4 5]


A useful feature of NumPy's sorting algorithms is the ability to sort along specific rows or columns of a multidimensional array using the ``axis`` argument. For example:

In [178]:
rand = np.random.RandomState(42)
X = rand.randint(0, 10, (4, 6))
print(X)

[[6 3 7 4 6 9]
 [2 6 7 4 3 7]
 [7 2 5 4 1 7]
 [5 1 4 0 9 5]]


In [179]:
# sort each column of X
np.sort(X, axis=0)

array([[2, 1, 4, 0, 1, 5],
       [5, 2, 5, 4, 3, 7],
       [6, 3, 7, 4, 6, 7],
       [7, 6, 7, 4, 9, 9]])

In [180]:
# sort each row of X
np.sort(X, axis=1)

array([[3, 4, 6, 6, 7, 9],
       [2, 3, 4, 6, 7, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 5, 9]])

## Partial Sorts: Partitioning

Sometimes we're not interested in sorting the entire array, but simply want to find the *k* smallest values in the array. NumPy provides this in the ``np.partition`` function. ``np.partition`` takes an array and a number *K*; the result is a new array with the smallest *K* values to the left of the partition, and the remaining values to the right, in arbitrary order:

In [181]:
x = np.array([7, 2, 3, 1, 6, 5, 4])
np.partition(x, 3)

array([2, 1, 3, 4, 6, 5, 7])

Note that the first three values in the resulting array are the three smallest in the array, and the remaining array positions contain the remaining values.
Within the two partitions, the elements have arbitrary order.

Similarly to sorting, we can partition along an arbitrary axis of a multidimensional array:

In [182]:
np.partition(X, 2, axis=1)

array([[3, 4, 6, 7, 6, 9],
       [2, 3, 4, 7, 6, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 9, 5]])

The result is an array where the first two slots in each row contain the smallest values from that row, with the remaining values filling the remaining slots.

Finally, just as there is a ``np.argsort`` that computes indices of the sort, there is a ``np.argpartition`` that computes indices of the partition..

### barbaros.blog
### December, 2020