##### Sourced from Aman.Ai
## Introduction
* NumPy is the core library for scientific computing in Python. It is informally known as the swiss army knife of the data scientist.
* It provides a high-performance multidimensional array object numpy.ndarray, and tools for operating on these arrays.

## Arrays
* A NumPy array is a grid of values, all of the same data type, and is indexed by a tuple of non-negative integers.
* The __rank__ of an array is the number of dimensions it contains. For matrix rank = 2
* The shape of an array is a tuple of integers giving the size of the array along each dimension.
For matrix (row, column)
* The size of an array is the number of elements it contains (which is equivalent to np.prod(<ndarray>.shape), i.e., the product of the array’s dimensions).
* We can initialize NumPy arrays from (nested) lists and tuples, and access elements using square brackets as array subscripts (similar to lists in Python).

* The concept of rows and columns applies when you have a 2D array. However, the array numpy.array([1,2,3,4]) is a 1D array and so has only one dimension, therefore shape rightly returns a single valued iterable.
Here in this case it will be 4

In [9]:
import numpy as np

a = np.array([1, 2, 3])              # Define a rank 1 array using a list
print(type(a))                       # Prints <class 'numpy.ndarray'>
print(a.shape)                       # Prints (3,)
print(a.ndim)                        # Prints 1 (the rank of the array); equivalent to "len(a.shape)"
print(a.size)                        # Prints 3; equivalent to "np.prod(a.shape)"
print(a[0], a[1], a[2])              # Prints (1, 2, 3)
a[0] = 5                             # Change an element of the array
print(a)                             # Prints [5 2 3]

b = np.array([[1, 2, 3]])            # Define a rank 2 array (vector) using a nested list
print(b.shape)                       # Prints (1, 3)
print(b.size)                        # Prints 3
print(b.ndim)                        

c = np.array([[1, 2, 3], [4, 5, 6]]) # Define a rank 2 array (matrix) using a nested list
print(c.shape)                       # Prints (2, 3)
print(c.size)                        # Prints 6
print(c[0, 0], c[0, 1], c[1, 0])     # Prints (1, 2, 4)

d = np.array((1, 2, 3))              # Define a rank 1 array using a tuple
print(d)                             # Prints [1 2 3]
print(d.shape)                       # Prints (3,)

e = np.array(((1, 2, 3), (4, 5, 6))) # Define a rank 2 array using a nested tuple
print(e)                             # Prints [[1, 2, 3]
                                     #         [4, 5, 6]]
                                     
# f = np.array([[1, 2, 3], [4, 5]])    # Define a rank 2 array using *** Update Numpy won't support this now
# print(f)                             # Prints [list([1, 2, 3]) list([4, 5])]

# NumPy arrays can be initialized using other NumPy arrays or lists
# but note that the resulting matrix is always of type NumPy ndarray
l = [1, 2, 3]                        # Define a python list
g = np.array([l, l, l])              # Matrix initialized with lists
a = np.array([1, 2, 3])              # Define a NumPy array by passing in a list
h = np.array([a, a, a])              # Matrix initialized with NumPy arrays
# i = np.array([a, [1, 2, 3], g])      # Matrix initialized with both types

# All the below statements print [[1 2 3]
#                                 [1 2 3]
#                                 [1 2 3]]
print(g)
print(h)
# print(i)

<class 'numpy.ndarray'>
(3,)
1
3
1 2 3
[5 2 3]
(1, 3)
3
2
(2, 3)
6
1 2 4
[1 2 3]
(3,)
[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [1 2 3]
 [1 2 3]]
[[1 2 3]
 [1 2 3]
 [1 2 3]]


Note the difference between a Python list and a NumPy array. NumPy arrays are designed for numerical (vector/matrix) operations, while lists are for more general purposes.

In [10]:
import numpy as np

l = [1, 2, 3]           # Define a python list
a = np.array([1, 2, 3]) # Define a numpy array by passing in a list
print(l)                # Prints [1 2 3]
print(a)                # Prints [1 2 3]

print(type(l))          # Prints <class 'list'>
print(type(a))          # Prints <class 'numpy.ndarray'>

[1, 2, 3]
[1 2 3]
<class 'list'>
<class 'numpy.ndarray'>


* Note that when defining an array, be sure that all the rows contain the same number of columns/elements. Otherwise, algebraic operations on malformed matrices could lead to unexpected results:

In [11]:
import numpy as np

a = np.array([[1, 2], [3, 4]]) # Define a 2x2 matrix

# Print a scaled version of 'a', more on this in the section on "scaling and translating arrays" below
print(a * 2)                   # Prints [[2 4]
                               #         [6 8]]

# Define a malformed matrix. Note the third row contains 3 elements, while other rows contain 2 elements
b = np.array([[1, 2], [3, 4], [5, 6, 7]]) ## *** Update Numpy won't support this now

# Print the malformed matrix *** Update Numpy won't support this now
print(b)                       # Prints [list([1, 2]) list([3, 4]) list([5, 6, 7])]

# Supposed to scale the whole matrix but does *not*
print(b * 2)                   # Prints [list([1, 2, 1, 2]) list([3, 4, 3, 4]) list([5, 6, 7, 5, 6, 7])]

[[2 4]
 [6 8]]


ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

NumPy also provides many functions to create arrays:

In [12]:
import numpy as np

a = np.array([])                            # Define an empty array
print(a)                                    # Prints array([], dtype=float64)
print(a.shape)                              # Prints (0,)

b = np.zeros((2, 2))                        # Define an array of all zeros
print(b)                                    # Prints [[ 0.  0.]
                                            #         [ 0.  0.]]

c = np.ones((1, 2))                         # Define an array of all ones
print(c)                                    # Prints [[ 1.  1.]]

d = np.full((2, 2), 7)                      # Define a constant array
print(d)                                    # Prints [[ 7.  7.]
                                            #         [ 7.  7.]]

e = np.eye(2)                               # Define a 2x2 identity matrix
print(e)                                    # Prints [[ 1.  0.]
                                            #         [ 0.  1.]]
                                            
f = np.empty((2, 2))                        # Define a float array without initializing entries
print(f)                                    # Prints [[1.13224202e+277 1.94241498e-109]
                                            #         [4.94065646e-323 0.00000000e+000]]

g = np.empty((2, 2), dtype=int)             # Define an int array without initializing entries  
print(g)                                    # Prints [[8751743591039004782 2980593642150976296]   
                                            #         [                 10                   0]]

h = np.random.random((2, 2))                # Define a 2x2 matrix from the uniform distribution [0, 1)
print(h)                                    # Prints a 2x2 matrix of random values 

i = 5 * np.random.random_sample((2, 2)) - 5 # Sample 2x2 matrix from Unif[-5, 0)
                                            # Sample from Unif[a, b), b > a: (b - a) * random_sample() + a
print(i)                                    # Prints a 2x2 matrix of random values 

j = np.random.randn(2, 2)                   # Sample a 2x2 matrix from the standard normal distribution
print(j)                                    # Prints a 2x2 matrix of random values 

k = 2.5 * np.random.randn(2, 2) + 3         # Sample 2x2 matrix from N(mean=3, var=6.25)
                                            # General form: stddev * np.random.randn(...) + mean
print(k)                                    # Prints a 2x2 matrix of random values 

[]
(0,)
[[0. 0.]
 [0. 0.]]
[[1. 1.]]
[[7 7]
 [7 7]]
[[1. 0.]
 [0. 1.]]
[[1. 0.]
 [0. 1.]]
[[4607182418800017408                   0]
 [                  0 4607182418800017408]]
[[0.43144633 0.46314612]
 [0.70299926 0.97130623]]
[[-0.87075935 -1.05631056]
 [-2.86631501 -0.89623609]]
[[-1.74472418 -0.77521363]
 [ 0.65003109 -1.37293359]]
[[ 2.69753526 -0.73534643]
 [-1.50194462  2.10914972]]


* Note that with np.random.randn(), the length of each dimension of the output array is an individual argument. On the other hand, np.random.random() accepts its shape argument as a single tuple containing all dimensions. More on this in the section on standard normal.

* To create a new array with the same shape and type as a given array, NumPy offers the following methods:

In [15]:
a = ([1, 2, 3], [4, 5, 6]) # Python list
# print(a.shape) ### Tuple has no attribute 'shape'

b = np.empty_like(a)
# Uninitialized array
# array([[-1073741821, -1073741821,           3],
#        [          0,           0, -1073741821]])
print(b.shape)             # Prints (2, 3)

c = np.array([[1., 2., 3.], [4., 5., 6.]])
d = np.empty_like(c)
# Uninitialized array
# array([[ -2.00000715e+000,   1.48219694e-323,  -2.00000572e+000], # uninitialized
#        [  4.38791518e-305,  -2.00000715e+000,   4.17269252e-309]])
print(d.shape)             # Prints (2, 3)

# Note the difference between np.ones() and np.ones_like() below.
# np.ones():      Return a new array of given shape and type, filled with ones.
# np.ones_like(): Return a new array with the same shape and type as a given array, filled with ones.
e = np.ones((1, 2, 3))
f = np.ones_like(e)
# array([[[ 1.,  1.,  1.],
#         [ 1.,  1.,  1.]]])
print(e.shape) # Prints (1, 2, 3)
print(f.shape) # Prints (1, 2, 3)
print(e)
print(f)

(2, 3)
(2, 3)
(1, 2, 3)
(1, 2, 3)
[[[1. 1. 1.]
  [1. 1. 1.]]]
[[[1. 1. 1.]
  [1. 1. 1.]]]


## Indexing
NumPy arrays can be indexed by integers, a tuple of nonnegative integers, by booleans or by another array.

### Integer Indexing
To “select” a particular row or column in an array, NumPy offers similar functionality as Python lists:

In [7]:
import numpy as np

a = np.array([[1, 2], [3, 4]])

# Select a row
a[0]    # Prints [1 2]
a[1]    # Prints [3 4]

# # Select a column
a[:, 0] # Prints [1 3]
a[:, 1] # Prints [2 4]

(2,)

Note that : implies that the entire dimension is selected (as opposed to a particular element or a range of elements within a dimension). Also, if : is the trailing/last element in the index subscript, it can be skipped.

In [17]:
import numpy as np

a = np.ones((1, 2, 3))
a

# Select the first dimension
a[0].shape       # Prints (2, 3)
a[0,].shape      # Prints (2, 3)
a[0, :].shape    # Prints (2, 3)

# Select the second dimension
a[0, 1].shape    # Prints (3,)
a[0, 1, :].shape # Prints (3,)

(3,)

### Slicing
* Similar to Python lists, NumPy arrays can be sliced.
* Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [21]:
import numpy as np

# Define the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3] ## Skip row 2 index and column 3 index

# A slice of an array is a "view" into the same data, so modifying it
# will modify the original array
print(a[0, 1])    # Prints 2
print(a[(0, 1)])  # Also prints 2
b[0, 0] = 77      # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])    # Prints 77

2
2
77


You can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array.

#### Mixing integer indexing with slices yields an array of lower rank,
#### while using only slices yields an array of the same rank as the

In [22]:
import numpy as np

# Define the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Basic slicing
a[0:3]                                # Select rows 0, 1 and 2, all columns
a[0:2, 1]                             # Select rows 0 and 1, column 1
a[:1]                                 # Select row 0, all columns (same as a[0:1, :])
a[1:2, :]                             # Select row 1, all columns

# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array
row_r1 = a[1, :]                      # Rank 1 view of the second row of a
row_r2 = a[1:2, :]                    # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)           # Prints [5 6 7 8] (4,)
print(row_r2, row_r2.shape)           # Prints [[5 6 7 8]] (1, 4)

# We can make the same distinction when accessing columns of an array
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape)           # Prints [ 2  6 10] (3,)
print(col_r2, col_r2.shape)           # Prints [[ 2]
                                      #         [ 6]
                                      #         [10]] (3, 1)

# Mix row and column slicing to print the first 2 rows and alternate 
# columns
arr_row_col = a[:2, ::2]
print(arr_row_col, arr_row_col.shape) # Prints [[1, 3],
                                      #         [5, 7]] (2, 2)

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[ 2  6 10] (3,)
[[ 2]
 [ 6]
 [10]] (3, 1)
[[1 3]
 [5 7]] (2, 2)


### Integer Array Indexing
* When you index into NumPy arrays using slicing, the resulting array view will always be a subarray of the original array.
* In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array.
* Here is an example:

In [24]:
import numpy as np

a = np.array([[1, 2], [3, 4], [5, 6]])

# An example of integer array indexing
# The returned array has shape (3,)
rows = [0, 1, 2]
cols = [0, 1, 0]
print(a[rows, cols])                         # Prints [1 4 5]

# or using direct indexing
print(a[[0, 1, 2], [0, 1, 0]])               # Prints [1 4 5]

# The above example of integer array indexing is equivalent to 
# the following:
print(np.array([a[0, 0], a[1, 1], a[2, 0]])) # Prints [1 4 5]

# Note that this doesn't work and results in
# IndexError: too many indices for array: array is 2-dimensional, 
# but 3 were indexed
# print(a[(0, 0), (1, 1), (2, 0)])

# When using integer array indexing, you can reuse the same
# element from the source array
print(a[[0, 0], [1, 1]])                     # Prints [2 2]

# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1]]))          # Prints [2 2]

[1 4 5]
[1 4 5]
[1 4 5]
[2 2]
[2 2]


You can use np.arange() to select the rows/columns of an array. 

In [25]:
import numpy as np

a = np.array([[1, 2], [3, 4]])

# Return the entire array
a[np.arange(2), :] # Prints [[1 2]
                   #         [3 4]]

# Return the first row
a[np.arange(1), :] # Prints [[1 2]]

array([[1, 2]])

Along with np.arange(), you can use an “index array” that contains indices of rows or columns to index into another array. This is a very common use-case in NumPy-based projects.

In [26]:
import numpy as np

a = np.array([[1, 2], [3, 4]])

# Selecting columns using an index array
b = [0, 0]         # Select the first column for both rows (see below)
a[np.arange(1), b] # Prints [1 1] (same as a[0, [0, 0]])
a[np.arange(2), b] # Prints [1 3] (same as a[[0, 1], [0, 0]])

a[:, b]            # Prints [[1 1]
                   #         [3 3]]

# Selecting rows using an index array
b = [0, 0]         # Select the first row for both columns (see below)
a[b, np.arange(1)] # Prints [1 1] (same as a[[0, 0], 0])
a[b, np.arange(2)] # Prints [1 2] (same as a[[0, 0], [0, 1]])        

a[b, :]            # Prints [[1 2]
                   #         [1 2]]

array([[1, 2],
       [1, 2]])

One useful trick with np.arange() and index array is selecting or mutating one element from each row of a matrix:

In [28]:
import numpy as np

# Define a new array from which we will select elements
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

print(a)                   # Prints [[ 1  2  3]
                           #         [ 4  5  6]
                           #         [ 7  8  9]
                           #         [10 11 12]]

# Define an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print(a[np.arange(4), b])  # Prints [ 1 6 7 11]

# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10

print(a)                   # Prints [[11  2  3]
                           #         [ 4  5 16]
                           #         [17  8  9]
                           #         [10 21 12]]

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[ 1  6  7 11]
[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]


## Boolean Array Indexing
* Boolean array indexing lets you pick out arbitrary elements of an array.
* This type of indexing is frequently used to select the elements of an array that satisfy some condition.
Here is an example:

In [30]:
import numpy as np

a = np.array([[1, 2], [3, 4]])

print(a[True])     # Same as print(a), interpreted as a "True" mask 
                   # on each of a's elements
                   # Prints [[[1 2]
                   #          [3 4]]]

bool_idx = (a > 2) # Find the elements of a that are bigger than 2;
                   # this returns a NumPy array of Booleans of the same
                   # shape as a, where each slot of bool_idx tells
                   # whether that element of a is > 2

print(bool_idx)    # Prints [[False False]
                   #         [ True  True]]

# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx]) # Prints [3 4]

# We can do all of the above in a single concise statement:
print(a[a > 2])    # Prints [3 4]


[[[1 2]
  [3 4]]]
[[False False]
 [ True  True]]
[3 4]
[3 4]


As an extension of the above concept, to select elements on an array based on the elements of another array:

In [31]:
import numpy as np

a = np.array([1, 2, 3, 4, 5, 6])
b = np.array(['f','o','o','b','a','r'])

# Note that & is the bitwise AND operator, and && is not supported in NumPy,
# so np.logical_and() is used in the below example
print(b[np.logical_and((a > 1), (a < 5))]) # Prints ['o' 'o' 'b']

# Another way to accomplish this is using np.all(), 
# which is explained in the section on "all" below.
print(b[np.all([a > 1, a < 5], axis=0)])   # Prints ['o' 'o' 'b']

['o' 'o' 'b']
['o' 'o' 'b']


## Datatypes
* Every NumPy array is a grid of elements of the same type.
* NumPy provides a large set of numeric datatypes that you can use to construct arrays.
* NumPy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype.
As an example:

In [32]:
import numpy as np

x = np.array([1, 2])                 # Let NumPy choose the datatype
print(x.dtype)                       # Prints int64

x = np.array([1.0, 2.0])             # Let NumPy choose the datatype
print(x.dtype)                       # Prints float64

x = np.array([1, 2], dtype=np.int64) # Force a particular datatype
print(x.dtype) 

int64
float64
int64


Note that with NumPy, the default float datatype is float64 (double precision), while that with PyTorch is float32 (single precision). However, the default int datatype for both NumPy and PyTorch is int64.

## Changing Datatypes
The astype() method of np.ndarray can change the datatype.

In [33]:
import numpy as np

a = np.array([1, 2, 3])
print(a)             # Prints [1 2 3]
print(a.dtype)       # Prints int64

a_float = a.astype(np.float32)
print(a_float)       # Prints [1. 2. 3.]
print(a_float.dtype) # Prints float32

# 'a' remains unchanged.
print(a)              # Prints [1 2 3]
print(a.dtype)        # Prints int64

a_str = a.astype('str')
print(a_str)          # Prints ['1' '2' '3']
print(a_str.dtype)    # Prints <U21

[1 2 3]
int64
[1. 2. 3.]
float32
[1 2 3]
int64
['1' '2' '3']
<U21
