### Numpy

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

To use Numpy, we first need to import the `numpy` package:

In [1]:
import numpy as np

##### Checking Python and Numpy versions.

In [2]:
import platform
print('Python version: ' + platform.python_version())
print('Numpy version: ' + np.__version__)

Python version: 3.12.12
Numpy version: 2.0.2


### 0. Numpy Datatypes

Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:

##### List of Numpy DATA TYPES

In [3]:
# List of Numpy Data Types
import pandas as pd
dtypes = pd.DataFrame(
    {
        'Type': ['int8', 'uint8', 'int16', 'uint16', 'int32', 'uint32', 'int64', 'uint64', 'float16', 'float32', 'float64', 'float128', 'complex64', 'complex128', 'bool', 'object', 'string_', 'unicode_'],
        'Type Code': ['i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f2', 'f4 or f', 'f8 or d', 'f16 or g', 'c8', 'c16', '', 'O', 'S', 'U']
    }
)

dtypes

Unnamed: 0,Type,Type Code
0,int8,i1
1,uint8,u1
2,int16,i2
3,uint16,u2
4,int32,i4
5,uint32,u4
6,int64,i8
7,uint64,u8
8,float16,f2
9,float32,f4 or f


Let's go through each of the NumPy data types listed above in more detail:

*   **`int8`**: Signed 8-bit integer type. It can store integer values from -128 to 127.
*   **`uint8`**: Unsigned 8-bit integer type. It can store integer values from 0 to 255.
*   **`int16`**: Signed 16-bit integer type. It can store integer values from -32768 to 32767.
*   **`uint16`**: Unsigned 16-bit integer type. It can store integer values from 0 to 65535.
*   **`int32`**: Signed 32-bit integer type. It can store integer values from -2147483648 to 2147483647.
*   **`uint32`**: Unsigned 32-bit integer type. It can store integer values from 0 to 4294967295.
*   **`int64`**: Signed 64-bit integer type. It can store integer values from -9223372036854775808 to 9223372036854775807.
*   **`uint64`**: Unsigned 64-bit integer type. It can store integer values from 0 to 18446744073709551615.
*   **`float16`**: Half-precision floating-point type. It has a limited range and precision but uses less memory.
*   **`float32`**: Single-precision floating-point type. It's a commonly used type for floating-point numbers.
*   **`float64`**: Double-precision floating-point type. It offers higher precision and a wider range than `float32`. This is the default floating-point type in NumPy.
*   **`float128`**: Quad-precision floating-point type. It provides even higher precision but may not be supported on all systems.
*   **`complex64`**: Complex number type represented by two 32-bit floating-point numbers (real and imaginary parts).
*   **`complex128`**: Complex number type represented by two 64-bit floating-point numbers (real and imaginary parts). This is the default complex type in NumPy.
*   **`bool`**: Boolean type, storing `True` or `False` values.
*   **`object`**: Python object type. This allows storing arbitrary Python objects in a NumPy array, but it sacrifices performance compared to using native NumPy types.
*   **`string_`**: Fixed-length byte string type. The length is determined by the longest string in the array when the array is created.
*   **`unicode_`**: Fixed-length Unicode string type. The length is determined by the longest string in the array when the array is created.

##### Create an array with a specified data type

In [4]:
# create an array with a specified data type
arr = np.array([1,2,3], dtype='f4')
print(arr)
print(arr.dtype)

arr = np.array([1+2j, 3-4j], dtype=np.complex64)
print(arr)
print(arr.dtype)

arr = np.array([0, 1, 1], dtype=np.bool)
print(arr)
print(arr.dtype)

[1. 2. 3.]
float32
[1.+2.j 3.-4.j]
complex64
[False  True  True]
bool


**Let numpy choosing the data types and forcing particuler data types to Numpy**



In [5]:
x = np.array([1, 2])  # Let numpy choose the datatype
y = np.array([1.0, 2.0])  # Let numpy choose the datatype
z = np.array([1, 2], dtype=np.int64)  # Force a particular datatype

print(x.dtype, y.dtype, z.dtype)

int64 float64 int64


#### String Data Type in Numpy

* set the max length of the string using S + some number, such as 'S3'
* any string longer than the max length will be truncated

In [6]:
s = np.array(['abc', 'defg'], dtype='S3')
print(s)
print(s.dtype)

[b'abc' b'def']
|S3


*   `np.bytes_`: Fixed-length **byte strings** (like `b'hello'`). Length is determined by the longest string. Uses 'b' prefix when printed.
*   `np.str_`: Fixed-length **Unicode strings** (like `'hello'`). Length is determined by the longest string. Can represent a wider range of characters. May show '<U' prefix.

`np.bytes_` is for binary data, `np.str_` is for text data.

In [7]:
arr = np.array(['a', 'ab', 'abc'], dtype=np.bytes_)
print(arr.dtype)

arr = np.array(['a', 'ab', 'abc'], dtype=np.str_)
print(arr.dtype)

|S3
<U3


what does "|" and "<" mean above?
*   These symbols are **byte order indicators**. They describe how the bytes that make up the data are arranged in memory.
    *   `|`: Indicates **not applicable** or **native byte order**. For single-byte data types like `'S'` (bytes) and `'U'` (Unicode), byte order doesn't matter, so you see `|`. For native byte order, it means the order matches the system your code is running on.
    *   `<`: Indicates **little-endian** byte order. This means the least significant byte is stored at the lowest memory address.

You can learn more about NumPy's data types and byte order in the [documentation](https://numpy.org/doc/stable/reference/arrays.dtypes.html) and on [Wikipedia](https://en.wikipedia.org/wiki/Endianness).

convert between np.bytes_ and np.str_ data types

In [8]:
import numpy as np

# Create a numpy array with bytes_ data type
bytes_array = np.array([b'hello', b'world'], dtype=np.bytes_)
print("Bytes array:", bytes_array)
print("Bytes array dtype:", bytes_array.dtype)

# Convert bytes_ array to str_ array
str_array = bytes_array.astype(np.str_)
print("\nString array:", str_array)
print("String array dtype:", str_array.dtype)

# Create a numpy array with str_ data type
str_array_2 = np.array(['numpy', 'arrays'], dtype=np.str_)
print("\nString array 2:", str_array_2)
print("String array 2 dtype:", str_array_2.dtype)

# Convert str_ array to bytes_ array
bytes_array_2 = str_array_2.astype(np.bytes_)
print("\nBytes array 2:", bytes_array_2)
print("Bytes array 2 dtype:", bytes_array_2.dtype)

Bytes array: [b'hello' b'world']
Bytes array dtype: |S5

String array: ['hello' 'world']
String array dtype: <U5

String array 2: ['numpy' 'arrays']
String array 2 dtype: <U6

Bytes array 2: [b'numpy' b'arrays']
Bytes array 2 dtype: |S6


You can read all about numpy datatypes in the [documentation](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html).

## 1. Arrays

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [9]:
a = np.array([1, 2, 3])  # Create a rank 1 array
print(type(a), a.shape, a[0], a[1], a[2])
a[0] = 5                 # Change an element of the array
print(a)

<class 'numpy.ndarray'> (3,) 1 2 3
[5 2 3]


By Rank 1, we mean 1 dimension Array, similiarly by rank 2, we mean 2 Dimension Array

In [10]:
b = np.array([[1,2,3],[4,5,6]])   # Create a rank 2 array
print(b)

[[1 2 3]
 [4 5 6]]


In [11]:
# Create a NumPy array from a list
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)

Array: [1 2 3 4 5]


In [12]:
# Create a 2D array (matrix) from a list of lists
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print("Matrix:\n", matrix)

Matrix:
 [[1 2 3]
 [4 5 6]]


**how numpy array different from list:**

*   **Data Type**: NumPy arrays hold elements of the **same data type**, unlike Python lists which can hold elements of different types.
*   **Performance**: NumPy operations are significantly **faster** for numerical tasks, especially on large datasets, due to optimized implementations.
*   **Functionality**: NumPy offers a rich set of **mathematical functions and operations** specifically designed for arrays.
*   **Memory Usage**: NumPy arrays are generally **more memory-efficient** for numerical data.

In essence, NumPy arrays are specialized for efficient numerical computation, while Python lists are more flexible for general-purpose data storage.

In [13]:
# create an array in a specified data type
arr = np.array([[1,2,3], [4,5,6]], dtype='i2')
print(arr)

[[1 2 3]
 [4 5 6]]


In [14]:
print(b.shape)
print(b[0, 0], b[0, 1], b[1, 0])

(2, 3)
1 2 4


### create an array of evenly spaced values within a specified interval

In [15]:
# np.arange(start, stop, step)

arr = np.arange(0, 20, 2)
print(arr)

[ 0  2  4  6  8 10 12 14 16 18]


### create an array of evenly spaced numbers in a specified interval

In [16]:
# np.linspace(start, stop, num_of_elements, endpoint=True, retstep=False)
arr = np.linspace(0, 10, 20)
print(arr)

[ 0.          0.52631579  1.05263158  1.57894737  2.10526316  2.63157895
  3.15789474  3.68421053  4.21052632  4.73684211  5.26315789  5.78947368
  6.31578947  6.84210526  7.36842105  7.89473684  8.42105263  8.94736842
  9.47368421 10.        ]


In [17]:
# exclude endpoint and return setp size
arr, step = np.linspace(0, 10, 20, endpoint=False, retstep=True)
print(arr)
print(step)

[0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5.  5.5 6.  6.5 7.  7.5 8.  8.5
 9.  9.5]
0.5


### create an array of random values in a given shape

In [18]:
arr = np.random.rand(3, 3)
print(arr)

[[0.32009153 0.5693842  0.31518809]
 [0.93458393 0.27504968 0.66666314]
 [0.89260821 0.22926537 0.00544594]]


In [19]:
e = np.random.random((2,2)) # Create an array filled with random values
print(e)

[[0.87746914 0.01415113]
 [0.76025016 0.71303047]]


### create an array of zeros in a given shape

In [20]:
zeros = np.zeros((2,3), dtype='i4')
print(zeros)

[[0 0 0]
 [0 0 0]]


### create an array of zeros with the same shape and data type as a given array

In [25]:
arr=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
print(arr,end="\n\n")

zeros = np.zeros_like(arr)
print(zeros)  # output in 3*4 matrix as of arr

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

[[0 0 0]
 [0 0 0]
 [0 0 0]
 [0 0 0]]


### create an array of ones in a given shape

In [27]:
# np.ones((rows,columns))

ones = np.ones((2,3))
print(ones)

[[1. 1. 1.]
 [1. 1. 1.]]


### create an array of ones with the same shape and data type as a given array

In [29]:
# similar to np.zeros_like
arr=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
ones = np.ones_like(arr)
print(ones)

[[1 1 1]
 [1 1 1]
 [1 1 1]
 [1 1 1]]


### create an array of arbitrary values in a given shape

In [34]:
empty = np.empty((2,3))
print(empty)

[[1. 1. 1.]
 [1. 1. 1.]]


The code ```empty = np.empty((2,3))``` creates a new NumPy array with the specified shape of 2 rows and 3 columns. The key thing about np.empty is that it does not initialize the elements of the array. The values in the array will be whatever happens to be in that memory location at the time of creation. This is why the output can appear to contain arbitrary or seemingly random numbers. It's generally faster than np.zeros or np.ones if you plan to fill the array with your own values shortly after creation.

### create an array of arbitrary values with the same shape and data type as a given array

In [33]:
# similar to np.empty but with shape as of reference array
arr=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])

empty = np.empty_like(arr)
print(empty)

[[          963562620                   0                   0]
 [                  0     135463268515840 8319683848551211643]
 [3180222411935070754 4189017755886035488 7308535291872901920]
 [4189017755886051700 8027140907415206688 7018332503360695925]]


### create an array of constant values in a given shape  

In [29]:
# np.full((rows,columns), constant_value)

p = np.full((2,3), 5)
print(p)

[[5 5 5]
 [5 5 5]]


### create an array of constant values with the same shape and data type as a given array

In [30]:
p = np.full_like(arr, 5)
print(p)

[[5. 5. 5.]
 [5. 5. 5.]
 [5. 5. 5.]]


### create an array by repetition

In [35]:
# repeat each element of an array by a specified number of times
# np.repeat(iterable, reps, axis=None)


arr = [0, 1, 2]
print(np.repeat(arr, 3))    # or np.repeat(range(3), 3)

[0 0 0 1 1 1 2 2 2]


In [36]:
# repeat along a specified axis with specified number of repetitions
arr = [[1,2], [3,4]]
print(np.repeat(arr, [1,2], axis=0))

[[1 2]
 [3 4]
 [3 4]]


In [37]:
arr = [[1,2], [3,4]]
print(np.repeat(arr, [4,2], axis=0))

[[1 2]
 [1 2]
 [1 2]
 [1 2]
 [3 4]
 [3 4]]


In [38]:
# repeat an array by a specified number of times
arr = [0, 1, 2]
print(np.tile(arr, 3))

[0 1 2 0 1 2 0 1 2]


In [39]:
# repeat along specified axes
arr = [0, 1, 2]
print(np.tile(arr, (2,2)))    #np.tile(array,(rows,columns))

[[0 1 2 0 1 2]
 [0 1 2 0 1 2]]


In [40]:
print(np.tile(arr, (4,3)))

[[0 1 2 0 1 2 0 1 2]
 [0 1 2 0 1 2 0 1 2]
 [0 1 2 0 1 2 0 1 2]
 [0 1 2 0 1 2 0 1 2]]


### create an identity matrix with a given diagonal size

In [41]:
identity_matrix = np.eye(3)
print(identity_matrix)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [42]:
identity_matrix = np.identity(3)
print(identity_matrix)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


Both `np.eye(N)` and `np.identity(N)` create an N x N identity matrix (a square matrix with ones on the main diagonal and zeros elsewhere).

The key difference is that `np.eye` is more general. It can create matrices with ones on off-diagonals as well, using the `k` parameter.
*   `np.eye(N)`: Creates an N x N identity matrix.
*   `np.eye(N, M=None, k=0)`: Creates an N x M matrix where the `k`-th diagonal is all ones and everything else is zeros.
*   `np.identity(N)`: Creates an N x N identity matrix (equivalent to `np.eye(N)` or `np.eye(N, N, k=0)`).

So, `np.identity` is a special case of `np.eye` when you only need the main diagonal and a square matrix.

### create an identity matrix with a diagonal offset

In [44]:
identity_matrix = np.eye(3,5)     # by default k = 0
print(identity_matrix)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]]


In [43]:
# k=1 = start 1 with first index and not 0 index
# positive  number shifts the diagonal upward

identity_matrix = np.eye(3,5,1)
print(identity_matrix)

[[0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]]


In [46]:
identity_matrix = np.eye(3,5,2)   # k=2 = start 1 with second index and not 0 index
print(identity_matrix)

[[0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


In [47]:
identity_matrix = np.eye(5, k=-2)   # negative number shifts the diagonal downward
print(identity_matrix)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]]


### extract the diagonal array / create a diagonal array

In [49]:
# extract the diagonal
arr= np.array([4,5,6])
print(np.diag(arr))

[[4 0 0]
 [0 5 0]
 [0 0 6]]


In [50]:
# create a matrix with a specified diagonal array
arr = np.diag([1,2,3,4,5])
print(arr)

[[1 0 0 0 0]
 [0 2 0 0 0]
 [0 0 3 0 0]
 [0 0 0 4 0]
 [0 0 0 0 5]]


In [51]:
# Reshape an array
reshaped = np.arange(12).reshape(3, 4)
print("Reshaped array:\n", reshaped)

Reshaped array:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


## 2. Inspect Arrays

In [52]:
arr = np.array([[1,2,3], [4,5,6]], dtype=np.int64)

### inspect general information of an array

In [53]:
print(np.info(arr))

class:  ndarray
shape:  (2, 3)
strides:  (24, 8)
itemsize:  8
aligned:  True
contiguous:  True
fortran:  False
data pointer: 0x3978cf70
byteorder:  little
byteswap:  False
type: int64
None


### inspect the data type of an array

In [55]:
print(arr.dtype)

int64


### inspect the dimension of an array

In [56]:
print(arr.shape)

(2, 3)


### inspect length of an array

In [57]:
print(len(arr))

2


### inspect the number of dimensions of an array

In [58]:
print(arr.ndim)

2


### inspect the number of elements in an array

In [59]:
print(arr.size)

6


### inspect the number of bytes of each element in an array

In [60]:
print(arr.itemsize)

8


### inspect the memory size of an array (in byte)

In [61]:
# arr.nbytes = arr.size * arr.itemsize
print(arr.nbytes)

48


## 3. Combine & Split an Array

In [227]:
arr1 = np.array([[1,2,3,4], [1,2,3,4]])
arr2 = np.array([[5,6,7,8], [5,6,7,8]])

### ```np.concatenate((a, b), axis=0)```

In [228]:
# concat along the row
cat = np.concatenate((arr1, arr2), axis=0)
print(cat)

[[1 2 3 4]
 [1 2 3 4]
 [5 6 7 8]
 [5 6 7 8]]


In [229]:
# concat along the column
cat = np.concatenate((arr1, arr2), axis=1)
print(cat)

[[1 2 3 4 5 6 7 8]
 [1 2 3 4 5 6 7 8]]


### ```np.vstack((a, b))```
### ```np.r_[a, b]```

In [230]:
# stack arrays vertically
cat = np.vstack((arr1, arr2))
print(cat)

[[1 2 3 4]
 [1 2 3 4]
 [5 6 7 8]
 [5 6 7 8]]


In [231]:
# stack arrays vertically
cat = np.r_[arr1, arr2]
print(cat)

[[1 2 3 4]
 [1 2 3 4]
 [5 6 7 8]
 [5 6 7 8]]


### ```np.hstack((a, b))```
### ```np.c_[a, b]```

In [232]:
# stack arrays horizontally
cat = np.hstack((arr1, arr2))
print(cat)

[[1 2 3 4 5 6 7 8]
 [1 2 3 4 5 6 7 8]]


In [None]:
# stack arrays horizontally
cat = np.c_[arr1, arr2]
print(cat)

[[1 2 3 4 5 6 7 8]
 [1 2 3 4 5 6 7 8]]


### split an array

In [233]:
arr = np.random.rand(6,6)

In [234]:
# split the array vertically into n evenly spaced chunks
arr1 = np.vsplit(arr, 2)
print(arr1)

[array([[0.60469135, 0.26591121, 0.881347  , 0.58404612, 0.47746048,
        0.17151502],
       [0.05662   , 0.15703523, 0.3386175 , 0.44064297, 0.68384312,
        0.20587172],
       [0.6164337 , 0.92447161, 0.07962825, 0.48539683, 0.69856363,
        0.08156848]]), array([[0.86910236, 0.43310568, 0.49433006, 0.49574047, 0.37072208,
        0.92212866],
       [0.84127822, 0.65552592, 0.46268505, 0.56501131, 0.42391551,
        0.28821195],
       [0.26398072, 0.69073823, 0.83283653, 0.59239158, 0.46652258,
        0.67365999]])]


In [235]:
# split the array horizontally into n evenly spaced chunks
arr2 = np.hsplit(arr, 2)
print(arr2)

[array([[0.60469135, 0.26591121, 0.881347  ],
       [0.05662   , 0.15703523, 0.3386175 ],
       [0.6164337 , 0.92447161, 0.07962825],
       [0.86910236, 0.43310568, 0.49433006],
       [0.84127822, 0.65552592, 0.46268505],
       [0.26398072, 0.69073823, 0.83283653]]), array([[0.58404612, 0.47746048, 0.17151502],
       [0.44064297, 0.68384312, 0.20587172],
       [0.48539683, 0.69856363, 0.08156848],
       [0.49574047, 0.37072208, 0.92212866],
       [0.56501131, 0.42391551, 0.28821195],
       [0.59239158, 0.46652258, 0.67365999]])]


## 4. Manipulate an Array

### transpose an array

In [209]:
arr = np.arange(16).reshape((4,4))
print(arr, end ="\n\n")
print("Tanspose result is : \n\n")

 # the following methods return a copy
print(arr.T)
# or
print(np.transpose(arr))
# or
print(arr.transpose())

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

Tanspose result is : 


[[ 0  4  8 12]
 [ 1  5  9 13]
 [ 2  6 10 14]
 [ 3  7 11 15]]
[[ 0  4  8 12]
 [ 1  5  9 13]
 [ 2  6 10 14]
 [ 3  7 11 15]]
[[ 0  4  8 12]
 [ 1  5  9 13]
 [ 2  6 10 14]
 [ 3  7 11 15]]


### transpose of a high dimensional array with specified order of axes

In [210]:
arr1 = np.arange(16).reshape((2,2,4))
print(arr1)

arr1.transpose((1,0,2))
print(arr1)

[[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]]
[[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]]


### swap axes

In [211]:
arr1 = np.arange(16).reshape((2,2,4))
print(arr1.swapaxes(1,2))

[[[ 0  4]
  [ 1  5]
  [ 2  6]
  [ 3  7]]

 [[ 8 12]
  [ 9 13]
  [10 14]
  [11 15]]]


### change the shape of an array

In [213]:
# change the shape of an array and return a copy
arr.reshape((2,8))

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15]])

In [215]:
# change the shape of an array in place
arr.resize((8,2))

### flatten an array

In [216]:
# return a copy
arr.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [217]:
# return a view
# change any element in the view will change the initial array
arr.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

### append elements to an array

In [218]:
arr = np.array([1,2,3])

In [None]:
# append a scalar and return a copy
arr1 = np.append(arr, 4)
print(arr1)

[1 2 3 4]


In [219]:
# append an array and return a copy
arr2 = np.append(arr, [4,5,6])
print(arr2)

[1 2 3 4 5 6]


### insert elements into an array

In [220]:
# np.insert(array, position, element)

# insert a scalar at a certain position
arr3 = np.insert(arr, 0, 100)
print(arr3)

[100   1   2   3]


In [221]:
# insert multiple values at a certain position
arr3 = np.insert(arr, 0, [1,2,3])
print(arr3)

[1 2 3 1 2 3]


### delete elements from an array

In [222]:
# remove the element at position 0
arr4 = np.delete(arr, 0)
print(arr4)

[2 3]


In [223]:
# remove the element at multiple positions
arr4 = np.delete(arr, [0,2])
print(arr4)

[2]


### copy an array

In [224]:
arr = np.array([1,2,3])

In [225]:
# the following methods are all deep copy
arr1 = np.copy(arr)
# or
arr1 = arr.copy()
# or
arr1 = np.array(arr, copy=True)

## 5. Array indexing

Numpy offers several ways to index into arrays.

Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

### select an element by row and column indices

In [68]:
# arr = np.random.rand(6,6)
arr = np.array(range(36)).reshape(6,6)
print(arr, end="\n\n")
print(arr[5][5])
# or more concisely
print(arr[5,5])

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 31 32 33 34 35]]

35
35


### indexing with slicing

In [69]:
print(arr[1:3, 4:6])

[[10 11]
 [16 17]]


In [70]:
# ellipsis slicing: auto-complete the dimensions
arr = np.array(range(16)).reshape(2,2,2,2)
# equivalent to arr[0,:,:,:]
print(arr[0, ...])

[[[0 1]
  [2 3]]

 [[4 5]
  [6 7]]]


In [71]:
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]

a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]

b = a[:2, 1:3]
print(b)

[[2 3]
 [6 7]]


A slice of an array is a view into the same data, so modifying it will modify the original array.

In [72]:
print(a[0, 1])
b[0, 0] = 77    # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])

2
77


You can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array. Note that this is quite different from the way that MATLAB handles array slicing:

In [73]:
# Create the following rank 2 array with shape (3, 4)

a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


Two ways of accessing the data in the middle row of the array.
Mixing integer indexing with slices yields an array of lower rank,
while using only slices yields an array of the same rank as the
original array:

In [74]:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
row_r3 = a[[1], :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)
print(row_r2, row_r2.shape)
print(row_r3, row_r3.shape)

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)
[[5 6 7 8]] (1, 4)


In [75]:
# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape)
print()
print(col_r2, col_r2.shape)

[ 2  6 10] (3,)

[[ 2]
 [ 6]
 [10]] (3, 1)


**Integer array indexing:** When you index into numpy arrays using slicing, the resulting array view will always be a subarray of the original array. In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array. Here is an example:

In [76]:
a = np.array([[1,2], [3, 4], [5, 6]])

# An example of integer array indexing.
# The returned array will have shape (3,) and
print(a[[0, 1, 2], [0, 1, 0]])

# The above example of integer array indexing is equivalent to this:
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))

[1 4 5]
[1 4 5]


In [77]:
# When using integer array indexing, you can reuse the same
# element from the source array:
print(a[[0, 0], [1, 1]])

# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1]]))

[2 2]
[2 2]


One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:

In [78]:
# Create a new array from which we will select elements
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
print(a)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [79]:
# Create an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print(a[np.arange(4), b])  # Prints "[ 1  6  7 11]"

[ 1  6  7 11]


In [80]:
# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10
print(a)

[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]


**Boolean array indexing:** Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [81]:
a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)  # Find the elements of a that are bigger than 2;
                    # this returns a numpy array of Booleans of the same
                    # shape as a, where each slot of bool_idx tells
                    # whether that element of a is > 2.

print(bool_idx)

[[False False]
 [ True  True]
 [ True  True]]


In [82]:
# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx])

# We can do all of the above in a single concise statement:
print(a[a > 2])

[3 4 5 6]
[3 4 5 6]


In [83]:
arr1 = np.arange(25).reshape((5,5))
bools = np.array([True, True, False, True, False])
print(arr1[bools])

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [15 16 17 18 19]]


In [84]:
# negate the condition
print(arr1[~bools])

[[10 11 12 13 14]
 [20 21 22 23 24]]


In [85]:
arr2 = np.array([1,2,3,4,5])
# multiple conditions
print(arr1[(arr2<2) | (arr2>4)])

[[ 0  1  2  3  4]
 [20 21 22 23 24]]


### fancy indexing

In [89]:
arr = np.array(range(100)).reshape(10,10)
arr

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

In [90]:
# select arr[3,3], arr[1,2], arr[2,1]
print(arr[[3,1,2], [3,2,1]])

[33 12 21]


In [91]:
# select rows 3,1,2 and columns 6,4,8
print(arr[[3,1,2]][:, [6,4,8]])

[[36 34 38]
 [16 14 18]
 [26 24 28]]


In [92]:
# dimension inference using any negative number (usually -1)
arr = np.array(range(16)).reshape((4,-1))
print(arr.shape)

(4, 4)


### find elements/indices by conditions

In [93]:
arr = np.arange(16).reshape(4,4)

In [94]:
# find the elements greater than 5 and return a flattened array
print(arr[arr>5])    # or arr[np.where(arr>5)]

[ 6  7  8  9 10 11 12 13 14 15]


In [95]:
# return values based on conditions
# np.where(condition, true_return, false_return)
print(np.where(arr>5, -1, 10))

[[10 10 10 10]
 [10 10 -1 -1]
 [-1 -1 -1 -1]
 [-1 -1 -1 -1]]


In [96]:
# find the indices of the elements on conditions
print(np.argwhere(arr>5))

[[1 2]
 [1 3]
 [2 0]
 [2 1]
 [2 2]
 [2 3]
 [3 0]
 [3 1]
 [3 2]
 [3 3]]


## 6  Sampling Methods

### set seed

#####  Seed in Random Number Generation

In NumPy, a **seed** is an integer used to initialize the internal state of a pseudo-random number generator. Pseudo-random number generators produce sequences of numbers that appear random but are actually determined by an initial value (the seed).

*   **Why use a seed?** Setting a seed ensures that the sequence of random numbers generated is the same every time you run your code. This is crucial for **reproducibility**. If you need to get the same "random" results consistently, you should set a seed before generating random numbers.

*   **`np.random.seed(value)`**: This sets the seed for NumPy's global random number generator.
*   **`np.random.RandomState(value)`**: This creates a separate, independent random number generator object with its own seed. This is useful if you want different parts of your code to use independent sequences of random numbers, or if you want to manage multiple reproducible sequences.

In [172]:
np.random.seed(123)

### set random state which is independent from the global seed

In [173]:
rs = np.random.RandomState(321)
rs.rand(10)

array([0.88594794, 0.07791236, 0.97964616, 0.24767146, 0.75288472,
       0.52667564, 0.90755375, 0.8840703 , 0.08926896, 0.5173446 ])

### generate a random sample from interval [0, 1) in a given shape

In [174]:
# generate a random scalar
print(np.random.rand())

0.6964691855978616


In [175]:
# generate a 1-D array
print(np.random.rand(3))

[0.28613933 0.22685145 0.55131477]


In [176]:
# generate a 2-D array
print(np.random.rand(3,3))

[[0.71946897 0.42310646 0.9807642 ]
 [0.68482974 0.4809319  0.39211752]
 [0.34317802 0.72904971 0.43857224]]


### generate a sample from the standard normal distribution (mean = 0, var = 1)

In [177]:
print(np.random.randn(3,3))

[[-0.14337247 -0.6191909  -0.76943347]
 [ 0.57674602  0.12652592 -1.30148897]
 [ 2.20742744  0.52274247  0.46564476]]


### generate an array of random integers in a given interval [low, high)

In [178]:
# np.ranodm.randint(low, high, size, dtype)
print(np.random.randint(1, 10, 3, 'i8'))

[5 7 2]


### generate an array of random floating-point numbers in the interval [0.0, 1.0)

In [179]:
# the following methods are the same as np.random.rand()
print(np.random.random_sample(10))
print(np.random.random(10))
print(np.random.ranf(10))
print(np.random.sample(10))

[0.65472131 0.37380143 0.23451288 0.98799529 0.76599595 0.77700444
 0.02798196 0.17390652 0.15408224 0.07708648]
[0.8898657  0.7503787  0.69340324 0.51176338 0.46426806 0.56843069
 0.30254945 0.49730879 0.68326291 0.91669867]
[0.10892895 0.49549179 0.23283593 0.43686066 0.75154299 0.48089213
 0.79772841 0.28270293 0.43341824 0.00975735]
[0.34079598 0.68927201 0.86936929 0.26780382 0.45674792 0.26828131
 0.8370528  0.27051466 0.53006201 0.17537266]


### generate a random sample from a given 1-D array

In [180]:
# np.random.choice(iterable_or_int, size, replace=True, p=weights)
print(np.random.choice(range(3), 10, replace=True, p=[0.1, 0.8, 0.1]))

[1 1 1 1 1 1 1 2 2 1]


In [181]:
print(np.random.choice(3, 10))

[1 0 1 2 2 0 1 1 1 0]


In [182]:
print(np.random.choice([1,2,3], 10))

[2 2 1 3 2 3 1 2 1 3]


### shuffle an array in place

In [183]:
arr = np.array(range(10))
print(arr)

[0 1 2 3 4 5 6 7 8 9]


In [184]:
np.random.shuffle(arr)
print(arr)

[1 2 8 5 4 0 6 7 9 3]


### generate a permutation of an array

A **permutation** of a set of elements is an arrangement of those elements into a sequence or order. In the context of an array, a permutation is a reordering of its elements.

NumPy's `np.random.permutation(x)` function randomly shuffles the elements of an array `x`.

*   If `x` is an integer, `np.random.permutation(x)` returns a randomly shuffled array of integers from 0 up to (but not including) `x`.
*   If `x` is an array, `np.random.permutation(x)` returns a randomly shuffled **copy** of the array `x`. The original array `x` is not modified.

This is similar to `np.random.shuffle(x)`, but `np.random.shuffle()` shuffles the array *in place* (modifies the original array) and only works on the first dimension of multidimensional arrays. `np.random.permutation()` is more flexible as it returns a new array and can handle multidimensional arrays differently depending on the axis specified (though by default it shuffles along the first axis).

In [185]:
# similar to np.random.shuffle(), but it returns a copy rather than making changes in place
arr = np.array(range(10))
print('The initial array: ', arr)
print('A permutation of the array: ', np.random.permutation(arr))

The initial array:  [0 1 2 3 4 5 6 7 8 9]
A permutation of the array:  [3 6 2 4 5 9 1 8 0 7]


## 7. Sort an Array

In [186]:
arr = np.random.rand(5,5)

### sort an array along a specified axis

In [187]:
# sort along the row and return a copy
print(np.sort(arr, axis=0))

[[0.46023842 0.02942373 0.2936925  0.2402475  0.05893816]
 [0.478459   0.19317033 0.54109404 0.309199   0.28232096]
 [0.6553475  0.34924138 0.55204372 0.41799246 0.55948738]
 [0.67792545 0.52115943 0.58063202 0.4490534  0.88480501]
 [0.87350227 0.80912668 0.86857215 0.81792751 0.91988263]]


In [188]:
# sort along the row in place
arr.sort(axis=0)
print(arr)

[[0.46023842 0.02942373 0.2936925  0.2402475  0.05893816]
 [0.478459   0.19317033 0.54109404 0.309199   0.28232096]
 [0.6553475  0.34924138 0.55204372 0.41799246 0.55948738]
 [0.67792545 0.52115943 0.58063202 0.4490534  0.88480501]
 [0.87350227 0.80912668 0.86857215 0.81792751 0.91988263]]


In [189]:
# sort along the column and return a copy
print(np.sort(arr, axis=1))

[[0.02942373 0.05893816 0.2402475  0.2936925  0.46023842]
 [0.19317033 0.28232096 0.309199   0.478459   0.54109404]
 [0.34924138 0.41799246 0.55204372 0.55948738 0.6553475 ]
 [0.4490534  0.52115943 0.58063202 0.67792545 0.88480501]
 [0.80912668 0.81792751 0.86857215 0.87350227 0.91988263]]


In [190]:
# sort along the column in place
arr.sort(axis=1)
print(arr)

[[0.02942373 0.05893816 0.2402475  0.2936925  0.46023842]
 [0.19317033 0.28232096 0.309199   0.478459   0.54109404]
 [0.34924138 0.41799246 0.55204372 0.55948738 0.6553475 ]
 [0.4490534  0.52115943 0.58063202 0.67792545 0.88480501]
 [0.80912668 0.81792751 0.86857215 0.87350227 0.91988263]]


### compute the indices that would sort an array along a specified axis

In [191]:
arr = np.random.rand(5,5)

In [192]:
# along the row
print(np.argsort(arr, axis=0))

[[4 3 2 3 3]
 [3 2 1 4 4]
 [0 4 4 0 1]
 [1 0 0 2 0]
 [2 1 3 1 2]]


In [193]:
# along the column
print(np.argsort(arr, axis=1))

[[0 4 3 1 2]
 [2 4 0 1 3]
 [2 1 0 3 4]
 [1 4 0 3 2]
 [0 4 3 1 2]]


In [194]:
# if axis=None, return the indices of a flattened array
print(np.argsort(arr, axis=None))

[12 16 19 11 20  7 15 18 24  0  9  4 23 21 22  3  1  5  2 17 10  6 13 14
  8]


In [195]:
arr = np.random.rand(3,4)

## 8. Array Math

Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:

#### Basic Operations

In [101]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
print(x + y)
print("\n")
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]


[[ 6.  8.]
 [10. 12.]]


In [100]:
# Elementwise difference; both produce the array
print(x - y)
print("\n")
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]


[[-4. -4.]
 [-4. -4.]]


In [102]:
# Elementwise product; both produce the array
print(x * y)
print("\n")
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]


[[ 5. 12.]
 [21. 32.]]


In [103]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]


print(x / y)
print("\n")
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]


[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [104]:
# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]


print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


**the above operations  performed using numpy built-in functions can save memory as the output can be stored in the original array rather than assigning new memoryarr = np.array([1,2,3])**

In [105]:
# better approach is to use built in functions

arr1 = np.array([1,2,3])

np.add(arr1, [8,9,10], out=arr1)
print(arr1)

np.subtract(arr1, [8,9,10], out=arr1)
print(arr1)

np.multiply(arr1, [1,2,3], out=arr1)
print(arr1)

[ 9 11 13]
[1 2 3]
[1 4 9]


#### Dot Products

Note that unlike other, `*` is elementwise multiplication, not matrix multiplication. We instead use the dot function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:

In [106]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

219
219


You can also use the `@` operator which is equivalent to numpy's `dot` operator.

In [107]:
print(v @ w)

219


In [108]:
# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))
print(x @ v)

[29 67]
[29 67]
[29 67]


In [109]:
# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]


print(x.dot(y))
print(np.dot(x, y))
print(x @ y)

[[19 22]
 [43 50]]
[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


Numpy provides many useful functions for performing computations on arrays; one of the most useful is `sum`:

In [110]:
x = np.array([[1,2],[3,4]])

print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

10
[4 6]
[3 7]


### element-wise exponentiation

In [115]:
arr = np.arange(1,7).reshape(2,3)
print(arr, end="\n\n")
print(np.exp(arr))

[[1 2 3]
 [4 5 6]]

[[  2.71828183   7.3890561   20.08553692]
 [ 54.59815003 148.4131591  403.42879349]]


### element-wise logorithm

In [116]:
# natural log
print(np.log(arr))

[[0.         0.69314718 1.09861229]
 [1.38629436 1.60943791 1.79175947]]


In [117]:
# base 2
print(np.log2(arr))

[[0.         1.         1.5849625 ]
 [2.         2.32192809 2.5849625 ]]


In [118]:
# base 10
print(np.log10(arr))

[[0.         0.30103    0.47712125]
 [0.60205999 0.69897    0.77815125]]


### element-wise square root

In [119]:
print(np.sqrt(arr))

[[1.         1.41421356 1.73205081]
 [2.         2.23606798 2.44948974]]


### element-wise sine and cosine

In [120]:
print(np.sin(arr))

[[ 0.84147098  0.90929743  0.14112001]
 [-0.7568025  -0.95892427 -0.2794155 ]]


In [121]:
print(np.cos(arr))

[[ 0.54030231 -0.41614684 -0.9899925 ]
 [-0.65364362  0.28366219  0.96017029]]


### sum along a specified axis

In [122]:
# sum along the row
print(np.sum(arr, axis=0))

[5 7 9]


In [123]:
# sum along the column
print(np.sum(arr, axis=1))

[ 6 15]


### compute the min and max along a specified axis

In [124]:
# calculate min along the row
print(np.min(arr, axis=0))

[1 2 3]


In [125]:
# calculate max along the column
print(np.max(arr, axis=1))

[3 6]


In [126]:
# if axis not specified, calculate the max/min value of all elements
print(np.max(arr))
print(np.min(arr))

6
1


### compute the indices of the min and max along a specified axis

In [131]:
arr = np.arange(1,37).reshape(6,6)
arr

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12],
       [13, 14, 15, 16, 17, 18],
       [19, 20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36]])

In [132]:
# along the row
print(np.argmin(arr, axis=0))
print(np.argmax(arr, axis=0))

[0 0 0 0 0 0]
[5 5 5 5 5 5]


In [133]:
# along the column
print(np.argmin(arr, axis=1))
print(np.argmax(arr, axis=1))

[0 0 0 0 0 0]
[5 5 5 5 5 5]


In [134]:
# if axis not specified, return the index of the flattened array
print(np.argmin(arr))
print(np.argmax(arr))

0
35


### compute element-wise min and max of two arrays

In [135]:
arr1 = np.array([1, 3, 5, 7, 9])
arr2 = np.array([0, 4, 3, 8, 7])
print(np.maximum(arr1, arr2))
print(np.minimum(arr1, arr2))

[1 4 5 8 9]
[0 3 3 7 7]


### split fractional and integral parts of a floating-point array

In [136]:
arr1 = np.random.rand(10) * 10
re, intg = np.modf(arr1)
print('fractional: ', re)
print('integral: ', intg)

fractional:  [0.99737441 0.4987993  0.26852454 0.16444342 0.51937122 0.95821634
 0.21206234 0.20722194 0.65317224 0.29573126]
integral:  [8. 7. 4. 8. 7. 6. 7. 4. 7. 1.]


#### Compute the Mean

The **mean** (or average) of a dataset is calculated by summing all the values and dividing by the number of values. In NumPy, the `np.mean()` function allows you to easily compute the mean of an array.

*   **Overall Mean**: When `axis` is not specified, `np.mean()` computes the mean of all elements in the array, treating it as a single flattened sequence.
*   **Mean along an Axis**: You can specify the `axis` parameter to compute the mean along a specific dimension.
    *   `axis=0`: Computes the mean of each column.
    *   `axis=1`: Computes the mean of each row.

This is useful for understanding the central tendency of your data, either as a whole or along specific dimensions.

In [137]:
# compute the overall mean
print(np.mean(arr))

18.5


In [138]:
# compute the mean along the row
print(np.mean(arr, axis=0))

[16. 17. 18. 19. 20. 21.]


In [139]:
# compute the mean along the column
print(np.mean(arr, axis=1))

[ 3.5  9.5 15.5 21.5 27.5 33.5]


### compute the median

The **median** is the middle value in a dataset that is ordered from least to greatest. If there is an even number of observations, the median is the average of the two central values. In NumPy, you can compute the median using the `np.median()` function.

*   **Overall Median**: When `axis` is not specified, `np.median()` computes the median of all elements in the array, treating it as a single flattened sequence.
*   **Median along an Axis**: You can specify the `axis` parameter to compute the median along a specific dimension.
    *   `axis=0`: Computes the median of each column.
    *   `axis=1`: Computes the median of each row.

The median is often used as a measure of central tendency when the data might be skewed or contain outliers, as it is less affected by extreme values than the mean.

In [150]:
# compute the overall median
print(np.median(arr))

18.5


In [151]:
# compute the median along the row
print(np.median(arr, axis=0))

[16. 17. 18. 19. 20. 21.]


In [152]:
# compute the median along the column
print(np.median(arr, axis=1))

[ 3.5  9.5 15.5 21.5 27.5 33.5]


### compute the percentile

A **percentile** is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls. For example, the 20th percentile is the value below which 20% of the observations may be found.

In NumPy, you can compute the q-th percentile of the data along the specified axis using the `np.percentile(a, q, axis=None)` function.

*   `a`: The input array.
*   `q`: The percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive.
*   `axis`: The axis along which the percentiles are computed. If `None`, the percentile of the flattened array is computed.

Percentiles are useful for understanding the distribution of your data and identifying values at specific points in the distribution.

In [156]:
#arr1 = np.random.rand(100)


arr1 = np.arange(1,10)
# compute 5, 65, and 95 percentiles of the array
print(np.percentile(arr1, [5, 65, 95]))

[1.4 6.2 8.6]


### compute the standard deviation & variance

**Standard Deviation** and **Variance** are both measures of the dispersion or spread of data points in a dataset. They indicate how much the data varies from the mean.

*   **Variance**: The average of the squared differences from the Mean. It gives a measure of how spread out the data is from its average value. In NumPy, you compute the variance using `np.var()`.
*   **Standard Deviation**: The square root of the variance. It is often preferred over variance because it is in the same units as the original data, making it easier to interpret. In NumPy, you compute the standard deviation using `np.std()`.

Both `np.var()` and `np.std()` accept an `axis` parameter, similar to `np.mean()` and `np.median()`, to compute the variance or standard deviation along a specific dimension (rows or columns) or for the entire flattened array if `axis=None`.

These measures are crucial for understanding the variability within your data. A low standard deviation or variance indicates that the data points tend to be close to the mean, while a high value indicates that the data points are more spread out.

In [157]:
# compute the overall standard deviation
print(np.std(arr))

10.388294694831615


In [158]:
# compute the standard deviation along the row
print(np.std(arr, axis=0))

[10.24695077 10.24695077 10.24695077 10.24695077 10.24695077 10.24695077]


In [159]:
# compute the standard deviation along the column
print(np.std(arr, axis=1))

[1.70782513 1.70782513 1.70782513 1.70782513 1.70782513 1.70782513]


In [160]:
# compute the overall variance
print(np.var(arr))

107.91666666666667


In [161]:
# compute the variance along the row
print(np.var(arr, axis=0))

[105. 105. 105. 105. 105. 105.]


In [162]:
# compute the variance along the column
print(np.var(arr, axis=1))

[2.91666667 2.91666667 2.91666667 2.91666667 2.91666667 2.91666667]


### compute the covariance & correlation

**Covariance** and **Correlation** are statistical measures that describe the relationship between two variables.

*   **Covariance**: Measures the extent to which two variables change together.
    *   A positive covariance indicates that as one variable increases, the other tends to increase as well.
    *   A negative covariance indicates that as one variable increases, the other tends to decrease.
    *   A covariance close to zero suggests little to no linear relationship between the variables.
    In NumPy, you can compute the covariance matrix of an array using `np.cov()`. The output is a covariance matrix where the element at row i, column j is the covariance between the i-th and j-th variables (rows or columns of the input array, depending on how you interpret it).

*   **Correlation**: Measures the strength and direction of the linear relationship between two variables. Unlike covariance, correlation is standardized, so its value is always between -1 and +1.
    *   A correlation of +1 indicates a perfect positive linear relationship.
    *   A correlation of -1 indicates a perfect negative linear relationship.
    *   A correlation of 0 indicates no linear relationship.
    In NumPy, you can compute the correlation coefficient between two arrays (or between rows/columns of an array) using `np.corrcoef()`.

Both covariance and correlation are useful for understanding how different parts of your data are related.

In [163]:
arr = np.random.rand(5,8)

In [164]:
print(np.cov(arr))

[[ 0.03974513 -0.00848707  0.00697338  0.0073406  -0.00718959]
 [-0.00848707  0.06512394 -0.0148455  -0.03567443  0.02162814]
 [ 0.00697338 -0.0148455   0.04740731 -0.00290729 -0.01840968]
 [ 0.0073406  -0.03567443 -0.00290729  0.10694938 -0.05034722]
 [-0.00718959  0.02162814 -0.01840968 -0.05034722  0.04020358]]


In [165]:
print(np.corrcoef(arr[:,0], arr[:,1]))

[[ 1.        -0.2175232]
 [-0.2175232  1.       ]]


### compute cumulative sum & product

*   **Cumulative Sum**: The cumulative sum, also known as the running total, is a sequence where each element is the sum of the current element and all preceding elements in the sequence. In NumPy, you can compute the cumulative sum using `np.cumsum()`.
*   **Cumulative Product**: The cumulative product is a sequence where each element is the product of the current element and all preceding elements in the sequence. In NumPy, you can compute the cumulative product using `np.cumprod()`.

Both `np.cumsum()` and `np.cumprod()` accept an `axis` parameter to compute the cumulative sum or product along a specific dimension (rows or columns) or for the entire flattened array if `axis=None`. These are useful for analyzing trends and accumulations in your data.

In [166]:
# calculate the cumulative sums along the row
print(np.cumsum(arr, axis=0))

[[0.35373568 0.73571481 0.46267927 0.38472441 0.62237965 0.3492623
  0.84762725 0.74335428]
 [0.47873478 1.11033135 1.36710114 0.94303758 1.40563971 0.89787495
  1.20489545 1.07117677]
 [1.21024174 1.58650188 1.65811608 1.39995156 2.30900294 1.3610427
  1.64053036 1.89772579]
 [2.08451392 1.9075043  2.44795462 1.7945999  2.38170094 1.62941885
  2.35968422 2.86445118]
 [2.57305704 2.88259496 3.15617021 2.70173396 3.13695273 2.37710054
  2.90798016 3.26388082]]


In [167]:
# calculate the cumulative sums along the column
print(np.cumsum(arr, axis=1))

[[0.35373568 1.08945049 1.55212977 1.93685417 2.55923382 2.90849612
  3.75612337 4.49947765]
 [0.1249991  0.49961564 1.4040375  1.96235067 2.74561074 3.29422339
  3.65149159 3.97931408]
 [0.73150696 1.20767748 1.49869242 1.95560641 2.85896964 3.32213739
  3.75777229 4.58432131]
 [0.87427218 1.19527461 1.98511316 2.37976149 2.45245949 2.72083564
  3.4399895  4.40671489]
 [0.48854312 1.46363378 2.17184936 3.07898342 3.83423521 4.5819169
  5.13021284 5.52964249]]


In [168]:
# calculate the cumulative product along the row
print(np.cumprod(arr, axis=0))

[[0.35373568 0.73571481 0.46267927 0.38472441 0.62237965 0.3492623
  0.84762725 0.74335428]
 [0.04421664 0.27561094 0.41845725 0.2147967  0.48748513 0.19160972
  0.30283026 0.24368825]
 [0.03234478 0.1312378  0.12177731 0.09814362 0.44037614 0.08874744
  0.13192343 0.20142029]
 [0.02827814 0.04212765 0.09618441 0.03873222 0.03201446 0.0238177
  0.09487325 0.1947181 ]
 [0.01381509 0.04107828 0.0681193  0.03513531 0.02417898 0.01780806
  0.05201862 0.07777618]]


In [169]:
# calculate the cumulative product along the column
print(np.cumprod(arr, axis=1))

[[0.35373568 0.26024858 0.12041162 0.04632529 0.02883192 0.0100699
  0.00853552 0.00634492]
 [0.1249991  0.04682673 0.04235112 0.02364519 0.01852033 0.01016049
  0.00363002 0.00119   ]
 [0.73150696 0.34832205 0.10136692 0.04631596 0.04184014 0.019379
  0.00844217 0.00697787]
 [0.87427218 0.28064349 0.22166305 0.08747895 0.00635954 0.00170675
  0.00122742 0.00118657]
 [0.48854312 0.47637383 0.33737537 0.30604469 0.2311408  0.17281974
  0.09475636 0.0378485 ]]


### element-wise comparison

In [None]:
arr1 = np.array([1,2,3,4,5])
arr2 = np.array([5,4,3,2,1])

In [None]:
# return an array of bools
print(arr1 == arr2)
print(arr1 < 3)

[False False  True False False]
[ True  True False False False]


You can find the full list of mathematical functions provided by numpy in the [documentation](http://docs.scipy.org/doc/numpy/reference/routines.math.html).

Apart from computing mathematical functions using arrays, we frequently need to reshape or otherwise manipulate data in arrays. The simplest example of this type of operation is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object:

In [170]:
print(x)
print("transpose\n", x.T)

[[1 2]
 [3 4]]
transpose
 [[1 3]
 [2 4]]


In [171]:
v = np.array([[1,2,3]])
print(v )
print("transpose\n", v.T)

[[1 2 3]]
transpose
 [[1]
 [2]
 [3]]


## 9. Broadcasting

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:

In [196]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


This works; however when the matrix `x` is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix `x` is equivalent to forming a matrix `vv` by stacking multiple copies of `v` vertically, then performing elementwise summation of `x` and `vv`. We could implement this approach like this:

In [197]:
vv = np.tile(v, (4, 1))  # Stack 4 copies of v on top of each other
print(vv)                # Prints "[[1 0 1]
                         #          [1 0 1]
                         #          [1 0 1]
                         #          [1 0 1]]"

[[1 0 1]
 [1 0 1]
 [1 0 1]
 [1 0 1]]


In [198]:
y = x + vv  # Add x and vv elementwise
print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:

In [199]:
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting
print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


The line `y = x + v` works even though `x` has shape `(4, 3)` and `v` has shape `(3,)` due to broadcasting; this line works as if v actually had shape `(4, 3)`, where each row was a copy of `v`, and the sum was performed elementwise.

Broadcasting two arrays together follows these rules:

1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension

If this explanation does not make sense, try reading the explanation from the [documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) or this [explanation](http://wiki.scipy.org/EricsBroadcastingDoc).

Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the [documentation](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs).

Here are some applications of broadcasting:

In [200]:
# Compute outer product of vectors
v = np.array([1,2,3])  # v has shape (3,)
w = np.array([4,5])    # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:

print(np.reshape(v, (3, 1)) * w)

[[ 4  5]
 [ 8 10]
 [12 15]]


In [201]:
# Add a vector to each row of a matrix
x = np.array([[1,2,3], [4,5,6]])
# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),
# giving the following matrix:

print(x + v)

[[2 4 6]
 [5 7 9]]


In [202]:
# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:

print((x.T + w).T)

[[ 5  6  7]
 [ 9 10 11]]


In [203]:
# Another solution is to reshape w to be a row vector of shape (2, 1);
# we can then broadcast it directly against x to produce the same
# output.
print(x + np.reshape(w, (2, 1)))

[[ 5  6  7]
 [ 9 10 11]]


In [204]:
# Multiply a matrix by a constant:
# x has shape (2, 3). Numpy treats scalars as arrays of shape ();
# these can be broadcast together to shape (2, 3), producing the
# following array:
print(x * 2)

[[ 2  4  6]
 [ 8 10 12]]


#### assign a scalar to a slice by broadcasting

In [205]:
arr[1:3,:] = 100    # or simply arr[1:3]
arr[:,8:] = 100
print(arr)

[[  0.38151211   0.80738444   0.81103914   0.68618686]
 [100.         100.         100.         100.        ]
 [100.         100.         100.         100.        ]]


Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.

This brief overview has touched on many of the important things that you need to know about numpy, but is far from complete. Check out the [numpy reference](http://docs.scipy.org/doc/numpy/reference/) to find out much more about numpy.

## 10. Set Operations

### select the unique elements from an array

In [236]:
arr = np.array([1,1,2,2,3,3,4,5,6])
print(np.unique(arr))

[1 2 3 4 5 6]


In [237]:
# return the number of times each unique item appears
arr = np.array([1,1,2,2,3,3,4,5,6])
uniques, counts = np.unique(arr, return_counts=True)
print(uniques)
print(counts)

[1 2 3 4 5 6]
[2 2 2 1 1 1]


### compute the intersection & union of two arrays

In [238]:
arr1 = np.array([1,2,3,4,5])
arr2 = np.array([3,4,5,6,7])

In [None]:
# intersection
print(np.intersect1d(arr1, arr2))

[3 4 5]


In [239]:
# union
print(np.union1d(arr1, arr2))

[1 2 3 4 5 6 7]


### compute whether each element of an array is contained in another

In [240]:
print(np.in1d(arr1, arr2))

[False False  True  True  True]


  print(np.in1d(arr1, arr2))


In [241]:
# preserve the shape of the array in the output, if the array is of higher dimensions
print(np.isin(arr1, arr2))

[False False  True  True  True]


### compute the elements in an array that are not in another

In [242]:
print(np.setdiff1d(arr1, arr2))

[1 2]


### compute the elements in either of two arrays, but not both

In [243]:
print(np.setxor1d(arr1, arr2))

[1 2 6 7]


## 11. Linear Algebra

In [245]:
arr1 = np.random.rand(5,5)
arr2 = np.random.rand(5,5)

### matrix multiplication

In [246]:
print(arr1.dot(arr2))
# or
print(np.dot(arr1, arr2))
# or
print(arr1 @ arr2)

[[1.10409595 0.74598991 1.54982469 0.94835571 0.90139923]
 [1.7440141  1.21514718 2.73426432 1.82372854 1.97764235]
 [1.79746896 0.73717412 2.18846082 1.74474151 1.62744871]
 [1.52885737 0.91960591 2.33354627 1.56466564 1.57974619]
 [1.77472597 1.11313814 2.3796251  1.56366946 1.55763806]]
[[1.10409595 0.74598991 1.54982469 0.94835571 0.90139923]
 [1.7440141  1.21514718 2.73426432 1.82372854 1.97764235]
 [1.79746896 0.73717412 2.18846082 1.74474151 1.62744871]
 [1.52885737 0.91960591 2.33354627 1.56466564 1.57974619]
 [1.77472597 1.11313814 2.3796251  1.56366946 1.55763806]]
[[1.10409595 0.74598991 1.54982469 0.94835571 0.90139923]
 [1.7440141  1.21514718 2.73426432 1.82372854 1.97764235]
 [1.79746896 0.73717412 2.18846082 1.74474151 1.62744871]
 [1.52885737 0.91960591 2.33354627 1.56466564 1.57974619]
 [1.77472597 1.11313814 2.3796251  1.56366946 1.55763806]]


### QR factorization

**QR factorization** (also known as QR decomposition) is a decomposition of a matrix A into a product of two matrices, Q and R.

*   **Q**: An orthogonal matrix, meaning its transpose is equal to its inverse ($Q^T Q = I$).
*   **R**: An upper triangular matrix.

So, $A = QR$.

QR factorization is a fundamental operation in linear algebra and has many applications, including solving linear least squares problems, finding eigenvalues, and performing singular value decomposition (SVD).

In NumPy, you can compute the QR factorization of a matrix using the `np.linalg.qr(a)` function, where `a` is the input matrix. It returns the orthogonal matrix Q and the upper triangular matrix R.

In [247]:
arr = np.random.rand(5,5)

q, r = np.linalg.qr(arr)
print(q)
print(r)

[[-0.11375442  0.63639254 -0.12395333  0.30013251  0.69037708]
 [-0.44084955 -0.02722395  0.44681994  0.72489892 -0.28246068]
 [-0.55676252  0.11278461 -0.75333103  0.00087532 -0.33134079]
 [-0.06034101  0.72349201  0.34706031 -0.39585467 -0.44245508]
 [-0.69216043 -0.24104423  0.31149482 -0.47722149  0.37154036]]
[[-1.28467125 -0.89378123 -1.14920241 -0.76471119 -0.91817004]
 [ 0.          1.15847005  0.14980009  0.63311411 -0.14431689]
 [ 0.          0.         -0.41254553  0.28672483  0.24022043]
 [ 0.          0.          0.          0.2835184  -0.14297812]
 [ 0.          0.          0.          0.          0.20353741]]


In [257]:
# Verify if Q @ R equals the original array
# Note: Due to floating-point precision, the result might not be exactly equal.
# We can use np.allclose to check if the elements are close within a tolerance.

reconstructed_arr = q @ r

print("Original array:\n", arr)
print("\nReconstructed array (Q @ R):\n", reconstructed_arr)

# Check if the reconstructed array is close to the original array
is_close = np.allclose(arr, reconstructed_arr)
print("\nIs the reconstructed array close to the original array?", is_close)

Original array:
 [[0.70682921 0.31981183 0.98050237 0.15655755 0.2042632 ]
 [0.4837251  0.47783535 0.06774467 0.52412791 0.82371056]
 [0.42333087 0.58119017 0.75228461 0.39554438 0.46583576]
 [0.79078103 0.80486584 0.16677372 0.77011669 0.4693528 ]
 [0.92542108 0.33079165 0.30494205 0.4842225  0.92490669]]

Reconstructed array (Q @ R):
 [[1.46137037e-01 8.38913274e-01 2.77194910e-01 5.39450971e-01
  8.04327727e-02]
 [5.66346735e-01 3.62484911e-01 3.18213641e-01 6.53523262e-01
  3.54903002e-01]
 [7.15256807e-01 6.28281491e-01 9.67511330e-01 2.81417524e-01
  2.46395046e-01]
 [7.75183545e-02 8.92075490e-01 3.45450188e-02 4.91475169e-01
  9.04559704e-04]
 [8.89198605e-01 3.39397471e-01 6.30818196e-01 3.30706548e-01
  8.88989739e-01]]

Is the reconstructed array close to the original array? False


```np.allclose``` returned ```False``` because, while the reconstructed array ```(q @ r)``` is mathematically equivalent to the original array ```(arr)```, the calculation involves floating-point numbers. Due to the nature of floating-point arithmetic, small precision errors can occur during the matrix multiplication.

```np.allclose``` checks if the elements of two arrays are equal within a certain tolerance. If the differences between the corresponding elements in the original and reconstructed arrays exceed this tolerance, ```np.allclose``` returns ```False```.

In this case, the differences are likely very small, but still large enough to fall outside the default tolerance of ```np.allclose``` This is a common occurrence when performing matrix operations with floating-point numbers.

### singular value decomposition (SVD)

**Singular Value Decomposition (SVD)** is a powerful matrix factorization technique that decomposes any matrix into three other matrices:

$A = U \Sigma V^T$

*   **U**: An orthogonal matrix whose columns are the left singular vectors of A.
*   **$\Sigma$ (Sigma)**: A diagonal matrix containing the singular values of A. The singular values are the square roots of the eigenvalues of $A^T A$ (or $A A^T$) and are typically arranged in descending order.
*   **$V^T$**: The transpose of an orthogonal matrix V, where the columns of V are the right singular vectors of A.

SVD has numerous applications in various fields, including:

*   **Dimensionality Reduction**: Used in techniques like Principal Component Analysis (PCA).
*   **Noise Reduction**: Can be used to denoise data.
*   **Recommender Systems**: Used in collaborative filtering.
*   **Image Compression**: Can approximate images with fewer values.
*   **Solving Linear Least Squares Problems**: Provides a stable way to find solutions.

In NumPy, you can compute the SVD of a matrix using `np.linalg.svd(a)`, which returns the matrices U, the singular values (as a 1D array), and the matrix V transpose ($V^T$).

In [248]:
arr = np.random.rand(5,5)

u, s, v = np.linalg.svd(arr)
print(u)
print(s)
print(v)

[[-0.49312949 -0.6789751   0.49953648  0.16224149 -0.14126969]
 [-0.45407436 -0.1543861  -0.76923219 -0.0248086  -0.42148283]
 [-0.53434012  0.25560585  0.13902699 -0.74547763  0.27217804]
 [-0.34744765  0.04011857 -0.23491649  0.4961579   0.7591523 ]
 [-0.38000074  0.66948778  0.29022557  0.41370603 -0.38987433]]
[2.66356529 0.78733805 0.60507145 0.35699155 0.30203217]
[[-0.51092717 -0.44532832 -0.28972894 -0.53443287 -0.41361185]
 [-0.35914736  0.87757913 -0.09101391 -0.30021345 -0.04956192]
 [-0.51370834 -0.05543708  0.80336943  0.23800546 -0.17601543]
 [-0.0598256  -0.14497825  0.28511107 -0.56396382  0.7589853 ]
 [-0.58522752 -0.0862878  -0.4255383   0.4995514   0.4684322 ]]


### compute eigen values

In linear algebra, **eigenvalues** are special scalar values associated with a linear transformation (like multiplying by a matrix) that describe how a vector is stretched or compressed by the transformation. For a square matrix A, a scalar $\lambda$ is an eigenvalue if there exists a non-zero vector $v$ (called the eigenvector) such that:

$Av = \lambda v$

This equation means that when the matrix A multiplies the eigenvector v, the result is simply a scaled version of the same vector, with the scaling factor being the eigenvalue $\lambda$.

Eigenvalues are important in many areas, including:

*   **Stability analysis** of systems
*   **Principal Component Analysis (PCA)** for dimensionality reduction
*   **Solving systems of linear differential equations**
*   **Quantum mechanics**

In NumPy, you can compute the eigenvalues of a square matrix using `np.linalg.eigvals(a)`.

In [249]:
arr = np.random.rand(5,5)
print(np.linalg.eigvals(arr))

[ 2.71799309+0.j         -0.38111642+0.j         -0.1923111 +0.17961095j
 -0.1923111 -0.17961095j -0.04206512+0.j        ]


### eigen value decomposition

Eigenvalue decomposition is a way to break down a square matrix into a set of eigenvectors and eigenvalues. For a square matrix A, the eigenvalue decomposition is given by:

$A = V \Lambda V^{-1}$

*   **V**: A matrix whose columns are the eigenvectors of A.
*   **$\Lambda$ (Lambda)**: A diagonal matrix where the diagonal elements are the corresponding eigenvalues.
*   **$V^{-1}$**: The inverse of the matrix V.

This decomposition is useful for understanding the properties of the matrix and for solving various linear algebra problems.

In NumPy, you can compute the eigenvalue decomposition of a square matrix using `np.linalg.eig(a)`, where `a` is the input matrix. It returns two outputs:
*   The eigenvalues (as a 1D array).
*   The eigenvectors (as a matrix where each column is an eigenvector).

In [250]:
arr = np.random.rand(5,5)

w, v = np.linalg.eig(arr)
print(w)    # eigen values
print(v)    # eigen vectors

[ 1.93939308+0.j          0.71534386+0.j         -0.40792043+0.17018514j
 -0.40792043-0.17018514j  0.08494769+0.j        ]
[[-0.40674833+0.j         -0.24569904+0.j         -0.06657798-0.38282898j
  -0.06657798+0.38282898j -0.28599678+0.j        ]
 [-0.59256114+0.j         -0.69398076+0.j         -0.29778906+0.2309838j
  -0.29778906-0.2309838j   0.64676561+0.j        ]
 [-0.49338132+0.j         -0.03081094+0.j         -0.04405564+0.18710669j
  -0.04405564-0.18710669j -0.56382205+0.j        ]
 [-0.22083971+0.j          0.63192996+0.j         -0.28328216-0.06879262j
  -0.28328216+0.06879262j  0.30686231+0.j        ]
 [-0.43730057+0.j          0.24028712+0.j          0.76488332+0.j
   0.76488332-0.j         -0.29637864+0.j        ]]


### compute the trace & determinant

### Compute the Trace & Determinant

*   **Trace**: The trace of a square matrix is the sum of the elements on its main diagonal (from the upper left to the lower right). In NumPy, you can compute the trace using `np.trace()`.
*   **Determinant**: The determinant of a square matrix is a scalar value that can be computed from the elements of the matrix. It has various applications in linear algebra, such as determining if a matrix is invertible (a matrix is invertible if and only if its determinant is non-zero) and finding the area or volume of a transformation. In NumPy, you can compute the determinant using `np.linalg.det()`.

In [251]:
# notice this is not a function in linalg!!!
print(np.trace(arr))

1.9238437783455455


In [252]:
print(np.linalg.det(arr))

0.023023560596183268


### calculate the inverse/psedo-inverse of a matrix

*   **Inverse**: For a square matrix A, its inverse, denoted by $A^{-1}$, is a matrix such that when multiplied by A, it yields the identity matrix ($A A^{-1} = A^{-1} A = I$). A matrix has an inverse if and only if its determinant is non-zero (i.e., it is a non-singular matrix). In NumPy, you can compute the inverse using `np.linalg.inv(a)`.

*   **Pseudo-inverse (Moore-Penrose inverse)**: The pseudo-inverse, denoted by $A^+$, is a generalization of the inverse to non-square matrices or singular square matrices. It is particularly useful for solving linear least squares problems. In NumPy, you can compute the pseudo-inverse using `np.linalg.pinv(a)`.

The inverse is used for solving systems of linear equations and other operations where a unique solution exists. The pseudo-inverse is used when dealing with systems that may not have a unique solution or when the matrix is not invertible.

In [258]:
arr = np.random.rand(5,5)

In [259]:
# compute the inverse of a matrix
print(np.linalg.inv(arr))

[[-1.25566857e-01  1.80087707e+00  3.20243801e-03 -1.28693035e+00
   6.93026352e-01]
 [ 3.90968432e-01  3.92376178e+00 -2.64044968e+00 -1.54386159e+00
  -7.04241538e-02]
 [ 5.17252362e-02 -3.21774325e+00  2.39373895e+00  2.01523771e+00
  -9.20733435e-01]
 [ 8.83028741e-01 -2.52588571e-01 -1.71796316e-01 -5.89508299e-01
   7.60084223e-01]
 [-6.19968961e-01 -1.31347899e+00  1.65603200e-01  1.38596522e+00
   3.40965122e-01]]


In [260]:
# compute the psudo-inverse of a matrix
print(np.linalg.pinv(arr))

[[-1.25566857e-01  1.80087707e+00  3.20243801e-03 -1.28693035e+00
   6.93026352e-01]
 [ 3.90968432e-01  3.92376178e+00 -2.64044968e+00 -1.54386159e+00
  -7.04241538e-02]
 [ 5.17252362e-02 -3.21774325e+00  2.39373895e+00  2.01523771e+00
  -9.20733435e-01]
 [ 8.83028741e-01 -2.52588571e-01 -1.71796316e-01 -5.89508299e-01
   7.60084223e-01]
 [-6.19968961e-01 -1.31347899e+00  1.65603200e-01  1.38596522e+00
   3.40965122e-01]]


### solve a linear system

You can solve a linear system of equations in the form $Ax = b$, where A is a matrix, x is the vector of unknowns, and b is a vector of constants.

*   **Using `np.linalg.solve(a, b)`**: This function solves the linear system $Ax = b$ for x. It is generally preferred for square, non-singular matrices as it is more efficient and accurate than computing the inverse and multiplying.
*   **Using `np.linalg.lstsq(a, b)`**: This function computes the least-squares solution to a linear system. It is particularly useful when the system is overdetermined (more equations than unknowns) or underdetermined (fewer equations than unknowns), or when the matrix A is singular. It returns the solution vector, residuals, rank of the matrix, and singular values.

In NumPy, these functions provide efficient ways to find solutions to linear systems, which are fundamental in many scientific and engineering applications.

In [261]:
# solve a linear system in closed form
y = [1,2,3,4,5]
print(np.linalg.solve(arr, y))

[ 1.80320497 -6.21042417  4.25473927  1.30485057  4.49856915]


In [262]:
# calculate the least-squares solution of a linear system
y = [1,2,3,4,5]
solution, residuals, rank, singular = np.linalg.lstsq(arr, y)
print(solution)
print(residuals)
print(rank)
print(singular)

[ 1.80320497 -6.21042417  4.25473927  1.30485057  4.49856915]
[]
5
[2.51904553 0.88050901 0.79109005 0.6058955  0.1393422 ]
