# Important Numpy Methods and Tricks

---
---|||
### Fancy (Sub) Indexing

This provides a mechanism where one can select *particular rows* from an ndarray. Initially, I had though that passing the second arguments, would also return the columns of the selected rows in the order specified, but running the code behaved differently.

E.g Say we have the following 2D array, 

- [10, 11, 12, 13]
- [20, 21, 22, 23]
- [30, 31, 32, 33]
- [40, 41, 42, 43]

and we want to **select the 4th, 1st, and 3rd row in that order**, i.e.
- [40, 41, 42, 43]
- [10, 11, 12, 13]
- [30, 31, 32, 33]

then we use the following sub-indexing

- **array[ [3, 0, 2] ]** =>> return an array containg the rows in the same order specified.

Now, one might be tempted to say, we also want to get the say; **first and last column of the returned slice, so that we would have;

- [40, 43]
- [30, 33]

*But an error will be returned, because the second sub-indexing does not act on the entire slice returned but **on independent row slices**. Thats is, the rows returned are independent slice from the orders, and hence any operation we want to perform using the second sub-indexing, will now act on the independent slice. In our case, we got a 1D independed slice arrays, and therefore the second sub indexing will **act only on the element in the array***

> Of importance to note is that the second sub-index must be of the same length as that of  firs sub-index, or be broadcastable to the length of the first index

As an example, passing the following to the preceeding array

- **array[ [3, 0, 2], [0]]: broadcasting will occur here, and this will be;
  - **array[ [3, 0, 2], [0, 0, 0] ]

the result of this will  be as follows:

- return the **4th, 1st, and 2nd rows of the source array**
- then return the **first elements (our 0)** of this independents rows

this will then return [40, 10, 30] as the result


> The take away is that one must see sub-indexing as returning independent slices, in the order required and the second sub-indexing acting on this independent slice

> Note when a slice is returned by the first argument (i.e. instead of using sub-indexing, we used a range instead), then the second argument will see a slice and not independent slices

---

---
---|||

### **nonzero** method

Accidentally: while trying to **find the indices of non-zero elements in array**

> *It works also works for higher dimension, returning arrays containing the indices arong the last axis*
---

---|||
### **eye** and **identity** method

Curiously: they are similar, except that *eye* permits more modification than *identity*. While the *identity* will always return an identity **square matrix** of specifed order along the main diagonal, the eye method can return a **non-square matrix**

---

---|||
### **randint**, **random**, **randn**method

They allow for initializing an ndarray with random values (and size)

##### **randint(low, high, size)**: 

Used for creating random integers between a **low**(inclusive) value and a **high**(exclusive) value, with an **optional size argument, that defaults to 1, i.e. return a single value**. 

Note: the size can be specified as the shape of the ndarray to be returned, and the array will be populated with the random integer values

---

##### **random**: 

This is used for creating random values between [0.0, 1.0)

- **the values are floats**, not integers.
- unlike randint, it takes a value that determines the size, and shape of the output arrays

---

##### **randn**: Normally Distributed Random Values (N ~ (mean = 0, std = 1))

This is similar to to **randint**, except that:

- **the values are floats**, not integers.
- the returned values are **normally distributed**, as the values are selected from a random distribution of **mean = 0**, and **standard deviation = 1**.

---

##### **rand**: Uniformly Distributed Random Values (U ~ (a = 0, b = 1))

This is similar to to **randint**, except that:

- the values are between 0 and 1(exclusive)
- **the values are floats**, not integers.
- the returned values are **uniformly distributed**, as the values are selected from a random distribution of **a = 0**, and **b = 1**.

---
---|||

### **diagonal and diag method** (array, k)

They perform the same operation, but the **diagonal** perform its operation **in-place**, while **diag** returns a copy.

If array, is 1d, then it constructs a square matrix with the main diagonal replaced by the data in the array. 

**The value of k (defaults: k = 0), determines which diagonal to place the data**;

- if k > 0: above the main diagonal
- if k < 0: below the main diagonal

I had to construct a (5, 5) matrix and replace diagonal elements just below the main diagonal with new data. I concocted a rather convoluted solution using the *eye* method by:

- making the diagonal k = -1, just below the main diagonal
- replacing the elements there by the given data using sub-indexing

Turns out I could do this in a single step using the diagonal or diag method.

---



In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import scipy
from matplotlib import pyplot as plt

### NDARRAY: Multidimensional Array Object


In [None]:
# generate random data filling 2 rows and three columns
data = np.random.randn(2, 3)
data

In [None]:
# multiply all the elements in the array by 100
data * 100

In [None]:
# get the shape of the array
data.shape

In [None]:
# get the type of data in the array
data.dtype

## Creating ndarray

In [None]:
# Creating ndarrays
data1 = [1, 2, 3, 4, 5.1, 6.2]
arr1 = np.array(data1)

In [None]:
arr1

In [None]:
data2 = ['c', 'b', 'a']
arr2 = np.array(data2)

In [None]:
arr2

In [None]:
data3 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr3 = np.array(data3)

In [None]:
print(arr3.shape)
print(arr3.dtype)
arr3

In [None]:
# get the dimension of the array
print(arr1.ndim)
print(arr3.ndim)

In [None]:
# create arrays filled with zeros, given a specific length or shape 

# an array of 10 zeroes in length (10,)
a = np.zeros(10)

# an array of a given shape (3, 4), containing zeroes
b = np.zeros((3, 4))

print('a', a)
print()
print('b' , b)

In [None]:
# creating an array of ones
a = np.ones(10)
b = np.ones((3, 4))

print('a', a)
print()
print('b', b)
b.ndim

In [None]:
# using empty to intialise an array
np.empty((3, 4,))

### NOTE
using empty method to intiialise an ndarray is almost similar to using random, except that it could fill it with garbage values

---

## Arange 

Creates an ndarray from start to stop-1; just like the python range function

In [None]:
# create an array of elements starting from 10 to 100, in steps of 2
np.arange(10, 101, 2,)

## Linspace 

works similarly as *arange* but the third specified parameter indicates how many element should be returned between the *start* and *stop* (inclusive)

> The elements returned are equally spaced

In [None]:
np.linspace(1, 100, 10)

In [None]:
np.asarray(np.arange(10))

## ones_like and zeros_like

the *ones_like*  and *zeros_like* method takes an ndarray, and produce the same ndarray with its element change to one (ones_like) or zero (zeros_like)

In [None]:
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# create an array from the data 
arr1 = np.array(data)

# using the asarray method. note that arr1 == arr2
arr2 = np.asarray(data)
print('arr1', arr1)
print("")
print('arr2', arr2)

# change the element of arr1 to one
a = np.ones_like(arr1)
print('arr1 ones_like: ', a)

# change the element of arr2 to zeros
b = np.zeros_like(arr2)
print('arr1 zeros_like: ', b)

# instead of converting to an ndarray before passing to the method,
# we pass the list directly
np.ones_like(data)

## full and full_like method

#### full
Takes the shape and an element that will be used to fill the ndarray of that same shape

#### full_like
takes another array and fill it with the specified value; note the same data type will be used as the passed array

In [None]:
 
from numpy import unicode_


a = np.full((2, 3), [2, 4, 5], )

a

In [None]:
np.full_like(a, 100)

In [None]:
np.full_like(a, [30, 90, 270])

## Type Casting

The *astype* method can be used to change (if possible) the underlying type of an ndarray

In [None]:
# create a 3 x 3 ndarray of type int32 and change the underlying type to unicode
np.full((3, 3), 4, dtype='int32').astype(np.unicode_)

In [None]:
# create an random 3 x 3 ndarray of float32

arr1 = np.random.random((3, 3)).astype('float32')
print(arr1.dtype)
print('arr1: ', arr1)

# cast the result to an int type
arr1.astype('int32') 

---


# Arithmetic with ndarrays

Arithmetic operations are performed on ndarrays of equal size without writing a for loop. This is known as **batch processing** or **vectorization**

The arithmetic operation is applied to each element of the array OR element-wise operation IF two or more arrays are supplied

In [None]:
# array
# arange
# asarray
# ones, ones_like
# zeros, zeros_like
# linspace
# astype
# full, full_like

In [None]:
arr1 = np.asarray([[1, 2, 3], [4, 5, 6]], dtype='float64')
arr1

In [None]:
arr2 = np.asarray([[1, 5, 3], [-5, 4, 1]])
arr2

In [None]:
(1/ arr2) ** 3


# Broadcasting

In vectorization, the arrays must have the same size (shape). In broadcasting however, the arrays can have different size.


Broadcasting Operations

- multiplication of an ndarray with a scalar value


More on this at the bottom

---
---

# Slicing and Indexing an Ndarray



In [None]:
# create an ndarray with element 1 - 10

arr = np.arange(11)
arr

In [None]:
# select the 5th element
print(arr[4])

# select the 2nd, 4th, and 6th element
print(arr[1:7:2])

# select the 4th to 9th element
print(arr[3:10])

In [None]:
# replacing through broadcasting
arr

In [None]:
arr[4:9] = 12
arr

> ### Any modification on slices will be reflected on the source array, because slices are views on the source array

The result of slicing an ndarray is a view on the source array and not a copy of data from the source array

In [None]:
arr

In [None]:
# create a slice of 4th to 9th element
arr_slice = arr[4:10]
print('slice', arr_slice)

# modify all the element in the slice to 101
arr_slice[:] = 101
print('modified slice: ', arr_slice)

# we notice immediately that the source ndarray is modified simultaneously
print('source ndarray: ', arr)

print("")
print("")
# copy a slice which upon modification will not reflect on the source
arr_slice_copied = arr[1:6].copy()
print('slice copied', arr_slice_copied)

# modify the copied slice
arr_slice_copied[:] = 230
print('modified copied slice', arr_slice_copied)

# check the source that it is not modified
print('source: ', arr)

# Slicing and Indexing in Multidimensional Arrays

In >= 2 ndarrays, when indexed, output a lower dimension array

In [None]:
arr = np.asarray([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print('source ndarray: '), 
print(arr)

In [None]:
# create a slice from the 
arr

In [None]:
arr[0:3,2]

## Indexing using a comma seperated value for >= 2 ndarray

The order of indexing is always from *Outside* --> *Inside*

In [None]:
arr[1] = [4, 5, 6]
arr

In [None]:
# select all the element in the rows, firstly, and then select the third column
arr[:, 2]

In [None]:
# working with three dimensional data

arr = np.asarray([ [[1, 2, 3, 4], [4, 5, 6, 40], [10, 10, 10, 10]] , [[2, 2, 2, 2], [7, 8, 9, 160], [10, 11, 12, 524]] ])

print('dimension: ', arr.ndim)
print('shape: ', arr.shape)

print("")
arr

In [None]:
# select all the element in the 
arr[:, 0, :3]

---

# Boolean Indexing



In [None]:
# names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7, 3, 4)
data

## Note indexing a multi-dimensional array from the outside to the the inner  elements

In [None]:
# select only columns for which the first element of its rows is less than 0

# first we create the truthy value
cond = data[:, 0, 0] < 0

# select columns, where the first elements of the innermost rows is less than 0
data[cond, :, 0][:, 0]


In [None]:
# select only inner row elements less than 0, for each element of the 3d-array 

# Note: each element of the (7, 3, 4) array is a 2d array of (3, 4) rows and column

# So, we want to select each element of the (7, 3, 4) array ==> this will  result in seven; 7 different (3, 4) arrays,

# Then we select the first rows of the (3, 4) resulting arrays ==> this will result in seven; 7 different, (1, 4) array.

# then we check that the inner elements of each resulting rows has a value less than 0

cond2 = data[:, 0] < 0
cond2

In [None]:
# this method here is the same as the one below
for i, cond in enumerate(cond2):
  print(data[:, 0, cond][i])
  print()
  
# second method
data[:, 0][cond2]


## !NOTE

When applying a conditional indexing, the dimension of the resulting conditional must match the slice, it is going to be applied to

--- 

In [None]:
arr = np.empty((4, 4))
arr

In [None]:
arr[[0, 1, 2, 3], [0, 1, 2, 3]]

In [None]:
arr = np.arange(45).reshape(3, 3, 5)
arr

# Fancy Indexing

This allows the selection of the rows of an ndarray, in a specific order, by passing a list with the index of the rows to be selected.

> However this indexing does not apply *in the same manner* to the next inner indexing

If mulltiple index are passed in, then it would select a one-dimensional array of elements corresponding to each tuple of indices

In [None]:
# one can select the first and last element of the (3, 3, 5) ndarray ==> two (2, 3, 5) ndarray

# note the tuple of indices are: (0, 1, 4), (2, 0, 3) ==> 
# 	select the first(0) outer row, second(1) inner row, and 5th(4)  element
# 	select the third(2) outer row, first(0) inner row, and 4th(3)  element
# 	select the second(1) outer row, third(2) inner row, and 2nd(1)  element
# 	
arr[[0, 2, 1], [1, 0, 2], [4, 3, 1]]

In [None]:
# with the 2d-array
arr = np.arange(20).reshape(4, 5)
arr

In [None]:
# select the first(0), and third(2) outer row
# then select their second(1) and fifth(4) element respectively.
arr[[0,2, 1], [1]]

In [None]:
# 4d-array

arr = np.arange(80).reshape(2, 2, 4, 5)
arr


In [None]:
# 4 multiple array indices are needed to select an element in the array

# arr[[0, 1], [1, 2], [1, 3], [[3, 4]]]

---

# Transposing Arrays and Swapping Axes

This returns a view of the underlying data without copying the source array

In [None]:
# create a 3, 5 array
arr = np.arange(15).reshape((3, 5))

arr

In [None]:
# transpose it to a (5, 3) array

arr_T = arr.T # or arr.transpose()

arr_T

In [None]:
# we modify the transposed array and confirm that the source array is also modified
# thus indicating that the transposed array is a view

arr_T[2] = 1000

print('modified arr_T')
print(arr_T)

print("source array:")
print(arr)

# Matrix multiplication and the Transpose of an ndarray

> To mutiply two matrix, the num of columns of the first must match the num of rows of the second. Hence to multiply a non-square matrix (m x n; where m!=n) by itself, then we have to multiply it with a transpose of itself

Note that this is otherwise 
> [!NOTE]
> Multiplication of two matrix, not element wise, is done using the [dot] method



In [None]:
arr = np.random.randn(10, 5)
arr

In [None]:
arr_mul = np.dot(arr.T, arr)
arr_mul

### Transpose and Higher Dimensional Array

This is similar to re-ordering the axes of the higher dimensional array

In [None]:
arr = np.arange(16).reshape((2, 2, 4))

arr

In [None]:
# make the second(1) axis first, first(0) axis second, and the third axis unchannged 
arr.transpose((1, 0, 2))

---
# Array Functions

A universal array function (ufunc), is any function that performs element-wise operations on the ndarrays, and returns a scalar or array of scalar values as its result

- Unary Unfunc
	- sqrt, exp

- binary ufuncs
	- add,

- n-aryy ufuncs

In [None]:
arr1d = np.arange(4, 20)
arr1d

In [None]:
# square root of element
np.sqrt(arr1d)

In [None]:
arr2d = np.arange(16).reshape((4, 4))
arr2d

In [None]:
# square of the 2d elements 

np.sqrt(arr2d)

In [None]:
# as exponential of e
print("e^(each array element): ")
np.exp(arr2d)


In [None]:
arr2d

In [None]:
# broadcasting
arr2d + 2

In [None]:
# vectorization
arr2d + arr2d

In [None]:
np.add(arr2d, arr2d)

In [None]:
print(arr1d)
print()
print(arr2d)

In [None]:
arr1d_rshp = arr1d.reshape((4, 4))
arr1d_rshp

#### compare two arrays element-by-element, which is the maximum value

In [None]:
np.maximum(arr2d.T, arr1d_rshp)

In [None]:
# modf array
remainder, whole = np.modf(arr2d, arr1d_rshp.astype('float64'))

In [None]:
remainder

In [None]:
whole

## !NOTE

> ufunc, can accept an optional argument that tells them where to  put the result of the computation, but the optional argument array must;

-  be the same order as the first argument, 
-  have the same type as the result of the computation.

In [None]:
# create two array of the same size
arr1 = np.arange(1, 13)
arr2 = np.arange(30 ,42).astype('float64')
print(arr1.size == arr2.size)

In [None]:
arr1

In [None]:
arr2

In [None]:
# take the sqrt of arr1 and store the result in arr2

# because arr2 is not the same type as the result of taking the sqrt on arr1
# we must change its type, so that it can store the result of the computation (float64)

# the previous statement isn't totally true. I used the astype for the conversion, and it turns out that the method returns a copy and doesn't modify the array in-place
# so i had to add the type from its initialisation for this to work
np.sqrt(arr1, arr2)

In [None]:
# check the elements of arr2 and see that it has been modified
arr2

In [None]:
# array remains the same
arr1


In [None]:
# pass arr1 as the second argument

# notice how we had to change the type of the array in memory
arr1 = arr1.astype('float64')
np.sqrt(arr1, arr1)

In [None]:
arr1

In [None]:
arr = np.arange(20, 26)
arr

In [None]:
#3 return the logarithm to base e, by adding one to all the element before performing the logarithmic computation
np.log1p(arr)

# Array-Oriented Programming

In this scenario, the object we interact with; manipulate , process, and store are the numpy arrays. The advantage of using numpy arrays here rather than Python's native list is due to n batch processing (vectorization or broadcasting) which allows us to avoid writing explicit for loops.

Aside not writing for loops, the performance of vectorization greatly exceed that of for loops.

In [None]:
arr = np.arange(-1, 1, 0.001)
arr

#### Meshgrid Function

This takes two 1d-array and produce a 2d-array vector tuples (or matrix) of all the pairs in the array.

the size of the resulting matrix = len(first_array) * len(second_array)

In [None]:
arr1 = np.arange(1, 10)
arr2 = np.arange(10, 13)

x1, y1 = np.meshgrid(arr1, arr2)

In [None]:
x1

In [None]:
y1

In [None]:
z1 = np.sqrt(x1 **2 + y1 **2)

In [None]:
x2, y2 = np.meshgrid(arr, arr)

In [None]:
z2 = np.sqrt(x2 ** 2 + y2 **2)

In [None]:
plt.imshow(z1, cmap=plt.cm.gray)
plt.colorbar()

In [None]:
plt.imshow(z2, cmap=plt.colormaps['Reds_r'])
plt.colorbar()

## Where: Conditional Logic as Array Expressions

The where method is similar to the ternary expression:
	x if condition else y
```python
np.where(cond, first_array, second_array)

> - Note:
> - the second and third argument need not be arrays, as they can be scalar values
> - the output of the result of the computation will have the same shape as the conditonal variable 
```
That is, choose x if condtion is true, otherwise choose y;

---
My first line of reasoning is that all the argument to the *where* method must have the same dimension??

Turns out this is false; what matters is **the *size* of the three arguments must be the same**

> This is because the where method, intuitively I suppose, flattens out all the array passed as arguments, and then do its comparison (i.e. ternary operations) element wise.

	> turns out that this is also not true; it depends on **broadcasting**

> The order of the conditional elements in this case would matter, and hence one must be careful when setting up the conditional argument

In [None]:
arr1 = np.arange(1.1, 1.9, .1)
arr2 = np.arange(2.1, 2.9, .1)

cond = np.array([True, False, True, False, False, True, True, False])
arr1, arr2

In [None]:
arr1 = arr1.reshape(2, 2, 2)
# arr1.flatten()
arr1

In [None]:
arr2 = arr2.reshape((4, 2))
# arr2.flatten()
arr2

In [None]:
# the equivalent and better approach
cond =cond.reshape(4,2)
result = np.where(cond, 2,arr2)
result

## Mathematical and Statistical Methods

NumPy comes with predefined statistical functions

Some of the functions can take an optional axis argument, that determines which axis the operation should be carried out

In [None]:
arr = np.random.randn(5, 4)
arr

In [None]:
arr.mean(axis=1)

In [None]:
arr.mean(axis=0)

In [None]:
# column 1 
arr[:, 0].mean(), arr[:, 1].mean()

In [None]:
# same as setting axis=1
arr.std(1)

### cumsum and cumprod

Respectively takes the sum or product, cummulatively along the specified axis


In [None]:
arr = np.arange(48).reshape((4 , 4, 3))
arr

In [None]:
arr.cumsum(axis=2)
arr.argmax(2)

### any and all

This tests a given condition;

- any: finds the first element that satisfies the condition; in which case it returns true; if none of the element satisfies the condition, it returns false

- all: checks if any of the elements doesn't satisfies the condition, in which case it returns false; otherwise it returns true

> We can check for this along the specified axis

---

### sort

##### NOTE!!!

- np.sort(array): This returns a copy

- array.sort(): this sorts the array in-place

> we can pass an optional axis parameter to indicate which axis the sorting should occur


In [None]:
cond = np.array([True, True, False, True, False, True, False, False])
cond

In [None]:
x = cond.reshape((2, 2, 2))
x

In [None]:
x.all(axis=2)

---

# File Input and Output with Arrays

This involves loading (and saving) data from (and to) disk either in text or binary format

In [None]:
arr = np.arange(10)
arr

#### using  the save and load for a single ndarray data

The file extension here is ***.npy***

In [None]:
# save it to disk
np.save("array", arr)

In [None]:
x = np.load('array.npy')

In [None]:
x

#### using the **savez** to store multiple arrays on disk

Unlike its **save** counterpart, its extension is ***.npz***

Also, when it is loaded from memory, instead of getting an array object, a dictionary is returned, with [arr_[i]] as the corresponding keys, where i is the index of the arrays in the order they were passed in.

> the index start from 0

In [None]:
arr1 = np.arange(48).reshape((4, 4, 3))
arr2 = np.arange(10)

In [None]:
# save the two arrays

# here the keys are explicitly passed in as (a, c)

# np.savez('multiple_array', a = arr1, c = arr2)

In [None]:
# arr = np.load('multiple_array.npz' )
arr

In [None]:
# arr['a']

In [None]:
# arr['c']

In [None]:
# save the same two arrays without passing in the keys

# np.savez('2_array', arr1, arr2)

In [None]:
# load the newly saved arrays

# np.load('2_array.npz')

In [None]:
# save as a  compressed data

# np.savez_compressed('2_array_comp', j=arr1, k=arr2)

In [None]:
# arr = np.load("2_array_comp.npz")

In [None]:
# arr['j']

---

# Linear Algebra Operations

1. Matrix Multiplication
2. Matrix Decompositions
3. Determinants, e.t.c

In [None]:
x = np.ones(4) #(1, 3)
y = np.array([[1 ,2 ,3, 5], [3, 4, 5, 100], [4, 5, 9, 1], [10 ,11, 12, 0]]) # (2, 3)

In [None]:
np.dot(y, x) == y @ x
# x @ y

In [None]:
X = np.random.randn(5, 5)
X

In [None]:
mat = X.T.dot(X)
mat

In [None]:
mat.dot(np.linalg.inv(mat))

In [None]:
q, r = np.linalg.qr(mat) 

In [None]:
q

In [None]:
r

### Pseudorandom Number Generation



In [None]:
rng = np.random.RandomState(1234)

In [None]:
np.random.permutation([1, 2, 3])

In [None]:
# is the sum method faster than  cumsum??

arr = np.arange(1_000_000_00).reshape((100 ,100, 10, 10, 100))

In [None]:
%timeit arr.sum()

In [None]:
%timeit arr.cumsum()[-1]

## Broadcasting

Vectorization implies that one can perform the same single operations on all the data on two given arrays at the same time (**SIMD: Single Insruction Multiple Data**).

This requires that the arrays have the same size, so that the operation can be mapped element-by-element.


However, it is possible for the arrays to be of different size and stil perform an operation in a vectorized manner. E.g.

let **p** be an array of size (l, m, n). Let the operation be an addition of 2,
i.e.

- **p * 2**

This operation will add 2 to all the element in p - vectorization. This is possible because 2 is sent (broadcasted) to all the elements in the array. This is achieved by transforming **2** into an **(l, m, n)** array (same size as p) containing 2's as its element. 

**This transformation is known as *broadcasting***


> **Broadcasting is not vectorization, but can be used to achieve vectorization**

#### Mechanism of Array Transformation: Broadcasting

In [None]:
a = np.arange((24)).reshape(2, 3, 4)
b = np.arange(2).reshape(1, 2)

In [None]:
a + b

In [None]:
help(np.newaxis)


---|||
# Advanced Numpy Section



---|||
### Internals of the ndarray Object

Internally consists of:

- pointer to a block of data in RAM of **memory-mapped file**

- a data type (dtype): provides information about the data stored in the cells of the memory. In numpy, each dtype has a fixed size, and a block of data (ndarray) can only contain values of the same type.

    - the consequence of this is that, the total size of the memory block needed to store the data is known and fixed

- A tuple: contains the shape of the ndarray

- A tuple of strides: these are integers indicating the number of bytes to step in order to advance to the next element along a dimension


---|||
### Strides

It provide a means to known the memory location of the next data in a contiguous block of memory. That is, the question, how far should I move to get to the next data, along a particular axis, is answered by the value of the stride, along that axis. 

Thus, indicating the number of bytes the pointer must "move" (accurately, it should be the number of bytes to add to the pointer's address) to get to the location of the next data along a particular axis in the memory block.

**The larger the value of the stride in one axis, the more expensive it is to perform a computation along that axis**

###### Examples

Note: we have assumed a C order; If it were, and F ordered array, then the strides would be different.

Let's say we have an array of (7, 3, 2) of float64 ==> each value in the 3d array is 8bytes (64 bits).

- Along the first axis = 0 ; we have **seven** (3, 2) arrays. Each of the arrays has a total **data size of 6** (3 *2), and a **byte size of 48** (8byte each * 6 data values). 

    - This means that moving to **any** one of the **seven arrays**, along axis = 0, we have to move **48** bytes along that axis. More precisely, if we are in a **cell** (say *m*), on say the **first** (of the seven) array, to go to the corresponding cell (say *n*) in the next array (i.e say the second (of the seven) array), **we have to add 48 bytes to the address of the pointer to the of the cell, m.

    - Hence the stride value in the first axis (0) is == 48

- Along the second axis = 1; we have **three** rows (3),  each containing **two** (2) data column. Each of the rows has a total **data size of 2**, and a **byte size of 16** (8bytes for each * 2 data values).

    - This also means, that moving to **any** one of the **three rows**, along the second axis (1), **we have to move 16 bytes** along that axis, to get to the next row, provided that we started from the first cell in that row.

        - Precisely, if we are at the Mth row, and Pth column, then to go to the (M + 1)th row, and Pth column, then we must move 16 bytes along that column. This also apply going to the previous (M-1) rows, as long as where we are moving from is not the starting point of the ndarray data.

    - Hence the stride value in the second axis (1) is == 16

- Along the last axis == 2, **we have 2 columns** of data. Each having a byte size of 8. Hence moving (advancing) to the next data is simply moving the pointer 8bytes in that direction.

    - Hence the stride value in the first axis (2) is == 8

Therefore the stride is (48, 3, 2). Also, it is more expensive to perform computation on the first axis than it is to perform on the second and third. At the same time, it is easiest to perform a computation on the last axis.


#### **as_strided** from the stride_tricks module

This allows for the creation of ndarray, using a sequence, stride, and the shape.

The default order is the "C" order:

say we have a sequence of **length l**, a **stride of (m, n)**, and  a **shape of (r, c)**;

If l > r:

- using m, rows will be filled using l data of the sequence (i.e first column), and the remaining (l-r) data, the remaining (l-r) will be used to fill the next column. If all the shape is not filled, then garbage values will be used to to fill the remaining.

if l < r

- 

---|||
### Reshaping Arrays

Passing a negative one (-1) as one of the tuple in the reshape method, allows reshape to infer the proper shape from the data.

An array can be arranged in two format, the C and Fortran major:

- C order: elements are arranged in the first row, then second row e.t.c until the rows is filled

- F order: elements are arranged in the first column, second column, third column e.t.c


---|||
### Flattening 

This is the opposite operation to reshaping, to one-dimension.

This can be achieved in two ways:

- **ravel** function: performs the operation in-place.

- **flatten** function: performs the operation and returns a copy.

---|||
### Concatenating Arrays

Concatenation combines array of the same dimension into one, along the specified dimension. This is achieved using the **concatenate** function. 

The use of **vstack** ==> along axis=0 (concatenate vertically)
The use of **hstack** ==> along axis=1 (concatenate horizontally)

---|||
### Splitting Arrays

Split an array by passing the location where the split should occur, using the split function.

For example; np.split(array, [2, 5]) ==> that the array would be splitted at the second and fifth.

Care must be taken to determine the number of arrays returned if tuple unpacking is used.


---|||
### Repeat and Tiling

- Repeat

Repeat each element along the specified axis, a certain number of times

- Tile

Repeat copies of an array (instead of the individual element) a given number of times, along the specified axis


---|||

### Take and Put

- Take
Take is a function that mimics sub-indexing, by taking a sequence containing the index of elements that would be returned.


- Put

Put on the other hand, takes a sequence of index and a value, which will replace the element at the passed index position



### Broadcasting Deep-Dive

This describes how arithmetic works between arrays of different shapes. the trailing dimension - from the end, is used to check for broadcasting capabilities.

#### The BroadCasting Rule

let (a, b, c) be the shape of A
let (m, n) be the shape of B

Since A and B have different shape, the only way to perform any vectorization on both of them is to use broadcasting.

- Now, A is 3D and B is 2D, the larger being A, then the trailing dimension is determined by using the lower dimension, in this case B. So, given B is 2D (m, n) then the trailing dimension from the end in A is the last two axis (b,c).

For A and B to be broadcastable,

- either (b, c) == (m, n) ==> (b == m) and (c == n) OR
- (m, n) == (1, 1) ==> (m == 1) and (n == 1)



This logic also applies in higher-dimension

### ***adding a new axis to satisfy broadcasting**

This is achieved by taking a slice of the array, that needed to be broadcasted, and adding a np.newaxis, at the axis that would allow broadcasting possible

In [None]:
arr = np.arange(120).reshape(2, 3, 4, 5)
a = np.arange(20).reshape(4, 5)
b = np.arange(1).reshape(1, 1)
c = np.arange(60).reshape(3, 4, 5)
d = np.arange(1).reshape(1, 1, 1)
e = np.arange(12).reshape(3, 4)

In [507]:
col = np.array([1.28, -0.42, 0.44, 1.6])
col

array([ 1.28, -0.42,  0.44,  1.6 ])

In [512]:
arr = np.arange(12).reshape(4, 3).astype('float64')
arr

array([[ 0.,  1.,  2.],
       [ 3.,  4.,  5.],
       [ 6.,  7.,  8.],
       [ 9., 10., 11.]])

In [499]:
col[:, np.newaxis]

array([[ 1.28],
       [-0.42],
       [ 0.44],
       [ 1.6 ]])

In [513]:
arr[:] = col[:, np.newaxis]

In [514]:
arr

array([[ 1.28,  1.28,  1.28],
       [-0.42, -0.42, -0.42],
       [ 0.44,  0.44,  0.44],
       [ 1.6 ,  1.6 ,  1.6 ]])