## Numpy

NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. In some ways, NumPy arrays are like Python’s built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.

!conda/pip install numpy

import numpy as np

Numpy is the most basic and a powerful package for working with data in python. If you are going to work on data analysis or machine learning projects, then having a solid understanding of numpy is nearly mandatory.

Because other packages for data analysis (like pandas) is built on top of numpy and the scikit-learn package which is used to build machine learning applications works heavily with numpy as well.


So what does numpy provide? At the core, numpy provides the excellent ndarray objects, short for n-dimensional arrays.

In a ‘ndarray’ object, aka ‘array’, you can store multiple items of the same data type. It is the facilities around the array object that makes numpy so convenient for performing math and data manipulations.


### ndarray object 
At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of **homogeneous** data types, with many operations being performed in compiled code for performance. 

### Create ndarray from scratch
* np.zeros(shape), np.zeros_like(array)
* np.ones(shape), np.ones_like(array)
* np.full(shape,value)
* np.arange(begin,end,sep)
* np.linespace(begin,end,length)
* np.random  (generate random samples)
* np.eye
* np.empty

In [None]:
a = np.zeros(5)      #Create an array of zeros
print(a)
a = np.ones(shape= (2,2))      #Create an array of ones
print(a)

In [None]:
b = np.zeros_like(a)
print(b)

In [None]:
b = np.ones((2,3,4),dtype=np.int16)      #Create an array of ones
print(b)

In [None]:
c = np.arange(10,30,5)  #Create an array of evenly spaced values (step value)
print(c)

c.shape

In [None]:
c = np.arange(1,3,0.2)  #Create an array of evenly spaced values (step value)
print(c)
d = np.linspace(1,3,11) #Create an array of evenly spaced values (number of samples)
print(d)

In [None]:
e = np.full((2,3),9)      #Create a constant array
print(e)

In [None]:
f = np.eye(5,dtype=int)             #Create a 5X5 identity matrix
print(f)
0.5*np.eye(5,dtype=int)+(1-0.5)*np.ones((5,5))

In [None]:
### Create an array with random uniform(0,1)
g = np.random.random((2,2))    
print(g)
###Create an array with random uniform(10,20)
g = np.random.uniform(low = 10,high = 20,size = (2,3))
print(g)

In [None]:
### Create an array with random normal N(mu,sigma^2)
g = np.random.normal(loc = 1, scale = 1,size = (2,2))      
print(g)

In [None]:
### Create an array with random integers from 1-9
np.random.randint(low = 1,high = 10,size = (1,5))

In [None]:
### Random Sampling
np.random.choice(a = (1,2,3,4,5,6), size = (3,7), replace =True, p = [1/2,0,0,1/6,1/6,1/6])

# if a is an integer, it means np.arange(a)

In [None]:
a = np.array([1,2,3])
b = np.array([(1.5,2,3), (4,5,6)], dtype = float)
c = np.array([[(1.5,2,3), (4,5,6)], [(3,2,1), (4,5,6)]], dtype = float)


Difference between ndarray and list with numerical items
- list can handle multiple datatype while ndarray have all items to be of the same data type
- The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not. That means, if you apply a function it is performed on every item in the array, rather than on the whole array object.

list to array: np.array(list)

array to list: array.tolist()

How to represent missing values and infinite?

using np.nan and np.inf

you can use isnan and isinf to check whether there are nan or inf

In [None]:
x = np.array([[1,np.nan],[np.inf,2]])

In [None]:
np.isnan(x)

In [None]:
np.isinf(x)

## Attribute of ndarray
array.ndim, array.shape, array.size, array.dtype,array.itemsize,array.nbytes 

In [None]:
np.random.seed(0)
a = np.random.random((10,5,5))
print("a ndim: ", a.ndim)
print("a shape:", a.shape)
print("a size: ", a.size)

In [None]:
print("a dtype:", a.dtype)
print("a itemsize:", a.itemsize, "bytes")
print("a nbytes:", a.nbytes, "bytes")

#### Datatype of ndarray
common types: int8,int16,float16,float64,complex16,<U**,object

In [None]:
### you can specify the datatype when you create a array:
a = np.array([[1,2],[3,4]],dtype ='float')
a

In [None]:
a = np.array([[1,2],['a','b']])
a

**note**: If you are uncertain about what datatype your array will hold or if you want to hold characters and numbers in the same array, you can set the dtype as 'object'.


## Method of ndarray
slides, reshape

access and modify by slides

In [None]:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [None]:
a_list = [[1,2,3],[4,5,6],[7,8,9]]
a_list[1,1]

In [None]:
### access item with index
print(a[0])
print(a[0][1])
print(a[1,1])

In [None]:
### access subarray using slides a[begin_index:end_index:sep]
print(a[0::2])
print(a[1:,1:])

The dots (...) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an array with 5 axes, then

- x[1, 2, ...] is equivalent to x[1, 2, :, :, :],
- x[..., 3] to x[:, :, :, :, 3] and
- x[4, ..., 5, :] to x[4, :, :, 5, :].

In [None]:
a = np.arange(1000).reshape(10,10,10)
a[1,...]

In [None]:
### use filter: output is an one-dimensional array
a
a[a>3]

*Fancy Indexing*

we can pass arrays of indices in place of single scalars.

In [None]:
a = np.arange(1,10)
index = np.array([[2,1],[0,1]])
a[index]

In [None]:
a = np.arange(1,10).reshape(3,3)
a[index]

In [None]:
print(a)
row = [2,1]
col = [0,1]
a[row,col]

In [None]:
## change values
a[:,1] = np.array([1,2,3])
### this is called broadcasting and we will introduce it later

In [None]:
a[:,2] = ['a','b','c']


In [None]:
dt = np.dtype(np.int32)
print(dt)
b = a.astype('<U32')

In [None]:
b[:,1]=['a','b','c']
b

### reshape()

a.reshape(-1,3)

In [None]:
a.flatten()

In [None]:
a.ravel()

In [None]:
b = a.flatten()
b[np.newaxis,:]
b.reshape(1,-1)

### Array Concatenation and Splitting

Concatenation: np.concatenate

In [None]:
a = np.array([[1,2],[3,4]])
b = np.array([[5,7],[9,11]])
c = np.concatenate([a,b],1)
print(c)

In [None]:
np.hstack([a,b])

In [None]:
np.vstack([a,b])

In [None]:
a = np.array([
[[1,2],[3,4]],
[[5,6],[7,8]],
[[9,10],[11,12]],
])
b = np.array([
[[1,1],[1,1]],
[[1,1],[1,1]],
[[1,1],[1,1]],
])

In [None]:
np.hstack([a,b])


In [None]:
import numpy as np

#### The opposite of concatenation is splitting, which is implemented by the functions
np.split, np.hsplit, and np.vsplit.

In [None]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

In [None]:
a = np.floor(10*np.random.random((3, 10)))
print(a)

In [None]:
np.vsplit(a,[1,3])

In [None]:
upper, lower = np.vsplit(a, [2])
print(upper)
print(lower)

In [None]:
left, right = np.hsplit(a, [7])
print(left)
print(right)

### Computation on NumPy Arrays: Universal Functions

In python, when you perform an *UFuncs* on the array, it will be applied to each element. This *vectorized* approach is designed to push the
loop into the compiled layer that underlies NumPy, leading to much faster execution.

In [None]:
def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

In [None]:
import time

In [None]:
time_start = time.time()
for iter in range(5):
    big_array = np.random.randint(1, 100, size=1000000)
    compute_reciprocals(big_array)
time_end = time.time()
print(f'average time is {(time_end-time_start)/5:.2f}s')

In [None]:
time_start = time.time()
for iter in range(5):
    big_array = np.random.randint(1, 100, size=1000000)
    1.0/big_array
time_end = time.time()
print(f'average time is {(time_end-time_start)/5:.2f}s')

### Basic NumPy’s UFuncs
Ufuncs exist in two flavors: unary ufuncs, which operate on a single input, and binary
ufuncs, which operate on two inputs.
- Array arithmetic
- Absolute value (np.absolute, np.abs, just same as build in function abs)
- Trigonometric functions
- Exponents and logarithms

#### Array arithmetic


In [None]:
x = np.arange(4)
print("x =", x)
print("x + 3 =", x + 3)
print("x - 3 =", x - 3)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2) 
print("-x = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2 = ", x % 2)

In [None]:
print("x =", x)
print("x + 3 =", np.add(x, 3))
print("x - 3 =", np.subtract(x, 3))
print("x * 2 =", np.multiply(x, 2))
print("x / 2 =", np.divide(x, 2))
print("x // 2 =", np.floor_divide(x, 2))
print("x ** 2 = ", np.power(x, 2))
print("x % 2 = ", np.mod(x, 2))

#### absolute values
**note** for complex values, it will calculate the norm: np.abs(a+bi)=np.sqrt(a**2 + b**2)

In [None]:
x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x)

#### Trigonometric functions
- sin, cos, tan
- arcsin, arccos, arctan

In [None]:
theta = np.linspace(0, np.pi, 3)

In [None]:
print("theta = ", theta)
print(f"sin(theta) = ", np.sin(theta))
print(f"cos(theta) = ", np.cos(theta))
print(f"tan(theta) = ", np.tan(theta))

In [None]:
x = [-1, 0, 1]
print("x = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))


#### Exponents and logarithms
- exp, exp2, power
- ln, log2, log10 

In [None]:
x = np.linspace(1,5,5)
print("x =", x)
print("e^x =", np.exp(x))
print("2^x =", np.exp2(x))
print("3^x =", np.power(3, x))

In [None]:
print("x =", x)
print("ln(x) =", np.log(x))
print("log2(x) =", np.log2(x))
print("log10(x) =", np.log10(x))

### Specialized ufuncs
- hyperbolic trig functions: sinh, cosh, tanh, coth, sech, csch
- bitwise arithmetic: bitwise_and (&), bitwise_or (|), bitwise_xor (^), invert (~), left_shift (<<), right_shift (>>)
- comparison operators: ==, >=, <=, !=, >, <
- conversions from radians to degrees
- rounding

In [None]:
### hyperbolic trig functions
x = np.linspace(1,5,5)
print("sinh(x) =", np.sinh(x))
print("cosh(x) =", np.cosh(x))
print("tanh(x) =", np.tanh(x))



In [None]:
### bitwise arithmetic
x = np.arange(1,6)

In [None]:
~x 

In [None]:
### comparison operators
x >1

In [None]:
x = np.array([30,60,90])
np.deg2rad(x)/np.pi

In [None]:
x = np.random.random((3,2))
print("x = ", x)
print("round(x) =" , np.round(x))

### Another excellent source for more specialized and obscure ufuncs is the submodule scipy.special.
- gamma
- beta
- erf (integration of gaussian)

In [None]:
from scipy import special

### Advanced UFuncs
- Specifying output
- Aggregates: reduce and accumulate


In [None]:
x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y)
print(y)

In [None]:
x = np.arange(1, 6)
np.multiply.reduce(x)

In [None]:
np.multiply.accumulate(x)

###  Summary statistics: Min, Max, and Everything in Between
sum, mean, median, max, min, quantile

**note**: you can specify the dimension of these operators

In [None]:
x = np.arange(12).reshape(3,4)
x

In [None]:
print(x.sum())
print(np.sum(x))


In [None]:
print(x.sum(1))
print(np.sum(x,1))

In [None]:
print(x.argmax(1))
print(np.argmax(x,1))

### Sorting, searching, and counting: 
sort(), argsort(), argmin(), argmax(), where()

In [None]:
x=np.array([3,5,6,2,3,1,1])
np.take(x,[1,3,4])

In [None]:
print(np.sort(x))
print(np.argsort(x))

In [None]:
print(np.min(x))
print(np.argmin(x))

In [None]:
print(np.max(x))
print(np.argmax(x))

In [None]:
x==np.min(x)

In [None]:
print(np.where(x==np.min(x)))

Iterating over multidimensional arrays is done with respect to the first axis.

In [None]:
x = np.array([[1,2,3],[4,5,6]])


In [None]:
i = 0
for item in x:
    print(item)
    i += 1

## broadcast rules

- Rule 1: If two arrays have different numbers of dimensions, the shape of the smaller-dimensional array will be padded with 1s on the leftmost side.
- Rule 2: If the shapes of two arrays do not match in any dimension, the shape of the array will be expanded along the dimension with size 1 to match the shape of the other array.
- Rule 3: If the shapes of two arrays do not match in any dimension and none of the dimensions are equal to 1, an exception will be raised.


In [None]:
a = np.array([[1,2,3],[4,5,6]])
a

### Copies and Views
When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This will cause confusion.

In [None]:
a = np.array([[ 0,  1,  2,  3],
              [ 4,  5,  6,  7],
              [ 8,  9, 10, 11]])
b = a  #no new object is created
print(id(a),id(b))

In [None]:
b[1,1] = 1000
print(a)

Different array objects can share the same data. The "view" method creates a new array object that looks at the same data.

In [None]:
x = a.view()
print(x.base is a)
print(x)
print(id(a),id(b),id(x))

In [None]:
x[2,2] = 199999 # a's data changes
print(a)

In [None]:
x = x.reshape((2, 6))  # a's shape doesn't change
print(x)
print(a)

or The "copy" method makes a complete copy of the array and its data.

In [None]:
d = a.copy()  # a new array object with new data is created
d=np.array(a)

In [None]:
d is a

In [None]:
d.base is a  # d doesn't share anything with a

In [None]:
d[0, 0] = 99
print(d)
print(a)