#  Introduction to Numpy

Numpy, which stands for numerical Python, is a Python library package to support numerical computations. The basic data structure in numpy is a **multi-dimensional array object called ndarray**. Numpy provides a suite of functions that can **efficiently** manipulate elements of the ndarray.

- To see the reference manual
    - Help $\Rightarrow$ Numpy Reference
    - or https://numpy.org/doc/stable/index.html#
- We will introduce basic building blocks of numpy.
- To use a specific function, always try googling or help().

## Creating ndarray

An ndarray can be created from a list or tuple object.

- numpy.array is just a convenience function to create an ndarray
- numpy.ndarray is a class

*Note*:
- Tensor is "something" which can be represented as multidimensional array.
- Tensorflow is a google product for deep learning.
- For tensorflow coding, you must be very proficient in numpy n-dimensional array.

In [None]:
import numpy

In [None]:
import numpy as np

oneDim = np.array([1.0,2,3,4,5])   # a 1-dimensional array (vector)
print(oneDim)
print("#Dimensions =", oneDim.ndim)
print("Dimension =", oneDim.shape)
print("Size =", oneDim.size)
print("Array type =", oneDim.dtype)

[1. 2. 3. 4. 5.]
#Dimensions = 1
Dimension = (5,)
Size = 5
Array type = float64


In [None]:
twoDim = np.array([[1,2],[3,4],[5,6],[7,8]])  # a two-dimensional array (matrix)
print(twoDim)
print("#Dimensions =", twoDim.ndim)
print("Dimension =", twoDim.shape)
print("Size =", twoDim.size)
print("Array type =", twoDim.dtype)

[[1 2]
 [3 4]
 [5 6]
 [7 8]]
#Dimensions = 2
Dimension = (4, 2)
Size = 8
Array type = int32


In [None]:
arrFromTuple = np.array([(1,'a',3.0),(2,'b',3.5)])  # create ndarray from tuple
print(arrFromTuple)
print("#Dimensions =", arrFromTuple.ndim)
print("Dimension =", arrFromTuple.shape)
print("Size =", arrFromTuple.size)
print("Array type =", arrFromTuple.dtype)

[['1' 'a' '3.0']
 ['2' 'b' '3.5']]
#Dimensions = 2
Dimension = (2, 3)
Size = 6
Array type = <U11


In [None]:
# Guess what is printed
print(np.array([1]).shape)
print(np.array([1,2]).shape)
print(np.array([[1],[2]]).shape)
print(np.array([[[1,2,3],[1,2,3]]]).shape)
print(np.array([[[[]]]]).shape)

(1,)
(2,)
(2, 1)
(1, 2, 3)
(1, 1, 1, 0)


There are several built-in functions in numpy that can be used to create ndarrays

In [None]:
print(np.random.rand(5))      # random numbers from a uniform distribution between [0,1]
print(np.random.randn(5))     # random numbers from a normal distribution
print(np.arange(-10,10,2))    # similar to range, but returns ndarray instead of list
print(np.arange(12).reshape(3,4))  # reshape to a matrix
print(np.linspace(0,1,10))    # split interval [0,1] into 10 equally separated values
print(np.logspace(-3,3,7))    # create ndarray with values from 10^-3 to 10^3
                              # logspace returns numbers spaced evenly on a log scale.

[0.39549071 0.83200035 0.21630632 0.92803293 0.71053543]
[0.89132678 0.0393841  0.36250086 0.04464502 1.31660902]
[-10  -8  -6  -4  -2   0   2   4   6   8]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[0.         0.11111111 0.22222222 0.33333333 0.44444444 0.55555556
 0.66666667 0.77777778 0.88888889 1.        ]
[1.e-03 1.e-02 1.e-01 1.e+00 1.e+01 1.e+02 1.e+03]


In [None]:
print(np.zeros((2,3)))        # a matrix of zeros
print(np.ones((3,2)))         # a matrix of ones
print(np.eye(3))              # a 3 x 3 identity matrix

[[0. 0. 0.]
 [0. 0. 0.]]
[[1. 1.]
 [1. 1.]
 [1. 1.]]
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


## Element-wise Operations

You can apply standard operators such as addition and multiplication on each element of the ndarray.

In [None]:
x = np.array([1,2,3,4,5])

print(x + 1)      # addition
print(x - 1)      # subtraction
print(x * 2)      # multiplication
print(x // 2)     # integer division
print(x ** 2)     # square
print(x % 2)      # modulo
print(1 / x)      # division

[2 3 4 5 6]
[0 1 2 3 4]
[ 2  4  6  8 10]
[0 1 1 2 2]
[ 1  4  9 16 25]
[1 0 1 0 1]
[1.         0.5        0.33333333 0.25       0.2       ]


## Why Numpy?

In [None]:
import time
start = time.time()

# iterative sum
total = 0
# iterating through 1.5 Million numbers
for item in range(0, 1500000):
    total = total + item

print('sum is:' + str(total))
end = time.time()
print(end - start)

sum is:1124999250000
0.1862163543701172


In [None]:
x = np.array([2,4,6,8,10])
y = np.array([1,2,3,4,5])

print(x + y)
print(x - y)
print(x * y)
print(x / y)
print(x // y)
print(x ** y)

[ 3  6  9 12 15]
[1 2 3 4 5]
[ 2  8 18 32 50]
[2. 2. 2. 2. 2.]
[2 2 2 2 2]
[     2     16    216   4096 100000]


In [None]:
import numpy as np

start = time.time()

# vectorized sum - using numpy for vectorization
# np.arange create the sequence of numbers from 0 to 1499999
print(np.sum(np.arange(1500000)))

end = time.time()
print(end - start)

-282181552
0.005002021789550781


In [None]:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0, 50, size=(5000000, 4)), columns=('a','b', 'c', 'd'))
# df.shape

df.head()

Unnamed: 0,a,b,c,d
0,33,31,6,23
1,40,30,35,17
2,42,25,10,45
3,10,38,15,45
4,5,28,23,39


In [None]:
import time
start = time.time()

# Iterating through DataFrame using iterrows
for idx, row in df.iterrows():
    # creating a new column
    df.at[idx,'ratio'] = 100 * (row["d"] / row["c"])

end = time.time()
print(end - start)

  import sys
  import sys


KeyboardInterrupt: 

In [None]:
start = time.time()
df["ratio"] = 100 * (df["d"] / df["c"])
end = time.time()
print(end - start)

0.12962055206298828


## Indexing and Slicing

There are various ways to select certain elements with an ndarray.

In [None]:
x = np.arange(-5,5)
print(x)

y = x[3:5]     # y is a slice, i.e., pointer to a subarray in x
print(y)
y[:] = 1000    # modifying the value of y will change x
print(y)
print(x)

z = x[3:5].copy()   # makes a copy of the subarray
print(z)
z[:] = 500          # modifying the value of z will not affect x
print(z)
print(x)

[-5 -4 -3 -2 -1  0  1  2  3  4]
[-2 -1]
[1000 1000]
[  -5   -4   -3 1000 1000    0    1    2    3    4]
[1000 1000]
[500 500]
[  -5   -4   -3 1000 1000    0    1    2    3    4]


#### Remark: slicing a list makes a copy of the sublist, but slicing numpy array does not.

In [None]:
# Remark slicing a list makes a copy of the sublist
x = list(range(-5,5))
print(x)

y = x[3:5]     # y is a slice, i.e., not a pointer to a list in x
print(y)
y[1] = 1000    # modifying the value of y does not change x
print(y)
print(x)

[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
[-2, -1]
[-2, 1000]
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]


In [None]:
my2dlist = [[1,2,3,4],[5,6,7,8],[9,10,11,12]]   # a 2-dim list
print(my2dlist)
print(my2dlist[2])        # access the third sublist
print(my2dlist[:][2])     # can't access third element of each sublist
# print(my2dlist[:,2])    # this will cause syntax error

my2darr = np.array(my2dlist)
print(my2darr)
print(my2darr[2][:])      # access the third row
print(my2darr[2,:])       # access the third row
print(my2darr[:][2])      # access the third row (similar to 2d list); Don't do this
print(my2darr[:,2])       # access the third column
print(my2darr[:2,2:])     # access the first two rows & last two columns
print(my2darr[::2,2:])    # applying skipper

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
[9, 10, 11, 12]
[9, 10, 11, 12]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[ 9 10 11 12]
[ 9 10 11 12]
[ 9 10 11 12]
[ 3  7 11]
[[3 4]
 [7 8]]
[[ 3  4]
 [11 12]]


### Remark again:  It's **indexing**, not copying

In [None]:
my2darr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])   # a 2-dim list
print(my2darr)

print()
sliced = my2darr[::2,2:]   # applying skipper
print(sliced)
print(type(sliced))

print()
sliced[:,:] = 1000
print(my2darr)

print()
sliced[0,0] = 2000
print(my2darr)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

[[ 3  4]
 [11 12]]
<class 'numpy.ndarray'>

[[   1    2 1000 1000]
 [   5    6    7    8]
 [   9   10 1000 1000]]

[[   1    2 2000 1000]
 [   5    6    7    8]
 [   9   10 1000 1000]]


### ndarray also supports boolean indexing (also called masking).

In [None]:
# slicing vs masking vs integer array indexing
x = np.array([1,2,3])
print(x[1:])                    # slicing
print(x[1:][0])                    # slicing

print()
print(x[[True, False, True]])   # boolean masking

print()
print(x[[2,1]])                 # integer array indexing
print(x[[2,1,1,1,0]])                 # integer array indexing

print()
x[[2,1,1,1,0]] = 0
print(x)

[2 3]
2

[1 3]

[3 2]
[3 2 2 2 1]

[0 0 0]


In [None]:
y = np.arange(35).reshape(5,7)
b = y > 20
print(b)

print()
t = y[b]   # Filtering result is always one-dimensional; it's a copy; not indexing
print(t)

print()
t[:3] = 1000
print(t)
print(y)

[[False False False False False False False]
 [False False False False False False False]
 [False False False False False False False]
 [ True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True]]

[21 22 23 24 25 26 27 28 29 30 31 32 33 34]

[1000 1000 1000   24   25   26   27   28   29   30   31   32   33   34]
[[ 0  1  2  3  4  5  6]
 [ 7  8  9 10 11 12 13]
 [14 15 16 17 18 19 20]
 [21 22 23 24 25 26 27]
 [28 29 30 31 32 33 34]]


**LAB:** set to zero if the value is a even number in 2-d array

In [None]:
import numpy as np

M = np.arange(35).reshape(5,7)
M

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34]])

In [None]:
M[M % 2 == 0] = 0
M

array([[ 0,  1,  0,  3,  0,  5,  0],
       [ 7,  0,  9,  0, 11,  0, 13],
       [ 0, 15,  0, 17,  0, 19,  0],
       [21,  0, 23,  0, 25,  0, 27],
       [ 0, 29,  0, 31,  0, 33,  0]])

**LAB:** Take negative if the value is a even number in 2-d array

In [None]:
M[M % 2 == 0] = -M[M % 2 == 0]
M

array([[  0,   1,  -2,   3,  -4,   5,  -6],
       [  7,  -8,   9, -10,  11, -12,  13],
       [-14,  15, -16,  17, -18,  19, -20],
       [ 21, -22,  23, -24,  25, -26,  27],
       [-28,  29, -30,  31, -32,  33, -34]])

In [None]:
# First Boolean masking

np.where(M % 2 == 0, M, -M)

array([[  0,  -1,   0,  -3,   0,  -5,   0],
       [ -7,   0,  -9,   0, -11,   0, -13],
       [  0, -15,   0, -17,   0, -19,   0],
       [-21,   0, -23,   0, -25,   0, -27],
       [  0, -29,   0, -31,   0, -33,   0]])

More indexing examples: **Integer array indexing**

In [None]:
my2darr = np.arange(1,13,1).reshape(4,3)
print(my2darr)

indices = [2,1,0,3]    # selected row indices
print(my2darr[indices,:])

rowIndex = [0,0,1,2,3]     # row index into my2darr
columnIndex = [0,2,0,1,2]  # column index into my2darr
print(my2darr[rowIndex,columnIndex])  # element-wise

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[ 7  8  9]
 [ 4  5  6]
 [ 1  2  3]
 [10 11 12]]
[ 1  3  4  8 12]


## Numpy Arithmetic and Statistical Functions

There are many built-in mathematical functions available for manipulating elements of nd-array.

In [None]:
y = np.array([-1.4, 0.4, -3.2, 2.5, 3.4])    # generate a random vector
print(y)
print()

print(np.abs(y))          # convert to absolute values
print(np.sqrt(np.abs(y)))    # apply square root to each element
print(np.sign(y))         # get the sign of each element
print(np.exp(y))          # apply exponentiation: e^y
print(np.sort(y))         # sort array; return a sorted copy of an array
print(y)                  # y does not change

[-1.4  0.4 -3.2  2.5  3.4]

[1.4 0.4 3.2 2.5 3.4]
[1.18321596 0.63245553 1.78885438 1.58113883 1.84390889]
[-1.  1. -1.  1.  1.]
[ 0.24659696  1.4918247   0.0407622  12.18249396 29.96410005]
[-3.2 -1.4  0.4  2.5  3.4]
[-1.4  0.4 -3.2  2.5  3.4]


In [None]:
x = np.arange(-2,3)
y = np.random.randn(5)
print(x)
print(y)

print(np.add(x,y))           # element-wise addition       x + y
print(np.subtract(x,y))      # element-wise subtraction    x - y
print(np.multiply(x,y))      # element-wise multiplication x * y
print(np.divide(x,y))        # element-wise division       x / y
print(np.maximum(x,y))       # element-wise maximum        max(x,y)

[-2 -1  0  1  2]
[-0.5915882   0.86652443 -0.76206247 -0.18749133 -0.65986152]
[-2.5915882  -0.13347557 -0.76206247  0.81250867  1.34013848]
[-1.4084118  -1.86652443  0.76206247  1.18749133  2.65986152]
[ 1.1831764  -0.86652443 -0.         -0.18749133 -1.31972304]
[ 3.38073005 -1.15403555 -0.         -5.33357984 -3.03093897]
[-0.5915882   0.86652443  0.          1.          2.        ]


In [None]:
y = np.array([-3.2, -1.4, 0.4, 2.5, 3.4])    # generate a random vector
print(y)

print("Min =", np.min(y))             # min
print("Max =", np.max(y))             # max
print("Average =", np.mean(y))        # mean/average
print("Std deviation =", np.std(y))   # standard deviation
print("Sum =", np.sum(y))             # sum

[-3.2 -1.4  0.4  2.5  3.4]
Min = -3.2
Max = 3.4
Average = 0.34000000000000014
Std deviation = 2.432776191925595
Sum = 1.7000000000000006


### More on filtering

In [None]:
M = np.arange(25).reshape(5,5)
print(M)
print(M[M%2==1])                        # filtering in general
print(np.argwhere(M >= 20))             # indexes satisfying condition
print(np.where(M % 2 == 1, M, 0))       # M, 0 is broadcast

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]
[ 1  3  5  7  9 11 13 15 17 19 21 23]
[[4 0]
 [4 1]
 [4 2]
 [4 3]
 [4 4]]
[[ 0  1  0  3  0]
 [ 5  0  7  0  9]
 [ 0 11  0 13  0]
 [15  0 17  0 19]
 [ 0 21  0 23  0]]


### New axis

- Used to increase the dimension of the existing array by one more dimension
- shape change, for example: n x (newaxis) x m $\Rightarrow$ n x 1 x m

In [None]:
t = np.array([1,2,3])
print(t.shape)
x = t[:, np.newaxis]
print(x.shape)
y = t[np.newaxis, :]
print(y.shape)

(3,)
(3, 1)
(1, 3)


In [None]:
print(t)
print(x)
print(y)

[1 2 3]
[[1]
 [2]
 [3]]
[[1 2 3]]


**x and y are broadcasted** (it is explained below)

In [None]:
x + y

array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

In [None]:
t = np.array([[1,2,3],[4,5,6]])
print(t[:, :, np.newaxis].shape)
print(t[:, :, np.newaxis])
print()

print(t[np.newaxis, :, :].shape)
print(t[np.newaxis, :, :])

print()
print(t[:, np.newaxis, :])

(2, 3, 1)
[[[1]
  [2]
  [3]]

 [[4]
  [5]
  [6]]]

(1, 2, 3)
[[[1 2 3]
  [4 5 6]]]

[[[1 2 3]]

 [[4 5 6]]]


### n-d array axis view
<!--img src="http://mathsfirst.massey.ac.nz/Algebra/CoordSystems/images/greg7_1.gif"-->
<img src="https://www.oreilly.com/library/view/elegant-scipy/9781491922927/assets/elsp_0105.png" width=700>

## Broadcasting

Broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

Examples:
```
A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  5 x 4

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  5 x 4

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 1
Result (3d array):  15 x 3 x 5
```

In [None]:
np.array([[1,2],[3,4]]) + np.array([[10]])

array([[11, 12],
       [13, 14]])

In [None]:
np.array([[1,2],[3,4]]) + np.array([[10,100]])

array([[ 11, 102],
       [ 13, 104]])

In [None]:
A = np.array([[1,2]])
B = np.array([[10],[100]])
print(A.shape, B.shape)
C = A + B
C

(1, 2) (2, 1)


array([[ 11,  12],
       [101, 102]])

In [None]:
X = np.array([[1]]*3) + np.array([[0]*10])
X

array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

In [None]:
a = np.array([[1],[2],[3]])
b = a.T
a + b

array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

### Meshgrid

Make D x N mesh grid for vectorized evaluations

In [None]:
v = np.array([10,20,30])   # N
w = np.array([5,6])        # D
X, Y = np.meshgrid(v, w)
X + Y

array([[15, 25, 35],
       [16, 26, 36]])

### Axis ordering

- By definition, the axis number of the dimension is the index of that dimension within the array's shape. It is also the position used to access that dimension during indexing.

- For example, if a 2D array a has shape (5,6), then you can access a[0,0] up to a[4,5]. Axis 0 is thus the first dimension (the "rows"), and axis 1 is the second dimension (the "columns"). In higher dimensions, where "row" and "column" stop really making sense, try to think of the axes in terms of the shapes and indices involved.

- If you do np.sum(axis=n), for example, then dimension n is collapsed and deleted, with each value in the new matrix equal to the sum of the corresponding collapsed values. For example, if b has shape (5,6,7,8), and you do c = b.sum(axis=2), then axis 2 (dimension with size 7) is collapsed, and the result has shape (5,6,8). Furthermore, c[x,y,z] is equal to the sum of all elements b[x,y,:,z].

In [None]:
X = np.array([[0,0,0], [1,1,1]])
X.shape

# axis 0 is row; axis 1 is column

(2, 3)

In [None]:
X.sum(axis=0)   # dimension 0 is collapsed and deleted; or aggregated over dimension 0

array([1, 1, 1])

In [None]:
X.sum(axis=1)   # dimension 1 is collapsed and deleted; or aggregated over dimension 0

array([0, 3])

In [None]:
X = np.array(range(1,24+1)).reshape(2,3,4)

In [None]:
X.shape

(2, 3, 4)

In [None]:
X.sum(axis=0)

array([[14, 16, 18, 20],
       [22, 24, 26, 28],
       [30, 32, 34, 36]])

In [None]:
X.sum(axis=1)

array([[15, 18, 21, 24],
       [51, 54, 57, 60]])

In [None]:
X.sum(axis=2)

array([[10, 26, 42],
       [58, 74, 90]])

In [None]:
X.sum(axis=(1,2))

array([ 78, 222])

In [None]:
X.sum(axis=(0,1,2))

300

In [None]:
# Compute the distance between 2 3-d points X, Y
X = np.array([0,0,0])
Y = np.array([1,1,1])

np.sqrt(np.sum((X - Y)**2))

1.7320508075688772

In [None]:
import numpy as np
X = np.array(np.arange(2*3, 0, -1).reshape(2,3))
print(X)
print()
print("axis=0\n", np.sort(X, axis=0))
print()
print("axis=-1\n", np.sort(X, axis=-1))
print()
print("default is -1\n", np.sort(X))
print()
print("axis=None\n", np.sort(X, axis=None))

[[6 5 4]
 [3 2 1]]

axis=0
 [[3 2 1]
 [6 5 4]]

axis=-1
 [[4 5 6]
 [1 2 3]]

default is -1
 [[4 5 6]
 [1 2 3]]

axis=None
 [1 2 3 4 5 6]


#### sort vs argsort vs partition vs argpartition

- argmin, argmax, ...

In [None]:
X = np.array([4,10,1,20,45,100,2,1])
print('X =\n', X)
print('sorted =\n', np.sort(X))
print('argsorted =\n', np.argsort(X))
print('partitioned first 3 =\n', np.partition(X, 3))
print('argpartitioned first 3 =\n', np.argpartition(X, 3))
print('partitioned last 3=\n', np.partition(X, -3))
print('argpartitioned last 3=\n', np.argpartition(X, -3))

X =
 [  4  10   1  20  45 100   2   1]
sorted =
 [  1   1   2   4  10  20  45 100]
argsorted =
 [2 7 6 0 1 3 4 5]
partitioned first 3 =
 [  2   1   1   4  45 100  10  20]
argpartitioned first 3 =
 [6 7 2 0 4 5 1 3]
partitioned last 3=
 [  2   1   1   4  10  20  45 100]
argpartitioned last 3=
 [6 7 2 0 1 3 4 5]


#### LAB
sort 2-d array T along axis 0, sort key is the sum of elements along axis 1

In [None]:
T = np.array([[2,2],[-1,10],[0,1]])
I = np.argsort(np.sum(T, axis=1))
T[I,:]

array([[ 0,  1],
       [ 2,  2],
       [-1, 10]])

## Vectorized function
- similar to map function


In [None]:
import math

max(1,2,3,4,5,-100, key=lambda x: math.fabs(x))

-100

In [None]:
from functools import partial

mymax = partial(max, key=lambda x: math.fabs(x))

In [None]:
list(map(max, [10,2,3], [4,5,6]))

[10, 5, 6]

In [None]:
list(map(mymax, [-10,2,3], [4,5,-6]))

[-10, 5, -6]

In [None]:
u = np.array([100,2,3,4])
v = np.array([1,2,3,4])
w = np.array([4,3,2,1])
np.vectorize(max)(u, v, w)

array([100,   3,   3,   4])

In [None]:
dist = np.vectorize(lambda x, y: np.sqrt(x**2 + y**2))
dist(v, w)

array([4.12310563, 3.60555128, 3.60555128, 4.12310563])

In [None]:
# write a vectorized function calculating the Euclidean distances from 0 of 3-d points


## Numpy linear algebra

Numpy provides many functions to support linear algebra operations.

In [None]:
import numpy as np

In [None]:
X = np.random.randn(2,3)    # create a 2 x 3 random matrix
print(X)
print(X.T)             # matrix transpose operation X^T

y = np.random.randn(3) # random vector
print(y)
print(X.dot(y))        # matrix-vector multiplication  X * y
print(np.dot(X,y))
print(X.dot(X.T))      # matrix-matrix multiplication  X * X^T
print(X.T.dot(X))      # matrix-matrix multiplication  X^T * X

[[-0.67521745 -0.25112232 -0.53902013]
 [-0.31444559  0.26792464 -0.91960302]]
[[-0.67521745 -0.31444559]
 [-0.25112232  0.26792464]
 [-0.53902013 -0.91960302]]
[-0.27857094  0.11301957 -0.0460988 ]
[0.1845624  0.16026873]
[0.1845624  0.16026873]
[[0.80952372 0.64072183]
 [0.64072183 1.01632936]]
[[ 0.55479463  0.08531445  0.65312091]
 [ 0.08531445  0.13484603 -0.11102433]
 [ 0.65312091 -0.11102433  1.13621241]]


In [None]:
y.dot(y)

0.09250029216474008

In [None]:
X = np.random.randn(5,3)
print(X)

C = X.T.dot(X)               # C = X^T * X is a square matrix

invC = np.linalg.inv(C)      # inverse of a square matrix
print(invC)
detC = np.linalg.det(C)      # determinant of a square matrix
print(detC)
S, U = np.linalg.eig(C)      # eigenvalue S and eigenvector U of a square matrix
print(S)
print(U)

[[-0.90775374  0.32656282  0.90444691]
 [ 1.40745745 -1.22468534 -0.88857523]
 [-0.82120479  0.43541336 -0.04238659]
 [-1.11534927 -0.17738222 -0.40865646]
 [ 0.4123681   0.898053   -1.278201  ]]
[[0.38888635 0.24298312 0.21976118]
 [0.24298312 0.53504544 0.10475245]
 [0.21976118 0.10475245 0.42019325]]
22.882373750364096
[7.02125181 1.19959342 2.71676736]
[[ 0.79300197 -0.59119484  0.14709362]
 [-0.35955504 -0.64908597 -0.67037868]
 [-0.49180082 -0.47872336  0.72729354]]


In [None]:
L = np.array([[2,0],[0,1]])
S, U = np.linalg.eig(L)
S, U

(array([2., 1.]),
 array([[1., 0.],
        [0., 1.]]))

In [None]:
v = np.array([1,1])

In [None]:
v = L.dot(v)
v

array([1024,    1])

### The Frobenius norm

https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html

In [None]:
X = np.array([1,2])
print(np.linalg.norm(X))          # 2-norm
print(np.linalg.norm(X, ord=1))   # 1-norm
print(np.linalg.norm(X, ord=np.inf))          # inf-norm
print(np.linalg.norm(X, ord=-np.inf))          # -inf-norm

2.23606797749979
3.0
2.0
1.0


In [None]:
import math
x = np.array([1,0])
y = np.array([0,1])
print("cosine =", x.dot(y) / (math.sqrt((x.dot(x))) * math.sqrt((y.dot(y)))))
print("cosine =", x.dot(y) / (np.linalg.norm(x) * np.linalg.norm(y)))

cosine = 0.0
cosine = 0.0


### LAB: distance matrix

**In case of 1-d points**

In [None]:
pts = np.array([1.,2,3,4,5])
u = pts[:, np.newaxis]
v = pts.T[np.newaxis, :]
np.abs(u - v)

array([[0., 1., 2., 3., 4.],
       [1., 0., 1., 2., 3.],
       [2., 1., 0., 1., 2.],
       [3., 2., 1., 0., 1.],
       [4., 3., 2., 1., 0.]])

In [None]:
pts.T

array([1., 2., 3., 4., 5.])

**In case of n-d points**

In [None]:
pts = np.array([[1,0], [1,1], [0,1]])   # 2-d points
print(pts.shape)
u = pts[:, :, np.newaxis]
v = pts.T[np.newaxis, :, :]

(3, 2)


In [None]:
pts = np.array([[0,0], [1,1], [0,0]])   # 2-d points
print(pts.shape)
u = pts[:, :, np.newaxis]
v = pts.T[np.newaxis, :, :]

(3, 2)


In [None]:
np.linalg.norm(pts)

1.4142135623730951

In [None]:
print(v.shape)
print(u.shape)

(1, 2, 3)
(3, 2, 1)


In [None]:
np.sqrt(np.sum((u - v)**2, axis=1))

array([[0.        , 1.        , 1.41421356],
       [1.        , 0.        , 1.        ],
       [1.41421356, 1.        , 0.        ]])

In [None]:
np.linalg.norm(u - v, axis=1)

array([[0.        , 1.        , 1.41421356],
       [1.        , 0.        , 1.        ],
       [1.41421356, 1.        , 0.        ]])

In [None]:
help(np.linalg.norm)

Help on function norm in module numpy.linalg:

norm(x, ord=None, axis=None, keepdims=False)
    Matrix or vector norm.
    
    This function is able to return one of eight different matrix norms,
    or one of an infinite number of vector norms (described below), depending
    on the value of the ``ord`` parameter.
    
    Parameters
    ----------
    x : array_like
        Input array.  If `axis` is None, `x` must be 1-D or 2-D, unless `ord`
        is None. If both `axis` and `ord` are None, the 2-norm of
        ``x.ravel`` will be returned.
    ord : {non-zero int, inf, -inf, 'fro', 'nuc'}, optional
        Order of the norm (see table under ``Notes``). inf means numpy's
        `inf` object. The default is None.
    axis : {None, int, 2-tuple of ints}, optional.
        If `axis` is an integer, it specifies the axis of `x` along which to
        compute the vector norms.  If `axis` is a 2-tuple, it specifies the
        axes that hold 2-D matrices, and the matrix norms of these

- In your real application, use **sklearn.metrics.pairwise** to compute pairwise distances

In [None]:
from sklearn.metrics.pairwise import euclidean_distances, manhattan_distances, cosine_similarity

print(euclidean_distances(pts))
print(manhattan_distances(pts))
print(cosine_similarity(pts))

[[0.         1.         1.41421356]
 [1.         0.         1.        ]
 [1.41421356 1.         0.        ]]
[[0. 1. 2.]
 [1. 0. 1.]
 [2. 1. 0.]]
[[1.         0.70710678 0.        ]
 [0.70710678 1.         0.70710678]
 [0.         0.70710678 1.        ]]


- If you define your own distance

In [None]:
from sklearn.metrics.pairwise import pairwise_distances

inf_dist = lambda x, y : np.max(np.abs(x - y))
print(pairwise_distances(pts, metric=inf_dist))

[[0. 1. 1.]
 [1. 0. 1.]
 [1. 1. 0.]]
