# Numpy
- Numpy is the core library for scientific computing in Python. It provides high-performance multidimensional array objects, and tools for working with these arrays. 

- A numpy **array** is a grid of values, usually of the same type, although technically you can store values of different types (this may complicate array operation). 
    - The number of dimensions is the **rank** of the array
    - The **shape** of an array is a tuple of integers giving the size of the array along each dimension
      * e.g. array([5, 2, 3]) has shape (3,) 
      * e.g. array([[5, 2, 3], [1,2,3]]) has shape (2,3)
    - Numpy arrays can be initialized from nested Python lists, and access elements using square brackets

![](https://raw.githubusercontent.com/devin19940107/My-Python-Notebook/master/supporting%20files/Images/matrix.jpg)

![](https://raw.githubusercontent.com/devin19940107/My-Python-Notebook/master/supporting%20files/Images/scalar-vector-matrix-tensor.jpeg)

![](https://raw.githubusercontent.com/devin19940107/My-Python-Notebook/master/supporting%20files/Images/dimension.png)

In [6]:
import numpy as np # always import numpy 1st

# Creating an array

In [13]:
# 0-D Array
arr = np.array(42)

print(arr)

42


In [14]:
# 1-D Array
data1 = [1,2,3,4,5,6,7,8,9]
arr1 = np.array(data1)
arr1

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [15]:
# 2D Array
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [16]:
#3D array
arr3 = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(arr3)

[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]


# Dimension

In [18]:
print(arr.ndim)
print(arr1.ndim)
print(arr2.ndim)
print(arr3.ndim)

0
1
2
3


# Rank (shape) 

* Number of dimensions of the array is called rank of the array

In [35]:
print('42 rank is--->',arr.shape)
print('[[1,2,3,4,5,6,7,8,9]] rank is------>',arr1.shape)
print('[[1, 2, 3, 4], [5, 6, 7, 8]] rank is ----->',arr2.shape)
print('[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]] rank is ----->',arr3.shape)


42 rank is---> ()
[[1,2,3,4,5,6,7,8,9]] rank is------> (9,)
[[1, 2, 3, 4], [5, 6, 7, 8]] rank is -----> (2, 4)
[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]] rank is -----> (2, 2, 3)


In [135]:
# also we can reshape 
ary = np.array([[1,2,3],
                [3,2,1]])  # shape is 2x3

ary.reshape(3,2) # reshape to 3x2

array([[1, 2],
       [3, 3],
       [2, 1]])

# Type & Dtype

In [56]:
print('type of arr:',type(arr))
print('dtype of arr:',arr.dtype)
print('----------------------------------')
print('type of arr1:',type(arr1))
print('dtype of arr1:',arr1.dtype)
print('----------------------------------')
print('type of arr2:',type(arr2))
print('dtype of arr2:',arr2.dtype)
print('----------------------------------')
print('type of arr3:',type(arr3))
print('dtype of arr3:',arr3.dtype)

type of arr: <class 'numpy.ndarray'>
dtype of arr: int64
----------------------------------
type of arr1: <class 'numpy.ndarray'>
dtype of arr1: int64
----------------------------------
type of arr2: <class 'numpy.ndarray'>
dtype of arr2: int64
----------------------------------
type of arr3: <class 'numpy.ndarray'>
dtype of arr3: int64


In [54]:
arr4 = np.array([[1,2,3,4,5,6,7,8.5]]) # when there is a float, all data type converted to float
print('type of arr4:',type(arr4))
print('dtype of arr4:',arr4.dtype)

type of arr4: <class 'numpy.ndarray'>
dtype of arr4: float64


# Specical Maxtix / Array
#### *zeros*


<font color = green>numpy.zeros(shape, dtype=float, order='C'')  
    
* shape : Shape of the new array
* dtype : Data-type of the returned array. [optional]
* order: C - columon wise; R - row wise. [optional]

In [61]:
np.zeros(5) # np.zeros(shape)

array([0., 0., 0., 0., 0.])

In [62]:
np.zeros((3,2)) 

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

#### *ones*


<font color = green>numpy.ones(shape,dtype=None, order='C'')  
    
* shape : Shape of the new array
* dtype : Data-type of the returned array. [optional]
* order: C - columon wise; R - row wise. [optional]

In [59]:
np.ones(5)# np.ones(shape)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [63]:
np.ones((3,2)) 

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

#### *empty*

<font color = green>numpy.empty(shape, dtype=float, order='C'')  
    
* shape : Shape of the new array
* dtype : Data-type of the returned array. [optional]
* order: C - columon wise; R - row wise. [optional]

In [77]:
np.empty(5)

array([0., 0., 0., 0., 0.])

In [78]:
np.empty((3,3))

array([[ 2.31584178e+077, -2.68678219e+154,  3.95252517e-323],
       [ 0.00000000e+000,  0.00000000e+000,  0.00000000e+000],
       [ 0.00000000e+000,  0.00000000e+000,  0.00000000e+000]])

#### *constant*

<font color = green>numpy.full(shape, fill_value, dtype=None, order='C')  
    
* shape : Shape of the new array
* fill_value : value to be filled
* dtype : Data-type of the returned array. [optional]
* order: C - columon wise; R - row wise. [optional]

In [80]:
np.full(5, 3) # np.full(shape, number)

array([3, 3, 3, 3, 3])

In [74]:
np.full((2,2), 7)

array([[7, 7],
       [7, 7]])

#### *identity matrix*
<font color = green>numpy.eye(N, M=None, k=0, dtype=<class 'float'>, order='C')  
    
* N : Number of rows in the output.
* M : Number of columns in the output.[optional]
* k : Index of the diagonal.[optional]
* dtype : Data-type of the returned array. [optional]
* order: C - columon wise; R - row wise. [optional]

In [89]:
np.eye(2) # passby one number generate a Square matrix

array([[1., 0.],
       [0., 1.]])

In [86]:
np.eye(3,2) # pass by a shape, it generate a matrix by shape

array([[1., 0.],
       [0., 1.],
       [0., 0.]])

In [88]:
np.eye(3,k=1)

array([[0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 0.]])

# Operator

### Numeric Operator

In [111]:
[1,2,3]+[1,2,3] # list

[1, 2, 3, 1, 2, 3]

In [91]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr # array

array([[1., 2., 3.],
       [4., 5., 6.]])

In [93]:
arr + arr

array([[ 4.,  8., 12.],
       [16., 20., 24.]])

In [105]:
arr+100

array([[102., 104., 106.],
       [108., 110., 112.]])

In [94]:
arr * arr

array([[  4.,  16.,  36.],
       [ 64., 100., 144.]])

In [106]:
arr*100

array([[ 200.,  400.,  600.],
       [ 800., 1000., 1200.]])

In [95]:
arr / arr

array([[1., 1., 1.],
       [1., 1., 1.]])

In [107]:
arr/100

array([[0.02, 0.04, 0.06],
       [0.08, 0.1 , 0.12]])

In [96]:
arr // arr

array([[1., 1., 1.],
       [1., 1., 1.]])

In [108]:
arr//100

array([[0., 0., 0.],
       [0., 0., 0.]])

In [97]:
arr % arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [110]:
arr % 3

array([[2., 1., 0.],
       [2., 1., 0.]])

In [104]:
arr ** arr

array([[4.00000000e+00, 2.56000000e+02, 4.66560000e+04],
       [1.67772160e+07, 1.00000000e+10, 8.91610045e+12]])

In [103]:
arr**2

array([[  4.,  16.,  36.],
       [ 64., 100., 144.]])

In [99]:
arr|arr # not support bitwise
arr ^ arr

TypeError: ufunc 'bitwise_or' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

### Array Operator

#### Transpose

In [112]:
arr.T # Trannspose

array([[ 2.,  8.],
       [ 4., 10.],
       [ 6., 12.]])

#### Sum

arr.sum() #inner sum

#### Multiplication

##### np.dot documentation 

    * If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).

    * If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.

    * If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a * b is preferred.

    * If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.

    * If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b:

In [120]:
np.matmul(arr,arr.T)

array([[ 56., 128.],
       [128., 308.]])

In [121]:
np.dot(arr, arr.T)

array([[ 56., 128.],
       [128., 308.]])

In [131]:
np.dot(arr1, arr2)

array([[35, 16, 30],
       [37, 20, 33]])

In [132]:
arr @ arr.T

array([[ 56., 128.],
       [128., 308.]])

#### concatenate

In [6]:
arr = np.random.randint(0,10,size=(3,3))
arr

array([[1, 9, 3],
       [1, 0, 6],
       [9, 6, 8]])

In [8]:
np.concatenate((arr,arr),axis = 0)

array([[1, 9, 3],
       [1, 0, 6],
       [9, 6, 8],
       [1, 9, 3],
       [1, 0, 6],
       [9, 6, 8]])

In [9]:
np.concatenate((arr,arr),axis = 1)

array([[1, 9, 3, 1, 9, 3],
       [1, 0, 6, 1, 0, 6],
       [9, 6, 8, 9, 6, 8]])

# Slicing / Indexing

In [169]:
# enable interactiveShell so that Jupyter will display variables or without the need for a print statement
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

### 1D array

In [184]:
arr = np.arange(20)
arr

arr = np.arange(1,20)
arr

arr = np.arange(1,20,2) # start,stop,step
arr


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19])

In [182]:
arr[1] # indexing
arr[2:8] # slicing
arr[2:8:2] # stepwise slicing ( tart,stop,step)
for idx,val in enumerate(arr): # we can also using a loop
    print(idx,val)

3

array([ 5,  7,  9, 11, 13, 15])

array([ 5,  9, 13])

0 1
1 3
2 5
3 7
4 9
5 11
6 13
7 15
8 17
9 19


### 2D array

In [190]:

a = np.array([[1,2,3,4], 
              [5,6,7,8], 
              [9,10,11,12]])


# get the first row
a[0,:]
a[0,]



# get the first column
a[:,0]
#a[,0] does not work

# also you can use[]
a[[0,2], [0,1]]


# shape of the first row
a[0].shape


# subarray
a[1][2] # 2nd row 3rd element
a[:,1][2] # 2nd column 3rd element


# loop through all rows
for idx,row in enumerate(a):
    print(idx, row)

array([1, 2, 3, 4])

array([1, 2, 3, 4])

array([1, 5, 9])

array([ 1, 10])

(4,)

7

10

0 [1 2 3 4]
1 [5 6 7 8]
2 [ 9 10 11 12]


In [191]:
a = np.array([[1,2,3,4], 
              [5,6,7,8], 
              [9,10,11,12]])
a

# last two rows and last two columns
a[-2:,-2:]


# back wise step
a[:,::-1]

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

array([[ 7,  8],
       [11, 12]])

array([[ 4,  3,  2,  1],
       [ 8,  7,  6,  5],
       [12, 11, 10,  9]])

### 3D array

In [216]:

arr3 = np.array([[[1, 2, 3], 
                  [4, 5, 6]], 
                 [[7, 8, 9], 
                  [5, 6, 7]]])

arr3[0]

arr3[0][0]
arr3[0][0][0]

array([[1, 2, 3],
       [4, 5, 6]])

array([1, 2, 3])

1

In [228]:
arr3
arr3[1]
arr3[1][:,-2:]
arr3[1][:,-2:][0]
arr3[1][:,-2:][0][1]

array([[[1, 2, 3],
        [4, 5, 6]],

       [[7, 8, 9],
        [5, 6, 7]]])

array([[7, 8, 9],
       [5, 6, 7]])

array([[8, 9],
       [6, 7]])

array([8, 9])

9

### Boolean indexing

In [241]:
# 1D array
a = np.arange(10)
a
# ind the elements of "a" that are bigger than 2;
bool_idx = (a > 2)  
bool_idx

# these two is identical
a[bool_idx]
a[a>2]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

array([3, 4, 5, 6, 7, 8, 9])

array([3, 4, 5, 6, 7, 8, 9])

In [242]:
# 2D array
a = np.array([[1,2],
              [3, 4], 
              [5, 6]])
a

# Find the elements of "a" that are bigger than 2;
bool_idx = (a > 2)  
bool_idx

# these two is identical
a[bool_idx]
a[a>2]

array([[1, 2],
       [3, 4],
       [5, 6]])

array([[False, False],
       [ True,  True],
       [ True,  True]])

array([3, 4, 5, 6])

array([3, 4, 5, 6])

In [244]:
# 3D array
a = np.array([[[1,2],
              [3, 4]], 
              [[5, 6],
               [7,8]]])
a


a[a>2]   # always return to a 1D array

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

array([3, 4, 5, 6, 7, 8])

In [246]:
a = np.array([[1,2],
              [3, 4], 
              [5, 6]])
a

# np.where returns the locations tuple (axis0,axis1), “axis 0” represents rows and “axis 1” represents columns.
np.where(a>2) 

a[np.where(a>2)]

array([[1, 2],
       [3, 4],
       [5, 6]])

(array([1, 1, 2, 2]), array([0, 1, 0, 1]))

array([3, 4, 5, 6])

In [253]:
a = np.array([[4,2], 
              [3, 4], 
              [5, 0]])
print("before change:\n", a)

# if a value >3, set it to 3
a[np.where(a>3)]=3
print("after change:\n", a)



# if a value>3, set it to 1; otherwise, 0
a = np.array([[4,2], [3, 4], [5, 0]])
print("Binarized array:\n", np.where(a>3, 1, 0))

before change:
 [[4 2]
 [3 4]
 [5 0]]
after change:
 [[3 2]
 [3 3]
 [3 0]]
Binarized array:
 [[1 0]
 [0 1]
 [1 0]]


In [317]:
names = np.array(['python', 'java', 'excel', 'python', 'excel', 'java', 'java'])
names
data = np.arange(49).reshape(7,7)
data

array(['python', 'java', 'excel', 'python', 'excel', 'java', 'java'],
      dtype='<U6')

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39, 40, 41],
       [42, 43, 44, 45, 46, 47, 48]])

In [318]:
names == 'python' 
# actually it is nothing to do with python, what we need is a boolean array
# array([ True, False, False,  True, False, False, False])


data[names == 'python'] # rowwise (if true select this row)
data[0:4:3]


data[:,names == 'python'] # columnwise(if true select this column)
data[:,0:4:3]


data[names == 'python', 2:]

array([ True, False, False,  True, False, False, False])

array([[ 0,  1,  2,  3,  4,  5,  6],
       [21, 22, 23, 24, 25, 26, 27]])

array([[ 0,  1,  2,  3,  4,  5,  6],
       [21, 22, 23, 24, 25, 26, 27]])

array([[ 0,  3],
       [ 7, 10],
       [14, 17],
       [21, 24],
       [28, 31],
       [35, 38],
       [42, 45]])

array([[ 0,  3],
       [ 7, 10],
       [14, 17],
       [21, 24],
       [28, 31],
       [35, 38],
       [42, 45]])

array([[ 2,  3,  4,  5,  6],
       [23, 24, 25, 26, 27]])

In [319]:
names != 'python'

# not / not equal 
data[~(names == 'python')] # not 
data[names != 'python'] # not equal


# or
OR = (names == 'python') | (names == 'java') # or)
OR

data[OR]
data[:,OR]

array([False,  True,  True, False,  True,  True,  True])

array([[ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [28, 29, 30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39, 40, 41],
       [42, 43, 44, 45, 46, 47, 48]])

array([[ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [28, 29, 30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39, 40, 41],
       [42, 43, 44, 45, 46, 47, 48]])

array([ True,  True, False,  True, False,  True,  True])

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [21, 22, 23, 24, 25, 26, 27],
       [35, 36, 37, 38, 39, 40, 41],
       [42, 43, 44, 45, 46, 47, 48]])

array([[ 0,  1,  3,  5,  6],
       [ 7,  8, 10, 12, 13],
       [14, 15, 17, 19, 20],
       [21, 22, 24, 26, 27],
       [28, 29, 31, 33, 34],
       [35, 36, 38, 40, 41],
       [42, 43, 45, 47, 48]])

In [320]:
data[data < 21] =0
data

array([[ 0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39, 40, 41],
       [42, 43, 44, 45, 46, 47, 48]])

In [321]:
data[names != 'python', names != 'python'] = 999
data

array([[  0,   0,   0,   0,   0,   0,   0],
       [  0, 999,   0,   0,   0,   0,   0],
       [  0,   0, 999,   0,   0,   0,   0],
       [ 21,  22,  23,  24,  25,  26,  27],
       [ 28,  29,  30,  31, 999,  33,  34],
       [ 35,  36,  37,  38,  39, 999,  41],
       [ 42,  43,  44,  45,  46,  47, 999]])

In [379]:
# arry can also be passed by loop

arr = np.arange(64,dtype = int).reshape(8,8)

    
arr

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])

In [382]:
arr[[4, 3, 0, 6]] # rowwise
arr[[-3, -5, -7]] # rowwise
arr[:,[4, 3, 0, 6]] #column wise
arr[[1, 5, 7, 2], [0, 3, 1, 2]] # both column and row

array([[32, 33, 34, 35, 36, 37, 38, 39],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [ 0,  1,  2,  3,  4,  5,  6,  7],
       [48, 49, 50, 51, 52, 53, 54, 55]])

array([[40, 41, 42, 43, 44, 45, 46, 47],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [ 8,  9, 10, 11, 12, 13, 14, 15]])

array([[ 4,  3,  0,  6],
       [12, 11,  8, 14],
       [20, 19, 16, 22],
       [28, 27, 24, 30],
       [36, 35, 32, 38],
       [44, 43, 40, 46],
       [52, 51, 48, 54],
       [60, 59, 56, 62]])

array([ 8, 43, 57, 18])

# Broadcasting

![](https://raw.githubusercontent.com/devin19940107/Python_Basic/master/supporting%20files/Images/numpybroadcasting.png)

1. Assume two arrays A and B. 
   - <font color='blue'>For example, A has size(1, 3), B has size (1,). A has rank 2, and B has rank 1</font>
2. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
    - <font color='blue'>Padding 1 to the left of the shape of B. So B's shape becomes (1,3)</font>
3. The two arrays are said to be **compatible** in a dimension **if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension**. If compatible, continue to Step 4; <font color='red'>otherwise stop and raise an error</font>
    
    - Compare the shapes of A and B in each dimension: 
        * <font color='blue'>Dimension 1: A is 10000  and B is 1 => compatible </font>   
        * <font color='blue'> Dimension 2: A is 4 and B is 4 => compatible</font> 
    - <font color='blue'> So, A and B are compatible in every dimension</font>
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
    - <font color='blue'> After brodcasting, B's shape => (10000,4)</font>
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension
    - <font color='blue'> Suppose B=([1, 0, 1]). After broadcasting, B => [ [1, 0, 1],[1, 0, 1],... ]</font>
6. Apply array math using B after broadcasting

e.g.
- Normalization:
    - subtract each samples by mean vector
    - divide each sample row by feature std vector (1-dimension array) 
    - for loop is lack of efficient

### Row wise broadcasting

In [30]:

import time
A = np.array([0,1,2])

B = np.array([5,5,5])

C = 5


start=time.time()   # get starting time
A+B
print("A+B time used: %.4f ms"%(time.time()-start))


start=time.time()   # get starting time
A+C
print("A+C time used: %.4f ms"%(time.time()-start))




array([5, 6, 7])

A+B time used: 0.0025 ms


array([5, 6, 7])

A+C time used: 0.0025 ms


In [32]:
# product is also avaliable
A*B
A*C

array([ 0,  5, 10])

array([ 0,  5, 10])

### Column wise broadcasting

In [27]:
A = np.ones((3,3)) #shape is 3 by 3
A

B = np.array([[0,1,2],
              [0,1,2],
              [0,1,2]])
B


C = np.arange(3)
C


start=time.time()   # get starting time
A+B
print("A+B time used: %.4f ms"%(time.time()-start))


start=time.time()   # get starting time
A+C
print("A+C time used: %.4f ms"%(time.time()-start))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

array([[0, 1, 2],
       [0, 1, 2],
       [0, 1, 2]])

array([0, 1, 2])

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

A+B time used: 0.0034 ms


array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

A+C time used: 0.0023 ms


### Both rowwise and columnwise broadcasting

In [48]:
A = np.arange(3).reshape(3,1)
A
np.shape(A)

array([[0],
       [1],
       [2]])

(3, 1)

In [47]:
B = np.transpose(A)
B
np.shape(B)

array([[0, 1, 2]])

(1, 3)

In [50]:
C = A+B
C

C = A*B
C

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

array([[0, 0, 0],
       [0, 1, 2],
       [0, 2, 4]])

#  Sparse Matrix
- Why sparse matrix
    * In some matrixes (e.g. document-term matrix), most of the elements are zero. These matrices are called **sparse matrices**, while the ones that have mostly non-zero elements are called **dense matrices**.
    * These matrixes usually are very big. It needs memory to store every number
- Sparse matrix: a matrix that **only stores non-zero elements**. 
- Scipy package provides different types of sparse matrix. Commonly used types:
    * csc_matrix: Compressed Sparse Column format
    * csr_matrix: Compressed Sparse Row format
- Sparse matrixes can be manipulated almost in the same way as a dense matrix. Check https://docs.scipy.org/doc/scipy/reference/sparse.html for functions for sparse matrixes.

![](https://raw.githubusercontent.com/devin19940107/Python_Basic/master/supporting%20files/Images/sparse_dense.gif)

In [64]:
from scipy.sparse import csr_matrix
A = csr_matrix([[1, 0, 0],
                [0, 0, 3], 
                [4, 0, 5]])
print(A)
print('-----------')
print(A[2,1])
print('-----------')
A.shape

  (0, 0)	1
  (1, 2)	3
  (2, 0)	4
  (2, 2)	5
-----------
0
-----------


(3, 3)

In [67]:
A = csr_matrix([[1, 0, 0],
                [0, 0, 3], 
                [4, 0, 5]])
v = np.array([1, 0, -1])


A.dot(v)
A*v

array([ 1, -3, -1], dtype=int64)

array([ 1, -3, -1], dtype=int64)

# Application

### Data processing

In [55]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

<font color = green>numpy.where(condition, x, y)  
    
* Where contition True, yield x, otherwise yield y.


In [59]:
# for loop
start=time.time()   # get starting time
result = []
for x, y, c in zip(xarr, yarr, cond):
    if c == True:
        result.append(x)
    else:
        result.append(y)

result
print("for loop time used: %.4f ms"%(time.time()-start))




# numpy
start=time.time()
result = np.where(cond, xarr, yarr)
result
print("numpy time used: %.4f ms"%(time.time()-start))

[1.1, 2.2, 1.3, 1.4, 2.5]

for loop time used: 0.0024 ms


array([1.1, 2.2, 1.3, 1.4, 2.5])

numpy time used: 0.0024 ms


In [70]:
arr = np.random.randn(4, 4)
arr
np.where(arr > 0, 2, -2)
np.where(arr > 0, 'OUTLIER', arr)

array([[ 0.85564781, -0.79985871, -1.01019584, -0.74392529],
       [-0.71751125, -1.97516522,  1.08398595, -0.4256654 ],
       [ 0.42949003,  1.79076539,  0.58889255, -1.02190624],
       [-0.02777351, -0.88764049,  1.11867144, -0.47678757]])

array([[ 2, -2, -2, -2],
       [-2, -2,  2, -2],
       [ 2,  2,  2, -2],
       [-2, -2,  2, -2]])

array([['OUTLIER', '-0.7998587059229658', '-1.0101958449535045',
        '-0.7439252882396051'],
       ['-0.7175112488238211', '-1.9751652209907944', 'OUTLIER',
        '-0.4256653996961279'],
       ['OUTLIER', 'OUTLIER', 'OUTLIER', '-1.0219062355442414'],
       ['-0.02777350654262473', '-0.8876404938587006', 'OUTLIER',
        '-0.4767875666613019']], dtype='<U32')

### sorting

In [44]:
# Slow sorting
arr = np.random.randn(1000)


def sortn (nd):
    for i in range(nd.size):
        for j in range(i,nd.size):
            if nd[j]>nd[i]:
                nd[i],nd[j] = nd[j],nd[i]
    return nd



# 

start = time.time()
sortn(arr)
print("numpy time used: %.4f ms"%(time.time()-start))

numpy time used: 0.2658 ms


In [45]:
# reduce T(n) and  S(n)
arr = np.random.randn(1000)


def sortn(nd):
    for i in range(nd.size):
        min_index = np.argmin(nd[i:])+i
        nd[i],nd[min_index] = nd[min_index],nd[i]
    return nd




start = time.time()
sortn(arr)
print("numpy time used: %.4f ms"%(time.time()-start))

numpy time used: 0.0042 ms


In [48]:
# np.sort

arr = np.random.randn(1000)
start = time.time()
np.sort(arr)
print("np.sort() time used: %.4f ms"%(time.time()-start))
print(arr) # it wont change the original arr

np.sort() time used: 0.0001 ms
[-3.35597300e-01 -4.05781506e-01  2.68832211e-01 -1.27208281e+00
 -2.45088762e+00  1.49787080e+00  1.19133038e+00  9.32494071e-01
  3.67306030e-01  6.90398016e-01  1.26275768e+00 -1.22543283e+00
 -1.54236117e+00 -2.83564475e-01 -5.34455473e-01 -1.69529712e+00
  8.23191386e-01  6.06211353e-01 -1.01311499e+00 -2.15938689e+00
 -2.09206414e+00  2.35013379e-01 -5.67358743e-01 -5.37833129e-01
  1.66210698e+00 -1.02590482e+00  1.64577550e+00  3.72441218e-01
 -1.22367700e+00 -2.94434635e-03  2.20152241e-01 -4.09081872e-01
 -1.77392513e+00 -4.24814316e-01  6.80458280e-02 -5.65401452e-01
  4.09950950e-01  1.06274862e+00 -8.12913621e-01 -7.23046354e-01
  2.85352931e-01 -5.44018456e-02 -2.45923609e+00  8.87966625e-01
  1.14385906e+00  1.57797385e-01 -2.48292731e+00 -8.86146107e-02
  1.85870170e+00  3.13082414e-01 -3.46648046e-01  3.61419247e-01
  8.62328857e-01  8.88765491e-02  6.19158261e-02 -8.03776649e-01
  3.22212241e-02  1.02325288e-01 -1.80246787e-01  5.0469549

In [49]:
# ndarray.sort()
arr = np.random.randn(1000)


start = time.time()
arr.sort()
print("numpy time used: %.4f ms"%(time.time()-start))


arr # orignal arr changed


numpy time used: 0.0002 ms


array([-3.27515294e+00, -2.90001730e+00, -2.85068949e+00, -2.79338044e+00,
       -2.74380365e+00, -2.72647989e+00, -2.64221537e+00, -2.47752038e+00,
       -2.40113032e+00, -2.32354629e+00, -2.29923642e+00, -2.24576198e+00,
       -2.19938580e+00, -2.17231046e+00, -2.14476048e+00, -2.12918557e+00,
       -2.08169269e+00, -2.06245955e+00, -2.03763474e+00, -2.03347448e+00,
       -2.02831463e+00, -2.01311040e+00, -1.99378048e+00, -1.98103067e+00,
       -1.93667774e+00, -1.90911463e+00, -1.89210686e+00, -1.88162184e+00,
       -1.85891114e+00, -1.85481758e+00, -1.83093842e+00, -1.78441871e+00,
       -1.73975816e+00, -1.73662597e+00, -1.73484759e+00, -1.70302398e+00,
       -1.70166868e+00, -1.69188340e+00, -1.68361687e+00, -1.66503873e+00,
       -1.66302060e+00, -1.64907216e+00, -1.64699467e+00, -1.62341627e+00,
       -1.61613368e+00, -1.61411202e+00, -1.59763237e+00, -1.59441921e+00,
       -1.59307231e+00, -1.59279989e+00, -1.59027331e+00, -1.58554563e+00,
       -1.57833415e+00, -

### unique

In [77]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

### Null

In [54]:
values = np.array([np.nan,None,None,np.nan]) # mix none type data and np.nan
values
None == np.nan # they are different

False

### math

In [98]:
arr = np.arange(10).reshape(2,5)
arr
np.sqrt(arr)
np.exp(arr)
np.mean(arr) # arr.mean() also can use build-infunction, or math package
arr.mean(axis=1)
np.sum(arr) # arr.sum()
arr.cumsum(1) # Cumulative sum from 0
np.argmax(np.random.rand(10))

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

array([[0.        , 1.        , 1.41421356, 1.73205081, 2.        ],
       [2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ]])

array([[1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
        5.45981500e+01],
       [1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03,
        8.10308393e+03]])

4.5

array([2., 7.])

45

array([[ 0,  1,  3,  6, 10],
       [ 5, 11, 18, 26, 35]])

9

### random generator

In [89]:
np.random.randn(8)
np.random.uniform(0,1,100) # low, high ,size

array([-1.24592081, -1.14838323, -0.34101015, -1.31555185, -1.99078395,
        0.01089737,  0.58850068, -2.64025053])

array([7.01012383e-01, 7.70919627e-01, 1.84452677e-01, 2.11112089e-01,
       8.96553575e-01, 2.81167515e-01, 1.83889362e-01, 9.39859061e-01,
       6.56034379e-01, 7.07878676e-01, 3.36663752e-01, 8.75451993e-01,
       2.60505984e-01, 9.60481589e-01, 4.07986477e-01, 5.85377708e-01,
       1.75011917e-01, 3.71385828e-01, 8.21248735e-01, 2.07461010e-01,
       1.08928815e-01, 8.67280308e-01, 6.30097555e-01, 4.30170450e-01,
       5.59282646e-01, 2.59286115e-01, 2.44111419e-01, 6.28525722e-01,
       3.38503111e-01, 4.11258784e-01, 9.57953809e-01, 8.14345551e-01,
       8.49534567e-01, 6.74070106e-01, 3.53817566e-01, 7.57475522e-01,
       4.10147135e-01, 2.21146657e-02, 8.51900331e-01, 9.97095619e-01,
       5.23066365e-01, 7.82072128e-01, 1.35703375e-01, 4.07725137e-01,
       9.16621790e-01, 1.48011236e-01, 8.57530466e-01, 4.82218179e-01,
       3.06408140e-01, 9.60214736e-01, 4.94124147e-01, 6.88466110e-01,
       7.65750455e-01, 5.53201991e-01, 9.09107923e-01, 5.35225508e-01,
      

### linear algebra

In [99]:
x = np.arange(1,13).reshape(3,4,order='f')
print(x,np.shape(x))

[[ 1  4  7 10]
 [ 2  5  8 11]
 [ 3  6  9 12]] (3, 4)


In [100]:
theta = [[11],
         [111],
         [1111],
         [11111]]
print(theta,'.....shape is:',np.shape(theta))

[[11], [111], [1111], [11111]] .....shape is: (4, 1)


#### Broadcast

![](https://raw.githubusercontent.com/devin19940107/My-Python-Notebook/master/supporting%20files/Images/multiply.png)

In [104]:
elementwise = theta*np.transpose(x)

elementwise


h = sum(elementwise)
print(h,'.....shape is:',np.shape(h))

h[0]

array([[    11,     22,     33],
       [   444,    555,    666],
       [  7777,   8888,   9999],
       [111110, 122221, 133332]])

[119342 131686 144030] .....shape is: (3,)


119342

In [107]:
theta2 = [[11,22],[111,222],[1111,2222],[11111,22222]]
print(theta2,'.....shape is:',np.shape(theta2))


elementwise = theta2*np.transpose(x)

elementwise

[[11, 22], [111, 222], [1111, 2222], [11111, 22222]] .....shape is: (4, 2)


ValueError: operands could not be broadcast together with shapes (4,2) (4,3) 

#### Dot multiplication

![](https://raw.githubusercontent.com/devin19940107/My-Python-Notebook/master/supporting%20files/Images/dot%20multiply.png)


In [109]:
new_h = np.dot(x,theta)
print(new_h,'.....shape is:',np.shape(new_h))

new_h[0]

[[119342]
 [131686]
 [144030]] .....shape is: (3, 1)


array([119342])

#### Inverse matrix

![](https://raw.githubusercontent.com/devin19940107/Python_Basic/master/supporting%20files/Images/inverse-matrix.png)

In [115]:
from numpy.linalg import inv, qr
X = np.random.randn(5, 5)
X
inv(X)

X.dot(inv(X))

array([[ 0.09930957,  0.92230259,  0.29155861, -1.08421838,  0.9159068 ],
       [-1.29092659, -0.49192642, -0.43489701, -0.09778525,  1.07121039],
       [-0.13443588, -0.59111017, -0.35507519, -0.48991899, -0.09717638],
       [ 0.54965085, -0.45946465,  1.54141626, -1.27617439, -1.01256013],
       [-0.09565208, -0.76534384, -0.07218208,  1.71808316, -2.18743042]])

array([[-1.25741043, -0.81933074, -0.07379034, -0.05247913, -0.90016002],
       [ 1.50590684, -0.05108177, -0.0370584 , -0.27343079,  0.73374554],
       [-0.54682972,  0.3408631 , -1.22557549,  0.55368137, -0.26389303],
       [-0.85271525,  0.02681703, -0.95101377, -0.06297627, -0.27251046],
       [-1.1236138 ,  0.06351539, -0.69032392,  0.03022906, -0.87985095]])

array([[ 1.00000000e+00,  1.52457721e-17, -1.44650070e-16,
         3.70171992e-17, -2.14714476e-16],
       [ 6.73428352e-17,  1.00000000e+00,  2.16272108e-17,
        -9.51112733e-18,  1.60893767e-16],
       [ 9.46372928e-17,  2.38641621e-17,  1.00000000e+00,
        -1.86710510e-17,  1.09442650e-16],
       [-1.30356194e-16, -7.82139114e-18,  1.31159288e-16,
         1.00000000e+00, -6.30892508e-17],
       [-9.89392718e-17, -1.99101986e-17,  9.71846467e-17,
        -4.11674137e-17,  1.00000000e+00]])

In [10]:
print('code lines',382+115+54+9)

code lines 560
