#A. Basic of Numpy: creating and manipulating numerical data

## A.1 Declare Numpy Array & its properties

### Import numpy library as abbreviation `np`

In [1]:
import numpy as np

### Manual create arrays

#### 1D array

In [2]:
a = np.array([0, 1, 2, 3])
print('a =\n', a)
print('a.ndim =', a.ndim)
print('a.shape =', a.shape)
print('len(a) =', len(a))


a =
 [0 1 2 3]
a.ndim = 1
a.shape = (4,)
len(a) = 4


#### 2D array

In [3]:
b = np.array([[0, 1, 2], [3, 4, 5]])    # 2 x 3 array
print('b =\n', b)
print('b.ndim =', b.ndim)
print('b.shape =', b.shape)
print('len(b) =', len(b))

b =
 [[0 1 2]
 [3 4 5]]
b.ndim = 2
b.shape = (2, 3)
len(b) = 2


#### 3D array

In [4]:
c = np.array([[[1], [2]], [[3], [4]]])
print('c =\n', c)
print('c.ndim =', c.ndim)
print('c.shape =', c.shape)
print('len(c) =', len(c))

c =
 [[[1]
  [2]]

 [[3]
  [4]]]
c.ndim = 3
c.shape = (2, 2, 1)
len(c) = 2


### Functions for creating arrays

#### Evenly spaced: `arange` with `start, end (exclusive), step`

In [5]:
a = np.arange(10) # 0 .. n-1  (!)
print(a)

[0 1 2 3 4 5 6 7 8 9]


In [6]:
b = np.arange(1, 9, 2) # start, end (exclusive), step
print(b)

[1 3 5 7]


#### By number of points: `linspace` with `start, end, num-points`

In [7]:
c = np.linspace(0, 1, 6)   # start, end, num-points
print(c)

[0.  0.2 0.4 0.6 0.8 1. ]


In [8]:
d = np.linspace(0, 1, 5, endpoint=False)
print(d)

[0.  0.2 0.4 0.6 0.8]


#### Special arrays

* `ones`: all values are 1
* `zeros`: all values are 0
* `eye`: all diagonal values are 1 (must be square matrix)
* `diag`: specific value on diagonal axis (must be square matrix)

In [9]:
a = np.ones((3, 3)) # reminder: (3, 3) is a tuple
print(a)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [10]:
b = np.zeros((2, 2))
print(b)

[[0. 0.]
 [0. 0.]]


In [11]:
c = np.eye(3)
print(c)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [12]:
d = np.diag(np.array([1, 2, 3, 4]))
print(d)

[[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]]


#### Random arrays

We can use a bunch of functions to create arrays of random values as follows:

![alt text](http://drive.google.com/uc?export=view&id=1U4z3B2S-d_s-SSXk0uPtyPZz2byyD0VI)

* `randint` with `low, high (exclusive), size`: Random int values
* `rand` with `number-of-elements`: uniform distribution
* `randn` with `number-of-elements`: normal distribution

In [13]:
np.random.seed(1234)    # Setting the random seed

In [14]:
a = np.random.randint(1,10,size=[3,5])
print(a)

[[4 7 6 5 9]
 [2 8 7 9 1]
 [6 1 7 3 1]]


In [15]:
b = np.random.rand(4)   # uniform in [0, 1]
print(b)

[0.44266265 0.02214391 0.29072855 0.24639444]


In [16]:
c = np.random.randn(4)  # Gaussian
print(c)

[ 0.5153869   0.31552321 -0.9821457  -2.20691316]


### Data types

There are following data types `np.dtype` for numpy arrays. Find more at [here](https://numpy.org/devdocs/user/basics.types.html):
* `np.int32`
* `np.int64`
* `np.float32`
* `np.float64`
* `np.complex128`
* `np.bool`
* `np.uint32`
* `np.uint64`

### Indexing and slicing

#### Indexing

In [17]:
a = np.arange(10)
print('a=\n',a)
print('a[0]=%d, a[2]=%d, a[-1]=%d'%(a[0], a[2], a[-1]))

a=
 [0 1 2 3 4 5 6 7 8 9]
a[0]=0, a[2]=2, a[-1]=9


In [18]:
# Reverse a np array
a[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [19]:
a = np.diag(np.arange(3))
print('a=\n',a)
print('a[1, 1] =',a[1, 1])
print('a[1] =',a[1])

a=
 [[0 0 0]
 [0 1 0]
 [0 0 2]]
a[1, 1] = 1
a[1] = [0 1 0]


#### Slicing

In [20]:
a = np.arange(10)
print('a =\n',a)

print('a[2:9:3] =',a[2:9:3])        # [start:end:step]
print('a[:4] =',a[:4])                        # Note that the last index is not included!

# All three slice components are not required: 
# by default, start is 0, end is the last and step is 1:
print('a[1:3] =', a[1:3])
print('a[::2] =', a[::2])
print('a[3:] =', a[3:])

a =
 [0 1 2 3 4 5 6 7 8 9]
a[2:9:3] = [2 5 8]
a[:4] = [0 1 2 3]
a[1:3] = [1 2]
a[::2] = [0 2 4 6 8]
a[3:] = [3 4 5 6 7 8 9]


Sample of slicing and indexing on numpy-array

![](https://scipy-lectures.org/_images/numpy_indexing.png)

[](./images/numpy_indexing.png)

#### Work with `np.newaxis`

`np.newaxis` add a new axis with a dimension = 1

In [21]:
a = np.arange(6)                            # Row vector, shape [6] ~ [1,6]
print('a =\n',a)
print('a.shape =',a.shape)

b = np.arange(0, 51, 10)[:,np.newaxis]      # column vector, shape [6,1]
print('b =\n',b)
print('b.shape =',b.shape)

print('a+b =\n',a+b)
print('(a+b).shape =',(a+b).shape)


a =
 [0 1 2 3 4 5]
a.shape = (6,)
b =
 [[ 0]
 [10]
 [20]
 [30]
 [40]
 [50]]
b.shape = (6, 1)
a+b =
 [[ 0  1  2  3  4  5]
 [10 11 12 13 14 15]
 [20 21 22 23 24 25]
 [30 31 32 33 34 35]
 [40 41 42 43 44 45]
 [50 51 52 53 54 55]]
(a+b).shape = (6, 6)


#### Fancy indexing
NumPy arrays can be indexed with slices, but also with boolean or integer arrays (masks). This method is called fancy indexing. It creates copies not views.

*Note*: Khi sử dụng indexing bằng mặt nạ `True-False`. Mảng kết quả là mảng 1 chiều. Những index nhận giá trị là `False` sẽ không xuất hiện trong mảng trích xuất giá trị. Index định nghĩa trong mặt nạ là index của mảng sau khi được `reduce_dim` step_by_step


In [22]:
a = np.random.randint(0,21,15)
print('a =\n',a)

a_div_3 = a[a%3 == 0]
print('a_div_3 =\n',a_div_3)

a =
 [11 17 14 19  7 10 11 14 17 13 20  0 12  5 17]
a_div_3 =
 [ 0 12]


In [23]:
a = np.random.randint(0,100,[3,5,7])

print(a)
print(a[a%3 == 0])

[[[95 31 89 84 45 16 41]
  [72 56 70 56 86 44 90]
  [83 47 49 18 85 46 98]
  [37 38  7 67  5 47 47]
  [15 34 10 28  4 82 89]]

 [[55 78 23 50 62 55 84]
  [ 0 98 90 33 21 71 68]
  [81 52 64 85 41  1 14]
  [ 3 30 12 73 19 26 96]
  [68 64 22 56 84  8 44]]

 [[24 94 15 72  2 16  2]
  [79 67 46 98 57 55 36]
  [88 33 42  2 87 84 35]
  [18 76 69 81 80  8 75]
  [15 20 16 64 61 96 83]]]
[84 45 72 90 18 15 78 84  0 90 33 21 81  3 30 12 96 84 24 15 72 57 36 33
 42 87 84 18 69 81 75 15 96]


When a new array is created by indexing with an array of integers, the new array has the same shape as the array of integers:

Mảng index có thể có shape khác shape của mảng chính. Shape của mảng trích xuất là shape của mảng index

In [24]:
a = np.arange(10)
idx = np.array([[3,4],[9,7]])
a[idx]

array([[3, 4],
       [9, 7]])

An illustration of fancy indexing

![](https://scipy-lectures.org/_images/numpy_fancy_indexing.png)

**PRACTICE**: Create following arrays

```
[[0., 0., 0., 0., 0.],
 [2., 0., 0., 0., 0.],
 [0., 3., 0., 0., 0.],
 [0., 0., 4., 0., 0.],
 [0., 0., 0., 5., 0.],
 [0., 0., 0., 0., 6.]]
```

In [25]:
A = np.zeros([6, 5])
A[np.arange(1,6),np.arange(5)] = np.arange(5)+2
print(A)

[[0. 0. 0. 0. 0.]
 [2. 0. 0. 0. 0.]
 [0. 3. 0. 0. 0.]
 [0. 0. 4. 0. 0.]
 [0. 0. 0. 5. 0.]
 [0. 0. 0. 0. 6.]]


**PRACTICE**: Create following arrays
```
[[4, 3, 4, 3, 4, 3],
 [2, 1, 2, 1, 2, 1],
 [4, 3, 4, 3, 4, 3],
 [2, 1, 2, 1, 2, 1]]
```

In [26]:
A = np.arange(4,0,-1).reshape([2,2])
A = np.tile(A, [2,3])

print(A)

[[4 3 4 3 4 3]
 [2 1 2 1 2 1]
 [4 3 4 3 4 3]
 [2 1 2 1 2 1]]


### Copy and View on np-arrays
You can use `np.may_share_memory()` to check if two arrays share the same memory block

In [27]:
a = np.arange(10)
b = a[::2]
np.may_share_memory(a, b)

True

In [28]:
a = np.arange(10)
c = a[::2].copy()
np.may_share_memory(a, c)

False

**PRACTICE:** Create a matrix of prime checking (`shape = [N]`) 

In [29]:
N = 100
isPrime = np.ones([N], dtype=np.bool)

isPrime[np.arange(0,2)] = False
isPrime[2*2::2] = False
isPrime[3*2::3] = False

for i in range(5,N,6):
    isPrime[i*2::i] = False
    isPrime[(i+2)*2::i+2] = False

## A.2 Numerical operations on np arrays

### Basic arithmetic: element-wise operators
With scalars:
We can perform `+ - * / power` between a np-array with a scalar value

In [30]:
a = np.random.randint(10,100,size=[2,5])
print('a =\n',a,'\n')

b = np.random.randint(10,100,size=[2,5])
print('b =\n',b,'\n')

print('a+b =\n',a+b,'\n')
print('a-b =\n',a-b,'\n')
print('a*b =\n',a*b,'\n')
print('a/b =\n',a/b,'\n')
print('a**b =\n',a**b,'\n')

a =
 [[67 59 72 67 93]
 [37 97 45 46 94]] 

b =
 [[71 38 11 21 72]
 [27 40 48 35 66]] 

a+b =
 [[138  97  83  88 165]
 [ 64 137  93  81 160]] 

a-b =
 [[-4 21 61 46 21]
 [10 57 -3 11 28]] 

a*b =
 [[4757 2242  792 1407 6696]
 [ 999 3880 2160 1610 6204]] 

a/b =
 [[0.94366197 1.55263158 6.54545455 3.19047619 1.29166667]
 [1.37037037 2.425      0.9375     1.31428571 1.42424242]] 

a**b =
 [[-8897445478090611893 -5911657519374999287 -7139911636680179712
   2331194912028978163  3546589703323008417]
 [ 5756122546718335997 -1386813847010951423  7801277409425119937
   4293887816099692544                    0]] 



### Type of Matrix multiplication
+ **Element-wise multiplication**: a*b
+ **Matrix multiplication**: a.dot(b) or a@b


In [31]:
a = np.random.randint(10,99,[4,4])
b = np.random.randint(10,99,[4,4])

print('Element-wise multiplication')
print('a*b =\n',a*b,'\n')

print('Matrix multiplication')
print('a.dot(b) =\n',a.dot(b),'\n')
print('a@b =\n',a@b,'\n')

Element-wise multiplication
a*b =
 [[1666 1518 7912  846]
 [5529 4656  630 1984]
 [ 864 2805 1653 1479]
 [3021 2484 2205 1404]] 

Matrix multiplication
a.dot(b) =
 [[ 7076  8760  9236  6436]
 [14790 12048 10380 11514]
 [10389  9081  7882  8517]
 [10009 10098 10080  8448]] 

a@b =
 [[ 7076  8760  9236  6436]
 [14790 12048 10380 11514]
 [10389  9081  7882  8517]
 [10009 10098 10080  8448]] 



### Transcendental functions
We can perform `sin cos tan log exp` on a np-array

In [32]:
a = np.random.randint(10, size=4)

print('a =',a)
print('np.sin(a) =',np.sin(a))
print('np.cos(a) =',np.cos(a))
print('np.tan(a) =',np.tan(a))
print('np.exp(a) =',np.exp(a))

a = [4 9 7 2]
np.sin(a) = [-0.7568025   0.41211849  0.6569866   0.90929743]
np.cos(a) = [-0.65364362 -0.91113026  0.75390225 -0.41614684]
np.tan(a) = [ 1.15782128 -0.45231566  0.87144798 -2.18503986]
np.exp(a) = [5.45981500e+01 8.10308393e+03 1.09663316e+03 7.38905610e+00]


### Logical operators

There are following logical operators: `== > < >= <= logical_or logical_and np.array_equal(a,b)`

In [33]:
a = np.random.randint(0,10,5)
b = np.random.randint(0,10,5)
c = a

print('a =',a)
print('b =',b)
print('c =',c)
print('a > b =', a > b)
print('np.array_equal(a>b) =',np.array_equal(a,c))

a = [8 1 9 4 3]
b = [1 0 7 8 0]
c = [8 1 9 4 3]
a > b = [ True  True  True False  True]
np.array_equal(a>b) = True


In [34]:
a = np.random.randint(2, size=5, dtype=np.bool)
b = np.random.randint(2, size=5, dtype=np.bool)
print('a =',a)
print('b =',b)

print('np.logical_and(a, b) =\t',np.logical_and(a, b))
print('np.logical_or(a, b) =\t',np.logical_or(a, b))

a = [ True False False  True False]
b = [False  True False  True  True]
np.logical_and(a, b) =	 [False False False  True False]
np.logical_or(a, b) =	 [ True  True False  True  True]


### Reduction functions
There are some reduction functions in Numpy
* Common arithmetic:
    + `sum` 
    + `max, argmax`
    + `min, argmin`
* Logical arithmetic:
    + `any, all`

* Statistical functions:
    + `sum, mean, std, np.median() np.cumsum() np.unique()` 

We can find more commonly-used functions at:
![](http://drive.google.com/uc?export=view&id=1OHKKpiAhvBY61SXdJa8MfF3urVOwYeN9)

In [35]:
x = np.random.randint(11,100,size=[6])

print('x =',x)
print('x.sum() =',x.sum())
print('x.min() =',x.min())
print('x.max() =',x.max())
print('x.argmin() =',x.argmin())
print('x.argmax() =',x.argmax())


x = [46 68 71 94 20 71]
x.sum() = 370
x.min() = 20
x.max() = 94
x.argmin() = 4
x.argmax() = 3


In [36]:
a = np.random.randint(0,10,5)
b = np.random.randint(0,10,5)

print('a =',a)
print('b =',b)

print('np.all(a>b) =',np.all(a>b))
print('np.any(a>b) =',np.any(a>b))

a = [2 3 1 7 1]
b = [4 7 3 8 4]
np.all(a>b) = False
np.any(a>b) = False


In [37]:
x = np.random.randint(11,100,size=[6])

print('x =',x)
print('x.mean() =',x.mean())
print('x.std() =',x.std())
print('np.median(x) =',np.median(x))
print('np.cumsum(x) =',np.cumsum(x))
print('np.unique(x) =',np.unique(x),'\n')

y = np.random.randint(1,4,size=[3,5])

print('y =\n',y)
print('y.mean() =',y.mean(axis=1))
print('y.std() =', y.std(axis=1))
print('np.median(y, axis=1) =',np.median(y, axis=1))
print('np.cumsum(y, axis=1) =\n',np.cumsum(y, axis=1))
print('np.unique(y) =',np.unique(y))

x = [16 94 83 83 74 67]
x.mean() = 69.5
x.std() = 25.342651794948374
np.median(x) = 78.5
np.cumsum(x) = [ 16 110 193 276 350 417]
np.unique(x) = [16 67 74 83 94] 

y =
 [[3 2 1 3 1]
 [2 2 2 1 2]
 [1 2 3 2 2]]
y.mean() = [2.  1.8 2. ]
y.std() = [0.89442719 0.4        0.63245553]
np.median(y, axis=1) = [2. 2. 2.]
np.cumsum(y, axis=1) =
 [[ 3  5  6  9 10]
 [ 2  4  6  7  9]
 [ 1  3  6  8 10]]
np.unique(y) = [1 2 3]


### Broadcast
***Broadcasting:*** Do operations on arrays of different sizes if NumPy can transform these arrays so that they all have the same size.

![](https://scipy-lectures.org/_images/numpy_broadcasting.png)



### Array shape manipulation

#### Flattening with `ravel`

In [38]:
x = np.arange(1,16,1).reshape([3,5])
print('x =\n',x)

print('x.ravel() =',x.ravel())

x =
 [[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]]
x.ravel() = [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]


#### Reshaping

In [39]:
x = np.arange(1,13,1).reshape([3,4])
print('x =\n',x)
print('x.shape =',x.shape,'\n')

print('x.reshape([2,6]) =\n',x.reshape([2,6]))
print('x.reshape([2,6]).shape =',x.reshape([2,6]).shape)

x =
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
x.shape = (3, 4) 

x.reshape([2,6]) =
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]]
x.reshape([2,6]).shape = (2, 6)


#### Adding a dimension

Indexing with the `np.newaxis` object allows us to add an axis to an array (you have seen this already above in the broadcasting section)  
We also expand dimension with `np.expand_dims`

We observed that:

> `x[np.newaxis]` is equivalent to `x[np.newaxis,:]`    
> `np.expand_dims(x,0) is equivalent to x[np.newaxis,:]` or `x[np.newaxis]`   
> `np.expand_dims(x,-1) is equivalent to x[:,np.newaxis]`

In [40]:
z = np.array([1, 2, 3])
print('z =\n',z)
print('z.shape =',z.shape,'\n')

print('y = z[:, np.newaxis]')
y = z[:, np.newaxis]
print('y =\n',y)
print('y.shape =',y.shape,'\n')

print('y = z[np.newaxis, :]')
y = z[np.newaxis, :]
print('y =\n',y)
print('y.shape =',y.shape,'\n')

print('y = z[np.newaxis]')
y = z[np.newaxis]
print('y =\n',y)
print('y.shape =',y.shape)

z =
 [1 2 3]
z.shape = (3,) 

y = z[:, np.newaxis]
y =
 [[1]
 [2]
 [3]]
y.shape = (3, 1) 

y = z[np.newaxis, :]
y =
 [[1 2 3]]
y.shape = (1, 3) 

y = z[np.newaxis]
y =
 [[1 2 3]]
y.shape = (1, 3)


In [41]:
z = np.array([1, 2, 3])
print('z.shape =',z.shape,'\n')

y = np.expand_dims(z,axis=0)
print('y = np.expand_dims(z,axis=0)')
print('y.shape =',y.shape)
print('y =\n',y,'\n')

y = np.expand_dims(z,axis=1)
print('y = np.expand_dims(z,axis=1)')
print('y.shape =',y.shape)
print('y =\n',y,'\n')

# y = np.expand_dims(z,axis=2)
# print('y = np.expand_dims(z,axis=2)')
# print('y.shape =',y.shape)
# print('y =\n',y,'\n')

z.shape = (3,) 

y = np.expand_dims(z,axis=0)
y.shape = (1, 3)
y =
 [[1 2 3]] 

y = np.expand_dims(z,axis=1)
y.shape = (3, 1)
y =
 [[1]
 [2]
 [3]] 



#### Array stacking

Join a sequence of arrays along a new axis.

In [42]:
# Create a list of arrays with the same size
number_of_arrays = 2
arrays = [(np.arange(1,13)+(i+1)*100).reshape([3,4]) for i in range(number_of_arrays)]
print('len(arrays) =',len(arrays))
print('arrays[0].shape =',arrays[0].shape,'\n')

print('arrays[0] =\n',arrays[0])
print('arrays[1] =\n',arrays[1],'\n')

x = np.stack(arrays, axis = 0)
print('x.shape =',x.shape, ' (stack along axis = 0)')
print('x =\n',x,'\n')

x = np.stack(arrays, axis = 1)
print('x.shape =',x.shape, ' (stack along axis = 1)')
print('x =\n',x,'\n')

x = np.stack(arrays, axis = 2)
print('x.shape =',x.shape, ' (stack along axis = 2)')
print('x =\n',x,'\n')

len(arrays) = 2
arrays[0].shape = (3, 4) 

arrays[0] =
 [[101 102 103 104]
 [105 106 107 108]
 [109 110 111 112]]
arrays[1] =
 [[201 202 203 204]
 [205 206 207 208]
 [209 210 211 212]] 

x.shape = (2, 3, 4)  (stack along axis = 0)
x =
 [[[101 102 103 104]
  [105 106 107 108]
  [109 110 111 112]]

 [[201 202 203 204]
  [205 206 207 208]
  [209 210 211 212]]] 

x.shape = (3, 2, 4)  (stack along axis = 1)
x =
 [[[101 102 103 104]
  [201 202 203 204]]

 [[105 106 107 108]
  [205 206 207 208]]

 [[109 110 111 112]
  [209 210 211 212]]] 

x.shape = (3, 4, 2)  (stack along axis = 2)
x =
 [[[101 201]
  [102 202]
  [103 203]
  [104 204]]

 [[105 205]
  [106 206]
  [107 207]
  [108 208]]

 [[109 209]
  [110 210]
  [111 211]
  [112 212]]] 



#### Dimension shuffling

> `transpose` function create a view not a real copy of an existing array

In [43]:
a = np.arange(4*3*2).reshape(4, 3, 2)
print('a.shape =',a.shape)

b = a.transpose(1, 2, 0)
print('b.shape =',b.shape)



a.shape = (4, 3, 2)
b.shape = (3, 2, 4)


### Sorting data

In [44]:
a = np.random.randint(6,size=[20])
print('a =\n',a)

print('np.argmax(a) =',np.argmax(a))
print('np.argminx(a) =',np.argmin(a))

idx = np.argsort(a)
print('a[idx] =\n',a[idx])

a.sort(axis=0)
print('a =\n',a)

a =
 [2 3 5 5 4 2 5 5 5 3 1 5 1 4 0 0 2 1 4 5]
np.argmax(a) = 2
np.argminx(a) = 14
a[idx] =
 [0 0 1 1 1 2 2 2 3 3 4 4 4 5 5 5 5 5 5 5]
a =
 [0 0 1 1 1 2 2 2 3 3 4 4 4 5 5 5 5 5 5 5]


### Linear-algebra-related functions
We can find more functions related to linear algebra at [here](https://numpy.org/doc/stable/reference/routines.linalg.html)