In [1]:
import numpy as np

# NumPy
- [NumPy](#NumPy)
- [Introduction to NumPy](#Introduction-to-NumPy)
- [Creating Arrays](#Creating-Arrays)
    - [Nested Lists](#Nested-Lists)
    - [Array-Generating Functions](#Array-Generating-Functions)
        - [Empty Arrays](#Empty-Arrays)
        - [Ranges](#Ranges)
        - [Random Data](#Random-Data)
        - [Matrix Creation](#Matrix-Creation)
        - [Data Types](#Data-Types)
        - [Shapes](#Shapes)
    - [Exercises](#Exercises)
        - [Exercise 1](#Exercise-1)
        - [Exercise 2](#Exercise-2)
        - [Exercise 3](#Exercise-3)
        - [Exercise 4](#Exercise-4)
        - [Exercise 5](#Exercise-5)
- [Manipulating arrays](#Manipulating-arrays)
    - [Indexing](#Indexing)
    - [Slicing](#Slicing)
    - [Boolean Mask](#Boolean-Mask)
    - [Assigning Values to Subarrays](#Assigning-Values-to-Subarrays)
    - [Exercises](#Exercises)
        - [Exercise 1](#Exercise-1)
        - [Exercise 2](#Exercise-2)
        - [Exercise 3](#Exercise-3)
        - [Exercise 4](#Exercise-4)
        - [Exercise 5](#Exercise-5)
- [Array Operations](#Array-Operations)
    - [Logical Operations](#Logical-Operations)
    - [Arithmetic](#Arithmetic)
    - [Aggregative Functions](#Aggregative-Functions)
    - [Vectorization](#Vectorization)
    - [Broadcasting](#Broadcasting)
        - [Rule 1](#Rule-1)
        - [Rule 2](#Rule-2)
        - [Rule 3](#Rule-3)
    - [Further Reading](#Further-Reading)
    - [Excercises](#Excercises)
        - [Exercise 1](#Exercise-1)
        - [Exercise 2](#Exercise-2)
        - [Exercise 3](#Exercise-3)
        - [Exercise 4](#Exercise-4)
        - [Exercise 5](#Exercise-5)
- [Advanced Manipulation](#Advanced-Manipulation)
    - [Reshaping and Transposing](#Reshaping-and-Transposing)
    - [Adding a new dimension with `newaxis`](#Adding-a-new-dimension-with-`newaxis`)
    - [Concatenation and Splitting](#Concatenation-and-Splitting)
    - [Exercises](#Exercises)
        - [Exercise 1](#Exercise-1)
        - [Exercise 2](#Exercise-2)
        - [Exercise 3](#Exercise-3)
        - [Exercise 4](#Exercise-4)
- [Additional Resources:](#Additional-Resources:)


## Introduction to NumPy

Datasets can include collections of documents, images, sound clips, numerical measurements, or, really anything. Despite the heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.

| Data type	    | Arrays of Numbers? |
|---------------|-------------|
|Images | Pixel brightness across different channels|
|Videos | Pixels brightness across different channels for each frame | 
|Sound | Intensity over time |
|Numbers | No need for transformation | 
|Tables | Mapping from strings to numbers |


Therefore, the efficient storage and manipulation of large arrays of numbers is fundamental to the process of doing data science. NumPy is a library specially designed to handle arrays of numerical data.

[NumPy](http://www.numpy.org/) is short for _numerical python_, and provides functions that are especially useful when you have to work with large arrays and matrices of numeric data, like matrix multiplications.  

The array object class is the foundation of NumPy, and NumPy arrays are much like nested lists in base Python. However, NumPy supports _vectorization_. This means that many operations in NumPy are written and compiled in C code rather than Python, making it much faster as we will see.

## Creating Arrays

### Nested Lists

Arrays can be created from nested lists (lists inside a list). The nesting determines the dimensions of the resulting array.

In [3]:
# Create array from lists:
lis = [[1,2,3,4,5],[6,7,8,9,10]] # list element mesti sama byk
ary = np.array(lis) # save the list as array (numpy object)
print(ary) # dah jadi matrix (2d array)--> ada 2 row dgn 5 column
print(lis)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]


In [8]:
lis = [1,2,3,4,5]
ary1 = np.array(lis)
print(ary1)
ary1.ndim # nak tau ada brapa dimension

[1 2 3 4 5]


1

In [9]:
ary1.shape # 5 element

(5,)

Note that dimensions must be consistent. If nested lists do not have the same lengths, NumPy will create a 1-D array in which the elements are the sublists.

In [10]:
ary2 = np.array([[1,2,3,4,5],[6,7,8,9]]) #list element x sama byk


  ary2 = np.array([[1,2,3,4,5],[6,7,8,9]]) #list element x sama byk


In [11]:
ary2
# x kluar error
# dia handle error tu jadi 1d array yg ada 2 list

array([list([1, 2, 3, 4, 5]), list([6, 7, 8, 9])], dtype=object)

The most important attributes of an array are its shape and the number of dimensions.

In [4]:
ary.shape # attribute bkn fucntion, so xyah letak bracket blakang

(2, 5)

In [None]:
ary.ndim

Less important but worth mentioning is the dtype of an array indicating what kind of data it contains.

In [12]:
ary.dtype # bagitau datatype of element dalam array tu

dtype('int32')

### Array-Generating Functions
For larger arrays it is inpractical to initialize the data manually. Instead we can use one of the many functions in numpy that generate arrays of different forms. Some of the more common are:

#### Empty Arrays
When the intended shape of an array is known in advance but its values are not, we can use various functions to generate empty arrays.

In [15]:
np.zeros((2, 3)) # by default element tu float values
# fill array with value zero

array([[0., 0., 0.],
       [0., 0., 0.]])

In [18]:
np.ones((3, 4), dtype=np.int8) # setkan dtype int8 (nombor 0-255)
# fill array with one

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int8)

In [19]:
np.full((3, 5), 3.14)
# fill array with value 3.14

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

A special case is the function `np.empty`, which does not initialize any values. It will reserve memory for the array but use whatever values are already stored there without reseting them. This can be a useful optimization for speed when creating extremely large arrays.

In [21]:
#print(np.empty((2, 3)))
print(np.empty((7, 10))) # x assign value pon (x sama dgn np.zeroes)
# just reserve some memory nnti nak pakai

[[9.80977918e-312 1.43279037e-322 9.80977918e-312 5.58788245e-321
  9.80977918e-312 1.43279037e-322 9.80977918e-312 5.62740771e-321
  9.80977918e-312 1.63041663e-322]
 [0.00000000e+000 1.18575755e-322 9.80959907e-312 4.94065646e-324
  0.00000000e+000 1.27319747e-313 1.69759663e-313 1.18575755e-322
  9.80956943e-312 4.94065646e-324]
 [0.00000000e+000 1.90979621e-313 2.97079411e-313 1.03753786e-322
  9.80977918e-312 9.80956943e-312 4.94065646e-324 1.27319747e-313
  2.97079411e-313 1.18575755e-322]
 [9.80961144e-312 4.94065646e-324 0.00000000e+000 3.18299369e-313
  3.81959242e-313 5.40013751e-321 9.80977918e-312 1.72922976e-322
  0.00000000e+000 5.40507817e-321]
 [9.80977918e-312 1.72922976e-322 9.80977918e-312 5.41001882e-321
  9.80977918e-312 1.72922976e-322 9.80977918e-312 5.41495948e-321
  9.80977918e-312 1.72922976e-322]
 [9.80977918e-312 5.41990013e-321 9.80977918e-312 1.72922976e-322
  9.80977918e-312 5.42484079e-321 9.80977918e-312 1.72922976e-322
  9.80977918e-312 5.44460342e-321

#### Ranges
Numpy also has a number of functions to support creating number ranges, such as:

In [25]:
list(range(0,10))
# range function bagi list of values

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [22]:
# Define endpoints and step size
np.arange(start=0, stop=10, step=1)
# arange function bagi array of values
# xkan include 10

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [26]:
np.arange(start=6, stop=15, step=2)

array([ 6,  8, 10, 12, 14])

In [27]:
# 'step' defaults to 1 and 'start' defaults to 0
np.arange(8)

array([0, 1, 2, 3, 4, 5, 6, 7])

In [28]:
# Define endpoints and the number of elements
np.linspace(start=1, stop=10, num=15)
# num = assign brapa element kita nak dalam array tu
# array start dari 1 sampai 10 (include 10)

array([ 1.        ,  1.64285714,  2.28571429,  2.92857143,  3.57142857,
        4.21428571,  4.85714286,  5.5       ,  6.14285714,  6.78571429,
        7.42857143,  8.07142857,  8.71428571,  9.35714286, 10.        ])

In [29]:
# Includes the endpoint by default (non-standard Python behavior!)
np.linspace(start=1, stop=10, num=15, endpoint=False)
# endpoint = false --> dia x include 10

array([1. , 1.6, 2.2, 2.8, 3.4, 4. , 4.6, 5.2, 5.8, 6.4, 7. , 7.6, 8.2,
       8.8, 9.4])

#### Random Data (boleh setting nak brapa demension array)
Arrays can also be initialized with random values. NumPy supports many different probability distributions.

# bagi 'random' values dalam array tu

In [30]:
# Uniform distribution, i.e. all values equally likely, 
# between low (inclusive) and high (exclusive) --> xkan bagi 10
np.random.uniform(low=0, high=1, size=(3, 3))

array([[0.60902062, 0.08915808, 0.63284059],
       [0.73548696, 0.3982739 , 0.95689553],
       [0.55455155, 0.21679713, 0.62246505]])

In [31]:
# Alias for np.random.uniform(low=0, high=1, ...)
np.random.random(size=(5, 5)) # by default bagi 0 to 1

array([[0.65484602, 0.49450302, 0.67071672, 0.73997248, 0.88364187],
       [0.55430233, 0.08329398, 0.68997199, 0.48578474, 0.61837647],
       [0.8274119 , 0.43041427, 0.4400758 , 0.17214074, 0.30636522],
       [0.19284897, 0.62928375, 0.62105847, 0.67150855, 0.7119026 ],
       [0.38399466, 0.6347551 , 0.02978973, 0.47061071, 0.4015863 ]])

In [32]:
# Normal (Gaussian) distribution centered around 'loc' (mean)
# with a standard deviation of 'scale'
np.random.normal(loc=5, scale=2, size=(3, 3))

array([[2.59703077, 2.75748789, 2.72023193],
       [3.92766523, 2.56574505, 2.65790914],
       [6.07913749, 5.35145337, 6.12513012]])

In [34]:
np.random.normal(loc=5, scale=2, size=(3, 2, 5)) # 3d array

array([[[ 5.73642611,  6.37113816,  8.34608022,  3.49278814,
          8.98304962],
        [ 7.53447759,  4.77016538,  5.25346539,  6.76150727,
          5.59503586]],

       [[ 4.0436329 ,  3.54521884,  2.30688505,  8.12699199,
         10.6070926 ],
        [ 5.18952535,  2.46911961,  6.77647391,  6.02699392,
          7.54578628]],

       [[ 8.77794666,  4.01076677,  4.45456635,  3.36296886,
          4.98272574],
        [ 6.60610837,  8.78571426,  4.68733686,  7.24059523,
         -0.21153116]]])

Beyond distributions of uniformly distributed floating point values, NumPy also lets us generate random integers.

In [33]:
np.random.randint(low=1, high=100, size=(4, 4))

array([[17, 28, 22, 97],
       [18, 53, 64, 51],
       [84, 77, 66,  4],
       [83, 69, 28, 55]])

In [35]:
np.random.randint(low=1, high=100, size=(4, 2,3)) 
# 4 set , 2 rows,  3 column --> boleh create n dimensional array

array([[[23, 12, 28],
        [91, 53, 42]],

       [[83, 99, 42],
        [42, 11, 43]],

       [[10, 95, 82],
        [24, 85, 84]],

       [[15, 99, 46],
        [80, 27, 45]]])

In [36]:
np.random.randint(low=1, high=100, size=(4, 2,2,3)) 
# 4 set ,2 tables in each set, 2 rows, 3 column 
# --> boleh create n dimensional array

array([[[[87, 56, 27],
         [57, 93, 15]],

        [[ 3, 56, 95],
         [96, 28, 47]]],


       [[[48, 31, 69],
         [77, 48, 61]],

        [[52,  6,  7],
         [40, 47, 40]]],


       [[[56, 63, 63],
         [31, 59, 25]],

        [[29, 49, 41],
         [14, 62, 66]]],


       [[[64, 68, 92],
         [75, 11, 40]],

        [[29, 43, 73],
         [48, 63, 94]]]])

#### Matrix Creation
Due to their ubiquity, NumPy also has several generating functions for 2-D arrays (matrices)

In [37]:
# Create an NxM identity matrix with 1 along the diagonal and 0 elsewhere
np.eye(N=3, M=5) # N = row, M = column
# by default k=0

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.]])

In [38]:
# Offset the diagonal
np.eye(N=4, M=4, k=1)
# k = 1--> shift 1, 1 unit to the right 

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [39]:
# Offset the diagonal
np.eye(N=4, M=4, k=2)
# k = 1--> shift 1, 2 unit to the right 

array([[0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [40]:
# Offset the diagonal
np.eye(N=4, M=4, k=-1)
# k = 1--> shift 1, 1 unit to the left

array([[0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])

In [42]:
# a diagonal matrix with custom diagonal values
np.diag([1,2,3])
# nak customize value dekat identitiy matrix

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

In [43]:
# put the values on the offset diagonal of degree k
# NumPy automatically generates a matrix of the necessary size
np.diag([1,2,3], k=2)
# k = 2 -->pindahkan diagonal 2 unit to the right

array([[0, 0, 1, 0, 0],
       [0, 0, 0, 2, 0],
       [0, 0, 0, 0, 3],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

In [45]:
# A matrix with 1's on the diagonal and all lower offset diagonals
# Can also be offset with argument k=...
np.tri(N=4, M=4)

array([[1., 0., 0., 0.],
       [1., 1., 0., 0.],
       [1., 1., 1., 0.],
       [1., 1., 1., 1.]])

#### Data Types
Most, if not all, of these functions allow us to determine the data type with the `dtype` function argument, e.g.

In [None]:
np.zeros((2, 3), dtype=np.int16)

Some of the most common supported data types are

| Data Type | Description |
| --------- | ----------- |
| `np.bool_` or `np.bool` | Boolean (True or False) stored as a byte
| `np.int8` | 	Byte (-128 to 127)
| `np.int16` | 	Integer (-32768 to 32767)
| `np.int32` | 	Integer (-2147483648 to 2147483647)
| `np.int64` | 	Integer (-9223372036854775808 to 9223372036854775807)
| `np.int_` or `np.int` | Default integer type (normally either int64 or int32)
| `np.uint8` | 	Unsigned integer (0 to 255)
| `np.uint16` | Unsigned integer (0 to 65535)
| `np.uint32` | Unsigned integer (0 to 4294967295)
| `np.uint64` | Unsigned integer (0 to 18446744073709551615)
| `np.float16` | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
| `np.float32` | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
| `np.float64` | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
| `np.float_` or `np.float` | Default float type (normally either float64 or float32)

#### Shapes
Until now, we've always created 1-D or 2-D arrays but NumPy is in no way limited to this. Any time a shape or size parameter is used in a function, we can create an array with as many dimensions as we like, e.g.

In [46]:
# Three 4x5 arrays stacked into a 3-D cube
np.random.randint(low=1, high=10, size=(3, 4, 5))

array([[[7, 4, 8, 7, 7],
        [6, 5, 7, 4, 4],
        [6, 8, 9, 6, 3],
        [7, 9, 2, 8, 3]],

       [[6, 5, 8, 5, 8],
        [3, 8, 9, 4, 8],
        [3, 6, 5, 7, 7],
        [4, 9, 5, 8, 9]],

       [[2, 4, 7, 7, 7],
        [7, 5, 2, 8, 5],
        [6, 7, 4, 1, 8],
        [8, 1, 2, 4, 8]]])

In [47]:
# Two sets of three 4x5 arrays stacked into 3-D cubes. 
np.random.randint(low=1, high=10, size=(2, 3, 4, 5))

array([[[[1, 4, 2, 1, 9],
         [4, 3, 4, 4, 9],
         [2, 7, 6, 9, 2],
         [4, 5, 1, 2, 2]],

        [[6, 3, 7, 4, 2],
         [6, 9, 4, 5, 5],
         [3, 4, 6, 5, 6],
         [5, 7, 3, 6, 6]],

        [[3, 5, 4, 4, 6],
         [1, 6, 1, 5, 1],
         [8, 2, 3, 2, 1],
         [1, 2, 7, 1, 3]]],


       [[[6, 8, 9, 1, 9],
         [6, 7, 4, 1, 6],
         [2, 8, 4, 9, 3],
         [4, 1, 2, 6, 4]],

        [[5, 8, 1, 7, 8],
         [5, 7, 5, 6, 6],
         [3, 5, 4, 8, 3],
         [5, 7, 4, 4, 3]],

        [[1, 9, 3, 7, 6],
         [1, 4, 9, 1, 6],
         [5, 4, 8, 1, 2],
         [8, 1, 9, 5, 2]]]])

### Exercises

#### Exercise 1
Create a new 2x2 array without initializing entries.

In [None]:
### your code here

In [48]:
# Haseena
print(np.empty((2,2)))

[[2.12199579e-314 4.67296746e-307]
 [4.68374232e-321 6.36598775e-314]]


#### Exercise 2
Create a new 3x2x4 array of ones and make sure they're floating point numbers.

In [None]:
### your code here

In [49]:
# haseena
print(np.ones((3,2,4)))

[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]]]


#### Exercise 3
Create a 1-D array of 20 evenly spaced elements between 3. (inclusive) and 10. (exclusive).

In [None]:
### your code here

In [51]:
# Haseena
np.linspace(start=3, stop=10, num=20, endpoint=False)

array([3.  , 3.35, 3.7 , 4.05, 4.4 , 4.75, 5.1 , 5.45, 5.8 , 6.15, 6.5 ,
       6.85, 7.2 , 7.55, 7.9 , 8.25, 8.6 , 8.95, 9.3 , 9.65])

#### Exercise 4
Create a matrix with the values (2, 4, 9) on the third offset diagonal and 0 everywhere else.

In [None]:
### your code here

In [52]:
# Haseena
np.diag([2,4,9], k=3)

array([[0, 0, 0, 2, 0, 0],
       [0, 0, 0, 0, 4, 0],
       [0, 0, 0, 0, 0, 9],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

#### Exercise 5
You want to simulate a coin toss with a binomial distribution (`np.random.binomial`). If this is a fair coin, then the probability of getting heads is `p=0.5`. If you toss the coin `n=100` times, how often does your simulation toss heads?

In [None]:
### your code here

In [53]:
np.random.binomial(n=100,p=0.5)

38

## Manipulating arrays

### Indexing
We can index elements in an array using square brackets and indices:

In [54]:
# a vector: the argument to the array function is a Python list
v = np.array([1,2,3,4])
print(v)
print(v[0]) # indexing

[1 2 3 4]
1


In [57]:
M = np.random.randint(low=1, high=10, size=[3,3])
print(M)
# M is a matrix, or a 2 dimensional array, taking two indices 
print(M[1,1]) # row,column start dari 0
# jawapan sentiasa berubah sbb random numbers
print(M[0,1])


[[7 4 8]
 [7 6 8]
 [2 3 3]]
6
4


In [58]:
print(M[0,2])

8


In [59]:
M = np.random.randint(low=1, high=10, size=[2,3,3])
print(M)
print(M[0, 2, 1]) # which set, row, column

[[[9 2 1]
  [4 1 8]
  [9 8 4]]

 [[8 2 8]
  [4 3 2]
  [9 8 9]]]
8


### Slicing
Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:
``` python
x[start:stop:step]
```
start is inclusive, stop is exclusive

Slicing follows the typical Python convention of excluding the stop-index. If any of these are unspecified, they default to the values ``start=0``, ``stop=<size_of_dimension>``, ``step=1``. 

In [60]:
v = np.arange(10) # 1d array
print(v) # print all
print(v[3:7])
print(v[5:]) # print dari index 5 until end of array
print(v[:6]) # print all until index 6
print(v[1:10:3])
print(v[::2])

[0 1 2 3 4 5 6 7 8 9]
[3 4 5 6]
[5 6 7 8 9]
[0 1 2 3 4 5]
[1 4 7]
[0 2 4 6 8]


In [63]:
(v[3:8]) # slicing akan return array yg kita slice

array([3, 4, 5, 6, 7])

The second `:` is unnecessary if no step is specified.

In [None]:
print(v[2:5]) # by default mmg step 1
print(v[2:5:1])

Like before, we can index multidimensional arrays by using slices for each dimension.

In [66]:
M = np.random.randint(low=1, high=10, size=(5, 5))
print(M)
print()
print(M[0:2, 3:5]) # row nak slice mana, column nak slice mana
print()
print(M[::2, 0:2]) # semua row, skip 1 line
print()
print(M[:2, 0:2]) # semua row sampai index 2

[[9 7 2 4 8]
 [8 5 3 6 8]
 [8 4 9 5 5]
 [3 8 4 7 9]
 [8 3 6 3 5]]

[[4 8]
 [6 8]]

[[9 7]
 [8 4]
 [8 3]]

[[9 7]
 [8 5]]


If we omit an index of a multidimensional array, it assumes all of the following dimensions should be indexed fully. For example, indexing a 2-D matrix with only one index slice will return all columns of the specified rows.

In [None]:
print(M[3]) # utk 2d array, mcm ni pon jadi slicing utk row
print(M[3, :]) # lagi clear yg M tu 2d array

### Boolean Mask
Lastly, we can use boolean masks to select specific values. Masks must have the same shape as the array itself. Note that NumPy will automatically convert base Python into NumPy arrays. That means that a mask can be anything that can be converted into an array, e.g. a (nested) list.

In [67]:
v = np.linspace(start=1, stop=10, num=4, endpoint=True)
# endpoint=true -->inclusive
print(v)
print()
print(v[[True, False, True, True]]) # nak pilih apa yg nak print
# yg False xkan kluar

[ 1.  4.  7. 10.]

[ 1.  7. 10.]


Ex: I want to select elements of values>5

In [68]:
v>5 # element bigger than 5 = True

array([False, False,  True,  True])

In [69]:
v[v>5] # nak indexing element bigger than 5

array([ 7., 10.])

In [70]:
v[v>-1]

array([ 1.,  4.,  7., 10.])

Indexing with boolean masks will always flatten arrays, i.e. all shape information will be lost. Result will always be 1D array.

In [71]:
mask = np.array([
    [False, False, False, False, False], 
    [False, False, False, False, False], 
    [True,  False, True, False, False], 
    [True,  True,  False, False, False], 
    [False, False, False, False, False]])
print(M)
print()
print(M[mask])

[[9 7 2 4 8]
 [8 5 3 6 8]
 [8 4 9 5 5]
 [3 8 4 7 9]
 [8 3 6 3 5]]

[8 9 3 8]


In [72]:
M[M>6] # all element bigger than 6

array([9, 7, 8, 8, 8, 8, 9, 8, 7, 9, 8])

In [74]:
M[~(M>6)] # selain dari element bigger than 6

array([2, 4, 5, 3, 6, 4, 5, 5, 3, 4, 3, 6, 3, 5])

We can negate boolean NumPy arrays with `~`

In [73]:
v = np.arange(5)
mask = np.array([True, True, False, False, False])
print(v[mask])
print(v[~mask]) # akan bagi jawapan False

[0 1]
[2 3 4]


### Assigning Values to Subarrays

We can assign new values to elements in an array using any of the indexing methods shown above.

In [77]:
M = np.zeros((5, 5), dtype=np.int8)
M[0,0] = 1 # tukar value 0 dekat specific coordinate tu dgn 1
# tukar single element dalam array
M[4,3] = 1
print(M)

[[1 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 1 0]]


In [78]:
# also works for rows and columns
M[1,:] = 2
M[:,2] = 3
print(M)

[[1 0 3 0 0]
 [2 2 3 2 2]
 [0 0 3 0 0]
 [0 0 3 0 0]
 [0 0 3 1 0]]


In [79]:
# simultaneous assignment of subarray
# assign all rows and columns simultaeously
M[3:5, 2:5] = 4
print(M)

[[1 0 3 0 0]
 [2 2 3 2 2]
 [0 0 3 0 0]
 [0 0 4 4 4]
 [0 0 4 4 4]]


Even though boolean masks flatten outputs when used for selection, they can be used to assign values while retaining the shape.

In [80]:
mask = np.array([
    [False, False, False, False, False], 
    [False, False, False, False, False], 
    [True,  False, True,  False, False], 
    [True,  True,  False, False, False], 
    [False, False, False, False, False]])
M[mask] = 5
print(M) # tukar semua value yg True jadi 5

[[1 0 3 0 0]
 [2 2 3 2 2]
 [5 0 5 0 0]
 [5 5 4 4 4]
 [0 0 4 4 4]]


Assigned values are broadcast to the necessary shape as per the broadcasting rules above. This means that for assignment with indices/slices, they are broadcast to the subarray shape

In [81]:
M[0:2, 3:5] = np.array([[-1, -2], [-3, -4]])
print(M) # gantikan portion tu dgn subarray baru 
# tapi size kena match

[[ 1  0  3 -1 -2]
 [ 2  2  3 -3 -4]
 [ 5  0  5  0  0]
 [ 5  5  4  4  4]
 [ 0  0  4  4  4]]


For boolean masks, the values must be either a scalar value, i.e. a 0-D array, or a 1-D array. Note that after assignment, the original shape of the array is retained.

In [83]:
np.where(mask) # recognize every True in the mask
# index utk row, index utk column

(array([0, 0, 0, 1, 1], dtype=int64), array([0, 2, 4, 1, 3], dtype=int64))

In [82]:
mask = np.array([
    [True,  False, True,  False, True ], 
    [False, True,  False, True,  False], 
    [False, False, False, False, False], 
    [False, False, False, False, False], 
    [False, False, False, False, False]]) # mask ni buat sendiri
M[mask] = [10, 11, 12, 13, 14] 
print(M) # dia isi bahagian True dgn element baru

[[10  0 11 -1 12]
 [ 2 13  3 14 -4]
 [ 5  0  5  0  0]
 [ 5  5  4  4  4]
 [ 0  0  4  4  4]]


### Exercises
Unless otherwise stated, the following exercises are based on the following array. Keep in mind, with regards to the phrasing, that Python begins indexing at 0, i.e. the 'first' element is the element with index 0.

In [None]:
np.random.seed(100)
M = np.random.randint(low=-5, high=5, size=(5, 5))
print(M)

#### Exercise 1
Extract the third column of the matrix `M`

In [None]:
### your code here

In [85]:
# Haseena
M[:,2]

array([11,  3,  5,  4,  4], dtype=int8)

#### Exercise 2
Extract only the odd-indexed rows and columns, i.e. those with indices 1 and 3, of `M`

In [None]:
### your code here

In [88]:
M

array([[10,  0, 11, -1, 12],
       [ 2, 13,  3, 14, -4],
       [ 5,  0,  5,  0,  0],
       [ 5,  5,  4,  4,  4],
       [ 0,  0,  4,  4,  4]], dtype=int8)

In [97]:
M[1::2,1::2]

array([[13, 14],
       [ 5,  4]], dtype=int8)

#### Exercise 3
Extract the positive values of the matrix `M`

In [None]:
### your code here

In [98]:
M[M>0]

array([10, 11, 12,  2, 13,  3, 14,  5,  5,  5,  5,  4,  4,  4,  4,  4,  4],
      dtype=int8)

#### Exercise 4
Replace all negative values of matrix `M` with 0

In [None]:
### your code here

In [99]:
M[M<0] = 0

In [100]:
M

array([[10,  0, 11,  0, 12],
       [ 2, 13,  3, 14,  0],
       [ 5,  0,  5,  0,  0],
       [ 5,  5,  4,  4,  4],
       [ 0,  0,  4,  4,  4]], dtype=int8)

--> Cara nak buat boolean mask

In [102]:
# convert array with boolean
# value 0 akan jadi False
mask2 = M==0

In [103]:
mask2

array([[False,  True, False,  True, False],
       [False, False, False, False,  True],
       [False,  True, False,  True,  True],
       [False, False, False, False, False],
       [ True,  True, False, False, False]])

In [104]:
mask3 = M!=0

In [105]:
mask3

array([[ True, False,  True, False,  True],
       [ True,  True,  True,  True, False],
       [ True, False,  True, False, False],
       [ True,  True,  True,  True,  True],
       [False, False,  True,  True,  True]])

#### Exercise 5
We can use arrays to represent images. The function `create_stick_figure()` returns an array representing a grayscale image. The function `show_image(arr)` displays the array `arr` as an image. Use the array manipulation techniques we've learned so far to perform the following tasks.

**a)** Remove unnecessary (black) background pixels on any side of the stick figure. <br/>
**b)** Subset the trimmed image into three parts: one containing only the head, one containing the torso and arms, and one containing only the legs.<br/>
**c)** Remove the arms in the original image by setting all pixels (array entries) corresponding to arms to black.

In [101]:
def create_stick_figure():
    # Set grayscale values. 1 == white and 0 == black
    head_val = 1
    body_val = 0.25
    arms_val = 0.5
    legs_val = 0.75
    # Create array
    arr = np.zeros((21, 21))
    # head
    arr[(1, 5), 9:12] = head_val
    arr[2:5, (8, 12)] = head_val 
    # body
    arr[6:14, 10] = body_val
    # arms
    arr[7, 9:12] = arms_val
    arr[8, (8, 12)] = arms_val
    arr[9:12, (7, 13)] = arms_val
    # legs
    arr[14, (9, 11)] = legs_val
    arr[15:20, (8, 12)] = legs_val
    return arr

def show_image(arr):
    plt.imshow(arr, cmap='gray', vmin=0, vmax=1)
    plt.xticks(np.arange(arr.shape[1], step=2))
    plt.yticks(np.arange(arr.shape[0], step=2))
    
# Demo code
image = create_stick_figure()
show_image(image)

NameError: name 'plt' is not defined

In [None]:
### your code here
# a)

In [None]:
### your code here
# b)

In [None]:
### your code here
# c)

## Array Operations
Apart from just manipulating array contents directly, we can also perform operations on them, such a logical, arithmetical, or aggregative operations.

### Logical Operations
Logical operations on NumPy arrays evaluate a condition on every individual entry and return boolean arrays of the same shape as the original array.

In [106]:
M = np.random.randint(low=-10, high=10, size=(5, 5))
print(M)
print()
print(M >= 0) # value yg positve = True, negative = False

[[ -6  -4   7  -5  -6]
 [ -2 -10   8 -10  -8]
 [ -1  -2   3   1   4]
 [  1   4  -5   6  -2]
 [ -6  -5  -9  -9  -7]]

[[False False  True False False]
 [False False  True False False]
 [False False  True  True  True]
 [ True  True False  True False]
 [False False False False False]]


We can, of course, use the resulting boolean array as a selection mask

In [107]:
print(M[M >= 0]) # result of mask is always 1D array

[7 8 3 1 4 1 4 6]


and to assign new values to elements > 0

In [108]:
M[M > 0] = 20
print(M)

[[ -6  -4  20  -5  -6]
 [ -2 -10  20 -10  -8]
 [ -1  -2  20  20  20]
 [ 20  20  -5  20  -2]
 [ -6  -5  -9  -9  -7]]


In [110]:
M > 0

array([[False, False,  True, False, False],
       [False, False,  True, False, False],
       [False, False,  True,  True,  True],
       [ True,  True, False,  True, False],
       [False, False, False, False, False]])

In [111]:
(M>5).any() # is there any value>5? if yes, then true, else false

True

In [112]:
(M>5).all() # is all values >5? If yes, then true, else false

False

In [109]:
if M >5:
    print('yes')
else:
    print('No') # x boleh pakai if-else mcm ni, kena pakai any/all

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [115]:
if (M >5).any():
    print('yes')
else:
    print('No') # kena tambah any/all

yes


When using boolean arrays in conditions, for example `if` statements and other boolean expressions, one needs to use `any` or `all`, which requires that any or all elements in the array evalute to `True`:

In [116]:
M = np.array([[ 1,  4],[ 9, 16]])
print(M)
print()
print((M > 5).any())
print()
print((M > 5).all())

[[ 1  4]
 [ 9 16]]

True

False


Base Python doesn't play well with boolean arrays consisting of multiple values.

In [None]:
# Uncomment to run and see Exception
# if M > 5:
#     print("Hello World")

In [117]:
#any
if (M > 5).any():
    print("At least one element in M is larger than 5")
else:
    print("No element in M is larger than 5")

At least one element in M is larger than 5


In [118]:
#all
if (M > 5).all():
    print("All elements in M are larger than 5")
else:
    print("Not all elements in M are larger than 5")

Not all elements in M are larger than 5


### Arithmetic
Arithemtical operations on NumPy arrays are performed on an element-by-element basis. We can either perform this arithmetic between an array and a scalar, i.e. a single number, or between two arrays.

In the case of a scalar, the identical operation is applied to every single array entry.

In [119]:
v1 = np.arange(0, 5)
v1

array([0, 1, 2, 3, 4])

In [120]:
v1 * 2 # scalar so applied on every element in the array

array([0, 2, 4, 6, 8])

In [121]:
v1 + 2

array([2, 3, 4, 5, 6])

In [122]:
A = np.random.randint(low=-5, high=5, size=(3, 3))
print(A)
print()
print(A * 2)
print()
print(A ** 2)

[[-2  1 -3]
 [-2  4 -1]
 [ 1 -2 -3]]

[[-4  2 -6]
 [-4  8 -2]
 [ 2 -4 -6]]

[[ 4  1  9]
 [ 4 16  1]
 [ 1  4  9]]


When we add, subtract, multiply and divide arrays with each other, the default behaviour is element-wise operations:

In [123]:
v1 = np.arange(start=5, stop=10)
v2 = np.arange(start=0, stop=5)
print(v1)
print(v2)
print(v1 + v2) # applied element by element
print(v1 ** v2)

[5 6 7 8 9]
[0 1 2 3 4]
[ 5  7  9 11 13]
[   1    6   49  512 6561]


### Aggregative Functions -- sum, avg, min, max
We can also aggregate over arrays using several built-in functions. For example,

In [124]:
A = np.random.randint(low=0, high=10, size=(2, 2))
print(A)
print()
print(np.sum(A)) # sum of all elements in the array

[[6 0]
 [3 2]]

11


NumPy provides many aggregation functions, but we won't discuss them in detail here.
Additionally, most aggregates have a ``NaN``-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point ``NaN`` value.
Some of these ``NaN``-safe functions were not added until NumPy 1.8, so they will not be available in older NumPy versions.

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

In [125]:
3 + np.nan + 6 # kalau ada nan mmg xkan kluar jawapan

nan

In [127]:
A = np.array([[1, 2], [3, np.nan]])
print(A)

[[ 1.  2.]
 [ 3. nan]]


In [128]:
print(np.sum(A))  # kalau ada nan mmg xkan kluar jawapan

nan


"`NaN`-safe" means that the function ignores any missing values, e.g.

In [129]:
print(np.nansum(A)) # teruskan operation ignore missing value

6.0


In [139]:
A = np.random.randint(low=0, high=10, size=(2,2))
print(A)
print()
print(np.max(A)) # give maximum value

[[5 0]
 [2 6]]

6


We can apply these functions either to entire arrays or individual axes. To understand how the `axis` parameter works it's best to stop thinking of arrays as rows and columns but as nested lists. `axis=0` performs an operation along the outer-most dimension, e.g. if 

$$A = \begin{matrix} [[1 & 5] \\ [2 & 2]] \end{matrix}$$

then the two arrays (1, 5) and (2, 2) would be added together elementwise, resulting in (3, 7). For `axis=1`, the individual elements of each array in the next layer would be added together, i.e. (1 + 5, 2 + 2) = (6, 4).

In [136]:
A = np.array([[1, 5], [2, 2]])
print(A)
print()
print(np.sum(A, axis=0)) # axis=0--> sum of every column
print()
print(np.sum(A, axis=1)) # axis=1--> sum of every row

[[1 5]
 [2 2]]

[3 7]

[6 4]


In [138]:
np.mean(A, axis = 1) # average of every row

array([3., 2.])

### Vectorization

Vectorization in NumPy refers to the implementation of mathematical operations in compiled C code rather than interpreted Python code. This provides a substantial performance boost. Furthermore, due to NumPy's more intuitive treatment of array arithmetic, as much of a program's math should be formulated in terms of NumPy operations. Many packages, like Pandas, SciPy, and Scikit-Learn make use of this vectorization.

In [144]:
# More intuitive treatment of lists versus arrays
lis = [1,2,3,4,5]
print(lis + lis) # list just concatenate

ary = np.array(lis)
print(ary + ary) # array do actual calculation

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
[ 2  4  6  8 10]


Achieving the same result in base Python requires loops.
Kalau nak buat operation dalam list kena pakai for loop.

In [145]:
lis

[1, 2, 3, 4, 5]

In [146]:
[x+x for x in lis]

[2, 4, 6, 8, 10]

This takes substantially longer. NumPy is faster by a factor of over 100 when adding large lists together.

In [147]:
lis = range(10000)
ary = np.array(lis)
# %timeit gives running time of each command
%timeit [x+x for x in lis]
%timeit ary + ary # vectorization takes shorter time

2.65 ms ± 531 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
29.5 µs ± 5.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


We call operations on numpy arrays **vectorized**. This feature is the reason NumPy sits at the base of so many numerical and scientific Python libraries, e.g. scipy, scikit-learn and Pandas.

### Broadcasting- how numpy do arithmetic operation between 2 arrays of different size
If we can add arrays together element-wise then we also need to make sure we have rules in place for when their shape doesn't match. For example, we want `np.array([1,2,3]) * 2 = np.array([2,4,6])`. That means that the scalar 2 needs to be *broadcast* to the same shape as the array. NumPy defines three rules for broadcasting that determine how binary functions, e.g., addition, subtraction, multiplication, division, or exponentiation, are performed on arrays of different sizes.

![Broadcasting](../images/broadcasting.png)

- **Rule 1:** If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is *padded* with ones on its leading (left) side.
- **Rule 2:** If in any dimensions the sizes disagree and one of the arrays has a size of 1 in that dimension then that array is stretched to match the other shape.
- **Rule 3:** If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

#### Rule 1
Let's take a moment to highlight a very important distinction between 1D arrays and 2D arrays with a single row. These may look and behave similarly but are, in fact, quite different.

In [148]:
a1 = np.ones(5)
print(a1)
print("a1 shape: {}".format(a1.shape)) # 1d array
print()
a2 = np.ones((1, 5))
print(a2)
print("a2 shape: {}".format(a2.shape))# 2d array walaupon row 1 je

[1. 1. 1. 1. 1.]
a1 shape: (5,)

[[1. 1. 1. 1. 1.]]
a2 shape: (1, 5)


In [None]:
# rule 1:
# cannot add 2d array and 1d array 
# convert 1d jadi 2d baru boleh tambah

In [None]:
# pad 1 into left side
(5,) --> (1,5)
# eventually sama je rupa dia 2d ngan 1d tu

The same applies to higher-dimensional arrays with more "padded dimensions", e.g.

In [None]:
a3 = np.ones((1, 1, 1, 5)) # tambah 1 dekat belah kiri
print(a3) # value sama je, dimension je berubah
print("a3 shape: {}".format(a3.shape))

We will see later in this chapter that they behave differently with regards to indexing and stacking. For now, simply keep in mind that they have different dimensions despite containing the same data arranged in the same way, i.e. a single row.

With this in mind, the following example highlights rule 1. Array `b` has a single dimension and must first be padded with an empty dimension before being added, element-wise, to array `a`.

In [153]:
a = np.ones((1, 3), dtype=int)
b = np.arange(1, 4) # arange-->isi array tu 1 sampai 3
# check shape tgk brapa square bracket
print(a) # 2d array --> 1 row, 3 column
print("")
print(b) # 1d array --> 3 element  (3,)-->(1,3) [padding at left]
print("")
print(a + b) # python automatically buat broadcasting tu

[[1 1 1]]

[1 2 3]

[[2 3 4]]


In [152]:
print(a.shape)
print(b.shape)

(1, 3)
(3,)


Scalars are a special case in this.

In [None]:
M = np.ones((3, 3))
print(M)
print()
print(M + 5)

#### Rule 2
Both arrays have the same number of dimensions, but while `a` has 3 rows, `b` has only one. Therefore, `b` is stretched to have 3 rows and the two resulting matrices are added together element-wise.

kalau shape x sama, then kena 'stretch' bagi sama

In [155]:
a = np.zeros((3, 3), dtype=np.int8)
b = np.array([[1, 2, 3]])

print(a)
print("a shape: {}".format(a.shape))
print("")
print(b) # stretch (1,3) bagi jadi (3,3) jugak
print("b shape: {}".format(b.shape))
print("")
print(a + b)

[[0 0 0]
 [0 0 0]
 [0 0 0]]
a shape: (3, 3)

[[1 2 3]]
b shape: (1, 3)

[[1 2 3]
 [1 2 3]
 [1 2 3]]


In this example, `a` and `b` don't match in either dimension, but for each one, at least one of the arrays has a size of 1 so that they can be stretched accordingly. The result is a 3x3 matrix.

In [156]:
a = np.array([[1], [2], [3]])
b = np.array([[1, 2, 3]])

print(a) # stretch by column
print("a shape: {}".format(a.shape))
print("")
print(b) # stretch by row
print("b shape: {}".format(b.shape))
print("")
print(a + b)

[[1]
 [2]
 [3]]
a shape: (3, 1)

[[1 2 3]]
b shape: (1, 3)

[[2 3 4]
 [3 4 5]
 [4 5 6]]


Internally, NumPy expands this to:

    1 1 1     1 2 3     2 3 4
    2 2 2  +  1 2 3  =  3 4 5
    3 3 3     1 2 3     4 5 6

#### Rule 3
Here, the second dimension doesn't match but neither of the arrays has a size of 1. NumPy doesn't know how to solve this problem this and throws an exception.

In [158]:
# Rule three --> size xde yg 1 so x boleh add
a = np.ones((3, 2))
b = np.random.randint(low=1, high=10, size=(3, 3))

print(a)
print("a shape: {}".format(a.shape))
print("")
print(b)
print("b shape: {}".format(b.shape))
print("")
# Uncomment to see Exception
print(a + b)

[[1. 1.]
 [1. 1.]
 [1. 1.]]
a shape: (3, 2)

[[7 1 6]
 [9 3 7]
 [8 1 2]]
b shape: (3, 3)



ValueError: operands could not be broadcast together with shapes (3,2) (3,3) 

### Further Reading
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html

### Excercises

#### Exercise 1
Create two 8x8 arrays of random integers. The first should have only negative numbers between -10 and -1 (inclusive) and the second should have only positive numbers between 1 and 10 (inclusive). Add them together and save the result as a variable `A`.

In [None]:
### Your code here

In [167]:
# Haseena
b = np.random.randint(low=-10, high=0, size=(8, 8))
c = np.random.randint(low=1, high=11, size=(8, 8))
a = b + c
print(a)



[[ 9 -7  4  8 -4 -4  1  0]
 [-2  5  1 -1 -2 -4  0 -3]
 [ 5 -4 -1  2 -4  0  3 -8]
 [-1  5 -5  3  1 -5  1 -1]
 [-2  2 -1  7  3  5 -1 -5]
 [-1  6 -9 -4 -1 -8  5  5]
 [-7  3  7  0  0  6  3  0]
 [-8 -4  4  1 -4 -3  3 -9]]


#### Exercise 2
Calculate the mean of the entire matrix `A`.

In [None]:
### Your code here

In [169]:
# Haseena
np.mean(a)

-0.234375

#### Exercise 3
How many of the entries of the resulting matrix `A` are positive, negative, and zero?

In [173]:
np.sum([True,False,True,False,False]) #tolong kira ada berapa True

2

In [174]:
np.sum([a>0]) # ada 27 element > 0

27

In [175]:
len(a[a>0]) # cara lain

27

In [178]:
# Haseena
print('number of positive entries:',np.sum([a>0]),
      'number of negative entries:',np.sum([a<0]),
      'number of zero:',np.sum([a==0]))

number of positive entries: 27 number of negative entries: 31 number of zero: 6


#### Exercise 4
Calculate the mean of every row and column of the matrix `A`

In [179]:
a

array([[ 9, -7,  4,  8, -4, -4,  1,  0],
       [-2,  5,  1, -1, -2, -4,  0, -3],
       [ 5, -4, -1,  2, -4,  0,  3, -8],
       [-1,  5, -5,  3,  1, -5,  1, -1],
       [-2,  2, -1,  7,  3,  5, -1, -5],
       [-1,  6, -9, -4, -1, -8,  5,  5],
       [-7,  3,  7,  0,  0,  6,  3,  0],
       [-8, -4,  4,  1, -4, -3,  3, -9]])

In [181]:
# Haseena
print('mean of each column:',np.mean(a, axis = 0))
print('mean of each row:',np.mean(a, axis = 1))

mean of each column: [-0.875  0.75   0.     2.    -1.375 -1.625  1.875 -2.625]
mean of each row: [ 0.875 -0.75  -0.875 -0.25   1.    -0.875  1.5   -2.5  ]


#### Exercise 5
Make use of broadcasting rules to multiply the first row of the following array by 2, the second row by 3 and the third row by 4.

In [None]:
### Your code here
A = np.array([
    [1, 2, 3], 
    [1, 2, 3], 
    [1, 2, 3]])

## Advanced Manipulation 

### Reshaping and Transposing
On disk, NumPy arrays are stored by their values and their shapes separately. That means we can change the shape of an array very quickly, regardless of the actual size. The array dimensions must match, i.e. the new shape must have space for exactly as many elements as the old shape. 

Elements are reshaped in an "inside-out" fashion. That means the inner-most dimensions are filled with values first and then combined in the outer dimensions. In the context of 2D matrices, this means that values are set rows-first.

In [183]:
a = np.arange(12)
print(a)
print()
print(a.reshape(3, 4)) # dpt 12
print()
print(a.reshape(6, 2)) # dpt 12

[ 0  1  2  3  4  5  6  7  8  9 10 11]

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]]


In [184]:
# bila multiply shape mesti dpt 12 jugak
a.reshape(2,2,3) # 2 set, each with 2 rows and 3 column

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

Alternatively, we can also transpose matrices. Transposing means that the order of dimensions become flipped, i.e. the first dimensions becomes the last, the last the first, etc. Consequently, **transposing 1D arrays has no effect**.

In [185]:
print(a)
print(a.transpose()) # transpose dpt result yg sama

[ 0  1  2  3  4  5  6  7  8  9 10 11]
[ 0  1  2  3  4  5  6  7  8  9 10 11]


Transposing 2D arrays means that rows become columns and columns become rows.

In [188]:
A = np.arange(15).reshape(3, 5)
print(A)
print(A.shape)
print()
print(A.transpose()) # column jadi row
print(A.transpose().shape)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
(3, 5)

[[ 0  5 10]
 [ 1  6 11]
 [ 2  7 12]
 [ 3  8 13]
 [ 4  9 14]]
(5, 3)


For higher-dimensional arrays, the order of the dimensions reverses. Within this new shape, values are then set in the same "inside-out" fashion.

In [191]:
A = np.random.randint(low=-5, high=5, size=(2, 3, 4, 5, 6))
print(A.shape)
print()
print(A.transpose().shape)
# transpose reverse order of shape

(2, 3, 4, 5, 6)

(6, 5, 4, 3, 2)


Alternatively, we can also define how we want to reorder the dimensions.

In [None]:
print(A.shape) # utk high dimension array
print()
print(A.transpose((0, 1, 2, 4, 3)).shape)
# transpose ikut order yg kita nak

To help you understand what is happening here, it is easiest to picture this as creating an empty array with a specified shape and then filling it with the values of the original array, even though this isn't actually what happens "under the hood".

### Adding a new dimension with `newaxis`

With newaxis, we can insert new dimensions in an array, for example converting a vector to a column or row matrix.

In [193]:
v.reshape(1,5) # kena specify shape yg kita nak

array([[0, 1, 2, 3, 4]])

In [192]:
v = np.arange(5)
print(v)
print(v.shape)
print()
v2 = v[np.newaxis, :] # add newaxis at row part, column pakai original value
print(v2)
print(v2.shape)
print()
v3 = v[:, np.newaxis] # add newaxis at column part
print(v3)
print(v3.shape)
print()

[0 1 2 3 4]
(5,)

[[0 1 2 3 4]]
(1, 5)

[[0]
 [1]
 [2]
 [3]
 [4]]
(5, 1)



This is essentially shorthand for reshaping an array and becomes useful when we don't want to explictly list the old dimensions of the array, e.g.

In [194]:
# nak create array shape(3,1,4)
A = np.random.randint(low=-5, high=5, size=(3, 4))
print(A)
print(A.shape)
print()
A2 = A[:, np.newaxis, :] # x perlu specify shape
print(A2)
print(A2.shape)
print()
# We have to explicitly list the old dimensions--> dpt result sama
A3 = A.reshape(3, 1, 4)
print(A3)
print(A3.shape)

[[ 3 -2  4 -2]
 [-5  2  4  4]
 [ 3 -2 -1  1]]
(3, 4)

[[[ 3 -2  4 -2]]

 [[-5  2  4  4]]

 [[ 3 -2 -1  1]]]
(3, 1, 4)

[[[ 3 -2  4 -2]]

 [[-5  2  4  4]]

 [[ 3 -2 -1  1]]]
(3, 1, 4)


### Concatenation and Splitting - will not increase dimension of array
We can concatenate arrays along given axes. In order to be concatenated, they must have the same number of dimensions and their shapes must match (column-wise or row-wise) 

In [195]:

A = np.arange(10)
B = np.arange(20, 40).reshape((2,10))
print('A')
print(A)
print(A.shape)
print()
print('B')
print(B)
print(B.shape)
print()

A
[0 1 2 3 4 5 6 7 8 9]
(10,)

B
[[20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]]
(2, 10)



In [196]:
print(np.concatenate((A, A))) # concatenate side by side
print()
print(np.concatenate((B, B))) # by default, axis = 0


[0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9]

[[20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]]


By default, `np.concatenate` will combine arrays along `axis=0`. We can specify the axis along which to concatenate, however.

In [197]:
print(np.concatenate((B, B), axis=1))


[[20 21 22 23 24 25 26 27 28 29 20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39]]


While `np.concatenate` joins arrays along existing axes, `np.stack` combines them along new axes. `Stack` will increase dimension.

In [198]:
print(A)
print(A.shape)
print()
A2 = np.stack((A, A), axis=0) # stacking based on row (default)
print(A2)
print(A2.shape)
print()
A3 = np.stack((A, A), axis=1) # specify utk concatenate by column
print(A3)
print(A3.shape)

[0 1 2 3 4 5 6 7 8 9]
(10,)

[[0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]]
(2, 10)

[[0 0]
 [1 1]
 [2 2]
 [3 3]
 [4 4]
 [5 5]
 [6 6]
 [7 7]
 [8 8]
 [9 9]]
(10, 2)


We can also split arrays along a certain axis into sections. We can either dictate how many equally sized parts the array should be split into or we can determine specifically where to split the array

In [201]:
A = np.arange(12)
print(A)
print()
# Split into 3 equally sized parts
print(np.split(A, 3)) # pecahkan jadi sub array
print()
# Split at specific indices
# before index 2 split, before index 3 split lagi, before index 8
print(np.split(A, (2, 3, 8)))
# Split into 5 equally sized parts -> x boleh sbb not 12 divisible
print(np.split(A, 5)) # pecahkan jadi sub array
print()

[ 0  1  2  3  4  5  6  7  8  9 10 11]

[array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])]

[array([0, 1]), array([2]), array([3, 4, 5, 6, 7]), array([ 8,  9, 10, 11])]


ValueError: array split does not result in an equal division

Note that NumPy will throw an exception if equally sized parts cannot be created, e.g. an array with 10 numbers cannot be split into 4 equally sized parts

In [202]:
# Uncomment for exception
np.split(np.arange(10), 4) # 10 not divisible by 4

ValueError: array split does not result in an equal division

The `axis` argument allows us to determine along which axis to split the array

In [204]:
print(B)
print()
# divide array B by 2 equal part
print('axis=0',np.split(B, 2, axis=0)) 
print() # split into sub array by row
print('axis=1',np.split(B, 2, axis=1))
# split into sub array by column

[[20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]]

axis=0 [array([[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]]), array([[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])]

axis=1 [array([[20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34]]), array([[25, 26, 27, 28, 29],
       [35, 36, 37, 38, 39]])]


### Exercises

#### Exercise 1
Let x be array
    
    [[1, 2, 3], 
     [4, 5, 6]].

Convert it to 
    
    [[1 4 2 5 3 6]]

In [208]:
x = np.array([[1,2,3],[4,5,6]])
x.shape

(2, 3)

In [209]:
x.reshape(1,6)

array([[1, 2, 3, 4, 5, 6]])

#### Exercise 2
Let x be an array

    [[1, 2, 3]
     [4, 5, 6]]

and y be an array

    [[ 7,  8,  9]
     [10, 11, 12]]

Concatenate x and y so that a new array looks like

    [[1, 2, 3,  7,  8,  9]
     [4, 5, 6, 10, 11, 12]]

In [210]:
y = np.array([[7,8,9],[10,11,12]])
y

array([[ 7,  8,  9],
       [10, 11, 12]])

In [212]:
np.concatenate((x, y), axis=1)

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

#### Exercise 3
Let x be an array [1, 2, 3, ..., 9]. Split x into 3 arrays, each of which has 4, 2, and 3 elements in the original order.

In [213]:
x = np.arange(1,10)
x

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [215]:
print(np.split(x, (4, 6, 9)))

[array([1, 2, 3, 4]), array([5, 6]), array([7, 8, 9]), array([], dtype=int32)]


#### Exercise 4
Let x be an array [0, 1, 2]. Convert it to

    [[0, 1, 2, 0, 1, 2]
     [0, 1, 2, 0, 1, 2]]

In [None]:
### Your code here

In [216]:
x = np.array([0,1,2])
x

array([0, 1, 2])

In [218]:
np.concatenate((x, x))

array([0, 1, 2, 0, 1, 2])

## Additional Resources:  
- [numpy Quickstart Guide](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)  
- [Rahul Dave's CS109 lab1 content at Harvard](https://github.com/cs109/2015lab1)  
- [The Data Incubator](https://www.thedataincubator.com)  
- [Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook)