# Welcome to WQD7003 Data Analytics Lab
This code is generated for the purpose of WQD7003 module.

Created by Shier Nee Saw

Reference: Python for Data Analysis O'Reily

# Numpy

NumPy, short for Numerical Python, is the fundamental package required for high
performance scientific computing and data analysis. It is the foundation on which nearly all of the higher-level tools.

While NumPy by itself does not provide very much high-level data analytical functionality, having an understanding of NumPy arrays and array-oriented computing will help you use tools like pandas much more effectively. If you’re new to Python and just looking to get your hands dirty working with data using pandas

## The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large data sets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.

The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data. For example, a list is a good candidate for conversion.

In [None]:
data1 = [6, 7.5, 8, 0, 1]  # data1 is a list. list is characterized by the square bracket. []

# convert to array, use the Numpy library conversion.
import numpy as np

arr1 = np.array(data1)

print(data1)
print(type(data1))
print(arr1)
print(type(arr1))

[6, 7.5, 8, 0, 1]
<class 'list'>
[6.  7.5 8.  0.  1. ]
<class 'numpy.ndarray'>


In [None]:
# Nested sequence, like a list of equal-length  lists, will be converted into multidimensional array
data2 = [[1, 2, 3, 4],
         [5, 6, 7, 8]]

arr2 = np.array(data2)

print(data2)
print(type(data2))
print(np.shape(data2))
print('----')
print(arr2)
print(type(arr2))
print(np.shape(arr2))

[[1, 2, 3, 4], [5, 6, 7, 8]]
<class 'list'>
(2, 4)
----
[[1 2 3 4]
 [5 6 7 8]]
<class 'numpy.ndarray'>
(2, 4)


In [None]:
# create zero or ones array
# create a zero array with a length of 10

np.zeros(10)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [None]:
# create a 2D zero array with a shape of 3 rows and 6 columns

np.zeros((3,6))

[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]


In [None]:
# create a 3D empty array with a shape of (2, 3, 2)
# the empty creates an array without initializing its values to any particular value.

np.empty((2,3,2))

array([[[4.92877158e-310, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]],

       [[0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]]])

In [None]:
# create an array with evenly spaced values within a given interval.
# https://numpy.org/doc/stable/reference/generated/numpy.arange.html

# create an array from 0 to 14
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [None]:
# create an array from 2 to 9
np.arange(2, 10)

array([2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
# create an array from 2 to 9 with step of 2

np.arange(2, 10, 2)

array([2, 4, 6, 8])

## Data Types for ndarrays

The data type or dtype is a special object containing the information the ndarray needs to interpret a chunk of memory as a particular type of data:

In [None]:
# create array with float64 dtype

np.array([1,2,3], dtype=np.float64)

array([1., 2., 3.])

In [None]:
# create array with int32 dtype

np.array([1,2,3], dtype=np.int32)

array([1, 2, 3], dtype=int32)

In [None]:
# to check the type

arr = np.array([1,2,3,4,5])
arr.dtype

dtype('int64')

In [None]:
# to change it to float64
float_arr = arr.astype(np.float64)

float_arr.dtype

dtype('float64')

In [None]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])

# if cast the floating point number to be of integer dtype, the decimal part will be truncated.
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10], dtype=int32)

Should you have an array of strings representing numbers, you can use astype to convert
them to numeric form.

In [None]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings.astype(float)


array([ 1.25, -9.6 , 42.  ])

## Operation between arrays and scalars.

Arrays are important because they enable you to express batch operations on data without writing any for loops. This is usually called vectorization. Any arithmetic operations between equal-size arrays applies the operation elementwise:

In [None]:
arr = np.array([[1., 2., 3.],
                [4., 5., 6.]])

print(arr)

[[1. 2. 3.]
 [4. 5. 6.]]


In [None]:
arr*arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [None]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [None]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [None]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

## Basic Indexing and Slicing

NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists:

Indexing in Python starts at 0, which means that the first element in a sequence has an index of 0, the second element has an index of 1, and so on.

In Python, when you slice a sequence like a list or a string using the slicing notation *start:stop*, the element at the start index is included, but the element at the *stop* index is **excluded**. This means that the slice goes up to, but does not include, the element at the stop index

In [None]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
# access all of the element

arr[:]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
# index always starts at 0
# the following code access the sixth element

arr[5]

5

In [None]:
# to access the sixth to eighth element

arr[5:8]

array([5, 6, 7])

In [None]:
# assign value 12 to sixth to eighth element

arr[5:8] = 12
print(arr)

[ 0  1  2  3  4 12 12 12  8  9]


### 2D dimension array



In [None]:
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
# access the third row

arr2d[2]

array([7, 8, 9])

In [None]:
arr2d[2][2]

9

In [None]:
arr2d[2, 2]

9

In [None]:
# acccess first row to the second row

arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
# acccess second row to the last row

arr2d[1:]

array([[7, 8, 9]])

In [None]:
# access first to second row, second to last column

arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

#### Try changing the number in the index and see what happen

In [None]:
# In multidimensional arrays, if you omit later indices, the returned object will be a lowerdimensional
# ndarray consisting of all the data along the higher dimensions.

arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [None]:
# arr3d[0] is a 2x3 array

arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
# assign 42 to arr3d[0]

arr3d[0] =42

arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [None]:
# access [7,8,9] in arr3d

arr3d[1, 0]

array([7, 8, 9])

### Boolean Indexing

In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

# create random number with 7 rows, 4 columns
data = np.random.randn(7, 4)

print(names)
print(data)

['Bob' 'Joe' 'Will' 'Bob' 'Will' 'Joe' 'Joe']
[[ 1.24341084  0.79903366  1.27172582 -0.31370933]
 [-0.48076604 -1.63026165  0.25192941 -0.80619438]
 [ 0.2770762  -0.55287602 -0.76008945 -2.18356506]
 [-0.29473529  0.54166965 -0.31415767  0.24913856]
 [ 0.27952021  0.23732492 -0.92491976 -0.02048626]
 [-1.1895586   0.71996981 -0.69663628  1.9649108 ]
 [ 0.30603383  1.89724641  0.47380906 -0.11777111]]


In [None]:
# check the elements in names if equal to 'Bob'
# First and forth elements return True, the rest returns False.

names == 'Bob'

array([ True, False, False,  True, False, False, False])

In [None]:
# only the first and forth elements in data will be displayed.

data[names == 'Bob']

array([[ 1.24341084,  0.79903366,  1.27172582, -0.31370933],
       [-0.29473529,  0.54166965, -0.31415767,  0.24913856]])

In [None]:
# Mix and match boolean arrays

data[names == 'Bob', 2:]

array([[ 1.27172582, -0.31370933],
       [-0.31415767,  0.24913856]])

In [None]:
data[names == 'Bob', 3]

array([-0.31370933,  0.24913856])

In [None]:
# To select everything but 'Bob', use != or ~

names != 'Bob'

array([False,  True,  True, False,  True,  True,  True])

In [None]:
data[names != 'Bob']

array([[-0.48076604, -1.63026165,  0.25192941, -0.80619438],
       [ 0.2770762 , -0.55287602, -0.76008945, -2.18356506],
       [ 0.27952021,  0.23732492, -0.92491976, -0.02048626],
       [-1.1895586 ,  0.71996981, -0.69663628,  1.9649108 ],
       [ 0.30603383,  1.89724641,  0.47380906, -0.11777111]])

In [None]:
data[~(names == 'Bob')]

array([[-0.48076604, -1.63026165,  0.25192941, -0.80619438],
       [ 0.2770762 , -0.55287602, -0.76008945, -2.18356506],
       [ 0.27952021,  0.23732492, -0.92491976, -0.02048626],
       [-1.1895586 ,  0.71996981, -0.69663628,  1.9649108 ],
       [ 0.30603383,  1.89724641,  0.47380906, -0.11777111]])

In [None]:
# Combine multiple boolean condition, use boolean operator & or |

# return True if names equals to 'Bob' or 'Will'
mask = (names == 'Bob') | (names == 'Will')
mask

array([ True, False,  True,  True,  True, False, False])

In [None]:
# return value only for index with True in mask

print(data)
print('----')
print(data[mask])

[[ 1.24341084  0.79903366  1.27172582 -0.31370933]
 [-0.48076604 -1.63026165  0.25192941 -0.80619438]
 [ 0.2770762  -0.55287602 -0.76008945 -2.18356506]
 [-0.29473529  0.54166965 -0.31415767  0.24913856]
 [ 0.27952021  0.23732492 -0.92491976 -0.02048626]
 [-1.1895586   0.71996981 -0.69663628  1.9649108 ]
 [ 0.30603383  1.89724641  0.47380906 -0.11777111]]
----
[[ 1.24341084  0.79903366  1.27172582 -0.31370933]
 [ 0.2770762  -0.55287602 -0.76008945 -2.18356506]
 [-0.29473529  0.54166965 -0.31415767  0.24913856]
 [ 0.27952021  0.23732492 -0.92491976 -0.02048626]]


**Selecting data from an array by boolean indexing always creates a copy of the data, even if the returned array is unchanged.**

In [None]:
# select element based on condition

data[data < 0 ]

array([-0.31370933, -0.48076604, -1.63026165, -0.80619438, -0.55287602,
       -0.76008945, -2.18356506, -0.29473529, -0.31415767, -0.92491976,
       -0.02048626, -1.1895586 , -0.69663628, -0.11777111])

In [None]:
# set all negative values to zero
data[data<0] = 0
data

array([[1.24341084, 0.79903366, 1.27172582, 0.        ],
       [0.        , 0.        , 0.25192941, 0.        ],
       [0.2770762 , 0.        , 0.        , 0.        ],
       [0.        , 0.54166965, 0.        , 0.24913856],
       [0.27952021, 0.23732492, 0.        , 0.        ],
       [0.        , 0.71996981, 0.        , 1.9649108 ],
       [0.30603383, 1.89724641, 0.47380906, 0.        ]])

In [None]:
# set whole rows or columns

print(names != 'Joe')

data[names != 'Joe']  = 7
data

[ True False  True  True  True False False]


array([[7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.        , 0.25192941, 0.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.71996981, 0.        , 1.9649108 ],
       [0.30603383, 1.89724641, 0.47380906, 0.        ]])

## Tranposing Arrays and Swapping Axes

Transposing is a special form of reshaping which similarly returns a view on the underlying data without copying anything. Arrays have the transpose method and also the special T attribute:

In [None]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [None]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

When doing matrix computations, you will do this very often, like for example computing the inner matrix product.

In [None]:
arr = np.random.randn(6, 3)

np.dot(arr.T, arr)

array([[ 3.78605455,  0.87551319, -1.52245127],
       [ 0.87551319, 11.07474965, -2.46268363],
       [-1.52245127, -2.46268363,  7.39486993]])

## Universal Functions: Fast Element-wise Array Functions

Many ufuncs are simple elementwise transformations, like sqrt or exp:

In [None]:
arr = np.arange(10)

np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [None]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

Others, such as add or maximum, take 2 arrays
(thus, binary ufuncs) and return a single array as the result:

In [None]:
x = np.random.randn(8)
y = np.random.randn(8)
print(x)
print(y)

[-0.10588567  0.76283509  1.04893057 -0.00734578  1.03786761 -1.85664786
 -1.81038683 -1.54672765]
[-0.95316148  1.04126029  0.36256998  1.30286669  0.49104896  0.75949074
  0.40297734  0.7963052 ]


In [None]:
np.maximum(x, y) # element-wise maximum


array([-0.10588567,  1.04126029,  1.04893057,  1.30286669,  1.03786761,
        0.75949074,  0.40297734,  0.7963052 ])

### For other math functions, please refer to https://numpy.org/doc/stable/reference/ufuncs.html

## Exercise

1. Create a 3x3 numpy array filled with zeros
2. Create two random array and perform a matrix multiplication.
3. Reshape a 1D numpy array with length of 12 to 2D array with 3 rows 4 columns.
4. Create a random array with a shape of 4 cows and 3 columns. Access the last row and last column element.
5. Perform a element-wise addition of two arrays.


In [None]:
# Your answer her

### Submission: File > Print > As PDF > Submit in ODL Platform
### Make sure the answer is visible in PDF format.
### Deadline: 1 week after today class.