# Lab 5: Handling Quantitative Data:  Introduction to Numpy

- What is NumPy?
- The data type of N-dimensional array: ndarray. 
- How to create ndarray objects?
- How to access and manipulate data in ndarray?


# Lab 5: Handling Quantitative Data:  Introduction to Numpy

- What is NumPy?
- <span style="color:#aaa">The data type of N-dimensional array: ndarray. </span>
- <span style="color:#aaa">How to create ndarray objects?</span>
- <span style="color:#aaa">How to access and manipulate data in ndarray?</span>

## What is NumPy ?*

- **NumPy** (standing for Numerical Python) is a popular library in Python that can efficiently process **multidimensional data arrays**. It is also the foundation of other advanced Python libraries such as **Pandas**.

**Examples of 1‐dimensinoal array**: the elements must have the same data type



In [None]:
[2,7,10,6] # an array of integers, shape is (4)
[73.5, 67.0, 75.5, 58.5, 92.0] # an array of stduent grades or stock prices, shape is (5)
['Sam', 'John', 'Zoe'] # an array of names, shape is (3)

**Example of 2‐diminsionial array**: an array of array with the same shape and data type

In [None]:
[[3,6,4,8],
 [2,7,5,9],
 [4,8,6,1]] # an array with 3 elements, each element is an array with 4 integers, shape is (3,4)

## List vs. Array

- Arrays need extra declaration while lists don’t.
- Lists are generally used more often between the two, which works fine most of the time. 
- If you're going to perform arithmetic functions to your lists, you should really be using arrays instead. 
- Arrays will store your data more compactly and efficiently. 


## The major features of NumPy
- Easily generate and store data in memory in the form of multidimensional array
- Easily load and store data on disk in **binary**, **text**, or **CSV format**
- Support **efficient** operations on data arrays, including basic arithmetic and logical operations, shape manipulation, data sorting, data slicing, linear algebra, statistical operation, discrete Fourier transform, etc.
- Vectorised computation: simple syntax for elementwise operations without using loops (e.g., **a** = **b** + **c** where **a**, **b**, and **c** are three multidimensional arrays with same shape).

## How to use NumPy?

- In order to use NumPy, we need to import the module **numpy** first.
- A widely used convention is to use **np** as a short name of **numpy**.

In [None]:
import numpy as np

- In this labsheet, when you see **np.xxx**, it is the same as **numpy.xxx**.
- Remark: the module name of NumPy library is **numpy** (i.e., lowercase)

# Lab 5: Handling Quantitative Data:  Introduction to Numpy

- <span style="color:#aaa">What is NumPy?</span>
- The data type of N-dimensional array: ndarray.
- <span style="color:#aaa">How to create ndarray objects?</span>
- <span style="color:#aaa">How to access and manipulate data in ndarray?</span>

## The data type of N-dimensional array: _ndarray_

- The core of NumPy is the N-dimensional array datatype **ndarray**. It can store a collection of data items ***with the same type***, i.e., the array is **homogeneous**.
- This makes it very different from list. A list is more **flexible**, but **less efficient**.
- **ndarray** can only store items with same type, but its **performance** is much better than list (i.e., it takes a shorter time to process the same amount of data).



## An ndarray object has the following properties
|Property|Description|
|----|----|
|**ndarray.ndim**|The number of dimensions of the array (i.e., 1 or 2 or 3 …)|
|**ndarray.shape**|The dimensions of the array (i.e., number of elements in each dimension)|
|**ndarray.size**|The total number of elements of the array|
|**ndarray.dtype**|The data type of the array|
|**ndarray.itemsize**|The number of bytes of each data element|
|**ndarray.data**|The buffer that stores the data elements of the array |
    

## Data types supported by NumPy

NumPy supports the following popular data types:

- Integers with different sizes: *int8  int16  int32  int64  uint8  uint16  uint32  uint64*
- Real numbers with different sizes: *float16  float32  float64  float128*
- Complex numbers with different sizes: *complex64   complex128   complex256*
- Traditional ASCII string with constant length (one byte per character): *S10* means a string with 10 characters
- Unicode string with constant length: *U10* means a string with 10 unicode characters

## Example 1

Try the following statements about 2-d array

In [None]:
a = np.arange(20) # generate a 1-d array first
print(a) # [0 1 2 3 ... 18 19]

a = a.reshape(4,5) # generate a 2-d array from a, and assign it to a
print(a) 
print(type(a)) # numpy.ndarray
print(a.ndim) # 2
print(a.shape) # (4,5)
print(a.dtype) # int32
print(a.itemsize) # 4
print(a.size) # 20

## Exercise 1

Use **numpy.arange()** to generate a 1-dimenstional array with 100 odd numbers 1, 3, 5, …, 199. Then use **numpy.reshape()** to generate a two-dimensional array with shape (10, 10). Print out the
shape, size, and data of the two-dimensional array

In [None]:
# write your code here



# :D

# Lab 5: Handling Quantitative Data:  Introduction to Numpy

- <span style="color:#aaa">What is NumPy?</span>
- <span style="color:#aaa">The data type of N-dimensional array: ndarray.</span>
- How to create ndarray objects?
- <span style="color:#aaa">How to access and manipulate data in ndarray?</span>

## How to create ndarray objects?

**Four** different approaches to create ndarray objects
1. Use the **numpy.array()** function to generate an ndarray object from any sequence-like object (e.g., list and tuple)
2. Use the **build-in functions** to generate some special ndarray object. Use help( ) to find out the details of each function. 
3. Generate ndarray with random numbers (**random sampling**). The **numpy.random** module provides functions to generate arrays of sample values from popular probability distributions.  
4. Save ndarray to disk file, and read ndarray from disk file 

- Use the **numpy.array()** function to generate an ndarray object from any sequence-like object (e.g., list and tuple)

**Example 2**: Try the following statements about ndarray

In [None]:
# generate a 1-d array from a sequence of data
data1 = [1,2,3,4,5,6]
arr1 = np.array(data1)
print(arr1)
print(arr1.ndim)
print(arr1.shape)

# generate a 2-d array from a sequence of data
data2 = [[1,2,3,4], [5,6,7,8]]
arr2 = np.array(data2)
print(arr2)
print(arr2.ndim)
print(arr2.shape)

# generate a 3-d array from a sequence of data
data3 = [[[1,2,3,4], [5,6,7,8]],
         [[9,10,11,12], [13,14,15,16]],
         [[17,18,19,20], [21,22,23,24]]]
arr3 = np.array(data3)
print(arr3)
print(arr3.ndim)
print(arr3.shape)


- Use of following functions to generate some special ndarray object. Use **help()** to find out the details of each function.

|Function|Example|Description|
|--------|-------|-----------|
|**arange**|arr = np.arange(20)|Return evenly spaced values within a given<br>interval, simiar to the built-in range( ) function.|
|**ones**|arr = np.ones(10)|An array of all 1’s with the given shape|
|**zeros**|arr = np.zeros( (2, 3) )|An array of all 0’s with the given shape|
|**full**|arr = np.full( (3, 4), 1.2)|An array of all specified value with the given shape|
|**empty**|arr = np.empty( (2, 5) )|An array with the given shape without initial data|
|**eye**|arr = np.eye(6)|A square NxN identity matrix|

**Example 3**: Try the following statements to learn different array generating functions

In [None]:
arr = np.ones(10)
print(arr)
arr = np.zeros((2,3))
print(arr)
arr = np.full((3,4), 1.2)
print(arr)
arr = np.empty((2,5))
print(arr)
arr = np.eye(6)
print(arr)

- Generate ndarray with random numbers (**random sampling**)

The **numpy.random** module provides to generate arrays of sample values from popular probability distributions.

**Reference**: https://numpy.org/doc/stable/reference/random/index.html

**Example 4**: Try the following statements to generate different random arrays. Use help( ) to understand the functions **random()**, **randint()**, **randn()**, and **uniform()** in numpy.random module. 

In [None]:
help(np.random.random)

arr = np.random.random((2,3)) # return 2x3 random floats in half-open interval [0.0, 1.0)
print(arr)
arr = np.random.randint(10, 100, 10) # return 10 random integers in half-open interval [10, 100)
print(arr)
arr = np.random.randn(6,3) # return 6x3 samples drawn from standard normal distribution
print(arr)
arr = np.random.uniform(-1, 1, 10) # return 10 samples drawn from standard normal distribution in (-1,1)
print(arr)

- Save ndarray to disk file, and load ndarray from disk file

**Example 5**: Try the following statements to save ndarray as **binary file** and load ndarray from binary file (which was previously created by **numpy.save()** function). 

a. Binary format (which is not suitable for human to read) 

In [None]:
arr1 = np.arange(2, 100, 2) # [2, 4, 6, ... , 100]
np.save('even.npy', arr1) # save ndarray to file named <even.npy>
# try: open the file by text editor and see what's inside? :Q

arr2 = np.load('even.npy') # load ndarray from file <even.npy>
print(arr2) # [2, 4, 6, ... , 100]

**Example 6**: Try the following statements to save ndarray as txt file and load ndarray from txt file. 

b. Txt format (which is suitable for human to read)

In [None]:
arr1 = np.arange(0.0, 10.0, 0.5) # [0. 0.5 1. ... 10.]
np.savetxt('half.txt', arr1, fmt='%.6f') # save numbers with 6 decimal places into a txt file
# try: open the file by text editor again, and see :P

arr2 = np.loadtxt('half.txt')
print(arr2)

## Exercise 2

Create the following ndarray objects: 
- Create an ndarray of shape (8, 8) and all data are 2.5  
- Create an ndarray of shape (4, 4) whose values range from 0 to 15 
- Create a 6 × 6 identity matrix 
- Create a random array of size 20 with standard normal distribution and find its mean value 
- Create a random array of shape (3, 6) with random integers in the range of `[1, 50]`.  
- Create a random array of shape (4, 5) with uniform distribution in the range of `[0, 10]`. Find its maximum and minimum values and the mean value. 

In [None]:
# write your code here



# :^)

# Lab 5: Handling Quantitative Data:  Introduction to Numpy

- <span style="color:#aaa">What is NumPy?</span>
- <span style="color:#aaa">The data type of N-dimensional array: ndarray.</span>
- <span style="color:#aaa">How to create ndarray objects?</span>
- How to access and manipulate data in ndarray?

## How to access and manipulate data in ndarray ?

- Accessing data in 1-d ndarray is similar to the cast of **list**
- **Indexing** and **Slicing**

**Array indexing**: use the square bracklets [] to index array values. To access a single data element in 2-d ndarray, you need to specify the coordinate of the element (i.e., the indices on the two axes)

If you index a multidimensional array with fewer indices than dimesions, you will get a sub-dimensional array

**Example 7**: Try the following statements to access items in ndarray

In [None]:
arr2d = np.array([[1,2,3], [4,5,6], [7,8,9]])
arr1d = arr2d[2]
print(arr1d) # [7,8,9]
print(arr2d[1]) # [4,5,6]
arr2d[1][2] = 10 # we can change the values in ndarray
print(arr2d[1]) # [4,5,10]
print(arr2d[0][2]) # 3
print(arr2d[0,2]) # 3

**Slicing**:Slicing on ndarray in similar to sequence slicing, but more complicated for 2-d or 3-d

In [None]:
arr1d = np.arange(20)
print(arr1d) 
arr1d[:10] = 20 # change the first 10 values to 20
print(arr1d)
arr1d[10:15] = -1 # change the next 5 values to -1
print(arr1d)
arr1d[-5:] = 0 # change the last 5 values to 0
print(arr1d)

## More data processing in ndarray

1. Universal functions: perform *elementwise* operations on data in ndarrays ####

Mathematical functions: https://docs.scipy.org/doc/numpy/reference/routines.math.html

**Example 8**: Try some vectorised operations and universal functions on ndarrays:

In [None]:
x = np.array([1,2,3,4])
y = np.array([5,6,7,8])
print(x+y) # [6 8 10 12]
print(x*y) # [5 12 21 32]
arr = np.arange(10)
print(arr) # [0 1 2 ... 9]
print(np.sqrt(arr)) #[0. 1. 1.414 ...]
print(np.exp(arr)) #[1.0e+00 2.718e+00 ...]

2. Statistics

Reference: https://docs.scipy.org/doc/numpy/reference/routines.statistics.html
**Example 9**: Try some statistical methods of ndarrays 

In [None]:
arr = np.random.randn(20,5)
print(arr)
print("Mean is ", arr.mean())
print("Standard Deviation is ", arr.std())
print("Max and Min are ", arr.max(), arr.min())
print("The index of the min is {} and the index of max is {}".format(arr.argmin(), arr.argmax()))

## Additional Resources

If you want to learn more about NumPy, try the following series of tutorials:
https://www.tutorialspoint.com/numpy/index.htm