# Tutorial 3: Numpy Basics

## 3.1 Agenda
This tutorial focuses on the most important numpy package functions and array methods, and provide some corresponding practices.

## 3.2 Numpy Fancy indexing and slicing

### Introduction of numpy arrays
- Numpy arrays are a class of mutable collections of elements that are optimized in efficient data handling and numerical operations.
- Numpy arrays support vectorized methods that are much faster (often more than 100x faster) to run that iterating a list
- Numpy array can be nested inside a numpy array to create multi-dimensional arrays. 
- The following shows the example of fancy indexing with numpy array, where we perform indexing with a list of multiple indices:

In [2]:
import numpy as np

array1 = np.array(range(1,9))
print(array1)

# select a single element
simple_indexing = array1[3]

print("Simple Indexing:",simple_indexing)   # 4

# select multiple elements
fancy_indexing = array1[[1, 2, 5, 7]]

print("Fancy Indexing:",fancy_indexing)   # [2 3 6 8]

[1 2 3 4 5 6 7 8]
Simple Indexing: 4
Fancy Indexing: [2 3 6 8]


- `np.argsort()` function accepts array input, returns an array that contains the indices of input array elements, where input elements are in **ascending** order.

In [None]:
array2 = np.array([3, 2, 6, 1, 8, 5, 7, 4])
print("Orignal array:", array2)
print("For sorting in ascending order:")
# sort array2 using fancy indexing
sorted_array = array2[np.argsort(array2)]

print("np.argsort() function returns: ",np.argsort(array2))

print("The sorted array is:", sorted_array)

print("For sorting in descending order:")
# sort array2 using fancy indexing in descending order
sorted_array = array2[np.argsort(-array2)]

print("np.argsort() function returns: ",np.argsort(-array2))

print("The sorted array is:", sorted_array)

# Output: [8 7 6 5 4 3 2 1]

- The following example shows the slicing of 2D array, syntax similar to slicing of lists:

In [None]:
# create a 2D array 
array1 = np.array([[1, 3, 5, 7], 
                      [9, 11, 13, 15],
                      [2, 4, 6, 8]])


# slice the array to get the first two rows and columns
subarray1 = array1[:2, :2]

# slice the array to get the last two rows and columns
subarray2 = array1[1:3, 2:4]

# print the subarrays
print("First Two Rows and Columns: \n",subarray1)
print("Last two Rows and Columns: \n",subarray2)

## 3.3 Numpy array operations
### element-wise operations
- Numpy array supports element-wise arithmatic operations that are simple in syntax and faster to run than iterating two lists in ordinary python.

In [None]:
first_array = np.array([1, 3, 5, 7])
second_array = np.array([2, 4, 6, 8])

# using the * operator
result1 = first_array * second_array
print("Using the * operator:",result1) 

# using the multiply() function
result2 = np.multiply(first_array, second_array)
print("Using the multiply() function:",result2) 

### Array broadcasting
- A 2D array with $m$ rows and $n$ columns (dimension $m \times n$) can perform arithmatic operations with a row array with $n$ elements or a column array with $m$ elements to generate a new $m \times n$ array.

In [None]:
array3 = np.arange(1,13).reshape(4,3) #dimension(4, 3)
print("first array: \n",array3)

In [None]:
array4 = np.array([5, 7, 9])
print("Second array:",array4)
print("Summation results: \n",array3 + array4)

In [None]:
array5 = np.array([[1], [2], [3],[4]])
print("Second array: \n",array5)
print("Multiplication results: \n", array3 * array5)

In [None]:
array6 = np.array([7 ,8 ,9])
print("first array:",array6)
print("Second array: \n",array5)

In [None]:
print("Addition results: \n", array5 + array6)

## 3.4 Random number generation
- `np.random.randint(low, high=None, size=None)` function returns an array of randomly generated integers ranging from `low` to `high`, with dimension stated in `size` integer or tuple
- `np.random.rand(d0, d1, ..., dn)` function returns an array of uniformly distributed floats from 0 to 1, with dimension stated inside the argument
- `np.random.randn(d0, d1, ..., dn)` function returns an array of floats following standard normal distribution, with dimension stated inside the argument

In [None]:
# generate 1D array of 5 random integers between 0 and 9
integer_array = np.random.randint(0, 10, 5)

print("1D Random Integer Array:\n",integer_array)

# generate 1D array of 5 random numbers between 0 and 1
float_array = np.random.rand(5)

print("\n1D Random Float Array:\n",float_array)

# generate 2D array of shape (3, 4) with random integers
result = np.random.randint(0, 10, (3,4))

print("\n2D Random Integer Array:\n",result)

# generate 2D array of shape (3, 4) following standard normal distribution
result = np.random.randn(3, 4)
print("\n2D Standard Normal Array:\n",result)

- `.round(decimals)` method of array rounds each element to the stated decimal places.
- In the following example, now generate a $2 \times 4$ array of uniformly distributed numbers from 10000 to 20000, round to 2 decimal places.

In [None]:
sample = np.random.rand(2, 4)*10000 + 10000
print(sample.round(2)) #round to 2 decimal places

### Practice Question
1. Randomly generate a transcript for 20 students. The transcript should contain scores for 4 subjects, including Math, Economics, Finance, and Science. The score ranges from 40 to 100, keep one decimal digit.
2. Show the highest mark for each subject.
3. List the row indices for the top 5 students based on the average score. 

In [5]:
#Generate transcript
transcript = (np.random.rand(20, 4) * 60 + 40).round(1)
transcript

array([[79.4, 85.6, 88.7, 85.4],
       [80.4, 48.9, 67. , 54.8],
       [94.5, 78.9, 91.7, 61.8],
       [58.6, 64.6, 44.4, 43.4],
       [74.8, 70.4, 41.2, 41.3],
       [56.2, 98.3, 44.5, 48.7],
       [47.5, 68.1, 51.4, 74.4],
       [88.5, 78. , 93.5, 88.4],
       [88. , 67.1, 87.7, 60.6],
       [44.6, 91.1, 49.9, 98.6],
       [40.7, 60.8, 81.8, 74.2],
       [69.7, 74.9, 79.7, 54.3],
       [59. , 99.8, 52.9, 85.3],
       [88.1, 62.3, 94.9, 96.8],
       [85.7, 40.1, 59.7, 44.2],
       [98.5, 91.9, 69. , 76.8],
       [98. , 49.3, 79.6, 78.3],
       [89.2, 60.5, 47.4, 61.7],
       [88.4, 99.6, 87.6, 80.3],
       [42.5, 97.7, 62.7, 82.8]])

In [8]:
highest_mark = transcript.max(axis = 0)
highest_mark

array([98.5, 99.8, 94.9, 98.6])

In [13]:
average_score = transcript.mean(axis = 1)
average_score

array([84.775, 62.775, 81.725, 52.75 , 56.925, 61.925, 60.35 , 87.1  ,
       75.85 , 71.05 , 64.375, 69.65 , 74.25 , 85.525, 57.425, 84.05 ,
       76.3  , 64.7  , 88.975, 71.425])

In [12]:
row_idx = np.argpartition(average_score, -5)[-5: ]
row_idx

array([15,  7, 13, 18,  0])