# B. Numpy Basic



**Numpy** is one of the Python's most used packages for data-handling. It allows you to manipulate data efficiently by providing high-performance multidimensional array objects with the operation tools.

A `Numpy array` is **an n-dimensional data structure** that allows you to process data much faster and more efficiently than lists.

In this section, we will be looking at how to process data using Numpy.



### _Objective_
1. **Importing Numpy** : Learning how to import Numpy and use it for data processing.
2. **Comparing Numpy Arrays with Lists** : Understanding the examples of Numpy indexing, aggregate operation, element-wise operation.

# \[1. Importing Numpy\]

Since `Numpy` is not a Python built-in package, it requires a separate installation from an external source. In case you don't have Numpy on your computer, proceed the installation using `pip` command and import it to the current working file. 



In [1]:
import numpy as np

Once Numpy is imported to the current file, create an array using **`np.array()`**. 

In [2]:
np_array = np.array([10, 12])
print(type(np_array))

<class 'numpy.ndarray'>


Now, you know the basic concept of Numpy and the array. We'll cover the entire section with a number of examples based on the following example data.


Example 1. Numpy arrays and Python lists show similarities and differences in the aspect of their functionality. Therefore, we'll be looking at how the same tasks are performed differently in each.<br>

### Example Data) Grade reports of 6 students in Science, English and Math



| StudentID | Science | English | Math | 
|  ----   | --- |---| --- |
|0 |80 |92 |70 |
|1 |91 |75 |90|
|2 |86 |76 |42 |
|3 |77 |92 |52 |
|4 |75 |85 |85 | 
|5 |96 |90 |95 |

First off, you need to create a list and a Numpy array for the example data. Let's do it as follows.

In [3]:
# The scores of 6 students in a list

scores_list = [
    [80, 92, 70],
    [91, 75, 90],
    [86, 96, 42],
    [77, 92, 52],
    [75, 85, 85],
    [96, 90, 95]
]

# The scores of 6 students in a Numpy array.

scores_np = np.array([
    [80, 92, 70],
    [91, 75, 90],
    [86, 96, 42],
    [77, 92, 52],
    [75, 85, 85],
    [96, 90, 95]
])

<hr>

# \[2. Numpy Array vs Python List\]

Numpy arrays are seemingly quite similar to Python lists.
So, we will be looking at each data structure's performance on
- **indexing**
- **aggregate operation**
- **elementwise operation**

 

## 1. Indexing
- Let's compare Numpy array indexing with list indexing and see the difference in performance.

### (1) Performance of student 1

#### `list`

In [5]:
scores_list[0]

[80, 92, 70]

#### `np.array`

In [6]:
scores_np[0]

array([80, 92, 70])

### (2) Performance of student 1, 2, and 3

#### `list`

In [7]:
scores_list[1:4] # accessing a subset of the list

[[91, 75, 90], [86, 96, 42], [77, 92, 52]]

#### `np.array`

In [8]:
scores_np[1:4] # accessing a subset of the list

array([[91, 75, 90],
       [86, 96, 42],
       [77, 92, 52]])

So far, the difference between Numpy arrays and Python lists is barely visible.
We've seen that the syntax of list indexing and Numpy array indexing is quite the same.

Now, let's dive more into the indexing to get scores of particular students in arbitrary order.

### (3) Performance of students 0, 4 and 1

#### `list`

In [4]:
scores_list[[0,4,1]] # TypeError

TypeError: list indices must be integers or slices, not list

You cannot get the desired scores since accessing multiple **non-sequential** rows at once is not supported in list. Instead, each student's scores must be accessed separately and then aggregated into a new list object as shown below.

In [5]:
[scores_list[0],scores_list[4],scores_list[1]]

[[80, 92, 70], [75, 85, 85], [91, 75, 90]]

#### `np.array`
On the other hand, you can perform the same task on a Numpy array, simply by inserting row indices to the index brackets **`[]`**.

In [None]:
scores_np[[0,4,1]] 

array([[80, 92, 70],
       [75, 85, 85],
       [96, 90, 95]])

Now, let's take a look at how to select scores according to **subject names.**<br>
First, we'll get the entire scores achieved in science. 

In [None]:
### (4) Science scores of all students

#### `list`

When selecting data from a Python list, you first have to access rows for each student and then select the first element in each row to get the science score of each student.
So, let's create an empty list `science_scores` first to collect science scores using a `for` loop.

In [6]:
science_scores = [] # creating an empty list to collect science grades

for student_number, score in enumerate(scores_list):
    science_scores.append(score[0])

print('Achievement in Science =', science_scores)

Achievement in Science = [80, 91, 86, 77, 75, 96]


#### `np.array`

In a 2-dimensional Numpy array, we can select data in 2 ways, over a specific row, either row-wise (axis = 0) and column-wise (axis = 1), or both. The syntax for Numpy array indexing is **`np.ndarray[row_index, column_index]`**. <br>Note that the indexing is zero-based.

Since we're reviewing all students' science scores, we'll be accessing **every row** by inserting a colon(**`:`**) for row_index.<br>
Then, for column_index, we only need to look at the column of science scores, the first column. So, we'll enter 0(**zero-based**) to the column_index.<br>
After all, the code for viewing all students' science scores will be written as  **`scores_np[:, 0]`** indicating first column of every row.

In [None]:
scores_np[:,0] 

array([80, 91, 86, 77, 75, 96])

## 2. Aggregate Operation
- Let's compare aggregate operations on lists and Numpy arrays.

### (1) Total score for each student

#### `list`

In case of using a list, you need to create an empty list, `total_scores` in this case. Then, access the row of each student, sum up all row elements to get the total scores of students and finally append it to `total_scores`. Since there are 6 students, use a `for` loop to get a list of total scores for 6 students at the end.

In [None]:
total_scores = [] # creating an empty list to collect the total score of each student

for score in scores_list:
    total_score = sum(score) # calculating the total score of each student
    total_scores.append(total_score) 
    
total_scores

[242, 256, 224, 221, 245, 281]

#### `np.array`
On Numpy arrays, however, you only need a single line of code to perform the same task.

In [7]:
scores_np.sum(axis=1) # calculating each student's total score using a single line of code with Numpy 

array([242, 256, 224, 221, 245, 281])

## 3. Element-wise Operations on Numpy Arrays and Python Lists.

+ An `elementwise operation` refers to a task operated on one element at a time and is supported on Numpy arrays.


### (1) Grade deduction table by student 

#### `list`
Let's assume that you want to calculate how many points student missed in each exam by subtracting student scores from the highest possible score set at 100.<br>
When using a list, you have to create an empty list `miss_scores`, access `scores_list` by row and then by element to perform the subtraction. Then again, you append each result to `miss-scores`.

In [1]:
miss_scores = []

for scores in scores_list:
    miss_score = []
    for score in scores:
        miss = 100-score # subtracting `score` from 100 to calculate how many points are missed
        miss_score.append(miss)
    miss_scores.append(miss_score)
miss_scores

NameError: name 'scores_list' is not defined

#### `np.array`
Unlike Python lists, you again only need a single line of code to perform the same task on a Numpy array by **subtracting the array as a whole from 100** which is expressed as **`100 - scores_np`**. The subtraction will automatically be applied to every array element.

In [None]:
100 - scores_np

array([[20,  8, 30],
       [ 9, 25, 10],
       [14,  4, 58],
       [23,  8, 48],
       [25, 15, 15],
       [ 4, 10,  5]])

Even though Numpy Arrays look similar to Python lists, they have the advantage of being compatible with various operations with simpler code. Not only does it save time and burden in writing code, but the actual operation time is also much faster, and that’s how Numpy has become the package widely used for dealing with large data sets.