# Efficient Computing with NumPy

**Numeric Computing** is, roughly speaking, computing with large amounts of numbers that are stored in vectors and matrices. Numeric Computing is at the core of many applications in scientific computing.

Most of the tasks are feasible using standard Python libraries (or packages). But there are specialised libraries for most data science purposes, which are efficiently implemented and easy to use. Furthermore, a broad online documentation is available with examples.

One of the most important libraries is [**NumPy**](http://www.numpy.org), which includes numerical computing with powerful numerical array objects and routines to manipulate them. A broadly used and more advanced library for scientific computing is [**SciPy**](https://www.scipy.org/), which is built on NumPy.

## Import of libraries

To include functions, classes or data types from libraries, they have to be imported to the respective Python script or Jupyter notebook. The following example shows how to include the standard library for mathematical operations [**math**](https://docs.python.org/3.0/library/math.html).

In [None]:
import math

All functions of the libary can be called from now on by using `math.` as prefix. The square root is given with the `sqrt()`-function as:

In [None]:
math.sqrt(4)

It is possible to introduce an abbreviation for a single library to reduce programming effort. 
In addition, it is possible to import only single functions, which are directly callable without the leading library name.

In [None]:
import math as m
m.sqrt(4)

In [None]:
from math import sqrt
sqrt(4)

It is not recommended to import all functions of a library by

In [None]:
from math import *

as for example in the later programming process affiliation or assigned names will be unknown or unclear. 

## NumPy

The [**NumPy**](http://www.numpy.org/) library includes different data types and functions for efficient handling of vectors, matrices and multidimensional arrays. This is allowed by using precompiled code internally, which improves speed. 

The most important data type is the **numpy array**. Like a list, it is a collection of objects. These objects are, however, all of the same data type so that they can be stored and processed more efficiently in memory. Some examples are given below.

In [None]:
# NumPy is imported with its common abbreviation
import numpy as np

In [None]:
# Array of a list; data type is called numpy.ndarray
np.array([1, 2, 3, 4, 5])  

In [None]:
# Checking for type of variable
x=[0,1,2,3]
y=np.array(x)

print(f'x is of type: {type(x)}')
print(f'y is of type: {type(y)}')

In [None]:
# Basic operations are NOT applied elementwise in case of a list
print(x + x)
print(x * 3)

In [None]:
# Basic operations are applied elementwise in case of a numpy-array
print(y + y)
print(y * 3)

In [None]:
# Innitializes an array of ten integers from 0 to 9
np.arange(10) 

An 1D-`numpy.ndarray` can be interpreted as a simple vector. With NumPy you can set up data with as many dimensions as you want. But let's stay with maximal three dimensions. In some NumPy functions you should set the correct _axis_ to which you want to perform for example your calculation like summing.

<img src="graphics/numpy_arrays.png" style="width: 800px;"/>

In [None]:
# Shape given array to a (3x5)-array
np.arange(15).reshape(3, 5)

## Try out: Different ways to create a NumPy array

Try different array creation methods.

In [None]:
np.arange(12)
# np.ones(3)
# np.zeros(5)
# np.empty(7)

# np.ones((3,2))
# np.zeros((3, 2))

### Performance _collections_: Python lists vs. NumPy array

Many things that are possible with NumPy arrays can also be achieved with Python's built-in collections - like lists. The difference lies in the performance. When doing scientific computing and data analysis on large amounts of data, you don't want the overhead that comes with the great flexibility and ease of use of these data types. 

Let's measure the difference: With _ipython_ there come some small [**build-in magic widgets**](https://ipython.readthedocs.io/en/stable/interactive/magics.html) you can use within a cell. The [**timeit**](https://docs.python.org/2/library/timeit.html) allows measuring execution time of small code snippets. We measure how long it takes to calculate the square of the numbers from 0 to 1 million.

In [None]:
numbers = range(int(1e6))

In [None]:
%timeit -n 5 [number**2 for number in numbers]

In [None]:
array = np.arange(int(1e6))
%timeit -n 5 array**2

The example above shows how we can achieve the same result much faster by not only replacing a list with a numpy array, but also by avoiding an explicit loop in Python. What happens under the hood is that numpy performs the calculation not in the Python interpreter, but with much faster compiled code written in C. So numpy lets you use highly optimized code without having to leave Python for C - and trust us, you probably don't want to do that.

### Why is computing with NumPy arrays fast?

The reason has to do with the layout of data in the computer memory. Take a look at how data is organized differently in a list and an array.
- a list object contains an array of pointers to objects that can be at any place in the memory
- a numpy array object contains the array of the numbers itself, compactly located in a contiguous block of memory

![](graphics/array_vs_list.png) [Source: Stackoverflow](https://stackoverflow.com/questions/47576775/getting-list-indexed-memory-locations-not-value-locations-but-list-index-locati)

### Selecting Elements

Elements in an array can be selected efficiently by their position (or index) in the array. This is possible with single elements, ranges or applying Boolean operations.

In [None]:
num_1d = np.array([0,1,2,3,4,5])

In [None]:
num_1d = np.array([0,1,2,3,4,5])

In [None]:
print(num_1d)
print(num_1d[0])  # indexing

In [None]:
print(num_1d[0:3])  # slicing

In [None]:
print(num_1d[-1])
print(num_1d[1:])
print(num_1d[:3])
print(num_1d[0::3]) #slicing with step three

### Boolean indexing

Instead of providing the index positions, it is also possible to use array broadcasting to select the array content with [**boolean indexing**](https://numpy.org/devdocs/reference/arrays.indexing.html#boolean-array-indexing).

In [None]:
num_1d

In [None]:
high = num_1d > 3
print(high)

The Bolean array (in our case: `high`) has to have the same length like the array you want to filter on.

In [None]:
print(num_1d[high])

In [None]:
# Direct broadcasting 1D
print(num_1d[num_1d > 3])

### NumPy functions

The full power of numpy arrays is achieved in combination with NumPy functions. Again, they perform operations in compiled code to be very efficient.

In [None]:
C = np.arange(15).reshape(3, 5)
print(C)

In [None]:
# Sum of all elements
C.sum(axis=None)

In [None]:
# Sum per column (axis=0) or line (axis=1)
C.sum(axis=0)

In [None]:
# Minimum (min) or maximum (max) per colum or line
C.min(axis=0)

In [None]:
# Cumulative sum
C.cumsum()

### Performance _functions_: built-in vs. NumPy

In [None]:
# list comprehension 
%timeit -n 5 values = [i for i in range(int(1e7))]; sum(values)

In [None]:
# precompiled function
%timeit -n 5 [np.arange(int(1e7)).sum()]

In [None]:
np.array([1, 2, 3])

### Array and matrix Operations

This is an example for a 2x2 matrix or (2, 2)-array to show different kinds of multiplication.

In [None]:
# Initialize
A = np.array([[1, 1], [0, 1]])
B = np.array([[2, 0], [3, 4]])
print(A)
print(B)

In [None]:
# Element-wise multiplication
A * B

In [None]:
# Matrix product
A.dot(B)

# Exercices
These execises should show you the functions and tricks of NumPy most commonly used in connection with data science. Although, typically your entry point to start with data from other sources will be pandas but let's get some insights already with numpy.

* Check wether none of the elements of a given arrays (`a_wo_zero`, `a_w_zero`) is zero.

In [None]:
a_wo_zero = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
a_w_zero = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Your code here




* Is there an easier/shorter way to create the given arrays (`a_wo_zero`, `a_w_zero`)? Give it a try!

Maybe some of the code will be familiar to you if we look at data visualization in another notebook.

In [None]:
# Your code here





* Generate five random numbers from the normal distribution. 

There are a lot of applications in a data science project where one can befenfit from generated data. To be able to generate data points according to a distribution can be helpful if the available data is insufficient, for training or testing a Machine Learning model, or to model an explicit problem and gain insights. 

In [None]:
# Your code here





---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © 2018-2025 [Point 8 GmbH](https://point-8.de)_