# More Vectorisation

Vectorisation is a powerful concept used in Numpy, that can greatly increase the performance of your code. You should have encountered it already in the 2nd year course. This notebook goes into a little more detail, and will hopefully explain how performance improvements are possible (but not guaranteed!)

To start with, it's useful to understand a key difference between different types of computer language : _compiled_ vs _interpreted_.

With _compiled_ languages like C, C++, Fortran, going from code (text) to a program is a two step process. First, a dedicated _compiler_ program parses (reads) the high-level code and converts it into instructions that can be understood by the cpu. The compiler outputs an executable binary file, which you then have to run in a second step. Compilation can potentially take a long time, but this allows the executable to be highly optimised and efficient. On the other hand, when the executable is highly optimised, it bears little resemblance to the original code, and debugging can become more difficult.

With _interpreted_ languages, like Python, there is no separate compiler, and no executable.  The code you write is parsed, compiled, and executed, at the time you run the program by an _interpreter_. The compilation step is omitted, simplifying the process. However, the interpreters ability to optimise the code is reduced and in general, interpreted languages are not nearly as fast as compiled languages.

However, in practise, real world programming with Python . A lot of the libraries used in NumPy and SciPy are actually Python _wrappers_ around code that was written in C, or Fortran, and compiled and optimised. When these libraries are used efficiently, we can get close to the performance of the compiled language.

## For-Loop example

Let's look at this by considering a function that squares each element in a long list.  First, using a for loop.

In [1]:
old = list(range(1000))
new = [0] * 1000

def forloop():
    for i in range(1000):
        new[i] = old[i]**2
    return new

print(forloop())

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801, 10000, 10201, 10404, 10609, 10816, 11025, 11236, 11449, 11664, 11881, 12100, 12321, 12544, 12769, 12996, 13225, 13456, 13689, 13924, 14161, 14400, 14641, 14884, 15129, 15376, 15625, 15876, 16129, 16384, 16641, 16900, 17161, 17424, 17689, 17956, 18225, 18496, 18769, 19044, 19321, 19600, 19881, 20164, 20449, 20736, 21025, 21316, 21609, 21904, 22201, 22500, 22801, 23104, 23409, 23716, 24025, 24336, 24649, 24964, 25281, 25600, 25921, 26244, 2656

Now we'll measure the speed of this function using Timer().  Timer takes a function as argument, runs it repeatedly (_number_ times) and returns the time taken. The _repeat_ argument will repeat this measurement a specified number of times.

Note that in general, we use the shortest time as a measure of performance.  Can you think of a reason why this is preferable to the mean, say ?

In [2]:
from timeit import Timer

print(min(Timer(forloop).repeat(number=1000, repeat=10)))

0.4148266540000005


## NumPy Vector Routines

Next we will do the same thing in a more efficient way using a "vectorised" routine provided by numpy.

NumPy contains a vast array of vectorised routines that should allow you to get access to the fast underlying C/Fortran code. Make sure you can navigate the reference manual here, to understand how to use a given routine :
https://docs.scipy.org/doc/numpy/reference/routines.html

Note that :
   * these routines don't all follow the simple example given here, of repeating a single operation on every element in a numpy array. But they all give you access to the fast code underneath without writing inefficient for loops.
   * for linear algebra you are recommended to use scipy.linalg rather than numpy.matlib or numpy.linalg

Now we will use numpy.power to square every element of our list.

In [3]:
import numpy as np

np_old = np.arange(1000)

def nploop():
    return np.power(np_old, 2)

print(nploop())

[     0      1      4      9     16     25     36     49     64     81
    100    121    144    169    196    225    256    289    324    361
    400    441    484    529    576    625    676    729    784    841
    900    961   1024   1089   1156   1225   1296   1369   1444   1521
   1600   1681   1764   1849   1936   2025   2116   2209   2304   2401
   2500   2601   2704   2809   2916   3025   3136   3249   3364   3481
   3600   3721   3844   3969   4096   4225   4356   4489   4624   4761
   4900   5041   5184   5329   5476   5625   5776   5929   6084   6241
   6400   6561   6724   6889   7056   7225   7396   7569   7744   7921
   8100   8281   8464   8649   8836   9025   9216   9409   9604   9801
  10000  10201  10404  10609  10816  11025  11236  11449  11664  11881
  12100  12321  12544  12769  12996  13225  13456  13689  13924  14161
  14400  14641  14884  15129  15376  15625  15876  16129  16384  16641
  16900  17161  17424  17689  17956  18225  18496  18769  19044  19321
  1960

In [4]:
from timeit import Timer

print(min(Timer(nploop).repeat(number=1000, repeat=10)))

0.004811326000002225


You should see a substantial improvement in computation time....

## numpy.vectorize

Numpy provides a way to "vectorise" a regular function :
https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

However, you should not that this is more for convenience than performance. This provides a numpy-like interface to your function, but the function itself is still interpreted Python.

We can see this in the example below.

In [5]:
def square(n):
    return n**2

vsquare = np.vectorize(square)

np_old = np.arange(1000)

def nvloop():
    return vsquare(np_old)

print(nvloop())

[     0      1      4      9     16     25     36     49     64     81
    100    121    144    169    196    225    256    289    324    361
    400    441    484    529    576    625    676    729    784    841
    900    961   1024   1089   1156   1225   1296   1369   1444   1521
   1600   1681   1764   1849   1936   2025   2116   2209   2304   2401
   2500   2601   2704   2809   2916   3025   3136   3249   3364   3481
   3600   3721   3844   3969   4096   4225   4356   4489   4624   4761
   4900   5041   5184   5329   5476   5625   5776   5929   6084   6241
   6400   6561   6724   6889   7056   7225   7396   7569   7744   7921
   8100   8281   8464   8649   8836   9025   9216   9409   9604   9801
  10000  10201  10404  10609  10816  11025  11236  11449  11664  11881
  12100  12321  12544  12769  12996  13225  13456  13689  13924  14161
  14400  14641  14884  15129  15376  15625  15876  16129  16384  16641
  16900  17161  17424  17689  17956  18225  18496  18769  19044  19321
  1960

In [6]:
from timeit import Timer

print(min(Timer(nvloop).repeat(number=1000, repeat=10)))

0.4608300320000005


You should see results comparable with those from the straight for-loop.  Nonetheless, numpy.vectorise is useful when a uniform numpy-like interface is required.

## Numba

https://numba.pydata.org/numba-doc/dev/index.html

Numba is a library that will _compile_ Python code. From the manual :

_"Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops. The most common way to use Numba is through its collection of decorators that can be applied to your functions to instruct Numba to compile them. When a call is made to a Numba decorated function it is compiled to machine code “just-in-time” for execution and all or part of your code can subsequently run at native machine code speed!"_

So numba's vectorize method does something similar to numpy.vectorise, but with added performance improvement.

In [7]:
import numba as numba

@numba.vectorize([numba.int64(numba.int64)])  # see note below
def nbsquare(n):
    return n**2

np_old = np.arange(1000)

def nbloop():
    return nbsquare(np_old)

print(nbloop())

[     0      1      4      9     16     25     36     49     64     81
    100    121    144    169    196    225    256    289    324    361
    400    441    484    529    576    625    676    729    784    841
    900    961   1024   1089   1156   1225   1296   1369   1444   1521
   1600   1681   1764   1849   1936   2025   2116   2209   2304   2401
   2500   2601   2704   2809   2916   3025   3136   3249   3364   3481
   3600   3721   3844   3969   4096   4225   4356   4489   4624   4761
   4900   5041   5184   5329   5476   5625   5776   5929   6084   6241
   6400   6561   6724   6889   7056   7225   7396   7569   7744   7921
   8100   8281   8464   8649   8836   9025   9216   9409   9604   9801
  10000  10201  10404  10609  10816  11025  11236  11449  11664  11881
  12100  12321  12544  12769  12996  13225  13456  13689  13924  14161
  14400  14641  14884  15129  15376  15625  15876  16129  16384  16641
  16900  17161  17424  17689  17956  18225  18496  18769  19044  19321
  1960

In [8]:
from timeit import Timer

print(min(Timer(nbloop).repeat(number=1000, repeat=10)))

0.0010428719999993064


Note the special line in the code above :

```
@numba.vectorize([numba.int64(numba.int64)])
```

This is an example of a _decorator_. Decoration is a common software design pattern, in which an individual function (or object) is given some added functionality.  The @ symbol in Python is used to apply decoration to the object or function that follows. In this case, the "decoration" is to parse and compile the Python code that follows.


Note that here we have to tell the compiler the _type_ of the function argument and return value. If we wanted to compile this code for different types, eg. floating point numbers, we could add more lines, eg :

```
@numba.vectorize([numba.int32(numba.int32),
                  numba.int64(numba.int64),
                  numba.float32(numba.float32),
                  numba.float64(numba.float64)])
```