# Introduction
<hr style="border:2px solid black"> </hr>


**What?** Vectorisation



# What is vectorisation?
<hr style="border:2px solid black"> </hr>


- Python is not the fastest programming language. So when you need to process a large amount of homogeneous data quickly, you’re told to rely on **vectorization**.

- Let’s say we have a few million numbers in a list or array, and we want to do some mathematical operations on them. Since we know they are all numbers, and if we’re doing the same operation on all of the numbers, we can **vectorise** the operation, i.e. take advantage of this **homogeneity** of data and operation.
    
- In python this means that a batch operation implemented in a fast language: say C.
    


In [14]:
from contextlib import contextmanager
from subprocess import Popen
from os import getpid
from signal import SIGINT
from time import sleep, time
from resource import getrusage, RUSAGE_SELF

events = [
    "instructions",
    "cache-references",
    "cache-misses",
    "avx_insts.all",
]

@contextmanager
def perf():
    """Benchmark this process with Linux's perf util.
    
    Example usage:

        with perf():
            x = run_some_code()
            more_code(x)
            all_this_code_will_be_measured()
    """
    p = Popen([
            # Run perf stat
            "perf", "stat",
            # for the current Python process
            "-p", str(getpid()),
            # record the list of events mentioned above
            "-e", ",".join(events)])
    # Ensure perf has started before running more
    # Python code. This will add ~0.1 to the elapsed
    # time reported by perf, so we also track elapsed
    # time separately.
    sleep(0.1)
    start = time()
    try:
        yield
    finally:
        print(f"Elapsed (seconds): {time() - start}")
        print("Peak memory (MiB):",
            int(getrusage(RUSAGE_SELF).ru_maxrss / 1024))
        p.send_signal(SIGINT)

In [9]:
from random import random
DATA = [random() for _ in range(30_000_000)]

In [None]:
with perf():    
    mean = sum(DATA) / len(DATA)
    result = [DATA[i] - mean for i in range(len(DATA))]

# References
<hr style="border:2px solid black"> </hr>


- https://pythonspeed.com/articles/vectorization-python/

