<a href="https://colab.research.google.com/github/ralsouza/data_visualization_with_python/blob/master/notebooks/12_python_numpy_vectorized_functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import numpy as np

In [6]:
# Making the array
arr1 = np.random.randint(0,50,20); arr1

array([44, 39, 28, 49, 41, 49,  2,  1, 14,  5, 39, 37, 41,  7, 25,  7, 27,
       24, 17, 37])

In [0]:
# Making a function
def calc_func(x):
  if x < 10:
    return x**3
  else:
    return x**2

In [8]:
# There is not possible to apply a function over an array, because this there are functions like map(), reduce(), etc...
# To do this we'll use a vectorized function
calc_func(arr1)

ValueError: ignored

In [0]:
# Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and 
# returns a single numpy array or a tuple of numpy arrays
?np.vectorize

In [0]:
# Vectorizing the function calc_func()
vec_calc_func = np.vectorize(calc_func)

In [11]:
# Check Type
type(vec_calc_func)

numpy.vectorize

In [12]:
# Apply the vectorized function over the array, returnin an array
vec_calc_func(arr1)

array([1936, 1521,  784, 2401, 1681, 2401,    8,    1,  196,  125, 1521,
       1369, 1681,  343,  625,  343,  729,  576,  289, 1369])

In [13]:
# Making the same with map() function, but returning a list of values
list(map(calc_func,arr1))

[1936,
 1521,
 784,
 2401,
 1681,
 2401,
 8,
 1,
 196,
 125,
 1521,
 1369,
 1681,
 343,
 625,
 343,
 729,
 576,
 289,
 1369]

In [14]:
# Using with list comprehension, returning a list
# FOR each item of x, apply the function calc_func IN arr1
[calc_func(x) for x in arr1]

[1936,
 1521,
 784,
 2401,
 1681,
 2401,
 8,
 1,
 196,
 125,
 1521,
 1369,
 1681,
 343,
 625,
 343,
 729,
 576,
 289,
 1369]

# Measuring the performance between three methods


In [15]:
# Vectorized function
%timeit vec_calc_func(arr1)

# List Comprehension
%timeit [calc_func(x) for x in arr1]

# map() function
%timeit list(map(calc_func,arr1))

The slowest run took 16.18 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 19.4 µs per loop
100000 loops, best of 3: 12.7 µs per loop
100000 loops, best of 3: 12.5 µs per loop


Vectorized functions appear slower than others, but with more data...

In [16]:
# Making an array the more data
arr2 = np.random.randint(0,100,20*1000); arr2

array([44, 97, 95, ..., 96,  8, 18])

In [17]:
# Vectorized function
%timeit vec_calc_func(arr2)

# List Comprehension
%timeit [calc_func(x) for x in arr2]

# map() function
%timeit list(map(calc_func,arr2))

100 loops, best of 3: 7.22 ms per loop
100 loops, best of 3: 12.1 ms per loop
100 loops, best of 3: 11.6 ms per loop


This time the vectorized function was faster than others