NumPy Arrays
NumPy's basic data structure is an indexable, n-dimensional array containing elements of the same type (dtype). Right away, you may notice we have overloaded the term 'dimension'. Above, it was the number of elements in the vector, here, dimension refers to the number of indexes of an array. A one-dimensional or 1-D array has one index. In Course 1, we will represent vectors as NumPy 1-D arrays.

1-D array, shape (n,): n elements indexed [0] through [n-1]



Vector Creation
Data creation routines in NumPy will generally have a first parameter which is the shape of the object. This can either be a single value for a 1-D result or a tuple (n,m,...) specifying the shape of the result. Below are examples of creating vectors using these routines.

In [2]:
import numpy as np
import time
# NumPy routines which allocate memory and fill arrays with value
a = np.zeros(4)
print(f"{a} , {a.shape}")
a = np.zeros(4,)
print(f"{a} , {a.shape}")
a= np.arange(4)
print(f"{a} , {a.shape}")
a = np.random.rand(4)
print(f"{a} , {a.shape}")

[0. 0. 0. 0.] , (4,)
[0. 0. 0. 0.] , (4,)
[0 1 2 3] , (4,)
[0.60198895 0.46376834 0.74242341 0.59581136] , (4,)


Creating a vector using numpy

In [3]:
arr = np.array([1,3,45,5])
arr2 = np.array([5,6,7,8,9])

Now any operation performed on this array will be using vectorization 

The Need for Speed: vector vs for loop
We utilized the NumPy library because it improves speed memory efficiency. Let's demonstrate:

In [4]:
def my_dot(a, b): 
    """
   Compute the dot product of two vectors
 
    Args:
      a (ndarray (n,)):  input vector 
      b (ndarray (n,)):  input vector with same dimension as a
    
    Returns:
      x (scalar): 
    """
    x=0
    for i in range(a.shape[0]):
        x = x + a[i] * b[i]
    return x

In [5]:
np.random.seed(1)
a = np.random.rand(10000000)  # very large arrays
b = np.random.rand(10000000)

tic = time.time()  # capture start time
c = np.dot(a, b)
toc = time.time()  # capture end time

print(f"np.dot(a, b) =  {c:.4f}")
print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")

tic = time.time()  # capture start time
c = my_dot(a,b)
toc = time.time()  # capture end time

print(f"my_dot(a, b) =  {c:.4f}")
print(f"loop version duration: {1000*(toc-tic):.4f} ms ")

del(a);del(b)  #remove these big arrays from memory

np.dot(a, b) =  2501072.5817
Vectorized version duration: 6.2211 ms 
my_dot(a, b) =  2501072.5817
loop version duration: 948.6630 ms 
