# Essentials of NumPy

Today we will discuss NumPy, which is the premier Python library for doing numerical work. We will introduce the main objects in NumPy - `ndarray`s - as well as look at how we can perform mathematical operations on them. We will also look at how we can use numbers to analyze seemingly non-number-like types of data (such as images and sounds).

## Importing NumPy

As NumPy is not part of the standard Python language (it is a seperate package), we need to import it in order to be able to use it. We commonly abbreviate the package name as `np`, which tends to be much more readable when writing code.

In [22]:
#STUDENT CODE HERE

## Creating NumPy Arrays

The central object in NumPy is the `ndarray`. This array simply stores a sequence of numbers, much like a list. However, NumPy arrays hold a few advantages over Python lists, especially when working with numbers:

1. NumPy arrays allow us to easily access data along multiple dimensions.
2. NumPy arrays let us perform fast math operations on all the elements in the array (or over patterned subsequences of its elements) using speedy C code; this process is known as vectorization and is really what makes NumPy so powerful.

We will see both of these in more detail, but first let's create some simple NumPy arrays.

In [23]:
# Let's create a NumPy array called `x` with four elements: 0, 1, 2, 3

#STUDENT CODE HERE

In [24]:
# Now you do it! Create a NumPy array called `y` with the first six even numbers (0 not included)

#STUDENT CODE HERE

In [25]:
# We learned about different kinds of data types last week
print(type([1, 2, 3]))
print(type("ABC"))

# But what is the type of a NumPy array?

#STUDENT CODE HERE

<class 'list'>
<class 'str'>


Just like lists, NumPy arrays can also have multiple dimensions.

In [26]:
# Let's now make a two-dimensional array:

#STUDENT CODE HERE

Note that in 2D NumPy arrays, each sub-array must be the same length, forming orderly "rows and columns".

## Reshaping Arrays

We don't have to start off by making two-dimensional arrays! We can take a one-dimensional array and reshape it to be two-dimensional.

In [27]:
# Let's change y into a 3 by 2 array

#STUDENT CODE HERE

In [28]:
# We did not set y to the reshaped variable! What would print(y) return?

#STUDENT CODE HERE

In [29]:
# Let's set y equal to the reshaped array now

#STUDENT CODE HERE

In [30]:
# Now you try! Turn x into a 2 by 2 array

#STUDENT CODE HERE

Reshaping arrays allows us to manipulate our data easily and perform certain array-shape-dependent math operations, which we will see soon.

## Vectorized Functions

Say we had a Python list of numbers, and we wanted to square every number in the list. We would need to write a for loop that would one-by-one go through the list and square each value:

In [31]:
# Square the numbers in this list
l = [0,1,2,3]

#STUDENT CODE HERE

Python is actually a pretty slow programming language on its own, so as the number of elements in the list grows, the time it takes for this for loop to finish will grow quickly.

However, with NumPy, we can use **vectorized functions** to quickly do mathematical operations on all elements on the list. "Vectorization" here means that NumPy will, under-the-hood, call very fast C code to perform the math so that it doesn't get bogged down in Python loops. We'll get to see how much faster vectorization is later.

In [32]:
# Let's square all the values in x:

#STUDENT CODE HERE

In [33]:
# We can also use the shorthand `**` for exponentiation:

#STUDENT CODE HERE

In [34]:
# Now you try exponentiating every element in y to the third power

#STUDENT CODE HERE

In [35]:
# Can anyone guess how we could multiply each element in y by 5, then add 1?
# (i.e. compute 5 * y + 1)

#STUDENT CODE HERE

## Indexing in NumPy

Indexing in NumPy looks a lot indexing in Python lists or tuples. However, since we now have multiple dimensions, we can index into each dimension!

In [36]:
# Let's try this out a bit on y (which recall is a 3 x 2 array)
# First let's see what happens if we provide only a single index

#STUDENT CODE HERE

In [37]:
# What happens with two indices?

#STUDENT CODE HERE

In [38]:
# What if we do more than two indices?

#STUDENT CODE HERE

In [39]:
# We can also do negative indexing
# (remember: negative indexing picks out elements from the back)

#STUDENT CODE HERE

In [40]:
# Now you try: pick out the element 3 from x (which remember is a 2 x 2 array)

#STUDENT CODE HERE

## Slicing in NumPy

Slicing in NumPy is quite similar to that of Python lists: we use the same colon notation and everything. However, since NumPy arrays have multiple dimensions, we can now slice individually across each dimension.

In [41]:
print(z)

# Let's try slicing the first dimension of z:

#STUDENT CODE HERE

# See that this picked out only the first row of z

NameError: name 'z' is not defined

In [None]:
# Now the second dimension:

#STUDENT CODE HERE

# See that this removed the first column of z

In [None]:
# What happens if we put these together? Can anyone predict the output?

#STUDENT CODE HERE

In [None]:
# Now you try: using slicing, remove the last row of y

#STUDENT CODE HERE

In [None]:
# As we saw last week, we can perform use slicing to reverse the order of lists
# We can do the same with NumPy arrays:

#STUDENT CODE HERE

In [None]:
# We can also have higher dimensions of arrays:
# a 3D array, shape-(2, 2, 2)
d3_array = np.array([[[0, 1],
                      [2, 3]],

                     [[4, 5],
                      [6, 7]]])

# Indexing a 3D array means giving it [sheet number, row number, column number]

#STUDENT CODE HERE

## Convenient Ways to Create Arrays

It can be a bit tedious to type out all the numbers that we want in our array. But as we saw last week with contrsuctions like `list(range(10))`, there are convenient ways to create arrays with lots of numbers.

In [None]:
# We can create an arrays that has a bunch of ones with a specified shape

#STUDENT CODE HERE

In [None]:
# We can do the same with 0's:

#STUDENT CODE HERE

In [None]:
# We can fill an array with random numbers:

#STUDENT CODE HERE

In [None]:
# Or we can create an array with all the integers up to a specified values
# like we did with list and range!

#STUDENT CODE HERE

In [None]:
# Note that `arange` will always give a 1-D array, but we can reshape it if we want:

#STUDENT CODE HERE

In [None]:
# We can also make an array of evenly spaced numbers:

#STUDENT CODE HERE

In [None]:
# Now you try to put some of these ideas (namely, creating arrays and vectorization) together:
# create a 1-D NumPy array containing the squares of the first 5 integers
# i.e. a NumPy array with 0, 1, 4, 9, 16

#STUDENT CODE HERE

## Operations with Multiple Arrays

So far we have seen how we can multiply each number in an array by a single value, or how we can raise every number in an array to a specified power. But what if we want to combine two arrays in an operation (say we want to add the elements in the two arrays)?

Well, we can do that as well!

In [None]:
# Let's add two (2 x 2) arrays, namely `x` and a new (2 x 2) array containing only 1's:

#STUDENT CODE HERE

In [None]:
# Let's divide the two arrays (with `x` in the numerator!):

#STUDENT CODE HERE

Notice that the operation between matrices are perfomed **element-wise**. That is, the elements in corresponding positions in the two arrays are the ones that "interact".

In [None]:
# Now you try! Raise each elements in x to its own power
# i.e. compute 0 ** 0, 1 ** 1, 2 ** 2, and 3 ** 3

#STUDENT CODE HERE

This is a super powerful tool for doing computations, but there are some limitations. Let's see what happens when we try to do a computation on arrays with different shapes:

In [None]:
# Try multiplying `x` (a 2 x 2 array) with `y` (a 3 x 2 array):

#STUDENT CODE HERE

We get an error! This error is teeling us that NumPy doesn't know how to multiply a 2x2 and 3x2 array element-wise. After all, how can an element-wise operation be done if one array has more elements than the other!

**We will generally only be able to perform element-wise operations between arrays with the same shape**. There is an exception to this rule, which is called 'broadcasting', which we will now very briefly discuss.

## Array Broadcasting

Before we saw that arrays with different shapes will give an error when we try to perform an element-wise operation between them. The exception to this is when one of the dimensions of the arrays is `1`:

In [None]:
# While a (2 x 2) and (3 x 2) array cannot be multiplied,
# a (2 x 2) and (2 x 1) array can:

#STUDENT CODE HERE

The idea here is that NumPy will under-the-hood stretch out any `1` dimensions into the appropriate size such that the two arrays will have the same shape:

<img src="broadcasting_example.png">

We won't worry about this all too much, and when necessary we will point out that broadcasting is happening. However, it is an important piece of NumPy's behaviour and allows us to do computations much more efficiently (we can often save on memory by not storing the 'fully expanded' versions of arrays and relying on broadcasting to do element-wise operations when we need to).

## How Fast is Vectorization?

Let's add two large 1D arrays 500 times, just to see how much faster vectorization is compared to basic math operations. For this, we will use the `default_timer` function from the `timeit` library.

In [44]:
from timeit import default_timer as timer

arr1 = np.linspace(0,1,10000) # create a size 10,000 array of evenly spaced numbers between 0 and 1

arr2 = np.arange(10000) # create a size 10,000 array of integers between 0 and 9,999

def slow(a, b):
    # Given 1D numpy arrays a and b, with same size, add the two arrays and return the sum using for-loops
    arr_out = np.zeros(a.shape)
    for idx in range(a.size):
        arr_out[idx] = a[idx] + b[idx]

def fast(a,b):
    # Given numpy arrays a and b, add them using a vectorized NumPy operation, and return the sum
    return np.add(a,b)

NameError: name 'np' is not defined

In [None]:
start = timer() # This line starts the timer for the slow loop

# Add the two arrays 500 times, the slow way
for i in range(500):
    slow(arr1, arr2)

time_slow = timer() - start # This line records the time for the slow loop
print("For-loop time:", time_slow, "seconds")


start = timer() # This line starts the timer for the fast loop

# Add the two arrays 500 times, using vectorized addition
for i in range(500):
    fast(arr1, arr2)

time_fast = timer() - start # This line records the time for the fast loop


print("NumPy Vectorized time:", time_fast, "seconds")
print("The vectorized addition is " + str(time_slow/time_fast) + " times faster!")

-------

# What is NumPy Good For?

So far, we've done quite a bit looking at how NumPy can be used to work with numbers. But we really want to do computer vision and NLP, which involve images and words, right? It isn't necessarily clear how these tasks can be translated into numbers, so how is NumPy at all useful for machine learning work in these areas?

The key idea is that **observed data are just numbers**. Consider the image of a cat below.

<img src="cat_pixels.png">

This picture is stored on the computer as a rectangular array (with shape 594 x 580) of numbers. Each number tells the computer how bright it should make the corresponding pixel on the screen in order to render the image accurately; the larger the number, the brighter the pixel. In the case of a colored image, each pixel consists of three numbers instead of one, and they tell each pixel how much red, green, and blue color should be present in the pixel, respectively.

Thus doing mathematical analysis and manipulations on an image might be simpler than we would have assumed, were we to think of the picture as some impenetrable “image-format file” on our computer. When handed a png, we can easily load that image as a NumPy array and proceed from there with our analysis. While we might not yet know precisely what our mathematical approach will be to gleaning useful information from the image, we certainly are no longer in unfamiliar territory – we can definitely do “mathy” stuff to an array of numbers.

Working with text doesn’t give us quite as clean of a story as does imagery — there is no text equivalent to a pixel. Rather, a major challenge in the field of natural language processing is establishing a cogent numerical representation for text. We will discuss this matter in some considerable depth in subsequent weeks, but rest assured, we will devise ways of representing text as numbers, with which we can do "mathy" things to model our data.

The takeaway here is: no matter how exotic a machine learning problem may appear to be, we will always find a way to map our observations to numbers that we can do math on. This true for jpegs, videos, pdfs, “tweets”, audio recordings, etc.

----

# Playing With NumPy

In the spirit of the previous discussion, let's play with a few actual representations of data. Work through the following exercises using the NumPy techniques we discussed above, and try to have some fun messing around with the examples on your own!

Don't worry too much about the extra code that is here that may be unfamiliar to you! Just focus on manipulating the data being stored in the NumPy arrays.

In [21]:
# Run the following cell to load in the trumpet.wav soundfile and play it!
# You may need to update `trumpet_path` to point to the correct file on your computer
trumpet_path = r"./trumpet.txt"

from IPython.display import Audio

with open(trumpet_path, 'r') as R:
    trumpet = np.asarray([int(i) for i in R])

# USE THE FOLLOWING LINE OF CODE TO PLAY THE AUDIO ENCODED IN THE `TRUMPET` ARRAY
Audio(trumpet, rate=44100)

# Notice how the audio data is being stored simply as a NumPy array,
# so we can use all the numerical/slicing/etc. operations we looked at earlier!

NameError: name 'np' is not defined

In [None]:
# Now, try the following exercises using the `trumpet` array:
# (1) Reverse the audio clip and play the result

reversed_trumpet = # reverse the trumpet data
Audio(reversed_trumpet, rate=44100)

In [None]:
# (2) Sample every fourth element in the array and play the result

every_fourth_trumpet = # select every fourth element to play in the signal
Audio(every_fourth_trumpet, rate=44100)

In [None]:
# (3) Square the values in the array and play the result

squared_trumpet = # square each element in the `trumpet` array
Audio(squared_trumpet, rate=44100)

In [None]:
# Run the following cell to load in a picture of your choice!
# You will need to set the variable `img_path` to a path to a png or jpg file on your computer:
# (if you don't have an image, you can use "./cat_pixels.png")
img_path = r"./cat_pixels.png"

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

%matplotlib notebook

img = mpimg.imread(img_path)

fig, ax = plt.subplots()
ax.imshow(img);

In [None]:
# Print the shape of img and try to interpret what each of the dimensions represents
# (i.e. which dimension corresponds to the height of the image, width of the image, 
# number of color channels in the image (RGB vs greyscale))

# STUDENT CODE HERE

In [None]:
# Using slicing, reverse the color channels of your image. What does the result look like?

color_flipped_img = # reverse the order of the color channels in your image (likely the size 3 dimension!)

fig, ax = plt.subplots()
ax.imshow(color_flipped_img);

In [None]:
# Try to crop your image to only include part of the top left corner. Try cropping to other regions too!

cropped_img = # crop your image to only include the top left corner 

fig, ax = plt.subplots()
ax.imshow(cropped_img);