# Important note!

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your GT login and the GT logins of any of your collaborators below. (The GT logins are worth 1 point per notebook, so don't miss the opportunity to get a free point!)

In [None]:
YOUR_ID = "" # Please enter your GT login, e.g., "rvuduc3" or "gtg911x"
COLLABORATORS = [] # list of strings of your collaborators' IDs

In [None]:
import re

RE_CHECK_ID = re.compile (r'''[a-zA-Z]+\d+|[gG][tT][gG]\d+[a-zA-Z]''')
assert RE_CHECK_ID.match (YOUR_ID) is not None

collab_check = [RE_CHECK_ID.match (i) is not None for i in COLLABORATORS]
assert all (collab_check)

del collab_check
del RE_CHECK_ID
del re

**Jupyter / IPython version check.** The following code cell verifies that you are using the correct version of Jupyter/IPython.

In [None]:
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

# Intro to NumPy/SciPy [12 points]

There is a nice package, called [SciPy](http://scipy.github.io), to support numerical computations in Python, including matrix computations. Partly for historical reasons, it consists of a base package called [NumPy](http://www.numpy.org) that provides basic support for multidimensional arrays, which we can use to store and operate on vectors and matrices. Matrix computations using these packages will usually be much faster than doing the equivalent purely in Python's native list, array, and dictionary types.

Some of the material from this lesson is lifted from the following more comprehensive tutorial: [link](http://www.scipy-lectures.org/intro/numpy/index.html)

**Quick demo.** The recommended importing idiom is:

In [None]:
import numpy as np

Numpy provides some natural types and operations on arrays. For instance:

In [None]:
a_1d = np.array ([0, 1, 2, 3]) # a vector
print (a_1d)

b_1d = np.array ([4, 5, 6, 7]) # another vector
print (b_1d)

print (a_1d + b_1d)

In [None]:
print (5*a_1d)

In [None]:
print (a_1d**2)

**Getting help.** By the way, if you need help getting documentation from within this notebook, here are some handy shortcuts.

In [None]:
# Append '?' to get help on a specific routine
np.array?

In [None]:
# Wildcard search
np.con*?

In [None]:
# Search for key text
np.lookfor ("creating array")

## Why bother with NumPy? A motivating example

We already have lists and dictionary types, which are pretty easy to use and very flexible. So why bother with this special type?

One reason to consider Numpy is that it "can be much faster," as noted above. But how much faster is that?

In [None]:
n = 1000000

In [None]:
L = range (n)
%timeit [i**2 for i in L]

In [None]:
np.arange (10) # Moral equivalent to `range`

In [None]:
A = np.arange (n)
%timeit A**2

**Exercise 1** (1 point). Recall the definition of the _2-norm_ of an $n$-vector (or _Euclidean length_ of a vector with $n$ components):

$$
  \|x\|_2 = \left( \sum_{i=0}^{n-1} |x_i|^2 \right)^{\frac{1}{2}}
$$

> Background reading: For a quick review of important concepts in linear algebra, see [these notes](https://t-square.gatech.edu/access/content/group/gtc-239f-fc11-5690-9dae-2dc96b59f372/Kuang-2014-linalg-notes-2.pdf) by former Georgia Tech PhD student, [Da Kuang](http://math.ucla.edu/~dakuang/).

Look for a NumPy routine to compute the two-norm. Compare its speed when using native Python lists versus Numpy arrays.

In [None]:
from random import gauss # Generates random numbers from a Gaussian
from math import sqrt # Computes the square root of a number

n = 1000000
X_py = [gauss (0, 1) for i in range (n)]
X_np = np.array (X_py)

print ("==> Native Python lists:")
%timeit sqrt (sum ([x**2 for x in X_py]))

print ("\n==> Numpy:")
# YOUR CODE HERE
raise NotImplementedError()

## Creating multidimensional arrays

Beyond simple arrays, Numpy supports multidimensional arrays. To do more than one dimension, call `numpy.array()` but nest each new dimension within a list.

Huh? Let's look at some examples.

In [None]:
# Create a two-dimensional array of size 3 rows x 4 columns:
B = np.array([[0, 1, 2, 3],
              [4, 5, 6, 7],
              [8, 9, 10, 11]])

print (B)
print (B.ndim) # What does this do?
print (B.shape) # What does this do?
print (len (B)) # What does this do?

Here is a 3-D array example.

In [None]:
C1 = [[0, 1, 2, 3],
      [4, 5, 6, 7],
      [8, 9, 10, 11]]

C2 = [[12, 13, 14, 15],
      [16, 17, 18, 19],
      [20, 21, 22, 23]]

C = np.array ([C1, C2])

print (C)
print (C.ndim)
print (C.shape)
print (len (C))

Besides `arange()`, you can also define an interval and a number of points. What does the following code do?

In [None]:
print (np.linspace (0, 1, 10))

In [None]:
print (np.linspace (0, 1, 10, endpoint=False))

There are routines for creating various kinds of structured matrices as well, which are similar to those found in [MATLAB](http://www.mathworks.com/products/matlab/) and [Octave](https://www.gnu.org/software/octave/).

In [None]:
print (np.ones ((3, 4)))

In [None]:
print (np.zeros ((3, 4)))

In [None]:
print (np.eye (3))

In [None]:
print (np.diag ([1, 2, 3]))

**Exercise 2** (1 point). The following code creates an identity matrix in two different ways, which are found to be equal according to the assertion. But in fact there is a subtle difference between the `I` and `I_u` matrices created below; can you spot it? Explain that difference in the `YOUR ANSWER HERE` (Markdown) cell that follows the code.

In [None]:
n = 3
I = np.eye (n)

print ("==> I = eye(n):")
print (I)

u = [1] * n
I_u = np.diag (u)

print ("\n==> u:\n", u)
print ("==> I_u = diag (u):\n", I_u)

assert np.all (I_u == I)

YOUR ANSWER HERE

You can also create empty (uninitialized) arrays. What does the following produce?

In [None]:
A = np.empty ((3, 4)) # An "empty" 3 x 4 matrix
print (A)

## Indexing and slicing

The usual 0-based slicing and indexing notation you know and love from lists is also supported for Numpy arrays. In the multidimensional case,  including their natural multidimensional analogues with index ranges separated by commas.

In [None]:
# Recall: C
print (C)

What part of C will the following slice extract? Run the code to find out.

In [None]:
print (C[0, 2, :])

What will the following slice return? Run the code to find out.

In [None]:
print (C[1, 0, ::-1])

**Exercise 3** (5 points). Consider the following $6 \times 6$ matrix, which has 4 different subsets highlighted.

<img src="slicing-exercise.png" alt="Exercise: Extract these slices" width="240">

Write some code to generate this matrix, named `Z`. Then, for each subset illustrated above, write an indexing or slicing expression that extracts the subset. Store the result of each slice into `Z_green`, `Z_red`, `Z_orange`, and `Z_cyan`.

In [None]:
# Hint: What do the following do?
#   np.arange (0, 51, 10)
#   np.arange (0, 51, 10)[:, np.newaxis]

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
print ("==> Z:\n", Z)
assert (Z == np.array ([np.arange (0, 6),
                        np.arange (10, 16),
                        np.arange (20, 26),
                        np.arange (30, 36),
                        np.arange (40, 46),
                        np.arange (50, 56)])).all ()

print ("\n==> Orange slice:\n", Z_orange)
assert (Z_orange == np.array ([3, 4])).all ()

print ("\n==> Red slice:\n", Z_red)
assert (Z_red == np.array ([2, 12, 22, 32, 42, 52])).all ()

print ("\n==> Cyan slice:\n", Z_cyan)
assert (Z_cyan == np.array ([[44, 45], [54, 55]])).all ()

print ("\n==> Green slice:\n", Z_green)
assert (Z_green == np.array ([[20, 22, 24], [40, 42, 44]])).all ()

## Slices are views

To help save memory, when you slice a NumPy array, you are actually creating a _view_ into that array. That means modifications through the view will modify the original array.

In [None]:
print ("==> Recall C: %s" % str (C.shape))
print (C)

In [None]:
C_view = C[1, 0::2, 1::2] # Question: What does this produce?
print ("==> C_view: %s" % str (C_view.shape))
print (C_view)

In [None]:
C_view[:, :] = -C_view[::-1, ::-1] # Question: What does this do?
print (C_view)

In [None]:
print (C)

You can force a copy using the `.copy()` method:

In [None]:
C_copy = C[1, 0::2, 1::2].copy ()
C_copy[:, :] = -C_copy[::-1, ::-1]

print ("==> C_view:")
print (C_view)

print ("\n==> C_copy:")
print (C_copy)

And to check whether two Numpy array variables point to the same object, you can use the `numpy.may_share_memory()` function:

In [None]:
print ("C and C_view share memory: %s" % np.may_share_memory (C, C_view))
print ("C and C_copy share memory: %s" % np.may_share_memory (C, C_copy))

**Exercise 4** (3 points). Complete the prime number sieve algorithm, which is illustrated below.

<img src="prime-sieve.png" alt="Exercise: Extract these slices" width="480">

That is, given a positive integer $n$, the algorithm iterates from $i \in \{2, 3, 4, \ldots, \left\lfloor\sqrt{n}\right\rfloor\}$, repeatedly "crossing out" values that are strict multiples of $i$. "Crossing out" means maintaining an array of, say, booleans, and setting values that are multiples of $i$ to `False`.

In [None]:
from math import sqrt

def sieve (n):
    """
    Returns the prime number 'sieve' shown above.
    
    That is, this function returns an array `X[0:n+1]`
    such that `X[i]` is true if and only if `i` is prime.
    """
    is_prime = np.empty (n+1, dtype=bool) # the "sieve"

    # Initial values
    is_prime[0:2] = False # {0, 1} are _not_ considered prime
    is_prime[2:] = True # All other values might be prime

    # Implement the sieving loop
    # YOUR CODE HERE
    raise NotImplementedError()
    
    return is_prime

# Prints your primes
print ("==> Primes through 20:\n", np.nonzero (sieve (20))[0])

In [None]:
is_prime = sieve (20)
assert len (is_prime) == 21
assert (is_prime == np.array ([False, False, True, True, False, True, False, True, False, False, False, True, False, True, False, False, False, True, False, True, False])).all ()

## Indirect addressing

Two other common ways to index a Numpy array are to use a boolean mask or to use a set of integer indices.

In [None]:
np.random.seed(3)
x = np.random.randint(0, 20, 15) # 15 random ints in [0, 20)
print (x)

Before looking at how to use a boolean mask for indexing, let's create one.

**Exercise 5** (1 point). Given the input array, `x[:]`, above, create an array, `mask_mult_3[:]` such that `mask_mult_3[i]` is true only if `x[i]` is a positive multiple of 3.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
print ("x:", x)
print ("mask_mult_3:", mask_mult_3)
print ("==> x[mask_mult_3]:", x[mask_mult_3])

inv_mask_mult_3 = np.invert (mask_mult_3)
assert ((x[mask_mult_3] % 3) == np.zeros (sum (mask_mult_3))).all ()
assert (((x[inv_mask_mult_3] % 3) != np.zeros (sum (inv_mask_mult_3))) | (x[inv_mask_mult_3] == 0)).all ()

In [None]:
# Pull out an arbitrary subset of elements
inds = np.array ([3, 7, 8, 12])
print (x[inds])