# Linear Algebra in NumPy

In class, you have been learning about vectors and matrices. Last lab, we refered to 1-D and 2-D *arrays*. When we see these as linear algebra objects, rather than just lists or boxes of numbers, we will refer them as *vectors* and *matrices* respectively. Specifically, you have seen the following in class:

* The *dot product* (or *inner product*) between two vectors;
* Matrix-vector multiplication;
* Matrix-matrix multiplication.

As a particular instance of matrix-matrix multiplication, you may have defined the *outer product* of two vectors.

For many applications, the size of the matrices and vectors required makes hand calculations impractical. In the first part of this lab, we will see how to use NumPy to carry out these basic linear algebra operations.

> ## Make a copy of this notebook (File menu -> Make a Copy...)

## 1-D Arrays: Vectors, 2-D Arrays: Matrices!

Enter the same 1-D and 2-D arrays as we started lab with last time:
$$v=\begin{bmatrix}5\\ 3 \\ -2\end{bmatrix}
\mbox{, }
w=\begin{bmatrix}1\\ 5 \\ -1\end{bmatrix}
\mbox{, }
A=\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{bmatrix}
\mbox{, and }
 B=\begin{bmatrix}
3 & 1 & 1 \\
2 & 2 & 4 \\
5 & 7 & 1
\end{bmatrix}$$

From here on, we will refer to 1-D arrays as *vectors* and to 2-D arrays as *matrices*. ```

**Recall:** NumPy indexing starts from 0, not from 1. Therefore, the first entry in a matrix $A$ is `A[0,0]`. The second row of $A$ is `A[1]`. The fourth column of $A$ is `A[:,3]`.

In class, it is important to understand vectors as *column vectors*. NumPy doesn't distinguish between column and row vectors. It is possible to force it to, but we will rarely need to do so. This approach has some disadvantages, but it make our lives easier in many ways.

In [2]:
import numpy as np
v = np.array([[5],[3],[-2]])
w = np.array([[1],[5],[-1]])
A = np.arange(1,10).reshape((3,3))
B = np.array([[3,1,1],[2,2,4],[5,7,1]])                          
print(v)
print(w)
print(A)
print(B)

[[ 5]
 [ 3]
 [-2]]
[[ 1]
 [ 5]
 [-1]]
[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[3 1 1]
 [2 2 4]
 [5 7 1]]


## Basic Vector and Matrix Operations and Properties

### Vectors

**Question 1** carry out the following commands and describe in words what each of them does:
```python
print(v+w)
print(w-v)
print(v*w)
print(v@w)
```

In [3]:
print(v)
print(w)
print()
print(v+w) #this adds values of the vectors pointwise
print(w-v) #this subtracts values of the vectors pointwise
print(v*w) #multiplies values of the vectors pointwise
print(v@w)# dot product

[[ 5]
 [ 3]
 [-2]]
[[ 1]
 [ 5]
 [-1]]

[[ 6]
 [ 8]
 [-3]]
[[-4]
 [ 2]
 [ 1]]
[[ 5]
 [15]
 [ 2]]


ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 1)

### Important Note! 

>When *v* and *w* are vectors, the command `v@w` carries out a dot product. The operation `v*w` is not a linear algebra operation! There is no notion of pointwise vector-vector multiplication in linear algebra! 

**Question 2** Try out the following commands with the matrices and vectors you entered above, then answer the questions below:

```python
print(A@v)
print(5*A[:,0]+3*A[:,1]-2*A[:,2])
print(np.array([A[0]@v,A[1]@v,A[2]@v]))
```

In [4]:
print(A)
print(v)

print(A@v) #A@v does the dot product of A and v when the dimensions line up.
print(5*A[:,0]+3*A[:,1]-2*A[:,2])
print(np.array([A[0]@v,A[1]@v,A[2]@v]))

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[ 5]
 [ 3]
 [-2]]
[[ 5]
 [23]
 [41]]
[ 5 23 41]
[[ 5]
 [23]
 [41]]


1. Describe what each of the commands does. Specifically, what does `A@v` do when *A* is a matrix and *v* is a vector? This is an example of *overloading*: the same operator (`@`) carries out a different operation depending on its inputs.<br><br>

1.  1. Explain why the first two commands give the same result. This is important! The *column picture* of matrix-vector multiplication is critical for the class and lab!<br><br>
  1. Explain why the third command also gives the same result. (Hint: What kind of object is, say ``A[0]``?)

### Matrices
**Question 3** Try out the following commands with the matrices you entered above, then answer the questions below. Be sure to examine the commands carefully rather than just copying and pasting.

```python
print(A@B)

print(np.array([A@B[:,0], A@B[:,1], A@B[:,2]]).T)

c1 = B[0,0]*A[:,0] + B[1,0]*A[:,1] + B[2,0]*A[:,2]
c2 = B[0,1]*A[:,0] + B[1,1]*A[:,1] + B[2,1]*A[:,2]
c3 = B[0,2]*A[:,0] + B[1,2]*A[:,1] + B[2,2]*A[:,2]
print(np.array([ c1,c2,c3 ]).T)
```

In [5]:
print(A)
print(B)
print()
print(A@B)

print(np.array([A@B[:,0], A@B[:,1], A@B[:,2]]).T)

c1 = B[0,0]*A[:,0] + B[1,0]*A[:,1] + B[2,0]*A[:,2]
c2 = B[0,1]*A[:,0] + B[1,1]*A[:,1] + B[2,1]*A[:,2]
c3 = B[0,2]*A[:,0] + B[1,2]*A[:,1] + B[2,2]*A[:,2]
print(np.array([ c1,c2,c3 ]).T)#because numpy thinks these are all row vectors

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[3 1 1]
 [2 2 4]
 [5 7 1]]

[[22 26 12]
 [52 56 30]
 [82 86 48]]
[[22 26 12]
 [52 56 30]
 [82 86 48]]
[[22 26 12]
 [52 56 30]
 [82 86 48]]


1. Compute $AB$ by hand and check that you get the same result as above.<br><br>
1. When *A* and *B* are matrices, what does the command `A@B` do? This is yet more overloading of the `@` operator!<br><br>
1. 1. Fill in the blanks: One way to do matrix multiplication is to multiply $\underline{\hspace{0.5in}}$ by each column of $\underline{\hspace{0.5in}}$ individually.<br><br>
   1. Explain why the second command gives the same results as the first. Why do we need to transpose at the end? (That's a NumPy question, not a linear algebra one!)<br><br>
1.  1. Fill in the blanks: each column of $AB$ is a linear combination of columns of $\underline{\hspace{0.5in}}$. The coefficients of the columns from $\underline{\hspace{0.5in}}$ that form the first column of $AB$ are given by the the first column of $\underline{\hspace{0.5in}}$.<br><br>
  1. Explain why the third set of commands above gives the same result as the first. Again, why the transpose?<br><br>
  
**Question 4** 
1. Verify using NumPy that $AB\neq BA$. What is this property of matrices called?<br><br>
1. What is another way to write $(AB)^T$? Verify your answer using NumPy.<br><br>
1. Carry out the commands `A@v` and `v@A`. You should get two different results. Explain what's going on here carefully.<br><br>
1. Enter another $3\times 3$ matrix $C$ and verify that $(AB)C=A(BC)$. What is this property called?<br><br>

In [30]:
print(A@B)
print(B@A)
print((A@B).T)
print((B.T)@A.T)
print(A@v)
#print(v@A)
C = np.arange(3,12).reshape((3,3))
print(C)
print((A@B)@C)
print(A@(B@C))

[[22 26 12]
 [52 56 30]
 [82 86 48]]
[[14 19 24]
 [38 46 54]
 [40 53 66]]
[[22 52 82]
 [26 56 86]
 [12 30 48]]
[[22 52 82]
 [26 56 86]
 [12 30 48]]
[[ 5]
 [23]
 [41]]
[[ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[[ 330  390  450]
 [ 762  900 1038]
 [1194 1410 1626]]
[[ 330  390  450]
 [ 762  900 1038]
 [1194 1410 1626]]


### Inner and Outer Products

Since NumPy does not distinguish between row and column vectors, the matrix-multiplication view of dot products (or *inner products*): 

$$v\cdot w = v^Tw$$

isn't really applicable in Numpy. When we take the transpose of a vector, `v.T`, we just get the same vector back. When we want a dot product, we use `v@w`. Likewise, the *outer product* of two vectors, $vw^T$ cannot be computed using transposes in NumPy. Instead, we use `np.outer(v,w)` if we need an outer product.

**Question 5** Using the vectors *v* and *w* from previous questions, compute their inner and outer products, both by hand and using NumPy.

In [31]:
print(np.outer(v,w))

[[  5  25  -5]
 [  3  15  -3]
 [ -2 -10   2]]


### Reminder!
> **Most of the time, we will want floating point matrices. We can either enter them using at least one decimal point, or use the command *A = A.astype(float)*. The latter will convert a matrix into floating point. If your result at any point in any lab aren't what you expect, check that your matrices are floats!**

### Avoiding Expensive Calculations
Sometimes, we don't need an entire matrix resulting from some operation (like a product of two matrices), but rather, we only want one or two entries in that matrix. Suppose we have two matrices $A$ and $B$, and we only want the entry in the fourth column, second row. We could multiply the two matrices, then look for that entry, but this is very wasteful: we spend a lot of time computing a whole product, when all we want is one number.

**Question 6** 
1. Write down Python commands that take two matrices $A$ and $B$, multiply them to get a new matrix $C$, then print out the entry in the third column, second row. (Note: remember that NumPy indexes matrices from zero, not one!)<br><br>
1. Using your knowledge of how matrix multiplication works, write down a Python command that only computes the entry above, without computing the whole matrix product (Hint: think of dot products).<br><br>
3. Test out your commands below by on your matrices $A$ and $B$ from the beginning of the lab.


In [14]:
print(A@B)
print((A[1]@B[:,2]))

[[22 26 12]
 [52 56 30]
 [82 86 48]]
30


# Functions and Loops in Python

### Functions

We will most often want to write reusable routines that we will be able call with general inputs. These are called *functions* in Python. Many of you will have seen this in prior classes, but since 218L does not assume any previous coding experience, this section will introduce these notions.

The basic syntax for Python functions can be seen from the following example:
```python
def myadd(a,b):
    result = a+b
    return result
```

This function takes two inputs, *a* and *b*, and returns their sum. As long as the `+` operator is defined, the function doesn't care whether its inputs are scalars, matrices, etc. For example:

* If `a=5` and `b=8`, the code `myadd(a,b)` returns 13.
* If *a* is the $3\times 3$ identity matrix, and *b* is 8, `myadd(a,b)` returns a $3\times 3$ matrix with 9's along the diagonal, and 8's everywhere else.

**Question 7** If *A* is a $4\times 3$ matrix, which of the following types of object *b* will make `myadd(A,b)` execute correctly? Show examples for each type in the code box below, including an example of *b* that gives an error.

* *b* is a $4\times 3$ matrix;
* *b* is a vector of length 4;
* *b* is a vector of length 3;
* *b* is a scalar (a number).

In [32]:
def myadd(a,b):
    result = a+b
    return result



### Loops

Loops are often best avoided in NumPy. Array operations like the ones we saw in Lab 1 are generally cleaner and faster. Nonetheless, there are quite a few places where they cannot be avoided, especially when implementing algorithms like we'll be doing starting in the next lab.

We will often find it useful to loop over an array. Double *for* loops are ideal for this:

```python
def loopoverarray(A):
    rows,cols = A.shape   # Get the number of rows and columns in A
    
    for row in range(rows):
        for col in range(cols):
            A[row,col] += row * col # Do something to the current array entry
            
    return A
```

**Question 8** Explain why the following code is not good NumPy. What is the correct way to do this?
```python
rows,cols = A.shape
for row in range(rows):
    for col in range(cols):
        A[row,col] += 3
```

**Question 9** Explain what the following code does. Be careful! You will need the notion of negative indices from the first lab! Make up a matrix to test your hypothesis out by hand and check it by running the code on it in the code box below.
```python
rows,cols = A.shape
for row in range(rows):
    A[row-1] += A[row]
```

# Membership Matrices

Suppose we have a list of people and a list of organizations. We create a *membership* matrix: its rows are people, and its columns organizations. We put a $1$ at position $(i,j)$ if person $i$ belongs to organization $j$, and a $0$ there otherwise. In this section, we will explore the power of linear algebra on membership matrices.

## Working by Hand

To start with, we will only consider eight people in three organizations. Complete this section by hand on paper. We'll get back to the computer when we get back to larger matrices.

We have the following people: Alan, Bartholemew, Catherine, DeDreana, Eimid, Felipe, Galia, and Homer; and the following organzations: Bearcats, Cowdogs, and Skunkpossums.

Suppose also that:


* Alan is in Bearcats;
* Bartholemew is in Cowdogs and Skunkpossums;
* Catherine is in Cowdogs;
* DeDreana is in Bearcats and Skunkpossums;
* Eimid is in all three organizations;
* Filipe is in Bearcats and Cowdogs;
* Galia is in Skunkpossums;
* Homer is in Bearcats and Skunkpossums.

**Question 10** 
1. Write down a membership matrix for the above information.<br><br>      

1. The following questions are about the matrix $A^TA$:<br><br>
  
  1. What size is this matrix?<br><br>

  1. What do the rows and columns of $A^TA$ represent? Organizations? People? Something else?<br><br>

  1. Compute the matrix by hand.<br><br>

  1. Consider the diagonal of your matrix. What does the entry $(i,i)$ tell you? By thinking about how you computed that entry, explain your answer.<br><br>
  
  1. Suppose that $i\neq j$, and consider the entry $(i,j)$. By thinking about how you computed this entry, explain what it represents and why.<br><br>
  
  1. What can you say about entries $(i,j)$ and $(j,i)$? Explain your answer.<br><br>

1. Next, answer all the above questions about the matrix $AA^T$.<br><br>

1. Use NumPy to verify your answers.<br><br>

1. Consider the matrix $AA^T$.<br><br>

  1. What is the significance of the number of nonzero, nondiagonal entries in a specific row?<br><br>
  
  1. Can you write a line of code to compute this number for each of our eight people? (Hint: recall that a command like 'A>0' will give a matrix of True/False values, and note that True=1 and False=0.)

In [5]:
A = np.array([[1,0,0],[0,1,1],[0,1,0],[1,0,1],[1,1,1],[1,1,0],[0,0,1],[1,0,1]])
print(A)
print(A.T @ A)
#the size of this matrix is 3x3; rows and columns represent the different organizations. 
#C= no
#D = the ith value on the diagonal tells us the amount of people who are in that organization. In order to compute this entry, we essentially just tallied up the numbers for each column (representing each organization)
#E if i is not equal to j, then (i,j) represents the the amount of people who are in both the ith and the jth organization. This makes sense because ...
s = A@A.T
print(s)
#5A The significance of nonzero nondiagonal entries (i,j) represent the amount of organizations that the ith row person shares with the jth column person.
#5B Compute the number for each of the 8 people:
s = s>0
print(s)
print((s.sum(axis=0))-1)

[[1 0 0]
 [0 1 1]
 [0 1 0]
 [1 0 1]
 [1 1 1]
 [1 1 0]
 [0 0 1]
 [1 0 1]]
[[5 2 3]
 [2 4 2]
 [3 2 5]]
[[1 0 0 1 1 1 0 1]
 [0 2 1 1 2 1 1 1]
 [0 1 1 0 1 1 0 0]
 [1 1 0 2 2 1 1 2]
 [1 2 1 2 3 2 1 2]
 [1 1 1 1 2 2 0 1]
 [0 1 0 1 1 0 1 1]
 [1 1 0 2 2 1 1 2]]
[[ True False False  True  True  True False  True]
 [False  True  True  True  True  True  True  True]
 [False  True  True False  True  True False False]
 [ True  True False  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True False  True]
 [False  True False  True  True False  True  True]
 [ True  True False  True  True  True  True  True]]
[4 6 3 6 7 6 4 6]


## Undermining the Revolution

In the 1770's, militias in Boston and elsewhere were planning a revolution. The British did not understand who the central actors were. If they had, they could have quite easily have disrupted the network. What was known was the membership of many of the militias. In this section, we show that using modern linear algebra, it would have been rather straightforward to use the information gained from spying to identify key militia members. Had the British had the tools you just discovered in the previous section, the United States may have never existed!

We will consider seven militias: Saint Andrew's Lodge, The Loyal Nine, The North Caucus, The Long Room Club, The Tea Party, The Boston Committee, and The London Enemies. We have information about the membership of 254 militiamen.

* Open the file *bostonmilitias.csv* by clicking [here](./data/bostonmilitias.csv). You will see it is essentially a membership matrix for militias. Scroll through the file and see if you find familiar names. Remember that the British had no idea who all these people were!

**Question 11**
  1. Suppose you wanted to figure out which militias were closely tied to each other. How could you use matrix multiplication to do that?<br><br>
  1. What would you want to know in order to identify central actors? How could you use matrix multiplication to identify which of the 254 militia members are or are not important to the network?<br><br>
  1. Once we perform this analysis (demo in class): who are some the central actors key to the Revolution? Do you know these names?

In [24]:
# Open the file, read the lines, strip the \n from the right, 
# and split based on comma delimeters.
file = open('./bostonmilitias.csv')
lines = file.readlines()
lines = [line.rstrip() for line in lines]
lines = [line.split(',') for line in lines]

# Grab the list of militias.
militias = lines[0][1:]
# print(militias)

# Grab the list of names.
names = [line[0].split('.') for line in lines[1:]]
names = [name[1]+' '+name[0] for name in names]
# print(names)

# Extract the membership matrix.
A = np.array([line[1:] for line in lines[1:]],dtype='float')
# print(A)

# Print membership numbers for each militia.
print(militias)
# print(A.T@A)
print(A.sum(axis=0),'\n')

# For each person, compute the number of connected people.
B = A@A.T
print(B)
C = (B>0).sum(axis=1)-1
# print(C)
count = sorted(zip(C,names),reverse=True)
for c in count:
    print(c)


['StAndrewsLodge', 'LoyalNine', 'NorthCaucus', 'LongRoomClub', 'TeaParty', 'BostonCommittee', 'LondonEnemies']
[53. 10. 59. 17. 97. 21. 62.] 

[[2. 2. 1. ... 1. 0. 1.]
 [2. 4. 1. ... 2. 0. 2.]
 [1. 1. 1. ... 1. 0. 1.]
 ...
 [1. 2. 1. ... 2. 0. 1.]
 [0. 0. 0. ... 0. 1. 1.]
 [1. 2. 1. ... 1. 1. 3.]]
(245, 'Paul Revere')
(193, 'Thomas Chase')
(193, 'Nathaniel Barber')
(193, 'Henry Bass')
(191, 'Thomas Urann')
(187, 'Moses Grant')
(187, 'JamesFoster Condy')
(187, 'Edward Proctor')
(168, 'Joseph Warren')
(154, 'William Molineux')
(154, 'Thomas Young')
(150, 'Joseph Eayres')
(146, 'Samuel Peck')
(142, 'James Swan')
(142, 'Elisha Story')
(142, 'Adam Collson')
(119, 'Samuel Adams')
(119, 'Benjamin Church')
(111, 'Samuel Cooper')
(111, 'Samuel Barrett')
(110, 'Joseph Greenleaf')
(104, 'John Winthrop')
(104, 'John Pulling')
(104, 'Ezekiel Cheever')
(104, 'Elias Parkman')
(104, 'Abiel Ruddock')
(96, 'William Russell')
(96, 'William Pierce')
(96, 'William Hendley')
(96, 'William Etheridge')
(96, '