Instructions

**When taking this exam, you agree to follow the Duke Honor Code.**

- This is a closed book exam. You can use the `help()` function, and the `?` prefix or suffix but are restricted to a SINGLE browser tab.
- All necessary imports are provided. You should not need to import any other packages.
- Answer all 5 questions.

In [53]:
import numpy as np
import scipy.linalg as la
import math
from collections import Counter
import pandas as pd

**1**. (20 points)

- Find the matrix $A$ that results in rotating the standard vectors in $\mathbb{R}^2$ by 30 degrees counter-clockwise and stretches $e_1$ by a factor of 3 and contracts $e_2$ by a factor of $0.5$. 
- What is the inverse of this matrix? How you find the inverse should reflect your understanding.

The effects of the matrix $A$ and $A^{-1}$ are shown in the figure below:

![image](vecs.png)

In [10]:
import math

In [12]:
Radians90 = math.radians(90)
sin90 = math.sin(Radians90)
sin90

1.0

In [9]:
np.sin(90)

0.8939966636005579

In [5]:
np.cos(30)

0.15425144988758405

In [18]:
# Correctly define 30 degrees in terms of radians
Radians30 = math.radians(30)

# Define initial basis and A
I2 = np.array([[1, 0], [0, 1]])
A = np.array([[3 * math.cos(Radians30), -0.5 * math.sin(Radians30)], 
              [3 * math.sin(Radians30), 0.5 * math.cos(Radians30)]])

# Display matrix product, which gives AI2 shown in image above
A @ I2

array([[ 2.59807621, -0.25      ],
       [ 1.5       ,  0.4330127 ]])

*Anything else I should be doing to find this inverse?*

In [25]:
A_inv = 1 / 1.5 * np.array([[0.5 * math.cos(Radians30), 0.5 * math.sin(Radians30)],
                       [-3 * math.sin(Radians30), 3 * math.cos(Radians30)]])
A_inv

array([[ 0.28867513,  0.16666667],
       [-1.        ,  1.73205081]])

In [23]:
np.linalg.inv(A)

array([[ 0.28867513,  0.16666667],
       [-1.        ,  1.73205081]])

*Unclear what I'm supposed to do special to get inverse*

In [20]:
np.linalg.inv(A)

array([[ 0.28867513,  0.16666667],
       [-1.        ,  1.73205081]])

**2**. (20 points)

- Given the DNA sequence below, create a $4 \times 4$ transition matrix $A$ where $A[i,j]$ is the probability of the base $j$ appearing immediately after base $i$. Note that a *base* is one of the four letters `a`, `c`, `t` or `g`. The letters below should be treated as a single sequence, broken into separate lines just for formatting purposes. You should check that row probabilities sum to 1. 
- Find the steady state distribution of the 4 bases from the row stochastic transition matrix - that is the, the values of $x$ for which $x^TA = x$ (You can solve this as a set of linear equations). Hint: you need to add a constraint on the values of $x$. 

```
gggttgtatgtcacttgagcctgtgcggacgagtgacacttgggacgtgaacagcggcggccgatacgttctctaagatc
ctctcccatgggcctggtctgtatggctttcttgttgtgggggcggagaggcagcgagtgggtgtacattaagcatggcc
accaccatgtggagcgtggcgtggtcgcggagttggcagggtttttgggggtggggagccggttcaggtattccctccgc
gtttctgtcgggtaggggggcttctcgtaagggattgctgcggccgggttctctgggccgtgatgactgcaggtgccatg
gaggcggtttggggggcccccggaagtctagcgggatcgggcttcgtttgtggaggagggggcgagtgcggaggtgttct
```

In [87]:
# Define sequence
seq = """gggttgtatgtcacttgagcctgtgcggacgagtgacacttgggacgtgaacagcggcggccgatacgttctctaagatc \
ctctcccatgggcctggtctgtatggctttcttgttgtgggggcggagaggcagcgagtgggtgtacattaagcatggcc \
accaccatgtggagcgtggcgtggtcgcggagttggcagggtttttgggggtggggagccggttcaggtattccctccgc \
gtttctgtcgggtaggggggcttctcgtaagggattgctgcggccgggttctctgggccgtgatgactgcaggtgccatg \
gaggcggtttggggggcccccggaagtctagcgggatcgggcttcgtttgtggaggagggggcgagtgcggaggtgttct"""

# Remove spaces
seq = seq.translate(str.maketrans('', '', ' '))

# Create zipped pairs
pairs = list(zip(seq, seq[1:]))

# Store results in dataframe
df = pd.DataFrame(data = np.zeros(shape = (4, 4)), index = ['a', 'c', 't', 'g'], columns = ['a', 'c', 't', 'g'])
for i in range(len(Counter(pairs))):
    df.loc[Counter(pairs).most_common(16)[i][0][0], Counter(pairs).most_common(16)[i][0][1]] = \
        Counter(pairs).most_common(16)[i][1]

# Normalize each row to sum to 1
df_norm = df / np.array(df.sum(axis = 1)).reshape(-1, 1)

In [108]:
df_norm

Unnamed: 0,a,c,t,g
a,0.09434,0.207547,0.245283,0.45283
c,0.166667,0.238095,0.261905,0.333333
t,0.102041,0.22449,0.27551,0.397959
g,0.146341,0.189024,0.22561,0.439024


In [106]:
# Find steady state solution
df_solve = df_norm.T - np.eye(4)
df_solve[-1:] = 1
b = np.array([0, 0, 0, df.sum().sum()])
res = np.linalg.solve(df_solve, b)
res

array([ 52.9563961 ,  84.03842607,  99.05308632, 162.95209151])

In [107]:
# Confirm that we have found steady state solution
df_norm.T @ res

a     52.956396
c     84.038426
t     99.053086
g    162.952092
dtype: float64

**Ask Bo: Do we need to know gradient descent with RMSProp?**

**3**. (20 points) 

We observe some data points $(x_i, y_i)$, and believe that an appropriate model for the data is that

$$
f(x) = ax^2 + bx^3 + c\sin{x}
$$

with some added noise. Find optimal values of the parameters $\beta = (a, b, c)$ that minimize $\Vert y - f(x) \Vert^2$

using gradient descent with RMSProp (no bias correction) and starting with an initial value of $\beta = \begin{bmatrix}1 & 1 & 1\end{bmatrix}$. Use a learning rate of 0.01 and 10,000 iterations. This should take a few seconds to complete. (25 points)

Plot the data and fitted curve using `matplotlib`.

Data
```
x = array([ 3.4027718 ,  4.29209002,  5.88176277,  6.3465969 ,  7.21397852,
        8.26972154, 10.27244608, 10.44703778, 10.79203455, 14.71146298])
y = array([ 25.54026428,  29.4558919 ,  58.50315846,  70.24957254,
        90.55155435, 100.56372833,  91.83189927,  90.41536733,
        90.43103028,  23.0719842 ])
```

**4**. (20 points)

Consider the following system of equations:

$$\begin{align*}
2x_1& - x_2&    +x_1    &=& 6\\
-x_1& +2x_2& -  x_3 &=& 2\\
 x_1   &  -x_2& + x_3 &=& 1
\end{align*}$$

1. Write the system in matrix form $Ax=b$ and define these in numpy or scipy.
2. Show that $A$ is positive-definite
3. Use the appropriate matrix decomposition function in numpy and back-substitution to solve the system. Remember to use the structure of the problem to determine the appropriate decomposition.

**5**. (20 points)

Let

$A = \left(\begin{matrix}2 & -1 &1\\-1& 2& -1 \\1&-1& 1
\end{matrix}\right) \;\;\;\;\;\;\textrm{ and }\;\;\;\;\;\; v = \left(\begin{matrix}1 \\ 1 \\2\end{matrix}\right)$

Find $w$ such that $w$ is conjugate to $v$ under $A$. You may use *basic* linear algebra in scipy or numpy - i.e. matrix products.