# DX 601 Week 11 Homework

## Introduction

In this homework, you will practice working with systems of linear equations and review previous weeks' material.

## Example Code

You may find it helpful to refer to this GitHub repository of Jupyter notebooks for example code.

* https://github.com/bu-cds-omds/dx500-examples
* https://github.com/bu-cds-omds/dx601-examples
* https://github.com/bu-cds-omds/dx602-examples

Any calculations demonstrated in code examples or videos may be found in these notebooks, and you are allowed to copy this example code in your homework answers.

## Shared Imports

Do not install or use any additional modules.
Installing additional modules may result in an autograder failure resulting in zero points for some or all problems.

In [1]:
import math
import sys

In [53]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats
import sklearn.linear_model

In [24]:
# Here are some imports which will be used in the code in the rest of the lab  

# Imports used for the code in CS 237

import numpy as np                # arrays and functions which operate on arrays

import matplotlib.pyplot as plt   # normal plotting

import warnings


# NOTE: You may not use any other libraries than those listed here without explicit permission.
# given an augmented matrix A, do G. Elimination and print out steps
# At the end, report whether is no, 1, or multiple solutions

# number of digits of precision to print out
prec = 4

########################################################################################################

# Just calculates the REF and reports if no solution
# Appropriate for problems Ax=b, create augmented matrix
# A:b and input it as parameter

def GaussianElimination(Ab,traceRowOps=False):
    
    print("\nRunning Gaussian Elimination on:\n")
    print(np.round(Ab, decimals=4))
    
    B = forwardElimination(Ab,traceRowOps)
    print("\nEchelon Form:\n")
    B1 = np.round(B, decimals=4) + np.zeros(B.shape)                  # To prevent printing -0.0 after rounding
    print(np.round(B1, decimals=4),"\n")
    
    if (noSolution(B)):
        print("No solution!")

########################################################################################################

# Same, but continues on to do back substitution
def GaussJordan(Ab,traceRowOps=False):            
    print("\nRunning Gauss-Jordan on:\n")
    print(np.round(Ab, decimals=4))
    
    B = forwardElimination(Ab,traceRowOps)
    print("\nEchelon Form:\n")
    B1 = np.round(B, decimals=4) + np.zeros(B.shape) 
    print(B1,"\n")
    
    if (noSolution(B)):
        print("No solution!")
    else:
        C = backwardSubstitute(B,traceRowOps)
        print("Reduced Echelon Form:\n")
        C1 = np.round(C, decimals=4) + np.zeros(C.shape)                      # adding -0.0 + 0 gives 0.0
        print(C1,"\n")

########################################################################################################

# Performs GaussJordan on array which is NOT augmented;
# Used in dependency tests, determining rank, etc. 

def reduce2RREF(A,traceRowOps1=False):
    
    print("\nReducing to RREF:\n")
    print(np.round(A, decimals=4))
    
    B = forwardElimination(A,traceRowOps1)
    print("\nEchelon Form:\n")
    B1 = np.round(B, decimals=4) + np.zeros(B.shape) 
    print(B1,"\n")
    

    C = backwardSubstitute(B,augmented=False, traceRowOps=traceRowOps1)
    print("Reduced Echelon Form:\n")
    C1 = np.round(C, decimals=4) + np.zeros(C.shape) 
    print(C1,"\n")
        

########################################################################################################

def forwardElimination(A,traceRowOps=False):
    
    A = (np.copy(A)).astype(float)
    
    if (traceRowOps):
        print("\nRunning Forward Elimination on:\n")
        print(np.round(A, decimals=4),"\n")
        print()
    
    if (traceRowOps):
        print("Creating Echelon Form...\n")
    
    (numRows,numCols) = A.shape
    
    # r = row we are currently working on         pivot value is A[r][c]
    r = 0            
    for c in range(numCols):     # solve for variable in column c 
        # find row in column c with first non-zero element, and exchange with row r                  
        r1 = r
        while(r1 < numRows):
            if (not np.isclose(A[r1][c],0.0)):   # A[r1][c] is first non-zero element at or below r in column c
                break
            r1 += 1
        
        if(r1 == numRows):    # all zeros below r in this column
          #  print('continue')
            continue          # go on to the next column, but still working on same row   
         
        if(r1 != r):
            exchangeRows(A, r1, r) 
            if (traceRowOps): 
                print("Exchange R" + str(r1+1) + " and R" + str(r+1) + "\n") 
                print(np.round(A, decimals=4))                
                print() 

        # now use pivot A[r][c] to eliminate all vars in this column below row r
        for r2 in range(r+1,numRows):
            eliminateVariable(A,r,c,r2,traceRowOps)
            
        r += 1  
        if (r >= numRows):
            break
            
    return A

# for pivot A[r][c], eliminate variable in location A[r2][c] in row r2 using row operations
def eliminateVariable(A,r,c,r2,traceRowOps=False):

    if(not np.isclose(A[r2][c],0.0)):

        factor = -A[r2][c]/A[r][c] 

        addRowMult( A, r2, r, factor )
            
        if(traceRowOps):
            print("R" + str(r2+1) + " += " + str(np.around(factor,prec)) + "*R" + str(r+1) + "\n")  
            print(np.round(A, decimals=4))
            print()


def backwardSubstitute(A,augmented=True,traceRowOps=False): 
    
    numRows,numCols = A.shape
    
    if (A.dtype != 'float64'):
        A = A.astype(float)

    # now back-substitute the variables from bottom row to top
    if (traceRowOps):
        print("Back Substituting....\n") 

    for r in range(numRows):

        # find variable in this row
        for c in range(numCols):
            if(not np.isclose(A[r][c],0.0)):
                break 
       
        #print(numCols, r, c)
        if (augmented and c >= numCols-1):        # inconsistent or redundant row
                continue
        elif (c >= numCols):                      # inconsistent or redundant row
                continue  
            
            
        # A[r][c] is variable to eliminate
        
        factor = A[r][c]
        
        if (np.isclose(factor,0.0)):
            continue
        
        if(not np.isclose(factor,1.0)):  
            multRowByScalar(A, r, 1/factor)
            if (traceRowOps):
                print("R" + str(r+1) + " = R" + str(r+1) + "/" + str(np.around(factor,prec)) + "\n")  
                print(np.round(A, decimals=4))
                print()

        for r2 in range(r): 
            eliminateVariable(A,r,c,r2,traceRowOps)
        


    return A 


def exchangeRows(A, r1, r2):
    tmp = A[r1].copy()
    A[r1] = A[r2]
    A[r2] = tmp

def multRowByScalar(A, r, s): 
    A[r] *= s

def addRowMult(A, r1, r2, s): 
    A[r1] += s * A[r2]
    
# try to find row of all zeros except for last column, in augmented matrix. 

def noSolution(A):
    numRows,numCols = A.shape
    for r in range(numRows-1,-1,-1):         # start from bottom, since inconsistent rows end up there
        for c in range(numCols):
            if(not np.isclose(A[r][c],0.0)):  # found first non-0 in this row
                if(c == numCols-1):
                    return True
                else:
                    break
    return False 

# determines if matrix is in form of augmented identity matrix

def uniqueSolution(A):
    # test shape first
    if(numRows != A[0].size - 1):
        return False 
    # then test if diagonal is all 1's and rest is all 0's
    for r in range(numRows):
        for c in range(numRows):
            if(r == c and not np.isclose(A[r][c],1.0)):
                return False
            elif(r != c and not np.isclose(A[r][c],0.0)):
                return False
    return True   


## Shared Data

### Vineyard Data

This data set attempts to predict yields for a small vineyard in Lake Erie in 1991 based on the yields in the previous years.
Each row of the data set represents the yields of a row of the vineyard.
See https://github.com/EpistasisLab/pmlb/blob/master/datasets/192_vineyard/metadata.yaml for more information.

In [3]:
vineyard = pd.read_csv("https://github.com/EpistasisLab/pmlb/raw/refs/heads/master/datasets/192_vineyard/192_vineyard.tsv.gz", sep="\t")
vineyard.head()

Unnamed: 0,lugs_1989,lugs_1990,target
0,1.0,5.0,9.5
1,3.0,8.0,17.5
2,3.0,11.0,18.0
3,3.0,9.0,20.0
4,5.0,9.5,20.5


In [4]:
vineyard_inputs = vineyard[["lugs_1989", "lugs_1990"]]
vineyard_inputs.head()

Unnamed: 0,lugs_1989,lugs_1990
0,1.0,5.0
1,3.0,8.0
2,3.0,11.0
3,3.0,9.0
4,5.0,9.5


In [5]:
vineyard_target = vineyard["target"]

## Problems

### Problem 1

Set `p1` to the value of $x$ after solving the following system of linear equations.

\begin{array}{rcl}
3x & = & 4.2 \\
\end{array}


In [6]:
# YOUR CHANGES HERE

p1 = 4.2/3

In [7]:
p1

1.4000000000000001

### Problem 2

Set `p2` to be a tuple of `(x, y)` where $x$ and $y$ are the solution to the following system of linear equations.

\begin{array}{rcl}
3x + 2y & = & 8.6 \\
2x + 5y & = & 13.8 \\
\end{array}


Hint: Just do this by hand.

In [13]:
# YOUR CHANGES HERE
p2 = (1.4,2.2)

In [14]:
p2

(1.4, 2.2)

### Problem 3

Set `p3` to be the x intercept of the following equation.

\begin{array}{rcl}
4x + 2y + 3z & = & 12 \\
\end{array}

In [15]:
# YOUR CHANGES HERE

p3 = 3

In [16]:
p3

3

### Problem 4

Set `p4` to be the sum of the 5 axis intercepts of the following equation.

\begin{array}{rcl}
9a + 4b + 27c + 6d + 3e & = & 36 \\
\end{array}

In [18]:
# YOUR CHANGES HERE

p4 = 4+9+36/27+6+12

In [19]:
p4

32.333333333333336

### Problem 5

Set `p5` to the augmented matrix of the following system of linear equations.

\begin{array}{rcl}
3x + 2y + 13z = 10 \\
7x + 2y - 13z = 23 \\
\end{array}

In [20]:
# YOUR CHANGES HERE
A = np.array([[3,2,13],[7,2,-13]])
b = np.array([[10,23]]).T
p5 = np.hstack([A,b])

In [21]:
p5

array([[  3,   2,  13,  10],
       [  7,   2, -13,  23]])

### Problem 6

Set `p6` to the rank of the following system of linear equations?

\begin{array}{rcl}
3x + 2y + 0z = 3 \\
2x + 3y + 1z = 5 \\
5x + 5y + 5z = 20 \\
\end{array}

In [None]:
# YOUR CHANGES HERE

p6 = 3

In [None]:
p6

### Problem 7

Consider the following system of linear equations.

\begin{array}{rcl}
3x + 2y + 0z = 3 \\
2x + 3y + 1z = 5 \\
5x + 5y + 5z = 20 \\
\end{array}

This system could be rewritten as 
\begin{array}{rcl}
\mathbf{A}
\begin{bmatrix}
x \\ y \\ z \\
\end{bmatrix}
& = &
\begin{bmatrix}
3 \\ 5 \\ 20 \\
\end{bmatrix}
\end{array}

Set `p7` to $\mathbf{A}$.

In [22]:
# YOUR CHANGES HERE

p7 = np.array([[3,2,0],[2,3,1],[5,5,5]])

In [23]:
p7

array([[3, 2, 0],
       [2, 3, 1],
       [5, 5, 5]])

### Problem 8

Set `p8` to the number of free variables in the following system of linear equations.

\begin{array}{rcl}
x + 3y + 4z = 3 \\
0x + 0y + 1z = 2 \\
x + 3y + 5z = 5 \\
\end{array}

In [None]:
# YOUR CHANGES HERE

p8 = 1

In [None]:
p8

### Problem 9

Set `p9` to any solution `(x, y, z)` to the following system of linear equations.

\begin{array}{rcl}
2x + 4y + 0z = 16 \\
1x + 3y + 1z = 16 \\
3x + 0y + 0z = 6 \\
\end{array}

In [None]:
# YOUR CHANGES HERE
A = np.array([[2,4,0],[1,3,1],[3,0,0]])
b = np.array([[16,16,6]]).T
Ab = np.hstack([A,b])

GaussJordan(Ab)
p9 = (2,3,5)


Running Gauss-Jordan on:

[[ 2  4  0 16]
 [ 1  3  1 16]
 [ 3  0  0  6]]

Echelon Form:

[[ 2.  4.  0. 16.]
 [ 0.  1.  1.  8.]
 [ 0.  0.  6. 30.]] 

Reduced Echelon Form:

[[1. 0. 0. 2.]
 [0. 1. 0. 3.]
 [0. 0. 1. 5.]] 



In [None]:
p9

### Problem 10

Set `p10` to any solution `(x, y, z)` to the following system of linear equations.

\begin{array}{rcl}
x + 3y + 0z = 3 \\
0x + 0y + 1z = 2 \\
\end{array}

Hint: these equations are in reduced row echelon form, so there are shortcuts to picking solutions.

In [None]:
# YOUR CHANGES HERE

p10 = (3,0,2)

In [None]:
p10

### Problem 11

Set `p11` to be a tuple or list of the average yields in the vineyard data set for 1989, 1990, and 1991 in that order.

In [34]:
# YOUR CHANGES HERE
p11 = (
    vineyard["lugs_1989"].mean(),
    vineyard["lugs_1990"].mean(),
    vineyard["target"].mean()
)


In [35]:
p11

(3.2788461538461537, 9.653846153846153, 18.08653846153846)

### Problem 12

Set `p12` to the 95th percentile of the data in `q12`.

In [37]:
# DO NOT CHANGE

q12 = np.array([3.44857705, 2.09151799, 4.98803337, 3.8649001 , 1.20265499,
       3.89903439, 3.05276698, 0.92826333, 3.20371215, 1.81124845,
       3.53150155, 2.32418747, 1.81826697, 3.50670706, 1.37181554,
       2.95770001, 3.80008758, 2.65923837, 2.83248683, 2.91306525,
       2.18314379, 2.17931002, 2.9086665 , 3.26098354, 3.24755896,
       1.01129371, 4.56540725, 3.05517241, 2.32079938, 3.39392893,
       3.3886077 , 3.38112083, 3.88523072, 3.13214221, 3.73298754,
       4.11129171, 2.74133096, 2.4825709 , 3.21885293, 4.08327916,
       2.82768517, 2.1188981 , 3.45886466, 4.20440619, 2.25038228,
       1.59150786, 2.24486543, 3.49914959, 3.72254599, 1.84068517])

In [38]:
# YOUR ANSWER HERE

p12 = np.percentile(q12, 95)

In [39]:
p12

4.162504674

### Problem 13

Set `p13` to the average $L_1$ loss using the average of 1989 and 1990 vineyard yields per row to predict 1991 yields per row.

In [49]:
# YOUR CHANGES HERE
pred = (vineyard["lugs_1989"] + vineyard["lugs_1990"])/2
p13 = (pred - vineyard["target"]).abs().mean()

In [48]:
p13

11.620192307692308

### Problem 14

Build a linear regression trained with `vineyard_inputs` as its input and `vineyard_target` as its target output. Set `p14` as the output of that regression with `vineyard_inputs` as its input.

In [56]:
# YOUR CHANGES HERE
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(vineyard_inputs, vineyard_target)

p14 = model.predict(vineyard_inputs)

In [57]:
p14

array([12.46642263, 16.68370318, 18.66216936, 17.34319191, 19.91175065,
       18.56238423, 19.32165808, 19.68179142, 21.2307281 , 20.24149501,
       22.02039093, 20.24149501, 19.45183218, 21.3609022 , 20.90098374,
       22.58009452, 20.11132091, 22.67987965, 19.65140244, 23.33936837,
       20.90098374, 24.25920531, 20.34128014, 21.100554  , 19.78157655,
       23.89907197, 20.44106527, 22.87944991, 18.00268063, 22.77966478,
       16.68370318, 15.23455163, 15.56429599, 15.56429599, 14.90480727,
       18.00268063, 14.90480727, 16.88327344, 15.56429599, 16.88327344,
       16.22378472, 16.32356985, 15.0045924 , 16.98305857, 15.76386625,
       13.78540008, 12.46642263, 13.78540008, 13.88518521, 12.56620776,
       12.99573725,  9.69829362])

### Problem 15

Given the following data, set `p15` to the weighted variance of 

| Color | Shape | Score | Probability |
|---|---|---|---:|
| red | square | 3 | 0.250 |
| blue | circle | 4 | 0.125 |
| purple | line | 2 | 0.125 |
| purple | diamond | 5 | 0.25 |
| blue | triangle | 3 | 0.25 |

In [58]:
# YOUR CHANGES HERE

p15 = 1

In [None]:
p15

### Problem 16

Set `p16` to be the correlation between the 1989 and 1990 yields in the vineyard data set.

In [60]:
# YOUR CHANGES HERE

p16 = vineyard["lugs_1989"].corr(vineyard["lugs_1990"])

In [61]:
p16

0.7224792530209385

### Problem 17

Compute the sample mean and variance of the 1990 vineyard yields.
Assuming that the yields follow a normal distribution with your computed parameters, what would the one-sided p-value of a yield of 13 lugs be?

Hint: use the [SciPy stats module](https://docs.scipy.org/doc/scipy/reference/stats.html) to calculate the p-values from the distribution.

In [75]:
# YOUR CHANGES HERE
from scipy.stats import norm
mu = vineyard["lugs_1990"].mean()
sigma = vineyard["lugs_1990"].std()


p17 = 1 - norm.cdf(13, loc=mu, scale=sigma)

In [76]:
p17

0.07618255849601963

### Problem 18

Set `p18` to be the $2 \times 3$ matrix full of question marks below, filled in with the following information.
1. Each serving of noodles requires 1/2 cup of flour.
2. Each serving of noodles requires 1/8 cup of water.
3. Noodles do not need sugar.
4. Each serving of cake requires 1/4 cup of flour.
5. Each serving of cake requires 1/4 cup of sugar.
6. Cake does not need water.

\begin{array}{rcl}
\begin{bmatrix}
\text{servings of noodles} & \text{pieces of cake} \\
\end{bmatrix}
\begin{bmatrix}
\text{??} & \text{??} & \text{??} \\
\text{??} & \text{??} & \text{??} \\
\end{bmatrix}
& = &
\begin{bmatrix}
\text{flour needed} & \text{sugar needed} & \text{water needed} \\
\end{bmatrix}
\end{array}

In [62]:
# YOUR CHANGES HERE

p18 = np.array([[1/2,1/8,0],[1/4,1/4,0]])

In [None]:
p18

### Problem 19

Set `p19` to be the cosine similarity of the vectors `x19` and `y19`.


In [64]:
# DO NOT CHANGE

x19 = [0.4, 0.2, -0.5]
x19

[0.4, 0.2, -0.5]

In [65]:
# DO NOT CHANGE

y19 = [-0.3, -0.2, 0.4]
y19

[-0.3, -0.2, 0.4]

In [66]:
# YOUR CHANGES HERE

p19 = np.dot(x19,y19)/np.linalg.norm(x19)*np.linalg.norm(y19)

In [67]:
p19

-0.28899826989101507

### Problem 20

Set `p20` to the reduced row echelon form of `q20`.


In [69]:
# DO NOT CHANGE

q20 = np.array([[2., 5., -3., 2.0],
                [-2, 1, 3, -2],
                [ 4.,  1.,  0., 16.]])

In [72]:
# YOUR CHANGES HERE

p20 = GaussJordan(q20)
p20 = np.array([
    [1,0,0,4],
    [0,1,0,0],
    [0,0,1,2]
])


Running Gauss-Jordan on:

[[ 2.  5. -3.  2.]
 [-2.  1.  3. -2.]
 [ 4.  1.  0. 16.]]

Echelon Form:

[[ 2.  5. -3.  2.]
 [ 0.  6.  0.  0.]
 [ 0.  0.  6. 12.]] 

Reduced Echelon Form:

[[1. 0. 0. 4.]
 [0. 1. 0. 0.]
 [0. 0. 1. 2.]] 



In [73]:
p20

array([[1, 0, 0, 4],
       [0, 1, 0, 0],
       [0, 0, 1, 2]])

### Generative AI Usage

If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the [generative AI policy](https://www.bu.edu/cds-faculty/culture-community/gaia-policy/).
If you did not use any generative AI tools, simply write NONE below.

YOUR CHANGES HERE