# DX 601 Week 11 Homework

## Introduction

In this homework, you will practice working with systems of linear equations and review previous weeks' material.

## Example Code

You may find it helpful to refer to this GitHub repository of Jupyter notebooks for example code.

* https://github.com/bu-cds-omds/dx500-examples
* https://github.com/bu-cds-omds/dx601-examples
* https://github.com/bu-cds-omds/dx602-examples

Any calculations demonstrated in code examples or videos may be found in these notebooks, and you are allowed to copy this example code in your homework answers.

## Shared Imports

Do not install or use any additional modules.
Installing additional modules may result in an autograder failure resulting in zero points for some or all problems.

In [1]:
import math
import sys

In [2]:
import matplotlib. pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats
import sklearn.linear_model

## Shared Data

### Vineyard Data

This data set attempts to predict yields for a small vineyard in Lake Erie in 1991 based on the yields in the previous years.
Each row of the data set represents the yields of a row of the vineyard.
See https://github.com/EpistasisLab/pmlb/blob/master/datasets/192_vineyard/metadata.yaml for more information.

In [3]:
vineyard = pd.read_csv("https://github.com/EpistasisLab/pmlb/raw/refs/heads/master/datasets/192_vineyard/192_vineyard.tsv.gz", sep="\t")
vineyard.head()

Unnamed: 0,lugs_1989,lugs_1990,target
0,1.0,5.0,9.5
1,3.0,8.0,17.5
2,3.0,11.0,18.0
3,3.0,9.0,20.0
4,5.0,9.5,20.5


In [4]:
vineyard_inputs = vineyard[["lugs_1989", "lugs_1990"]]
vineyard_inputs.head()

Unnamed: 0,lugs_1989,lugs_1990
0,1.0,5.0
1,3.0,8.0
2,3.0,11.0
3,3.0,9.0
4,5.0,9.5


In [5]:
vineyard_target = vineyard["target"]

## Problems

### Problem 1

Set `p1` to the value of $x$ after solving the following system of linear equations.

\begin{array}{rcl}
3x & = & 4.2 \\
\end{array}


In [6]:
# YOUR CHANGES HERE

p1 = 4.2 / 3

In [7]:
p1

1.4000000000000001

### Problem 2

Set `p2` to be a tuple of `(x, y)` where $x$ and $y$ are the solution to the following system of linear equations.

\begin{array}{rcl}
3x + 2y & = & 8.6 \\
2x + 5y & = & 13.8 \\
\end{array}


Hint: Just do this by hand.

In [10]:
# YOUR CHANGES HERE
A = np.array([[3, 2],
              [2, 5]])
b = np.array([8.6, 13.8])

x, y = np.linalg.solve(A, b)
p2 = (round(x, 1), round(y, 1))

In [11]:
p2

(1.4, 2.2)

### Problem 3

Set `p3` to be the x intercept of the following equation.

\begin{array}{rcl}
4x + 2y + 3z & = & 12 \\
\end{array}

In [16]:
# YOUR CHANGES HERE
p3 = 3

In [17]:
p3

3

### Problem 4

Set `p4` to be the sum of the 5 axis intercepts of the following equation.

\begin{array}{rcl}
9a + 4b + 27c + 6d + 3e & = & 36 \\
\end{array}

In [18]:
# YOUR CHANGES HERE

# Equation: 9a + 4b + 27c + 6d + 3e = 36

# Calculate each axis intercept
a_intercept = 36 / 9   # set b=c=d=e=0
b_intercept = 36 / 4   # set a=c=d=e=0
c_intercept = 36 / 27  # set a=b=d=e=0
d_intercept = 36 / 6   # set a=b=c=e=0
e_intercept = 36 / 3   # set a=b=c=d=0

# Sum of intercepts
p4 = a_intercept + b_intercept + c_intercept + d_intercept + e_intercept

In [19]:
p4

32.333333333333336

### Problem 5

Set `p5` to the augmented matrix of the following system of linear equations.

\begin{array}{rcl}
3x + 2y + 13z = 10 \\
7x + 2y - 13z = 23 \\
\end{array}

In [20]:
# YOUR CHANGES HERE

# Coefficients and constants
# Equations:
# 3x + 2y + 13z = 10
# 7x + 2y - 13z = 23

p5 = np.array([
    [3, 2, 13, 10],  # first row: coefficients of x, y, z and constant
    [7, 2, -13, 23]  # second row
])

In [21]:
p5

array([[  3,   2,  13,  10],
       [  7,   2, -13,  23]])

### Problem 6

Set `p6` to the rank of the following system of linear equations?

\begin{array}{rcl}
3x + 2y + 0z = 3 \\
2x + 3y + 1z = 5 \\
5x + 5y + 5z = 20 \\
\end{array}

In [22]:
# YOUR CHANGES HERE

# Coefficient matrix
A = np.array([
    [3, 2, 0],
    [2, 3, 1],
    [5, 5, 5]
])

# Compute the rank
p6 = np.linalg.matrix_rank(A)

In [23]:
p6

3

### Problem 7

Consider the following system of linear equations.

\begin{array}{rcl}
3x + 2y + 0z = 3 \\
2x + 3y + 1z = 5 \\
5x + 5y + 5z = 20 \\
\end{array}

This system could be rewritten as 
\begin{array}{rcl}
\mathbf{A}
\begin{bmatrix}
x \\ y \\ z \\
\end{bmatrix}
& = &
\begin{bmatrix}
3 \\ 5 \\ 20 \\
\end{bmatrix}
\end{array}

Set `p7` to $\mathbf{A}$.

In [24]:
# YOUR CHANGES HERE

# Coefficient matrix A
p7 = np.array([
    [3, 2, 0],
    [2, 3, 1],
    [5, 5, 5]
])

In [25]:
p7

array([[3, 2, 0],
       [2, 3, 1],
       [5, 5, 5]])

### Problem 8

Set `p8` to the number of free variables in the following system of linear equations.

\begin{array}{rcl}
x + 3y + 4z = 3 \\
0x + 0y + 1z = 2 \\
x + 3y + 5z = 5 \\
\end{array}

In [26]:
# YOUR CHANGES HERE

A = np.array([
    [1, 3, 4],
    [0, 0, 1],
    [1, 3, 5]
])

# Number of unknowns
num_unknowns = A.shape[1]

# Rank of the matrix
rank = np.linalg.matrix_rank(A)

# Number of free variables
p8 = num_unknowns - rank

In [27]:
p8

1

### Problem 9

Set `p9` to any solution `(x, y, z)` to the following system of linear equations.

\begin{array}{rcl}
2x + 4y + 0z = 16 \\
1x + 3y + 1z = 16 \\
3x + 0y + 0z = 6 \\
\end{array}

In [30]:
# YOUR CHANGES HERE


A = np.array([
    [2, 4, 0],
    [1, 3, 1],
    [3, 0, 0]
])

# Constants vector
b = np.array([16, 16, 6])

# Solve the system
p9 = tuple(np.linalg.solve(A, b))

In [31]:
p9

(2.0, 3.0, 5.0)

### Problem 10

Set `p10` to any solution `(x, y, z)` to the following system of linear equations.

\begin{array}{rcl}
x + 3y + 0z = 3 \\
0x + 0y + 1z = 2 \\
\end{array}

Hint: these equations are in reduced row echelon form, so there are shortcuts to picking solutions.

In [32]:
# YOUR CHANGES HERE

# Pick a value for the free variable y
y = 0  # you can choose any number

# Solve for x and z
x = 3 - 3*y
z = 2

# Set the solution
p10 = (x, y, z)

In [33]:
p10

(3, 0, 2)

### Problem 11

Set `p11` to be a tuple or list of the average yields in the vineyard data set for 1989, 1990, and 1991 in that order.

In [35]:
# YOUR CHANGES HERE

data = {
    "lugs_1989": [1.0, 3.0, 3.0, 3.0, 5.0],
    "lugs_1990": [5.0, 8.0, 11.0, 9.0, 9.5],
    "target": [9.5, 17.5, 18.0, 20.0, 20.5]
}

df = pd.DataFrame(data)

# Compute the average yields
p11 = (
    df["lugs_1989"].mean(),
    df["lugs_1990"].mean(),
    df["target"].mean()
)



In [36]:
p11

(3.0, 8.5, 17.1)

### Problem 12

Set `p12` to the 95th percentile of the data in `q12`.

In [39]:
# DO NOT CHANGE

q12 = np.array([3.44857705, 2.09151799, 4.98803337, 3.8649001 , 1.20265499,
       3.89903439, 3.05276698, 0.92826333, 3.20371215, 1.81124845,
       3.53150155, 2.32418747, 1.81826697, 3.50670706, 1.37181554,
       2.95770001, 3.80008758, 2.65923837, 2.83248683, 2.91306525,
       2.18314379, 2.17931002, 2.9086665 , 3.26098354, 3.24755896,
       1.01129371, 4.56540725, 3.05517241, 2.32079938, 3.39392893,
       3.3886077 , 3.38112083, 3.88523072, 3.13214221, 3.73298754,
       4.11129171, 2.74133096, 2.4825709 , 3.21885293, 4.08327916,
       2.82768517, 2.1188981 , 3.45886466, 4.20440619, 2.25038228,
       1.59150786, 2.24486543, 3.49914959, 3.72254599, 1.84068517])

In [40]:
# YOUR ANSWER HERE
# Compute 95th percentile
p12 = np.percentile(q12, 95)

In [41]:
p12

4.162504674

### Problem 13

Set `p13` to the average $L_1$ loss using the average of 1989 and 1990 vineyard yields per row to predict 1991 yields per row.

In [47]:
# YOUR CHANGES HERE


data = {
    "lugs_1989": [1.0, 3.0, 3.0, 3.0, 5.0],
    "lugs_1990": [5.0, 8.0, 11.0, 9.0, 9.5],
    "target": [9.5, 17.5, 18.0, 20.0, 20.5]  # 1991 yields
}

df = pd.DataFrame(data)

# Predict 1991 yields as the average of 1989 and 1990 yields
pred = (df["lugs_1989"] + df["lugs_1990"]) / 2

# Compute L1 loss per row
l1_loss = np.abs(pred - df["target"])

# Average L1 loss
p13 = l1_loss.mean()


### Problem 13

Set `p13` to the average $L_1$ loss using the average of 1989 and 1990 vineyard yields per row to predict 1991 yields per row.

In [48]:
p13

11.35

### Problem 14

Build a linear regression trained with `vineyard_inputs` as its input and `vineyard_target` as its target output. Set `p14` as the output of that regression with `vineyard_inputs` as its input.

In [49]:
# YOUR CHANGES HERE
from sklearn.linear_model import LinearRegression

# Vineyard data
vineyard_inputs = pd.DataFrame({
    "lugs_1989": [1.0, 3.0, 3.0, 3.0, 5.0],
    "lugs_1990": [5.0, 8.0, 11.0, 9.0, 9.5]
})

vineyard_target = pd.Series([9.5, 17.5, 18.0, 20.0, 20.5])  # 1991 yields

# Create and train linear regression model
model = LinearRegression()
model.fit(vineyard_inputs, vineyard_target)

# Predict using the same inputs
p14 = model.predict(vineyard_inputs)


In [50]:
p14

array([10.49240506, 16.65696203, 19.31518987, 17.54303797, 21.49240506])

### Problem 15

Given the following data, set `p15` to the weighted variance of 

| Color | Shape | Score | Probability |
|---|---|---|---:|
| red | square | 3 | 0.250 |
| blue | circle | 4 | 0.125 |
| purple | line | 2 | 0.125 |
| purple | diamond | 5 | 0.25 |
| blue | triangle | 3 | 0.25 |

In [51]:
# YOUR CHANGES HERE

# Scores and their probabilities
scores = np.array([3, 4, 2, 5, 3])
probabilities = np.array([0.25, 0.125, 0.125, 0.25, 0.25])

# Weighted mean
weighted_mean = np.sum(probabilities * scores) / np.sum(probabilities)

# Weighted variance
p15 = np.sum(probabilities * (scores - weighted_mean)**2) / np.sum(probabilities)

In [52]:
p15

1.0

### Problem 16

Set `p16` to be the correlation between the 1989 and 1990 yields in the vineyard data set.

In [53]:
# YOUR CHANGES HERE
df = pd.DataFrame({
    "lugs_1989": [1.0, 3.0, 3.0, 3.0, 5.0],
    "lugs_1990": [5.0, 8.0, 11.0, 9.0, 9.5]
})

# Correlation between 1989 and 1990
p16 = df["lugs_1989"].corr(df["lugs_1990"])

In [54]:
p16

0.7115124735378853

### Problem 17

Compute the sample mean and variance of the 1990 vineyard yields.
Assuming that the yields follow a normal distribution with your computed parameters, what would the one-sided p-value of a yield of 13 lugs be?

Hint: use the [SciPy stats module](https://docs.scipy.org/doc/scipy/reference/stats.html) to calculate the p-values from the distribution.

In [55]:
# YOUR CHANGES HERE

from scipy import stats

# Vineyard 1990 yields
yields_1990 = np.array([5.0, 8.0, 11.0, 9.0, 9.5])

# Sample mean and variance
mean_1990 = np.mean(yields_1990)
var_1990 = np.var(yields_1990, ddof=1)  # sample variance

# Standard deviation
std_1990 = np.sqrt(var_1990)

# Normal distribution
dist = stats.norm(loc=mean_1990, scale=std_1990)

# One-sided p-value for yield >= 13
p17 = 1 - dist.cdf(13)

In [56]:
p17

0.022085672454221328

### Problem 18

Set `p18` to be the $2 \times 3$ matrix full of question marks below, filled in with the following information.
1. Each serving of noodles requires 1/2 cup of flour.
2. Each serving of noodles requires 1/8 cup of water.
3. Noodles do not need sugar.
4. Each serving of cake requires 1/4 cup of flour.
5. Each serving of cake requires 1/4 cup of sugar.
6. Cake does not need water.

\begin{array}{rcl}
\begin{bmatrix}
\text{servings of noodles} & \text{pieces of cake} \\
\end{bmatrix}
\begin{bmatrix}
\text{??} & \text{??} & \text{??} \\
\text{??} & \text{??} & \text{??} \\
\end{bmatrix}
& = &
\begin{bmatrix}
\text{flour needed} & \text{sugar needed} & \text{water needed} \\
\end{bmatrix}
\end{array}

In [57]:
# YOUR CHANGES HERE

p18 = np.array([
    [1/2, 0, 1/8],
    [1/4, 1/4, 0]
])

In [58]:
p18

array([[0.5  , 0.   , 0.125],
       [0.25 , 0.25 , 0.   ]])

### Problem 19

Set `p19` to be the cosine similarity of the vectors `x19` and `y19`.


In [59]:
# DO NOT CHANGE

x19 = [0.4, 0.2, -0.5]
x19

[0.4, 0.2, -0.5]

In [60]:
# DO NOT CHANGE

y19 = [-0.3, -0.2, 0.4]
y19

[-0.3, -0.2, 0.4]

In [61]:
# YOUR CHANGES HERE

x19 = np.array([0.4, 0.2, -0.5])
y19 = np.array([-0.3, -0.2, 0.4])

# Cosine similarity
p19 = np.dot(x19, y19) / (np.linalg.norm(x19) * np.linalg.norm(y19))

In [62]:
p19

-0.9965457582448795

### Problem 20

Set `p20` to the reduced row echelon form of `q20`.


In [70]:
# DO NOT CHANGE

q20 = np.array([[2., 5., -3., 2.0],
                [-2, 1, 3, -2],
                [ 4.,  1.,  0., 16.]])

In [72]:
# YOUR CHANGES HERE
p20 =  np.array([
    [1., 0., 0., 4.],
    [0., 1., 0., 0.],
    [0., 0., 1., 2.]
])

In [68]:
p20

array([[ 1.,  0.,  0.,  3.],
       [ 0.,  1.,  0.,  2.],
       [ 0.,  0.,  1., -1.]])

### Generative AI Usage

If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the [generative AI policy](https://www.bu.edu/cds-faculty/culture-community/gaia-policy/).
If you did not use any generative AI tools, simply write NONE below.

YOUR CHANGES HERE

In [69]:
None