<div class="alert alert-success">

#### Homework 7 Supplemental Notebook
    
# Projections; Regression with Linear Algebra

### EECS 245, Fall 2025 at the University of Michigan
    
</div>

### Instructions

Most homeworks will have Jupyter Notebooks, like this one, designed to supplement the theoretical problems. 

To write and run code in this notebook, you have two options:

1. **Use the EECS 245 DataHub.** To do this, click the link provided in the Homework 7 PDF. Before doing so, read the instructions on the [**Tech Support**](https://eecs245.org/tech-support/#option-1-using-the-eecs-245-datahub) page on how to use the DataHub.
1. **Set up a Jupyter Notebook environment locally, and use `git` to clone our course repository.** For instructions on how to do this, see the [**Tech Support**](https://eecs245.org/tech-support) page of the course website.

There are two homework problems mentioned in this notebook:
- Problem 3 is contained **entirely** in this notebook, and is **entirely autograded**. To receive credit for Problem 3, submit your completed notebook to the Homework 7, Problem 3 Code autograder on Gradescope. Your submission time for Homework 7 is the **latter** of your PDF and code submission times. Remember that homework problems have hidden test cases. The public test cases in your notebook only verify that your answer is in the correct format and on the right track; your results on the hidden tests will be available to you on Gradescope after we release grades.
- Problem 4e) is the only part of Problem 4 that exists in the notebook, but it is **not autograded**: instead, to get credit for it, you'll need to screenshot your implementation of required function and include it in your Homework 7 PDF.

In [None]:
# Run this cell.
import numpy as np
import pandas as pd
import time

pd.options.plotting.backend = "plotly"

import plotly.express as px
import plotly.io as pio
import plotly.figure_factory as ff
from plotly.subplots import make_subplots

# Set default layout for all plotly figures
import plotly.graph_objs as go

custom_template = go.layout.Template(pio.templates["plotly_white"])
custom_template.layout.plot_bgcolor = "white"
custom_template.layout.paper_bgcolor = "white"
custom_template.layout.margin = dict(l=60, r=60, t=60, b=60)
custom_template.layout.width = 700
custom_template.layout.font = dict(
    family="Palatino Linotype, Palatino, serif",
    color="black"
)

pio.templates["custom"] = custom_template
pio.templates.default = "custom"

## Problem 3: Billy the Waiter 🧑‍🍳 (14 pts)

---

<div class="alert alert-success">
    <b>Note: In order for this problem to make the most sense, finish Problems 1 and 2 first.</b>
</div>

Run the cell below to load in a dataset containing information about the tips Billy received over the last month as a waiter at Mani Osteria.

In [None]:
tips = px.data.tips().rename(columns={'size': 'table_size'}).replace('Fri', 'Thur')
tips

Each row corresponds to a single table that he served. Throughout this question, our goal will be to predict `tip` using some or all of the other features in the DataFrame. We will do so by implementing all aspects of the linear regression model-building process **manually using `numpy`, i.e. WITHOUT using `sklearn` or other machine learning packages**. In Homework 8, we'll look at how to use `sklearn` to build more advanced linear models.

Let's start by just using `total_bill` to predict `tip`. Here's a scatter plot showing the relationship between the two variables:

In [None]:
fig = tips.plot(kind='scatter', x='total_bill', y='tip', title='Total Bill vs. Tip')
# fig.show(renderer='notebook')

We knew how to build a simple linear regression model that uses `total_bill` to predict `tip` over a month ago, well before we learned about linear algebra.

In [None]:
def optimal_slope(x, y):
    return np.corrcoef(x, y)[0, 1] * np.std(y) / np.std(x)

def optimal_intercept(x, y):
    return np.mean(y) - optimal_slope(x, y) * np.mean(x)

w1_star = optimal_slope(tips['total_bill'], tips['tip'])
w0_star = optimal_intercept(tips['total_bill'], tips['tip'])
print('w1_star: ', w1_star, '\nw0_star: ', w0_star)

In [None]:
# Predicted tip for a total bill of $15.
w0_star + w1_star * 15

But, equipped with our understanding of projections, we can implement the same formulas using linear algebra. That's exactly what we've done for you below.

In [None]:
def solve_normal_equations(X, y):
    '''
    Finds w*, as defined in Problem 1 of Homework 7.
    Equivalent to returning w* = np.linalg.inv(X.T @ X) @ X.T @ y
    when X.T @ X is invertible, but more efficient and numerically stable.
    '''
    return np.linalg.solve(X.T @ X, X.T @ y)

def compute_mse(X, y, w):
    '''
    Returns the mean squared error of the predictions of the model
    predicted y = X @ w.
    Often used in conjunction with solve_normal_equations, which finds the best w*.
    '''
    return np.mean((y - X @ w) ** 2)

def create_design_matrix(df, columns):
    '''
    Creates a design matrix X, whose first column is 1, 1, ..., 1,
    whose second column contains the first feature, third column contains the second feature, etc.
    '''
    df = df.copy()
    df['1'] = 1
    return df[['1'] + columns].to_numpy()

How do we use `solve_normal_equations` and `create_design_matrix`?

In [None]:
X_one_feature = create_design_matrix(tips, ['total_bill'])
y = tips['tip']

# Notice that X_one_feature has two columns.
X_one_feature

The vector $\vec w^*$, found below, finds `w1_star` and `w0_star` in just a single function call.

In [None]:
# Finding w*.
w_one_feature = solve_normal_equations(X_one_feature, y)
w_one_feature

<div class="alert alert-success">
    <b>Before proceeding, make sure you notice the fact that <code>w_one_feature</code>, found using linear algebra, contains both <code>w1_star</code> and <code>w0_star</code>, which we found using calculus at the start of the semester! This equivalence is what you proved in Problem 2 of this homework.</b>
</div>

We can now use this hypothesis function to make predictions.

In [None]:
# Equivalent to 1 * w0_star + 15 * w1_star.
np.array([1, 15]) @ w_one_feature

In [None]:
px.scatter(tips, x='total_bill', y='tip', title='Tip vs. Total Bill')

x_range = np.linspace(0, 60)

fig = go.Figure()
fig.add_trace(go.Scatter(x=tips['total_bill'], y=y, mode='markers', name='actual'))
fig.add_trace(go.Scatter(x=x_range, 
                         y=w_one_feature[0] + w_one_feature[1] * x_range, 
                         name='Simple Linear Regression Model', 
                         line=dict(color='orange')))

fig.update_layout(xaxis_title='Total Bill', yaxis_title='Tip')
# fig.show(renderer='notebook')

The mean squared error of this hypothesis function is as follows:

In [None]:
mse_one_feature = compute_mse(X_one_feature, y, w_one_feature)
mse_one_feature

We'll define the DataFrame `hypothesis_functions` solely to keep track of the hypothesis functions we've used so far along with their MSEs. (We'll update this DataFrame for you.)

In [None]:
hypothesis_functions = pd.DataFrame(index=['total_bill'], columns=['MSE'])
hypothesis_functions.loc['total_bill'] = mse_one_feature
hypothesis_functions

### Problem 3a) (2 pts)

Let's suppose Billy works for a day as a waiter at the [Gandy Dancer](https://www.gandydancerrestaurant.com/), a fancy restaurant. He waits a table whose total bill is \$350. He decides to use the above trained model to predict the tip that he will receive.

1. What tip would the above single-feature model predict for a total bill of \$350? In the cell below, assign the answer to the variable `prediction_for_350`. (Try and use the `@` symbol as part of your answer!)
1. Is this prediction likely to be accurate? If so, in the cell below, assign the variable `is_accurate` to `True`, otherwise, assign it to `False`. Before assigning `is_accurate` to either `True` or `False`, you should think about what makes a prediction about the future likely to be accurate vs. not.

**You should not round any numbers at any point in this question**!

In [None]:
prediction_for_350 = ...
is_accurate = ...

# Don't change the line below.
print(f'The predicted tip for a total bill of $350 is ${round(prediction_for_350, 2)}, and we {"do" if is_accurate else "do not"} think this prediction is likely to be accurate.')

In [None]:
grader.check("p03a")

### Problem 3b) (2 pts)

Now, let's suppose we want to use `total_bill` **and** `table_size` to predict `tip`. This amounts to creating an $n \times \mathbf{3}$ design matrix, whose first two columns are the same as in `X_one_feature`, but with a third column containing table sizes.

$$X = \begin{bmatrix} 1 & \text{total bill}_1 & \text{table size}_1 \\ 1 & \text{total bill}_2 & \text{table size}_2 \\ \vdots & \vdots & \vdots \\ 1 & \text{total bill}_n & \text{table size}_n \end{bmatrix}$$

Below, complete the following tasks:

1. Assign `X_two_features` to the design matrix for this new hypothesis function.
1. Assign `w_two_features` to the optimal parameter vector for this new hypothesis function.
1. Assign `mse_two_features` to the mean squared error of this hypothesis function.
1. Did adding `table_size` as a feature make our hypothesis function significantly more accurate as compared to the hypothesis function that used just `total_bill`? If so, assign `much_more_accurate` to `True`, otherwise assign it to `False`.

Tasks 1, 2, and 3 should each only take line; remember to use the helper functions we've already defined.

In [None]:
X_two_features = ...
w_two_features = ...
mse_two_features = ...
much_more_accurate = ...

# Don't change the lines below.
print('first five rows of design matrix:\n', X_two_features[:5])
print('\noptimal parameter vector:', w_two_features)
print('MSE:', mse_two_features)
print('much more accurate:', 'yes' if much_more_accurate else 'no')

In [None]:
grader.check("p03b")

If you completed Problem 3b) correctly, you should see a 3D scatter plot of the original data points and your hypothesis function below.

In [None]:
XX, YY = np.mgrid[0:60:2, 0:8:2]
Z = w_two_features[0] + w_two_features[1] * XX + w_two_features[2] * YY
plane = go.Surface(x=XX, y=YY, z=Z, colorscale='Oranges')

fig = go.Figure(data=[plane])
fig.add_trace(go.Scatter3d(x=tips['total_bill'], 
                           y=tips['table_size'], 
                           z=tips['tip'], mode='markers', marker = {'color': '#3d81f6'}))

fig.update_layout(scene = dict(
    xaxis_title='Total Bill',
    yaxis_title='Table Size',
    zaxis_title='Tip'), title='Tip vs. Total Bill')

# fig.show(renderer='notebook')

Don't change this cell, just run it.

In [None]:
hypothesis_functions.loc['total_bill and table_size'] = mse_two_features
hypothesis_functions

### Problem 3c) (2 pts)

Which feature is more important in predicting tip – `total_bill` or `table_size`?

Assuming you answered Problem 3b) correctly, run the cell below to create a **standardized** design matrix, where the two columns for `total_bill` and `tip` are standardized to have mean 0 and standard deviation 1 (i.e. converted to $z$-scores).

In [None]:
X_two_features_standardized = X_two_features.copy()
X_two_features_standardized[:, 1:] = (X_two_features[:, 1:] - np.mean(X_two_features[:, 1:], axis=0)) / X_two_features[:, 1:].std(axis=0, ddof=0)
X_two_features_standardized[:5]

Below,

1. Assign `w_two_features_standardized` to an array containing the standardized regression coefficients for our two-feature hypothesis function.
1. Assign `more_important` to either `'total_bill'` or `'table_size'`, depending on which of the two features you think is more important in predicting `tip`.

_Hint: We haven't talked about standardized regression coefficients in class yet. We want you to think about what standardizing the features actually does._

In [None]:
w_two_features_standardized = ...
more_important = ...
w_two_features_standardized, more_important

In [None]:
grader.check("p03c")

Don't change this cell, just run it.

In [None]:
hypothesis_functions.loc['total_bill and table_size std'] = compute_mse(X_two_features_standardized, y, w_two_features_standardized)
hypothesis_functions

The MSEs of the last two models were the same! The only difference is that when we standardized the features in creating the most recent hypothesis function, we were able to compare the coefficients directly.

Now, let's revisit the scatter plot of `'tip'` vs. `'total bill'`:

In [None]:
fig = px.scatter(tips, x='total_bill', y='tip', title='Tip vs. Total Bill')
fig.update_layout(xaxis_title='Total Bill', yaxis_title='Tip')
# fig.show(renderer='notebook')

Let's see if using higher-degree polynomial features yields a better hypothesis function. Specifically, let's try and create a degree 4 polynomial hypothesis function, using the features `total_bill`, `total_bill^2`, `total_bill^3`, and `total_bill^4`.

Again, we recognize that we haven't seen this idea in lecture yet; this part of the problem is meant to give you a taste of how polynomial regression works,

In [None]:
# Making a copy of the tips DataFrame so that we don't modify the original data.
tips_with_poly_features = tips.copy()

In [None]:
# Computing total_bill^2.
tips_with_poly_features['total_bill^2'] = tips_with_poly_features['total_bill'] ** 2
tips_with_poly_features.head()

### Problem 3d) (3 pts)

Below,

1. Add columns `total_bill^3` and `total_bill^4` to the DataFrame `tips_with_poly_features`.
1. Define `X_poly`, `w_poly`, and `mse_poly` to be the design matrix, optimal parameter vector, and mean squared error of our new 4th degree polynomial hypothesis function. Note that this hypothesis function should be of the form:

    $$h(x_i) = w_0 + w_1 x_i + w_2 x_i^2 + w_3 x_i^3 + w_4 x_i^4$$

    where $x$ is the `total_bill`.

Again, this subpart should only take a few minutes.

In [None]:
tips_with_poly_features = ...
X_poly = ...
w_poly = ...
mse_poly = ...

# Don't change the lines below.
print('first five rows of design matrix:\n', X_poly[:5])
print('\noptimal parameter vector:', w_poly)
print('MSE:', mse_poly)

In [None]:
grader.check("p03d")

Don't change this cell, just run it.

In [None]:
hypothesis_functions.loc['total_bill 4th degree poly'] = mse_poly
hypothesis_functions

Assuming you completed Problem 3d) correctly, run the following cell to see a visualization of our 4th degree polynomial hypothesis function.

In [None]:
x_range = np.linspace(0, 50)

fig = go.Figure()
fig.add_trace(go.Scatter(x=tips['total_bill'], y=tips['tip'], mode='markers', name='actual'))
fig.add_trace(go.Scatter(x=x_range, 
                         y=w_poly[0] + w_poly[1] * (x_range) + w_poly[2] * (x_range**2) + \
                             w_poly[3] * (x_range**3) + w_poly[4] * (x_range**4),
                         name='4th Degree Polynomial Model', 
                         line=dict(color='orange', width=5)))

fig.update_layout(xaxis_title='Total Bill', yaxis_title='Tip', title='Tip vs. Total Bill')
# fig.show(renderer='notebook')

The 4th degree polynomial hypothesis function seems to fit the data the best so far, since its MSE is the lowest.

In [None]:
hypothesis_functions

But let's see what happens when we "zoom out" and look at how this hypothesis function behaves.

In [None]:
x_range = np.linspace(-20, 70)

fig = go.Figure()
fig.add_trace(go.Scatter(x=tips['total_bill'], y=tips['tip'], mode='markers', name='actual'))
fig.add_trace(go.Scatter(x=x_range, 
                         y=w_poly[0] + w_poly[1] * (x_range) + w_poly[2] * (x_range**2) + \
                             w_poly[3] * (x_range**3) + w_poly[4] * (x_range**4),
                         name='4th Degree Polynomial Model', 
                         line=dict(color='orange', width=5)))

fig.update_layout(xaxis_title='Total Bill', yaxis_title='Tip', title='Tip vs. Total Bill')
# fig.show(renderer='notebook')

If we keep increasing the degrees of the polynomial features we use, our hypothesis function will look more and more like a polynomial that passes through every single data point. **Think** about **why** a model with a lower MSE is not necessarily better than a model with a higher MSE. You don't need to write your answer anywhere, but discuss it with someone (either a peer or IA/instructor) before submitting this homework.

### Problem 3e) (2 pts)

Let's again suppose Billy works for a day as a waiter at [The Gandy Dancer](https://www.gandydancerrestaurant.com/). He waits a table whose total bill is \$350. He decides to use the above 4th degree polynomial hypothesis function to predict the tip that he will receive.

What tip would the above polynomial model predict for a total bill of \$350? In the cell below, assign the answer to the variable `poly_prediction_for_350`.

In [None]:
poly_prediction_for_350 = ...

# Don't change the line below.
print(f'The predicted tip for a total bill of $350 is ${round(poly_prediction_for_350, 2)}.')

In [None]:
grader.check("p03e")

There was another column in our original DataFrame, `tips`, that we haven't yet looked at: `day`.

In [None]:
tips.head()

In [None]:
fig = px.bar(tips['day'].value_counts().loc[['Thur', 'Sat', 'Sun']])
# fig.show(renderer='notebook')

Note that unlike `total_bill` and `table_size`, `day` is **categorical**. This means there's no easy way to put it in our design matrix or find the best hypothesis function.

A naïve solution would be to encode `'Thur'` as 1, `'Sat'` as 2, and `'Sun'` as 3, but this would make it seem like Sunday is "more" than Saturday or Thursday in some regard, which it is not – these are all just different days of the week.

A more robust and common solution is called **one hot encoding** (OHE). To show you how it works, we'll first get it working on a toy example. Let's pretend we have a DataFrame with just 5 rows and 2 columns, `total_bill` and `day`. Call it `mini_tips`.

In [None]:
mini_tips = pd.DataFrame()
mini_tips['total_bill'] = tips['total_bill'].iloc[:5]
mini_tips['day'] = ['Sat', 'Sun', 'Sun', 'Thur', 'Sat']
mini_tips

When we **one hot encode** a categorical variable, we create a new column for each unique value of that categorical variable. In this case, we'd create three new columns, one each for `'Thur'`, `'Sat'`, and `'Sun'`.

Each of these new columns is binary, meaning they only contain the values 1 and 0. 
- The new column for `'Thur'`, which we'll call `is_thur`, will contain a 1 for rows where the value of `'day'` is `'Thur'`, and 0 for all other rows. 
- Similarly, the new column for `'Sun'`, which we'll call `is_sun`, will contain a 1 for rows where the value of day is `'Sun'`, and 0 for all other rows.

In [None]:
(mini_tips['day'] == 'Thur')

Repeating this for all columns:

In [None]:
mini_tips['is_thur'] = (mini_tips['day'] == 'Thur').astype(int)
mini_tips['is_sat'] = (mini_tips['day'] == 'Sat').astype(int)
mini_tips['is_sun'] = (mini_tips['day'] == 'Sun').astype(int)

# Dropping the 'day' column. We've encoded it numerically, we don't need it anymore.
mini_tips = mini_tips.drop(columns=['day'])
mini_tips

Now we've converted a categorical feature into three numerical features, so we're good to go!

**There's just one more thing.** Since we're used to fitting linear hypothesis functions with an intercept term, our design matrix generally has a column of all 1s in it. In the case of `mini_tips`, which contains three binary columns, this would look like:

In [None]:
create_design_matrix(mini_tips, list(mini_tips.columns))

This design matrix contains redundant information! Specifically, we can recreate the column of all 1s by adding together the three one hot encoded columns.

In [None]:
# Note that the 0, 1, 2, 3, 4 that you see is the index of this Series, which is irrelevant for our purposes.
mini_tips['is_thur'] + mini_tips['is_sat'] + mini_tips['is_sun']

What this means is $X$'s columns aren't linearly independent, so $X^TX$ (which has the same rank as $X$) is not **full rank** and hence is **not invertible**.

This means that a unique minimizer

$$w^* = (X^TX)^{-1}X^T \vec y$$

doesn't exist, and instead, there are infinitely many optimal $\vec w^*$'s that minimize mean squared error. These all satisfy the **normal equations**,

$$X^TX \vec w^* = X^T \vec y$$

Again, we'll address this idea in lectures to come, so don't worry if this is a bit confusing. This is more meant to be a preview.

**For now, know this – the way to avoid this problem is to drop one of the one hot encoded columns.** That way, there is no redundant information in the design matrix, and we don't run into any issues. This is not "getting rid" of any information, so it will not impact our predictions – if we know it is not Saturday or Sunday, it must be Thursday.

In [None]:
# We've arbitrarily chosen to drop 'is_thur', but it would make no difference if we instead dropped 'is_sat' or 'is_sun'.
mini_tips = mini_tips.drop(columns=['is_thur'])
mini_tips

In [None]:
create_design_matrix(mini_tips, list(mini_tips.columns))

Now we have a design matrix that is ready to go. Let's replicate this process on our full dataset.

In [None]:
# Run this cell.
tips_ohe = tips.copy()
tips_ohe['is_sat'] = (tips_ohe['day'] == 'Sat').astype(int)
tips_ohe['is_sun'] = (tips_ohe['day'] == 'Sun').astype(int)

# Design matrix with two one-hot encoded columns.
X_ohe = create_design_matrix(tips_ohe, ['total_bill', 'is_sat', 'is_sun'])
print('first five rows of design matrix:\n', X_ohe[:5])

In [None]:
w_ohe = solve_normal_equations(X_ohe, y)
w_ohe

Let's now plot the resulting hypothesis function. We've zoomed into the region where the `total_bill`s are less than \\$30 and `tip`s are less than \\$4 to make the hypothesis function more clear.

In [None]:
x_range = np.linspace(0, 30)

under_30 = tips[(tips['total_bill'] < 30) & (tips['tip'] < 4)]

fig = go.Figure()
fig.add_trace(go.Scatter(x=under_30['total_bill'], y=under_30['tip'], mode='markers', name='actual'))

# Line for Thursday.
fig.add_trace(go.Scatter(x=x_range, 
                         y=w_ohe[0] + w_ohe[1] * x_range, 
                         name='Thursday', 
                         line=dict(color='orange', width=4)))

# Line for Saturday.
fig.add_trace(go.Scatter(x=x_range, 
                         y=w_ohe[0] + w_ohe[2] + w_ohe[1] * x_range, 
                         name='Saturday', 
                         line=dict(color='purple', width=4)))

# Line for Sunday.
fig.add_trace(go.Scatter(x=x_range, 
                         y=w_ohe[0] + w_ohe[3] + w_ohe[1] * x_range, 
                         name='Sunday', 
                         line=dict(color='green', width=4)))

fig.update_layout(xaxis_title='Total Bill', yaxis_title='Tip', title='Tip vs. Total Bill')
# fig.show(renderer='notebook')

It looks like the model is actually three separate lines, each of which have the same slope but different intercepts!

Let's try and understand why this is the case.

In [None]:
w_ohe

Our hypothesis function is of the following form (approximately, since the coefficients are rounded):

$$\text{predicted tip}_i = 0.925 + 0.105 (\text{total bill}_i) - 0.072 (\text{is saturday}_i) + 0.089 (\text{is sunday}_i)$$

### Problem 3f) (3 pts)

Below, assign `intercept_thur`, `intercept_sat`, and `intercept_sun` to the **$y$-intercepts** of the three lines above, corresponding to when the `'day'` is Thursday, Saturday, or Sunday. You should do this using code,  pulling values from `w_ohe`, but you should think conceptually about where each of the three intercepts are coming from.

In [None]:
intercept_thur = ...
intercept_sat = ...
intercept_sun = ...

# Don't change the lines below.
print('Intercept for Thursday:', intercept_thur)
print('Intercept for Saturday:', intercept_sat)
print('Intercept for Sunday:', intercept_sun)

In [None]:
grader.check("p03f")

Just for completeness, we'll also compute the MSE of this hypothesis function:

In [None]:
mse_ohe = compute_mse(X_ohe, y, w_ohe)
hypothesis_functions.loc['total_bill + OHE day'] = mse_ohe
hypothesis_functions

This new hypothesis function didn't have a much lower MSE than the hypothesis function that used `total_bill` only. That's not all that surprising, since the three lines above look quite similar.

That's all for this problem! Remember, this problem is entirely autograded, and has some hidden tests. Make sure to submit it to the Homework 7, Problem 3 autograder on Gradescope and verify all public tests pass there. 

<div class="alert alert-success">
    <b>Make sure you also understand how one hot encoding works, since it's likely to be a topic that appears on Midterm 2.</b>
</div>

## Problem 4: Orthogonalization

---

The entirety of Problem 4 is manually graded; all you need to do here is implement the function `orthogonalize` below and include screenshots of your implementation and its outputs. As usual, there are <span style="color: orange; font-weight: bold">orange lines</span> around what you need to screenshot.

First, allow us to give you some hints on how to work with arrays. Consider the following examples.

In [None]:
X = np.array([[3, 5],
              [1, 2],
              [4, 0],
              [0, 3],
              [9, -2]])

X.shape

In [None]:
X[:, 0]

In [None]:
X[:, 1].reshape(-1, 1)

In [None]:
np.dot(X[:, 0], X[:, 1])

In [None]:
np.zeros_like(X)

In [None]:
np.eye(4)

In [None]:
# Coupled with the above, you can use np.eye and np.allclose to
# verify that you implemented orthogonalize correctly!
np.allclose([[3, 4]], [[2.9999999999, 4]])

### Problem 4e)

Complete the implementation of the function `orthogonalize`, which takes in an $n \times d$ matrix $V$ (stored as a 2D array) whose columns are linearly independent, and returns an $n \times d$ matrix $Q$ (also stored as a 2D array) whose columns are orthonormal and span the same subspace of $\mathbb{R}^n$ that $V$'s columns do.

Example behavior is given below.

```python
# This is the example we walked through in Problem 4.
>>> orthogonalize(np.array([[1.0,  1.0, 1.0],
                            [-1.0, 0.0, 1.0],
                            [1.0,  1.0, 2.0]]))
array([[ 5.77350269e-01,  4.08248290e-01, -7.07106781e-01],
       [-5.77350269e-01,  8.16496581e-01, -3.14018492e-16],
       [ 5.77350269e-01,  4.08248290e-01,  7.07106781e-01]])
```

Feel free to implement helper functions if you'd like (we did).

<hr style="border: 0; height: 4px; background: orange;">

In [None]:
def orthogonalize(V):
    """
    Runs the Gram-Schmidt process on the columns of V.
    The shape of Q should be the same as the shape of V.
    Assume V has at least 1 column and 1 row.
    """
    V = V.copy() # Ensures that you don't make any modifications to the underlying array.
    ...
    
# Feel free to test out your implementation below.
# Recent versions of numpy will error if you try and perform operations involving
# an array of ints and an array of floats, which is why we've initialized the array with floats.
orthogonalize(np.array([[4.0, 1.0], 
                        [3.0, 5.0]]))

In [None]:
# DO NOT CHANGE THIS CELL;
# include a screenshot of the output of this cell in your submitted PDF.
V_test_1 = np.array([[1.0,  1.0, 1.0],
                     [-1.0, 0.0, 1.0],
                     [1.0,  1.0, 2.0]])
Q_test_1 = orthogonalize(V_test_1)
Q_test_1

In [None]:
# DO NOT CHANGE THIS CELL;
# include a screenshot of the output of this cell in your submitted PDF.
V_test_2 = np.array([[3, 5, 4, 0],
                     [1, 0, 9, -1],
                     [5, 1, 3, 2],
                     [0, 8, 0, 8],
                     [3, 2, -1.5, 4],
                     [0, 0, 0, 1]])
Q_test_2 = orthogonalize(V_test_2)
Q_test_2

<hr style="border: 0; height: 4px; background: orange;">

In your submission for Problem 4e), make sure to include screenshots of:
- Your implementations of `orthogonalize` **and** any helper functions you defined.
- The inputs and outputs for the cells involving `V_test_1` and `V_test_2`.

Feel free to experiment beyond these test cases, but you don't have to screenshot any additional code.

### Problem 4f)

This is not a programming problem, but you might want to use your implementation of `orthogonalize` to figure out how $A = QR$ works. A pretty big hint is below.

In [None]:
A = np.array([[2.0,  1, -2],
              [1,    4, -5],
              [-4,   3,  6],
              [1,    0,  9]])

A

In [None]:
Q = orthogonalize(A)
Q

In [None]:
Q @ Q.T @ A

No part of the code above needs to be included in your notebook for Problem 4f).

## Finish Line 🏁

Remember:
1. To get credit for Problem 3, submit this notebook to Gradescope.
    1. Select `Kernel -> Restart & Run All` to ensure that you have executed all cells, including the test cells.
    2. Read through the notebook to make sure everything is fine and all public tests passed.
    3. Run the cell below to run all tests, and make sure that they all pass.
    4. Download your notebook using `File -> Download`, then upload your notebook to Gradescope under "Homework 7, Problem 3 Code".
    5. Stick around for a few minutes while the Gradescope autograder grades your work. Make sure you see that all **public tests** have passed on Gradescope. **Remember that homeworks have hidden tests!**
2. To get credit for Problem 4e), include a screenshot of your implementation of `orthogonalize` and the outputs of the two provided tests in your Homework 7 PDF.