# Pset-2

## Problem 1

Consider a linear regression problem in which your input data $x$ lies within the narrow domain $[100000, 100001]$. You want to fit a model of the form:

$$
y \approx \beta_0 + \beta_1 x.
$$

You can represent this model in two different bases:
1. The **standard monomial basis**: $\{1, x\}$,
2. A **shifted basis**: $\{1, x - 100000.5\}$.

For $N$ data points $\{(x_i, y_i)\}_{i=1}^N$, the mean squared error (MSE) loss function for both is written as:

$$
L(\beta_0, \beta_1) = \frac{1}{N} \sum_{i=1}^N \bigl(\beta_0 + \beta_1 x_i - y_i\bigr)^2,
$$

and

$$
\tilde{L}(\tilde{\beta}_0, \tilde{\beta}_1) = \frac{1}{N} \sum_{i=1}^N \Bigl(\tilde{\beta}_0 + \tilde{\beta}_1 (x_i - 100000.5) - y_i\Bigr)^2.
$$

#### Tasks:
- Compute the gradient of each loss function with respect to its parameters.
- Suppose you perform **one step** of gradient descent for each basis, using the **same** learning rate and **same** initial parameter guesses. How much does the loss decrease in each case? Explain any observed differences.
- A useful diagnostic in gradient-based methods is to examine how well the local gradient direction aligns with the true direction toward the optimum. Let $\boldsymbol{\beta}^*$ be the optimal parameter vector that minimizes the loss, and let $\nabla L(\boldsymbol{\beta})$ denote the gradient of the loss at some iterate $\boldsymbol{\beta}$.

  - **Compute or estimate** the angle between $\boldsymbol{\beta}^*$ and $\nabla L(\boldsymbol{\beta})$ for each of the two bases:

    $$
       \theta = \cos^{-1}\!\Biggl(
         \frac{\bigl\langle \boldsymbol{\beta}^*, \nabla L(\boldsymbol{\beta}) \bigr\rangle}
         {\|\boldsymbol{\beta}^*\|\;\|\nabla L(\boldsymbol{\beta})\|}
       \Biggr).
    $$

  - How large is the angle in each case, and what does this imply about the alignment of the gradient direction toward the optimal solution? Relate your observations to the conditioning of each basis and the implications for gradient-descent convergence.

---

## Problem 2

We know that if $f(x)$ is analytic in an open real interval containing $[-1,1]$, then there exist constants $C > 0$ and $K > 1$ such that:

$$
    \max_{x \in [-1,1]} |f(x) - p(x)| \leq C K^{-n},
$$

where $p$ is the unique polynomial of degree $n$ or less defined by interpolation on $n + 1$ Chebyshev second-kind points. But now consider the family of functions $f_m(x) = |x|^m$.

#### Tasks:
- **Differentiability:** How many continuous derivatives over $[-1,1]$ does $f_m$ possess?
- **Polynomial Interpolation:** Compute the polynomial interpolant using $n$ second-kind Chebyshev nodes in $[-1,1]$ for $n = 10, 20, 30, \dots, 100$. At each value of $n$, compute the max-norm error:

  $$
      \max |p(x) - f_m(x)|
  $$

  evaluated for at least 41000 values of $x$. Using a single log-log graph, plot the error as a function of $n$ for all six values $m = 1, 3, 5, 7, 9, 11$.

- **Asymptotic Hypothesis:** Based on the results of parts (a) and (b), form a hypothesis about the asymptotic behavior of the error for fixed $m$ as $n \to \infty$.

---

## Problem 3

Kepler found that the orbital period $\tau$ of a planet depends on its mean distance $R$ from the sun according to:

$$
    \tau = c R^{\alpha}
$$

for a simple rational number $\alpha$. Perform a linear least-squares fit from the following table in order to determine the most likely simple rational value of $\alpha$.

| Planet   | Distance from Sun (Mkm) | Orbital Period (days) |
|----------|-------------------------|-----------------------|
| Mercury  | 57.59                   | 87.99                 |
| Venus    | 108.11                  | 224.7                 |
| Earth    | 149.57                  | 365.26                |
| Mars     | 227.84                  | 686.98                |
| Jupiter  | 778.14                  | 4332.4                |
| Saturn   | 1427                    | 10759                 |
| Uranus   | 2870.3                   | 30684                 |
| Neptune  | 4499.9                   | 60188                 |

---

## Problem 4: Condition Number

1. Let $f(x) = c^T x$ where $x$ is an $n$-component vector. What is are the relative **condition number** of $f$ (depending on $x$ and $c$) in the $L_1$ norm? in the $L_2$ norm?

2. Suppose that you have two data points $(x_1, y_1)$ and $(x_2, y_2)$ and you linearly interpolate $y$ at a point $x$ ($x_1 \leq x \leq x_2$). If we think of $y(y_1, y_2)$ as a function of the input function values (keeping $x_1$ and $x_2$ fixed), show that the absolute **condition number** (in the $L_2$ norm) is bounded but the relative **condition number** can be infinite. Why does the absolute **condition number** make sense in this case?

3. If $Q$ is a square matrix with orthonormal columns (so that $Q^T Q = I$), explain why its induced norm (defined in class) and **condition number** are both 1 (in the $L_2$ norm). Such "orthogonal" (or "unitary") matrices are the best case for solving linear systems!
