# NumPy Interview Exercise: Low-Rank Adapted Linear Layer

## Part 1: Single Low-Rank Adapter

### Inputs

You are given the following NumPy arrays:

- `x`: input activations, shape `(batch, d_in)`
- `W`: base weight matrix, shape `(d_out, d_in)`
- `A`: low-rank left factor, shape `(r, d_in)`
- `B`: low-rank right factor, shape `(d_out, r)`
- `alpha`: scalar scaling factor (float, default = 1.0)

### Output

Return an array `y` of shape `(batch, d_out)` defined as:

$$
y = x W^T + \alpha \cdot x A^T B^T
$$

---

### Constraints

1. **Do not explicitly construct**:
   - `W_eff = W + B @ A`, or
   - `deltaW = B @ A`
2. You may use:
   - `@`, `np.matmul`, `np.dot`, or `np.einsum`
3. The function should raise a clear `ValueError` if shapes are incompatible.
4. Assume inputs are `float32` or `float64`; dtype promotion may follow NumPy defaults.

---

In [None]:
import numpy as np

def adapted_linear(x: np.ndarray,
                   W: np.ndarray,
                   A: np.ndarray,
                   B: np.ndarray,
                   alpha: float = 1.0) -> np.ndarray:
    ...



np.random.seed(0)
batch, d_in, d_out, r = 4, 6, 5, 2
x = np.random.randn(batch, d_in).astype(np.float32)
W = np.random.randn(d_out, d_in).astype(np.float32)
A = np.random.randn(r, d_in).astype(np.float32)
B = np.random.randn(d_out, r).astype(np.float32)
alpha = 0.3

y = adapted_linear(x, W, A, B, alpha)

# reference (allowed in test)
W_eff = W + alpha * (B @ A)
y_ref = x @ W_eff.T

print(np.max(np.abs(y - y_ref)))
# ~e-7 for float32

## Part 2: Multiple Low-Rank Adapters

In this extension, the linear layer is adapted by **multiple low-rank terms**, each with its own coefficient.

### Additional Inputs

- `As`: array of low-rank left factors, shape `(n_adapters, r, d_in)`
- `Bs`: array of low-rank right factors, shape `(n_adapters, d_out, r)`
- `coeffs`: mixing coefficients, shape `(n_adapters,)`

### Output Definition

The output is defined as:

$$
y = x W^T + alpha * sum_{i=1..n_adapters} coeffs[i] * x A_i^T B_i^T
$$

### Constraints

- **Do not construct any full `(d_out, d_in)` matrix**, including:
  - `B_i @ A_i`
  - `W_eff = W + sum_i coeffs[i] * (B_i @ A_i)`
- Avoid creating large intermediate tensors such as `(n_adapters, d_out, d_in)`.
- Broadcasting, reshaping, and transposing are allowed and expected.
- Raise a clear `ValueError` if input shapes are incompatible.

In [None]:
def adapted_linear_multi(x: np.ndarray,
                          W: np.ndarray,
                          As: np.ndarray,
                          Bs: np.ndarray,
                          coeffs: np.ndarray,
                          alpha: float = 1.0) -> np.ndarray:
    ...


np.random.seed(0)
b, d_in, d_out, r, n = 4, 6, 5, 2, 3

x = np.random.randn(b, d_in).astype(np.float32)
W = np.random.randn(d_out, d_in).astype(np.float32)
As = np.random.randn(n, r, d_in).astype(np.float32)
Bs = np.random.randn(n, d_out, r).astype(np.float32)
coeffs = np.random.randn(n).astype(np.float32)
alpha = 0.3

y = adapted_linear_multi(x, W, As, Bs, coeffs, alpha)

W_eff = W.copy()
for i in range(n):
    W_eff += alpha * coeffs[i] * (Bs[i] @ As[i])

y_ref = x @ W_eff.T
print(np.max(np.abs(y - y_ref)))