## JAX and Scientific Computing
The basic prerequisite is multivariable calculus.

### Multivariable Calculus: Review 
Multivariable functions can be scalar-valued or vector-valued, depending on whether their output is a scalar or a vector.

#### 1. Scalar-valued functions:
A scalar-valued function is used when you want a single number that measures something. In machine learning, this usually means a quantity you want to evaluate, compare, or optimize. The most important use of scalar-valued functions in machine learning is as loss (or cost) functions. A loss function takes many inputs—model parameters, predictions, and true labels—but outputs one number that tells you how bad the model is.

#### 2. Vector-valued functions:
A vector-valued function is used when the output naturally has multiple components. In machine learning, this usually means predictions, transformations, or feature mappings. Models produce structured outputs, not single numbers. Those outputs are vectors.

Vector-valued functions are also used for:
- Word embeddings (mapping a word to a vector)
- Hidden layers in neural networks
- Image feature representations

$$
\mathbf{r}(t) = \langle x(t),\, y(t),\, z(t) \rangle
$$

Here, $x(t)$, $y(t)$, and $z(t)$ are ordinary real-valued functions, called the component functions. Together, they describe how each coordinate of the vector changes as $t$ changes.

You can think of a vector-valued function as describing motion: as $t$ varies, the tip of the vector moves through space, tracing out a curve. The function tells you both magnitude and direction, which is why vector-valued functions are fundamental in physics and engineering.

**Example:**
$$
\mathbf{r}(t) = \langle t,\, t^2 \rangle
$$

### Partial Derivatives: A New Perspective

We might be familiar with partial derivatives:

$$
\frac{\partial f}{\partial x}, \quad \frac{\partial f}{\partial y}
$$

Each of these means:

> "What happens if I change one coordinate and freeze the others?"

But that is a very special kind of change.

Now ask a more general question:

> "What if I move the input in some arbitrary direction?"

Let:

$$
f: \mathbb{R}^n \to \mathbb{R}
$$

Pick a direction vector:

$$
\mathbf{v} \in \mathbb{R}^n
$$

The directional derivative of $f$ at $\mathbf{x}$ in direction $\mathbf{v}$ is:

$$
D_{\mathbf{v}} f(\mathbf{x}) = \lim_{\epsilon \to 0} \frac{f(\mathbf{x} + \epsilon \mathbf{v}) - f(\mathbf{x})}{\epsilon}
$$

**Interpretation:**

> "If I nudge the input slightly in direction $\mathbf{v}$, how fast does the output change?"

This is the most general first-derivative question you can ask.

If a function is differentiable, then all directional derivatives are given by dot products with one fixed vector. That vector is the gradient, and if we want to find the vector that maximizes the change, then it is the fixed vector itself that we dot with, which is the gradient.

$$
D_{\mathbf{v}} f(\mathbf{x}) = \nabla f(\mathbf{x}) \cdot \mathbf{v}
$$

