# Convolution Rant

I should be doing homework right now but I got sidetracked thinking about how to think about convolution better, went on a run to think about it more deeply, came back, and now I wanna write this down before I get hungry and forget.

Say we have two functions, $x(t)$ and $h(t)$. We want to find a function that quantifies the overlap between x and h i.e. if x > 0 and h > 0 or x < 0 when h < 0 we have a positive contribution to our overlap, when x > 0 and h < 0 or vice versa we have a negative contribution to our overlap. 

We can give one such expression for this as follows:

$$
R = \int_{-\infty}^\infty x(t)h(t)dt
$$

This will capture all overlap between the functions. We can also add an offset between the two functions to parameterize R as a function of that offset.

$$
R(\tau) = \int_{-\infty}^\infty x(t)h(t + \tau)dt
$$

We could also define as follows:

$$
R(\tau) = \int_{-\infty}^\infty x(t + \tau)h(t)dt
$$

We could also sign flip tau to get a different expression. None of these specifics matter because we are encoding the exact same information and it's all by convention anyway. The important thing is that we are encoding overlap as a function of offset.

If our integral is still dependent on t, then R is dependent on both absolute time and tau i.e. $R(t, \tau)$. This means our correlation is time-variant. If it isn't, then it's time invariant.

Say we wanted to find $\frac{dR}{d\tau}$, the first thing we can do to simplify our problem is represent our inner product:
$$
A(t, \tau) = x(t)h(t+\tau)
$$

Geometrically, you can think of any point along $A(t, \tau)$ as a rectangle where one side is the value of $x(t)$ and the other side is the value of $h(t + \tau)$. We can then approximate A by breaking it up into m points along the time axis via point function $P(n) n \rightarrow t$. $A(k)$ is then the value of $A$ at time $t = P(k)$. From here, we can generate a Riemann sum where the kth rectangle's height is $A(P(k))$ and the width for any rectangle r is $P(r+1) - P(r)$, which is assumed to be constant. This means that the area under the curve $A(t, \tau)$ is really just an infinite series of rectangles with height $A(P(k))$ and width $\Delta P$. If we were to change tau slightly, we could approximate how R is effected by summing all the area differences across all rectangles. From here, it's intuitive to see that if we were to define a new Riemann sum where the height of each rectangle is equal to $\frac{dA(k)}{d\tau}$, this Riemann sum would converge to $\frac{dR(\tau)}{d\tau}$.

We now focus on $\frac{dR}{d\tau}$. A(t) at any given point in time t is really just the product of $x(t)$ and $h(t + \tau)$. Forget how offset tau is represented for the moment. Forget you know how to do algebra too for good measure. Imagine you have two functions overlapping at a point. You should be able to convince yourself that if you move one function to the right at any point, the overall 'rectangle area change' expressed by that point will be the same as if you moved the other function to the left by the same amount (remember this directional dichotomy). This is because in both cases the same two parts of the curves will be overlapping, and thus will multiply to the same 'new area' value. We can then ask "how does this area change when I freeze one function in place and slide the other?". From here, it's intuitive that the instantaneous rate of area change $\frac{dA}{d\tau}$ is equal to the value of the 'fixed' function $x(t)$ multiplied by $\frac{dh}{d\tau}$. Pretend $h(\tau)$ is a plane, standing still at point t and sliding the plane $\tau$ steps under us would give a vertical difference equal to if we walked along the plane + $\tau$ steps along the horizontal axis. It would also be the negative value of our drop if we walked $-\tau$ steps back along the horizontal axis. Thus we can say $\frac{dh}{d\tau} = \frac{dh}{dt}$. By extension of what we said earlier, we can also say $\frac{dh}{d\tau}x(t) = -\frac{dx}{d\tau}h(t)$ if we were to slide x instead of h. If we were sliding both out in opposite directions then we would say:

$$\frac{dA}{d\tau} = \frac{dh(t)}{d\tau}x(t) + \frac{dx(t)}{d\tau}h(t) = \frac{dh(t)}{dt}x(t) + \frac{dx(t)}{dt}h(t)$$

Which provides some nice geometric intuition for the product rule.


This is all to say that:

$$
\frac{dR(\tau)}{d\tau} = \int_{-\infty}^\infty x(t)\frac{dh(t + \tau)}{dt}dt = -\int_{-\infty}^\infty \frac{dx(t + \tau)}{dt}h(t)dt
$$


Here, the negative sign in the second expression results from the directional dichotomy (TM) listed above.



Going off on another tangent briefly, it is highly likely that we are all familiar with the standard limit expression of a derivative:

$$
\frac{df}{dt} = \lim_{a\rightarrow0} \frac{f(t + a) - f(t)}{a}
$$

For any continuous and differentiable function f. This is relatively obvious and most ten year olds could figure it out if they actually tried. The key idea here is that when we take this limit we get some clearly defined algebraic manipulation on our function to transform it to the derivative, which is just the measure of how our function's output changes with respect to time. The corrolary of this is that we can also see what sort of function we'd want to differentiate in order to get our function f, this is called the antiderivative. 

Back to the discrete delta series example above. Imagine you have a function $y(t)$, which is continuous and differentiable and that we know the derivative $y'(t)$. We can get that derivative either through algebra shortcut or chadly limit convergence, it really doesn't matter. We can then say that this derivative $y'(t)$ as approximately a series of rectangles each of width p. The height of the rectangle is dictated by the value of y'(t) on the left-hand corner of that rectangle. If you were to graph the function that this approximate derivative represents the rate of change of, it would look like a bunch of straight lines bent into the general shape of $y(t)$. As we make p smaller, this function will become smoother and smoother until its curves match $y(t)$ exactly. In other words:

$$
y'_a(t) = \sum_{n=-\infty}^{\infty} y'(n p)\, \Pi\!\left(\frac{t - np}{p}\right)
$$
$$
\lim_{p\rightarrow 0}y'_a(t) = y'(t)
$$

The interesting part here is we see that we can define the distances between any two values of the function across time, say $t_1$ and $t_2$, as roughly equal to the sum of the deltas between them. Once again, as p goes to zero and the number of delta steps goes to infinity this summation will converge to the actual difference between the two function values. In other words:

$$
y(t_2) - y(t_1) = \lim_{p\rightarrow 0} \sum_{n_{t1}}^{n_{t2}}y'(np) * p
$$

In the context of an integral, we'd like to find the area under a curve. We know that one way to do this is by approximating the function as a bunch of rectangles, each of a height dictated by the function's value on the left-hand side of that rectangle and with a width dictated by p. We can then sum the areas of all these rectangles to approximate the area of the function. That is all to say:

$$
A(y(t)) = \lim_{p\rightarrow 0} \sum_n y(np) * p
$$

To find the area under the curve between two points in time we just take the subset of the summation that we care about:

$$
A(t_1, t_2) = A(y(t_2)) - A(y(t_1)) = \lim_{p\rightarrow 0} \sum_{n_{t1}}^{n_{t2}}y(np) * p
$$

$y(t)$ is just a function, and more specifically it represents the derivative of $Y(t)$ with respect to time. From the above, we know that:
$$
Y(t_2) - Y(t_1) = \lim_{p\rightarrow 0} \sum_{n_{t1}}^{n_{t2}}y(np) * p
$$

But this is also equal to the area difference we were expressing above, therefore we can say:
$$
A(t_1, t_2) = Y(t_2) - Y(t_1) = \int_{t_1}^{t_2}y(t)dt
$$

This provides some nice geometric intuition behind the Fundamental Theorem of Calculus.

We can think of the function $A(t)$ as being approximately composed of a bunch of rectangles, each with height $x(P(t))h(P(t))$ and width p, where p is very small. Let $B(t) = x(P(t))h(P(t))$. Now we would like to grab the value $A(t)$ at some particular $t=\tau$. We want to find $A(\tau)$, but the best thing we have is a rectangle of height $B(\tau)$ and width p, where $B(\tau) = x(P(\tau))h(P(\tau))$, which can be assumed to be the nearest valid point to the left of tau. As p decreases, the number of rectangles increases, and $B(t)$ gets closer and closer to A(t) such that:
$$
\lim_{n\rightarrow\infty}B(\tau) = A(\tau)
$$

But for now we will assume that p is just very small.

We naively think to ourselves that one way we could get $B(\tau)$ for some time $\tau$ is to take a function with a value equal to $\frac{1}{p}$ at exactly $t=\tau$ and zero everywhere else, the mutliply that function with. A reasonable way to approximate this would be the following:
$$
\delta_p(t-\tau) =
\begin{cases}
\dfrac{1}{p} \exp\!\left(-\dfrac{t-\tau}{p}\right), & t \ge \tau, \\[8pt]
0, & t < \tau.
\end{cases}
$$

Basically just a highly aggressive decaying exponential with some x-offset to the right. Then:

$$
A(\tau) = \lim_{p\rightarrow 0}\int_{-\infty}^{\infty}B(t)\delta_p(t-\tau)dt = \lim_{p\rightarrow 0}\int_{\tau}^{\infty}B(t)\delta_p(t-\tau)dt 
$$

We don't care about anything after $\tau + p$ because at that point we're on a new rectangle. We also know that for any p the integral will equal $A(\tau)[1 - e^{-1}]$, so we can just divide this constant out. However, because the whole curve gets steeper and steeper, the remaining 36% overflowing into the other rectangles becomes negligible because they are infinitessimally close to our target. At this point 63% of our 'mass' lies in our target rectangle and the other 36% lies infinitessimally close to it, which could be considered a w. That means the convergent behaviour of this integral as p approaches zero is $B(\tau)$. Thus:

$$
A(\tau) = \lim_{p\rightarrow 0}\int_{-\infty}^{\infty}B(t)\delta_p(t-\tau)dt
$$

