# Introduction

We analytically calculate the local volume obtained through the random projection method, given boundaries defined by $x, \frac{a \pm w}{x}$ for a point located at $ab, a/b$ for $a > w > 0, b > 0$. This is a toy model that represents scale invariance in a multi-layer neural network.

## Local Volume Calculation

The volume estimated via random projections from a point $P = (ab, a/b)$ is the volume of the set reachable by straight lines from $P$. We're interested in the set bounded by $x, \frac{a \pm w}{x}$. The random projection approach amounts to finding $r(\theta)$, where $r(\theta)$ is the minimum positive distance such that

$$
\left(ab + r \sin \theta, a/b + r \cos \theta \right) = \left(x, \frac{a \pm w}{x} \right)
$$

Then the volume given $r (\theta)$ is 

$$
V = V_{ball, n} \times \int r^n (\theta) d\theta
$$

where n is the dimension of the problem, and $V_{ball, n}$ is the volume of the unit ball in n dimensions. Due to the radial integrals, this volume is difficult to calculate.

Instead, since the volume from random projections is equal to the volume of the set reachable by straight lines, we compute this volume directly by obtaining the bounding equations for this set and integrating.

## Symmetries

Note that the volume of any point inside the boundaries is identical under flips along the $x, y$  axes (where the point and boundaries are flipped, corresponding to the sign of the minima defining parameter $a$ and scaling parameter $b$. Therefore, without loss of generality we can calculate the volume asuming $a, b > 0$, working entirely in the first quadrant. 

## a > w

The problem is qualitatively different for $a < w$, as the boundaries of the set are no longer defined by inspecting a pair of lines in $x, y > 0$, but instead extra lines in the other quadrants with more unique boundary conditions. For now, we consider $a > w$. 

## Set Boundaries

Without loss of generality (due to our symmetries), the boundary is primarily dominated by the lower boundary $\left(x, \frac{a - w}{x} \right)$. This changes when a straight line from the point intersects instead with the top boundary. When this occurs, two things happen:

- The slope of the upper boundary at some critical value $x_{c}$, given by $\frac{d}{dx} \frac{a + w}{x} = - \frac{a + w}{x^2}$, is equal to the slope of the line from the point on the upper boundary to the original point
- The line from $x_{c}$ to the original point ontop of touching the upper boundary, also touches the lower boundary at some $x_{i}$ or $x_{f}$, which represents the smallest or largest $x$ in the set. These values are also the largest or smallest possible values of $x$ for which a point drawn from them to the original point crosses the upper boundary, and crosses so exactly once.

Note that there are two critical values.

Our set is defined as follows:

- The lower boundary of our set is given by $(x, \frac{a - w}{x})$
- At the far left, there is a $x_i$ such that a line towards $(x_i, \frac{a - w}{x_i})$ touches a point on the upper boundary $(x_{c, 1}, \frac{a - w}{x_{c, 1}})$. For x values $x_{c, 1} > x > x_{i}$, the set is given by this straight line.
- For $x_{c, 2} > x > x_{c, 1}$, the set is given by the upper boundary $(x, \frac{a + w}{x})$. $x_{c, 2}$ is a critical point similar to $x_{c, 1}$, but for the larger x values.
- At the far right, there is a $x_f$ that plays a similar role to $x_i$. 

### Critical Values

For the critical value, the slope of the upper boundary curve at $x_{c}$

$$
\frac{d}{dx} \frac{a + w}{x} = - \frac{a + w}{x^2}
$$

is equal to the slope from the point on the upper boundary curve to the original point (denoted by $x_2, y_2 = ab, a/b$)

$$
\frac{y_2 - y_1}{x_2 - x_1} = \frac{\frac{a}{b} - \frac{a + w}{x_c}}{ab - x_c}
$$

Setting these equal yields

$$
\begin{split}
-\frac{a + w}{x_c^2} = \frac{\frac{a}{b} - \frac{a + w}{x_c}}{ab - x_c} \implies -\big(a + w\big)\big(ab - x_c\big) &= \frac{a}{b}x_c^2 - (a + w)x_c\\
\implies 0 &= a x_c^2 - 2 b (a + w) x_c + a b^2 (a + w)\\
\implies x_c &= \frac{b(a + w)}{a} \left(1 \pm \sqrt{\frac{w}{a+w}} \right)
\end{split}
$$

where $\pm$ denotes which of the two critical points are of interest. Remember $x_{c, 1} < ab$ denote the critical point less than the original point's x value (given by the negative value), and $x_{c, 2} > ab$ the critical point greater.

## Boundary Lines

We need the slope and y-intercept of the lines defining our boundary. We compute these here. The slope is given by the slope at our critical point, meaning

$$
\boxed{m_{i/f} = - \frac{a + w}{x_{c}^2} = -\frac{a^2}{b^2(a + w)}\frac{1}{\left(1 \pm \sqrt{\frac{w}{a + w}}\right)^2}}
$$

For the y-intercepts, we know that $m_i x_c + d_i = y_c$, implying

$$
\begin{split}
d_{\pm} &= \frac{a + w}{x_c} - m_{\pm} x_c 
\\&= \frac{a + w}{x_c} + \frac{a + w}{x_c}
\\&= \frac{a}{b} \frac{2}{1 \pm \sqrt{\frac{w}{a + w}}}
\end{split}
$$

Which are remarkably simple straight lines.

## Largest / Smallest X Values

The largest and smallest x values of our set are given by intercepts between our lines and the lower boundary.

$$
\begin{split}
m x + d &= \frac{a - w}{x}
\\ &\implies - \frac{a + w}{x_c^2} x + 2\frac{a + w}{x_c} = \frac{a - w}{x}
\\ &\implies - x^2 + 2 x x_c - \frac{a - w}{a + w} x_c^2 = 0
\end{split}
$$

The solution is given by

$$
x = x_c \left(1 \pm \sqrt{\frac{2w}{a + w}} \right)
$$

Note that there are two possibilities for $x_c$, and two solutions here for $x$. For $x_{c, i}$, on the left, the two solutions for $x_i$ correspond to the two intercepts with the lower boundary via the straight line, and we take the smaller $x_i$ value. Similarly, for $x_f$, the upper bound in our set, we take the larger $x_f$ value.

$$
x_i = x_{c, 1} \left(1 - \sqrt{\frac{2w}{a + w}} \right), \quad x_f = x_{c, 2} \left(1 + \sqrt{\frac{2w}{a + w}} \right), 
$$

## Local Volume Integral

The full integral is given by

$$
V = \int_{x_i}^{x_{c, 1}}\bigg( m_i x + d_i - \frac{a - w}{x} \bigg) dx + \int_{x_{c, 1}}^{x_{c, 2}}\bigg(\frac{2 w}{x} \bigg) dx + \int_{x_{c, 2}}^{x_f}\bigg( m_f x + d_f - \frac{a - w}{x} \bigg) dx
$$

Carrying out the integrals yield

$$
\begin{split}
V &= \bigg[m_i \frac{x^2}{2} + d_i x - (a - w) \log x \bigg]_{x_i}^{x_{c, 1}} + 2w \bigg[\log x \bigg]_{x_{c, 1}}^{x_{c, 2}} + \bigg[m_f \frac{x^2}{2} + d_f x - (a - w) \log x \bigg]_{x_{c, 2}}^{x_{f}}
\\&= \bigg[m_i \frac{x^2}{2} + d_i x \bigg]_{x_i}^{x_{c, 1}} + \bigg[m_f \frac{x^2}{2} + d_f x\bigg]_{x_{c, 2}}^{x_{f}} + 2w \bigg[\log x \bigg]_{x_{c, 1}}^{x_{c, 2}} - (a - w)\bigg(\bigg[log x \bigg]_{x_i}^{x_{c, 1}} + \bigg[log x \bigg]_{x_{c, 2}}^{x_{f}}\bigg)
\end{split}
$$

We proceed by substituting in parts

### Log Terms

Expanding the log terms yields

$$
\begin{split}
&= 2w\log x_{c,2} - 2w\log x_{c,1} - (a - w)\log x_{c,1} + (a - w)\log x_i - (a - w)\log x_f + (a - w)\log x_{c,2}
\\ &= (a + w)\log \frac{x_{c,2}}{x_{c,1}} + (a - w)\log \frac{x_i}{x_f}
\\ &= (a + w)\log \frac{x_{c,2}}{x_{c,1}} - (a - w)\log \frac{x_{c,2}\left(1 + \sqrt{\frac{2w}{a + w}} \right)}{x_{c,1}\left(1 - \sqrt{\frac{2w}{a + w}} \right)}
\\ &= 2w \log \frac{\left(1 + \sqrt{\frac{w}{a+w}} \right)}{\left(1 - \sqrt{\frac{w}{a+w}} \right)} - (a - w)\log \frac{\left(1 + \sqrt{\frac{2w}{a + w}} \right)}{\left(1 - \sqrt{\frac{2w}{a + w}} \right)}
\end{split}
$$

where we've substituted the explicit identities.

## Polynomial Terms

For the polynomial terms, we compute the following

$$
\begin{split}
x_{c,1}^2 - x_i^2 &= x_{c,1}^2 \bigg( 1 - \left(1 - \sqrt{\frac{2w}{a + w}} \right)^2 \bigg) 
\\& = 2 x_{c,1}^2 \bigg(\sqrt{\frac{2w}{a + w}} - \frac{w}{a + w} \bigg)
\end{split}
$$

$$
\begin{split}
x_{c,1} - x_i &= x_{c,1} \bigg( 1 - \left(1 - \sqrt{\frac{2w}{a + w}} \right) \bigg)
\\&= x_{c,1} \sqrt{\frac{2w}{a + w}}
\end{split}
$$

$$
\begin{split}
x_{f}^2 - x_{c, 2}^2 &= -x_{c,2}^2 \bigg( 1 - \left(1 + \sqrt{\frac{2w}{a + w}} \right)^2 \bigg) 
\\& = 2 x_{c,2}^2 \bigg(\sqrt{\frac{2w}{a + w}} + \frac{w}{a + w} \bigg)
\end{split}
$$

$$
\begin{split}
x_{f} - x_{c, 2} &= -x_{c,2} \bigg( 1 - \left(1 + \sqrt{\frac{2w}{a + w}} \right) \bigg)
\\&= x_{c,2} \sqrt{\frac{2w}{a + w}}
\end{split}
$$
Note that these are all positive as they should be. The only worrying part is the first square root, but since $a > w$, it's guaranteed to be positive.

Then the coefficients

$$
\begin{split}
\frac{m_i}{2} \big( x_{c,1}^2 - x_i^2\big) &= -\frac{a + w}{x_{c, 1}^2} x_{c,1}^2 \bigg(\sqrt{\frac{2w}{a + w}} - \frac{w}{a + w} \bigg)
\\& = -(a + w) \bigg(\sqrt{\frac{2w}{a + w}} - \frac{w}{a + w} \bigg)
\\& = w - \sqrt{2w (a + w)}
\end{split}
$$

$$
\begin{split}
d_i \big( x_{c,1} - x_i \big) &= \frac{2 (a + w)}{x_{c, 1}} x_{c,1} \sqrt{\frac{2w}{a + w}} 
\\&= 2 \sqrt{2w (a + w)} 
\end{split}
$$

$$
\begin{split}
\frac{m_f}{2} \big( x_{f}^2 - x_{c, 2}^2 \big) &= -\frac{a + w}{x_{c, 2}^2}x_{c,2}^2 \bigg(\sqrt{\frac{2w}{a + w}} + \frac{w}{a + w} \bigg)
\\& = -w - \sqrt{2w (a + w)}
\end{split}
$$

$$
\begin{split}
d_f \big( x_{f} - x_{c, 2} \big) &= \frac{2 (a + w)}{x_{c, 2}} x_{c,2} \sqrt{\frac{2w}{a + w}}
\\&= 2 \sqrt{2w (a + w)} 
\end{split}
$$
Note that the slope terms are always negative, while the constants are positive.

Our polynomial volume term is

$$
\begin{split}
V_{polynomial} &= \bigg[m_i \frac{x^2}{2} + d_i x \bigg]_{x_i}^{x_{c, 1}} + \bigg[m_f \frac{x^2}{2} + d_f x\bigg]_{x_{c, 2}}^{x_{f}} 
\\&= w - \sqrt{2w (a + w)} + 2 \sqrt{2w (a + w)} - w - \sqrt{2w (a + w)} + 2 \sqrt{2w (a + w)} 
\\&= 2\sqrt{2w (a + w)}
\end{split}
$$

## Total Volume

Adding up the two contributions, we find

$$
\begin{split}
V &= 2\sqrt{2w (a + w)} + 2w \log \frac{\left(1 + \sqrt{\frac{w}{a+w}} \right)}{\left(1 - \sqrt{\frac{w}{a+w}} \right)} - (a - w)\log \frac{\left(1 + \sqrt{\frac{2w}{a + w}} \right)}{\left(1 - \sqrt{\frac{2w}{a + w}} \right)}
\end{split}
$$

Surprisingly, there is no dependence on the scale $b$ in this. All the dependences of the scale cancelled out, meaning the local volume estimates are scale invariant (!!!!).

As we would like, they depend on the width of the original minima $w$. However, they also depend on the value $a$, with larger volumes attaining for larger values of $a$. This appears to be because minima with a given $w$ but larger $a$ are more star-convex.

The independence on the scale might make sense because it happens that our number of parameters is exactly equal to the number of scale invariances. If we consider more general systems with more parameters but no scale invariance along all sides, this may not be true anymore.

We'll need good numeric algorithms to confirm this result for the simple cases where parameters = scale, and then try in 3d. We might see a scale dependence in the 3D case with two parameters in the first dimension.

We also first need to confirm this algorithm really works, and test with a better code and a good random sampling code. They should give the same predictions. How many directions and how much scaling will we need lol...