# Ray Samplers

## Overview

Once we have a set of cameras, we want to cast camera rays associated with each pixel. 

Along these ray we will sample the _field_ and aggregate the samples to predict the pixels value (ie. color). The parameterization of the samples are described [here](./visualize_samples.ipynb) however we must decide where to place these samples along a ray. For this task we will use a [`Sampler`](../../reference/api/model_components/ray_sampler.rst).

In the ideal world we would compute many dense samples along a ray. Unfortunately, each additional sample adds a computation cost to the system as it needs to be processed by the _field_ which is often a neural network.

As a result it is common for NeRF methods to use on the order of 100 samples. Therefore, we want to optimize where those samples are placed in the scene.

For example, if the scene can be bounded by a box and the objects are all similar scales, uniform sampling along the ray may be a good option. On the other hand if the scene is unbounded (potential extending as far as the eye can see) uniform sampling does not make sense as the samples would be very sparse for close objects. In this case a different sampling like _Uniform in Disparity_ may perform better.

```{image} imgs/samplers_type-light.png
:align: center
:class: only-light
```

```{image} imgs/samplers_type-dark.png
:align: center
:class: only-dark
```

### Stratified Sampling

Most samplers has the option to _stratify_ the samplers. When stratified, each sample is randomly perturbed. 

The magnitude of the pertubation is such that the sample ordering remains consistent and the overall distribution statistics are not changed. Using stratified samples during training generally improves the reconstructions as it help prevent overfitting.

During inference stratified sampling should be disabled (nerfstudio samplers will do this) as it can cause noisy artifacts when the camera moves.

```{image} imgs/samplers_stratified-light.png
:align: center
:class: only-light
```

```{image} imgs/samplers_stratified-dark.png
:align: center
:class: only-dark
```

### Hierarchical Sampling

It is important to sample the scene where it has content otherwise the reconstruction quality will be reduced. 

One trick that is often employed in NeRF methods is to do multiple round of sampling. The first round can use a predefined sampler (ie. Uniform) to generate an image. Once the space is sampled, we have an idea which samples contributed to the final color. 

We can use this information to sample more around those regions using a `PDFSampler`. The PDF sampler is described more below.

## Spaced Samplers

These are the most basic samplers that spaces samples based on a predefined function. These samplers have all have a starting and ending distance (also known as a near/far plane). The plots below are histograms of points sampled from some predefined samplers. 


In [3]:
# COLLAPSED
import plotly.graph_objects as go
import torch
from plotly.subplots import make_subplots

from nerfstudio.cameras.rays import RayBundle
from nerfstudio.model_components import ray_samplers as ray_sampler

num_samples = 1000
near = 2
far = 5
train_stratified = False

samplers = [
    ray_sampler.UniformSampler,
    ray_sampler.LinearDisparitySampler,
    ray_sampler.SqrtSampler,
    ray_sampler.LogSampler,
]

fig = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=("Uniform", "Linear in Disparity", "Square Root", "Log Sampler"),
    shared_xaxes=True,
    shared_yaxes=True,
    vertical_spacing=0.1,
)

for i, Sampler in enumerate(samplers):
    sampler = Sampler(num_samples=num_samples, train_stratified=train_stratified)

    ray_bundle = RayBundle(
        origins=torch.ones([1, 3]),
        directions=torch.ones([1, 3]),
        pixel_area=torch.ones([1, 1]),
        nears=torch.ones([1, 1]) * near,
        fars=torch.ones([1, 1]) * far,
    )

    samples = sampler.generate_ray_samples(ray_bundle)

    trace = go.Histogram(x=samples.frustums.starts[0, :, 0], nbinsx=50)
    fig.append_trace(trace, i // 2 + 1, i % 2 + 1)

fig.update_yaxes(title_text="# Samples", row=1, col=1)
fig.update_yaxes(title_text="# Samples", row=2, col=1)
fig.update_xaxes(title_text="Distance", row=2, col=1)
fig.update_xaxes(title_text="Distance", row=2, col=2)

# Overlay both histograms
fig.update_layout(height=700, hovermode=False, showlegend=False, margin=dict(l=20, r=20, t=50, b=20))
fig.update_yaxes(range=[0, 80])
fig.update_traces(opacity=0.7)
fig.show()

## PDF Sampler

The Probability Distribution Function (PDF) Sampler generates samples that match a given distribution. 

In the example below we first create a `UniformSampler` to generate a set of initial samples. We then assign weights to each of these samples to define the PDF (here it is an arbitrary function, but usually you would use the predicted weights from the field). The left plot the target PDF, on the right we plot a histogram of samples generated from the `PDFSampler`.

In [4]:
# COLLAPSED
import plotly.graph_objects as go
import torch
from plotly.subplots import make_subplots

from nerfstudio.cameras.rays import RayBundle
from nerfstudio.model_components import ray_samplers as ray_sampler

num_coarse_samples = 20
num_samples = 1000
near = 2
far = 5
train_stratified = False

fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=("PDF", "Samples"),
)

uniform_sampler = ray_sampler.UniformSampler(num_samples=num_coarse_samples, train_stratified=train_stratified)
pdf_sampler = ray_sampler.PDFSampler(num_samples=num_samples, train_stratified=train_stratified, include_original=False)

ray_bundle = RayBundle(
    origins=torch.ones([1, 3]),
    directions=torch.ones([1, 3]),
    pixel_area=torch.ones([1, 1]),
    nears=torch.ones([1, 1]) * near,
    fars=torch.ones([1, 1]) * far,
)

coarse_ray_samples = uniform_sampler(ray_bundle)

# Generate arbitrary PDF
weights = torch.ones(num_coarse_samples)
weights += torch.sin(torch.linspace(0, 3 * torch.pi, num_coarse_samples))
weights += torch.sin(torch.linspace(0, 0.5 * torch.pi, num_coarse_samples))
weights -= torch.min(weights)
weights /= torch.sum(weights)

samples = pdf_sampler.generate_ray_samples(ray_bundle, coarse_ray_samples, weights[None, :, None], num_samples)

# Plotting stuff
x = torch.ones((num_coarse_samples * 2))
x[::2] = coarse_ray_samples.frustums.starts[0, :, 0]
x[1::2] = coarse_ray_samples.frustums.ends[0, :, 0]

y = torch.ones((num_coarse_samples * 2))
y[::2] = weights
y[1::2] = weights

pdf_trace = go.Scatter(x=x, y=y)
fig.append_trace(pdf_trace, 1, 1)

samples_trace = go.Histogram(x=samples.frustums.starts[0, :, 0], nbinsx=100)
fig.append_trace(samples_trace, 1, 2)

fig.update_yaxes(title_text="# Samples", row=1, col=2)
fig.update_xaxes(title_text="Distance", row=1, col=1)
fig.update_xaxes(title_text="Distance", row=1, col=2)

# Overlay both histograms
fig.update_layout(height=400, hovermode=False, showlegend=False, margin=dict(l=20, r=20, t=50, b=20))
fig.update_traces(opacity=0.7)
fig.show()

### The math behind

:::{note}
In this section we try to explain everything in the context of NERFs. We try our best to be thorough but usually these concepts have a lot more to explore. Also being formal is not allways the best way to explain something so foregive us some simplifications.
:::

We previously mention that a PDF is a Probability Distribution Function. That is, a mathematical function that describes the probability of different possible values of a variable. In more human terms, a PDF is a function that describes the probability of a variable (an statistical variable) to have a certain value. So with a formula like:

$$
    PDF(X=2) = 0.32
$$

We are saying that the probability of a variable $X$ being equal to 2 is 32%. Its like asking to the function what is the probability to have 2 as a value for a certain distribution described by a PDF.

To understand the sampler we also need to understand what a Cumulative Distribution Function or CDF is. A CDF is a function that describes the probability of a variable (again, an statistical variable) being less or equal than a certain value x. You can see this as a fancy way to describe the sum of probabilities up to a point x. If you ever studied calculus this idea actually makes a lot of sense once we take a look at the math formulation of a CDF:

$$
    CDF(x) = P(X \leq x) = \int_{-\infty}^x f_X(t)   dt
$$

But how does that relates to sampling points on NERFs?, well, if we take the density from our first sampling (usually called coarse sampling) and normalize those values, we can interpret that as a piece-wise constant PDF function. You can see this as a way of figuring out where we should sample next, the denser it is, the higher the probability and the more points we want to sample from that region.

```{image} imgs/h_sampling_type-light.png
:align: center
:class: only-light
```

```{image} imgs/h_sampling_type-dark.png
:align: center
:class: only-dark
```

To achieve this we use a technique called **Inverse Transform Sampling**. The inversion sampling is usually use to generate psuedo-random sample of numbers that follows a PDF given its CDF. For our purposes that PDF would be the one described by the normalized densities obtained from the coarse sampling. 

Without digging too deep into the statistic (although [here](https://youtu.be/9ixzzPQWuAY) is a good video explaining it), the idea is to take the inverse of a CDF and, starting from points on a normal distribution, rearrange them in a way that the points will match the distribution we want. This way we are able to sample more points were we need it and less were we don't.

#### Lets dig a bit deeper


First, lest define our PDF, for our use case the PDF are the normalized densities obtained from the coarse sampling at regular intervals, the math would be:

$$
    PDF = 
      \begin{cases} 
         \delta_1 &  0 \leq x \le p_1  \\
         \delta_2 &  p_1 \leq x \le p_2  \\
         \delta_3 &  p_2 \leq x \le p_3  \\
         \vdots & \\
         \delta_{n-1} &  p_{n-2} \leq x \le p_{n-1}  \\
         \delta_{n} &  p_{n-1} \leq x \\
      \end{cases}
$$

With $\delta_i$ the normalized density sampled at $p_i$.

To keep things simple, lets build the $CDF^{-1}$ in a very practical way that will help us to understand what is going on and how its actually implemented in the source code.

When we look at the plot of a simple PDF we notice that is build from a set of rectangles that are bound by our coarse sample points and the normalized densities.

```{image} imgs/pdf-light.png
:align: center
:class: only-light
```

```{image} imgs/pdf-dark.png
:align: center
:class: only-dark
```

From here is easy to see that the CDF can be build by simply adding the rectangles along our sampling area yielding something like this:

```{image} imgs/cdf-light.png
:align: center
:class: only-light
```

```{image} imgs/cdf-dark.png
:align: center
:class: only-dark
```

Under this notion, we can compute the values of our CDF by simply adding the values from our PDF accordingly. Something like this:

$$
    CDF_{p_i} = CDF_{p_{i-1}} + PDF_{p_i} 
$$

Now, our goal its to go from a normal distribution to our PDF. How to acomplish this is by reversing our CDF. If we look at the previous image, we can se that the $Y-axis$ represents the acumulated probability (the CDF value at a point $x$) while the $X-axis$ the range of our random variable (the statistical one), in our case the distance represented by the $p_i$ points.

The values of the CDF goes from $[0,1]$, just like a normal distribution. So, the idea is to take a Y-value within the range $[0,1]$ and figure out the X-value that comes from, a.k.a, the reverse of the function. In image below we can see why this produces values that matches our PDF, the higher the density (and by consequence the probability), the more values will be "trapped" into the boundaries of that density.  

```{image} imgs/inverse-light.png
:align: center
:class: only-light
```

```{image} imgs/inverse-dark.png
:align: center
:class: only-dark
```

Okay, but, how do we do that?. First lets flip our plot so the values of the CDF match our X-axis (just like what happens when we try to inverse a function) and from now lets think about this axis as the U-axis. 

```{image} imgs/flip-light.png
:align: center
:class: only-light
```

```{image} imgs/flip-dark.png
:align: center
:class: only-dark
```

Lets focus on the area were our point $u$ "hits" the CDF. What we want is to first figure out on wich rectangle or "bin" our point falls into. From here, we can trace a line from the lates point of the last bin to the lates point of our current bin. With this line we can easly map from the uniform values to the distance by linear interpolation. The formula would look something like this:


```{image} imgs/interpolation-light.png
:align: center
:class: only-light
```

```{image} imgs/interpolation-dark.png
:align: center
:class: only-dark
```

$$
   x = p_{i-1} + ( u - CDF_{p_{i-1}} ) \frac{ p_i - p_{i-1} }{ CDF_i - CDF_{i-1} }
$$

With this simple formula we can generate as many fine samples as we want matching the distribution of our PDF that ultimatly matches the density of our coarse sample.

#### Demonstration

We are not pulling the formulas out of no where, here is a formal-ish examplanation:

Recalling our PDF:

$$
    PDF = 
      \begin{cases} 
         \delta_1 &  0 \leq x \le p_1  \\
         \delta_2 &  p_1 \leq x \le p_2  \\
         \delta_3 &  p_2 \leq x \le p_3  \\
         \vdots & \\
         \delta_{n-1} &  p_{n-2} \leq x \le p_{n-1}  \\
         \delta_{n} &  p_{n-1} \leq x \\
      \end{cases}
$$

Starting from the formal definition of the CDF:

$$ 
\begin{aligned} 
    CDF &= \int_{-\infty}^x PDF(t)  dt  \\
\end{aligned}
$$

We can se that the goal is to compute the area under the curve of our PDF. Thankfully, for our **piece-wise constant** function we dont actually need complex calculations to achieve this but a clever way ot visualize this will help us. 

As before, we are building the CDF by adding the rectangles of the PDF:


```{image} imgs/pdf-cdf-light.png
:align: center
:class: only-light
```

```{image} imgs/pdf-cdf-dark.png
:align: center
:class: only-dark
```

However, if we look closely to the progress of the area, we don't jump from one value to another, we actually have a linear progression while we traverse the function. Something more like this:

```{image} imgs/integral-light.png
:align: center
:class: only-light
```

```{image} imgs/integral-dark.png
:align: center
:class: only-dark
```

With this notion we see that our CDF can be though as a series of linear functions defined by our coarse samples. This allow as to express our CDF by a simple two point line equation: 

```{image} imgs/line-eq-light.png
:align: center
:class: only-light
```

```{image} imgs/line-eq-dark.png
:align: center
:class: only-dark
```

$$
\begin{aligned} 
   CDF(x) &= CDF_{p_{i-1}} + ( x - p_{i-1} )( \frac{CDF_{p_i} - CDF_{p_{i-1}} }{ p_i - p_{i-1} } )
\end{aligned}
$$


From here is trivial to solve for $x$, giving us the formula to generate our fine samples.

$$
   x = p_{i-1} + ( u - CDF_{p_{i-1}} ) \frac{ p_i - p_{i-1} }{ CDF_i - CDF_{i-1} }
$$

As you can see this is the exact same formula that we derive from our more practical notion.

#### Wrap things up

The process to generate our fine samples utilizing inverse transforming sampling is:

```{image} imgs/h_sampling_ex_type-light.png
:align: center
:class: only-light
```

```{image} imgs/h_sampling_ex_type-dark.png
:align: center
:class: only-dark
```

This is the basic idea, but if you look at the [source code](https://github.com/MashiPe/nerfstudio/blob/main/nerfstudio/model_components/ray_samplers.py#L251) you would notice extra implementation details but we are confident that you would figure it out.