#### Neural Radiance Field in 3D computer vision

#### my paper implementation file can be found  [here](../src/NeRf/paper_implementation.py)

##### [Survey](https://arxiv.org/pdf/2210.00379)

##### Volume Rendering: 

Volume rendering is a technique used to create 2D images from 3D volumetric data — where “volumetric data” means that inside the space (not just on surfaces), there are properties like *color* and *density* at every point.

The core idea is:
- Shoot a ray from the camera into the scene.
- As you move along the ray, sample points in the volume.
- At each sample:
    - Query the color and density.
    - Accumulate (integrate) these along the ray to compute the final pixel color.

the formula of the color $C(r)$ along the ray $r$ is:
$$C(r) = \int_{t_n}^{t_f} T(t) \sigma(r(t)) \mathbf{c}(r(t), \mathbf{d}) \, dt$$

Where:
- $\mathbf{c}(r(t), \mathbf{d})$ = RGB color emitted at point $r(t)$ (in the direction $\mathbf{d})$,
- $\sigma(r(t)$) = density at point $r(t)$ (how much it blocks or emits light),
- $T(t)$ = transmittance: how much light from $r(t)$ reaches the camera without getting absorbed.
- $t_n$, $t_f$ = near and far bounds along the ray.



----
##### from the seurvey:

Broadly speaking, novel view synthesis using a trained NeRF model is as follows.

- For each pixel in the image being synthesized, send camera rays through the scene and generate a set of sampling points.
- For each sampling point, use the viewing direction and sampling location to compute local color and density using the NeRF MLP(s).
- Use volume rendering to produce the image from these colors and densities.

Given the volume density and color functions of the scene being rendered, volume rendering [21] is used to obtain the position and color $C(\mathbf{r})$ of any camera ray $\mathbf{r}(t) = \mathbf{o} + t\mathbf{d}$, with camera position $\mathbf{o}$ and viewing direction $\mathbf{d}$ using

$$C(\mathbf{r}) = \int_{t_1}^{t_2} T(t) \cdot \sigma(\mathbf{r}(t)) \cdot \mathbf{c}(\mathbf{r}(t), \mathbf{d}) \, dt,$$

where $\sigma(\mathbf{r}(t))$ and $\mathbf{c}(\mathbf{r}(t), \mathbf{d})$ represent the volume density and color at point $\mathbf{r}(t)$ along the camera ray with viewing direction $\mathbf{d}$, and $dt$ represents the differential distance traveled by the ray at each integration step. $T(t)$ is the accumulated transmittance, representing the probability that the ray travels from $t_1$ to $t$ without being intercepted, given by

$$T(t) = \exp\left(-\int_{t_1}^{t} \sigma(\mathbf{r}(u)) \, du\right).$$

Novel views are rendered by tracing the camera rays $C(\mathbf{r})$ through each pixel of the to-be-synthesized image. This integration can be computed numerically. The original implementation [1] and most subsequent methods used a non-deterministic stratified sampling approach, where the ray was divided into $N$ equally spaced bins, and a sample was uniformly drawn from each bin. Then, equation (2) can be approximated as

$$\hat{C}(\mathbf{r}) = \sum_{i=1}^{N} \alpha_i T_i \mathbf{c}_i, \quad \text{where} \quad T_i = \exp\left(-\sum_{j=1}^{i-1} \sigma_j \delta_j\right),$$

$\delta_i$ is the distance from sample $i$ to sample $i+1$, $(\sigma_i, \mathbf{c}_i)$ are the density and color evaluated along the sample point $i$ given the ray, as computed by the NeRF MLP(s). $\alpha_i$ the transparency/opacity from alpha compositing at sample point $i$, is given by

$$\alpha_i = 1 - \exp(-\sigma_i \delta_i).$$

An expected depth can be calculated for the ray using the accumulated transmittance as

$$d(\mathbf{r}) = \int_{t_1}^{t_2} T(t) \cdot \sigma(\mathbf{r}(t)) \cdot t \, dt.$$

----

<img src=./images/NeRF.png>