# Assignment 3 : Neural Volume Rendering and Surface Rendering

#### Submitted by: Shahram Najam Syed
#### Andrew-ID: snsyed
#### Date: 11th March, 2025
#### Late days used: 0

## A. Neural Volume Rendering (80 points)

### 0. Transmittance Calculation (10 points)

<img src="./output/figure1.png">

Since, 
$$
\frac{dT}{dy} = -\sigma(y)T
$$

Hence the base equation for transmittance becomes:
$$
T = e^{-\int \sigma(y) dy}
$$

So,
$$
T(y_1, y_2) = e^{-\int_{y_1}^{y_2} \sigma(y) dy} = e^{-2}
$$

$$
T(y_2, y_4) = e^{-\int_{y_2}^{y_3} \sigma(y) dy} \times e^{-\int_{y_3}^{y_4} \sigma(y) dy} = e^{-30.5}
$$

$$
T(x, y_4) = T(x, y_2) \times T(y_2, y_4) = T(x, y_1) \times T(y_1, y_2) \times T(y_2, y_4) = e^{-32.5}
$$

$$
T(x, y_3) = T(x, y_1) \times T(y_1, y_2) \times T(y_2, y_3) = e^{-2.5}
$$


---

### 1. Differentiable Volume Rendering

#### 1.3. Ray sampling (5 points)

<table>
<tr>
<th>Grid Visualization</th>
<th>Ray Visualization</th>
</tr>
<tr>
<td><img src="./images/1_3_xygrid.png"></td>
<td><img src="./images/1_3_rays.png"></td>
</tr>
</table>

---

#### 1.4. Point sampling (5 points)

<center><img src="./images/1_4_pts_sampled.png"></center>

---

#### 1.5. Volume rendering (20 points)

<table>
<tr>
<th>Color Visualization</th>
<th>Depth Visualization</th>
</tr>
<tr>
<td><img src="./images/part_1.gif"></td>
<td><img src="./images/1_5_depth.png"></td>
</tr>
</table>

---

### 2. Optimizing a basic implicit volume

#### 2.1. Random ray sampling (5 points)

Implemented in ray_utils.py

---

#### 2.2. Loss and training (5 points)

* Box side length: (2.02, 1.50, 1.50)
* Center: (0.25, 0.25, 0.00)

---

#### 2.3. Visualization

<center><img src="./images/part_2.gif"></center>

---

### 3. Optimizing a Neural Radiance Field (NeRF) (20 points)

<table>
<tr>
<th>Epochs=10</th>
<th>Epochs=50</th>
<th>Epochs=100</th>
<th>Epochs=250</th>
</tr>
<tr>
<td><img src="./images/part_epoch_10.gif"></td>
<td><img src="./images/part_epoch_50.gif"></td>
<td><img src="./images/part_epoch_100.gif"></td>
<td><img src="./images/part_epoch_250.gif"></td>
</tr>
</table>

---

### 4. NeRF Extras

#### 4.1 View Dependence (10 points)

<table>
<tr>
<th>Low-res with no view dependence</th>
<th>Low-res with view dependence</th>
</tr>
<tr>
<td><img src="./images/part_4_1_lr_nvd.gif"></td>
<td><img src="./images/part_4_1_lr_vd.gif"></td>
</tr>
</table>

<table>
<tr>
<th>High-res with no view dependence</th>
<th>High-res with view dependence</th>
</tr>
<tr>
<td><img src="./images/part_4_1_hr_nvd.gif"></td>
<td><img src="./images/part_4_1_hr_vd.gif"></td>
</tr>
</table>

**Observation:** Juxtaposing low-resolution no view dependence vs view dependence I don't observe a marked difference for the subtle lighting variations and specular highlights, which makes sense due to limited resolution. On the flip side, the subtle variations is much evident for high-resolution renders.

**Trade-offs b/w view dependence vs generalization quality:**
* View-dependent method captures intricatre scene details by leveraging viewpoint specific information, yielding photorealistic renderings that generalized approaches may lack.
* View dependent method for ample resolution (as observed) produce crisper, high-fidelity, and more defined results due to their ability to adapt rendering to specific viewing angles.
* But, view-dependent models struggle with novel unseen viewpoints, increasing the risk of voerfitting to training data. In contrast, generalized methods prioritize robustness across diverse angles.
* The added details comes at a cost requiring greater model complexity leading to higher computational and memory expenses compared to lightweight and parameter-efficient generalized approaches.

---

#### 4.2 Coarse/Fine Sampling (10 points)

<table>
<tr>
<th>Scene</th>
<th>Before Coarse/Fine Sampling</th>
<th>After Coarse/Fine Sampling</th>
</tr>
<tr>
<td>Materials</td>
<td><img src="./images/4_2_wo_cfs.gif"></td>
<td><img src="./images/4_2_w_cfs.gif"></td>
</tr>
<tr>
<td>Lego</td>
<td><img src="./images/4_2_wo_cfs_lego.gif"></td>
<td><img src="./images/4_2_w_cfs_lego.gif"></td>
</tr>
</table>

**Trade-offs (Speed vs. Quality):**

* Quality Improvement: Fine sampling captures high-frequency details (e.g., textures, thin structures), yielding photorealistic renders, while the coarse pass avoids "wasting" samples on empty regions, refining only relevant areas.

* Speed Cost: Two-pass sampling doubles computation per ray (coarse + fine network queries). And additionally, training/inference time increases, but total samples per ray remain fixed (e.g., 128 total = 64 coarse + 64 fine).

---

## B. Neural Surface Rendering (50 points)

### 5. Sphere Tracing (10 points)

<center><img src="./images/part_5.gif"></center>

---

### 6. Optimizing a Neural SDF (15 points)

For the Neural SDF implementation, I designed an MLP that effectively learns to predict signed distance values for any input point in 3D space. Here's a breakdown of the architecture:

* Positional Encoding
    - The input 3D coordinates are transformed using a harmonic embedding with 4 frequencies.
    - This transformation helps the network capture high-frequency details, which is crucial for representing fine surface details in the SDF.

* Skip Connection Network
    - A deep MLP with 6 layers and 128 neurons per hidden layer.
    - Skip connections directly feed the input encoding to intermediate layers, significantly improving gradient flow during training and helping the network learn complex surfaces.

* Final Output Layer
    - Unlike density fields in NeRF that require non-negative outputs, SDF values can be positive (outside the surface), negative (inside the surface), or zero (exactly on the surface).
    - The final layer produces direct SDF values without any activation function.


The key insight behind training an effective Neural SDF is the use of eikonal regularization, which is based on a fundamental property of SDFs:

> The gradient of a proper signed distance function should have unit norm almost everywhere in space.

To enforce this constraint, an eikonal loss function is implemented to penalize deviations from a unit gradient norm:

```python
def eikonal_loss(gradients):
    gradient_norms = torch.norm(gradients, dim=-1)
    return torch.mean(torch.square(gradient_norms - 1.0))
```


The model was trained for 5000 epochs with a learning rate of 0.0001, which gradually decreased using a scheduler (with gamma = 0.8 and step size = 50) to allow fine-tuning of the surface representation. 

**Hyperparameter experiments:**
* **No. of epochs:** I observed that after 5000, increasing the number of epochs had marginal gain to offer.
* **Eikonal loss weight:** I observed that increasing the Eikonal loss weight distorts the reconstruction while a smaller weight values tend to lose parts of the reconstruction.

<table>
<tr>
<th>Input pointcloud</th>
<th>Epochs=100</th>
<th>Epochs=500</th>
<th>Epochs=1000</th>
<th>Epochs=5000</th>
<th>Epochs=10000</th>
<th>Epochs=15000</th>    
</tr>
<tr>
<td><img src="./images/part_6_input.gif"></td>
<td><img src="./images/part_6_100.gif"></td>
<td><img src="./images/part_6_500.gif"></td>
<td><img src="./images/part_6_1000.gif"></td>
<td><img src="./images/part_6_5000.gif"></td>
<td><img src="./images/part_6_10000.gif"></td>
<td><img src="./images/part_6_15000.gif"></td>
</tr>
</table>

<table>
<tr>
<th>Input pointcloud</th>
<th>w=0.025</th>
<th>w=0.1</th>
<th>w=0.5</th>
<th>w=1.0</th>
<th>w=5.0</th>
</tr>
<tr>
<td><img src="./images/part_6_input.gif"></td>
<td><img src="./images/part_6_iw_1.gif"></td>
<td><img src="./images/part_6_iw_2.gif"></td>
<td><img src="./images/part_6_iw_3.gif"></td>
<td><img src="./images/part_6_iw_4.gif"></td>
<td><img src="./images/part_6_iw_5.gif"></td>
</tr>
</table>

---

### 7. VolSDF (15 points)

Following is an abalation study with the varying values for $\alpha$ and $\beta$.

<table>
<tr>
<th>Alpha</th>
<th>Beta</th>
<th>Geometry</th>
<th>Color</th>
</tr>
<tr>
<td>10.0 (default)</td>
<td>0.05 (default)</td>
<td><img src="./images/part_7_geometry_a_10_b_0.05.gif"></td>
<td><img src="./images/part_7_a_10_b_0.05.gif"></td>
</tr>
<tr>
<td>1.0</td>
<td>0.05</td>
<td><img src="./images/part_7_geometry_a_1_b_0.05.gif"></td>
<td><img src="./images/part_7_a_1_b_0.05.gif"></td>
</tr>
<tr>
<td>100.0</td>
<td>0.05</td>
<td><img src="./images/part_7_geometry_a_100_b_0.05.gif"></td>
<td><img src="./images/part_7_a_100_b_0.05.gif"></td>
</tr>
<tr>
<td>10.0</td>
<td>0.1</td>
<td><img src="./images/part_7_geometry_a_10_b_0.1.gif"></td>
<td><img src="./images/part_7_a_10_b_0.1.gif"></td>
</tr>
<tr>
<td>10.0</td>
<td>0.5</td>
<td><img src="./images/part_7_geometry_a_10_b_0.5.gif"></td>
<td><img src="./images/part_7_a_10_b_0.5.gif"></td>
</tr>
</table>

#### 1. How does high beta bias your learned SDF? What about low beta?

- **High beta** creates a more smoothed density function around the surface boundary, leading to:
  - Less precise surface boundaries
  - Smoother overall representation
  - More volumetric-like appearance with gradual falloff

- **Low beta** creates a sharp transition in density near the zero level-set of the SDF, biasing the model to learn:
  - Crisp, well-defined surface boundaries
  - Higher precision in surface localization
  - More binary-like distinctions between inside and outside

Mathematically, as beta approaches zero, the density function approaches a step function at the surface boundary.

#### 2. Would an SDF be easier to train with volume rendering and low beta or high beta? Why?

An SDF would be easier to train with volume rendering using a **higher beta** value because:

- Higher beta creates smoother gradients throughout the volume space.
- These smoother gradients provide more meaningful learning signals during backpropagation.
- The optimization landscape becomes less steep and easier to navigate.
- Training is more numerically stable since density changes gradually.

With very **low beta** values, training often becomes unstable because:

- The density function approaches a step function with near-zero gradients everywhere except right at the surface.
- This creates **vanishing gradient** problems during optimization.
- Small errors in the SDF can lead to large changes in rendered appearance.
- The loss landscape becomes more rugged with many local minima.

#### 3. Would you be more likely to learn an accurate surface with high beta or low beta? Why?

We would likely learn a more accurate surface with a **lower beta** value (though not too low) because:

- Lower beta encourages the model to precisely localize the surface boundary.
- The sharper transition creates stronger incentives for the network to accurately model the zero level-set.
- Fine details and sharp features are better preserved.
- The rendered output more closely resembles the true surface geometry.

##### Trade-offs:
- **High beta** leads to overly smooth surfaces that lose detail.
- **Too low a beta** makes training unstable and prone to getting stuck in poor local minima.

The optimal approach is often a **curriculum strategy**:
1. Start with a **higher beta** for stable initial training.
2. Gradually **reduce beta** to refine surface details as training progresses.

---

### 8. Neural Surface Extras

#### 8.1. Render a Large Scene with Sphere Tracing (10 points)

---

#### 8.2 Fewer Training Views (10 points)

##### Comparing VolSDF and NeRF with Fewer Training Views

When reducing the number of training views from **100** to fewer views (like 10), several interesting differences emerge between **VolSDF** and **NeRF** approaches.

* **VolSDF Performance**
    - **With 20 views**: VolSDF maintains relatively good geometry but shows some artifacts in areas with limited observation.
    - **With 10 views**: The geometric structure is still recognizable, but there's notable deterioration in surface quality with shadows and distortions appearing.

* NeRF Performance
    - **With 20 views**: NeRF produces visually pleasing renderings but with less geometric consistency.
    - **With 10 views**: NeRF struggles significantly, with blurry results and phantom geometry in unobserved regions.

##### Analysis of the Differences
The key advantage of **VolSDF** in low-view settings comes from its **explicit surface representation** through the **SDF constraint**. The **eikonal regularization** enforces a proper distance field, which provides strong **geometric prior knowledge** even in regions with sparse observations. However, **NeRF** can sometimes produce **more visually appealing results** in terms of texture and color blending, even when the underlying geometry is incorrect. This is because NeRF focuses purely on **appearance matching** without geometric constraints.

Overall, **VolSDF is better suited for preserving geometry in low-view scenarios**, while **NeRF can produce more visually appealing but less geometrically accurate results**.


<table>
<tr>
<th>Training Views</th>
<th>Vol SDF (Geometric)</th>
<th>Vol SDF (Color)</th>
<th>NeRF</th>
</tr>
<tr>
<td>10</td>
<td><img src="./images/part_8_2_geometry_10_views.gif"></td>
<td><img src="./images/part_8_2_10_views.gif"></td>
<td><img src="./images/part_8_2_nerf_10_views.gif"></td>
</tr>
<tr>
<td>20</td>
<td><img src="./images/part_8_2_geometry_50_views.gif"></td>
<td><img src="./images/part_8_2_50_views.gif"></td>
<td><img src="./images/part_8_2_nerf_50_views.gif"></td>
</tr>
<tr>
<td>All views</td>
<td><img src="./images/part_8_2_geometry_all_views.gif"></td>
<td><img src="./images/part_8_2_all_views.gif"></td>
<td><img src="./images/part_8_2_nerf_all_views.gif"></td>
</tr>
</table>

---

#### 8.3 Alternate SDF to Density Conversions (10 points)

##### VolSDF Method
The **VolSDF** approach uses this function:

$$
\sigma(x) = \alpha \cdot 
\begin{cases} 
1 - \frac{1}{2} e^{d(x)/\beta}, & \text{if } d(x) \leq 0 \\ 
\frac{1}{2} e^{-d(x)/\beta}, & \text{if } d(x) > 0 
\end{cases}
$$

Where:
- $d(x)$ is the **SDF value**.
- $\alpha$ controls the **overall density magnitude**.
- $\beta$ controls the **transition sharpness**.

##### NeuS Method
The **NeuS** "naive" approach uses:

$$
\sigma(x) = s \cdot \frac{e^{-s \cdot d(x)}}{(1 + e^{-s \cdot d(x)})^2}
$$

Where:
- $s$ is a parameter controlling **transition sharpness**.

##### Key Differences

* **Characteristics:**
    - **VolSDF**: Asymmetric inside/outside behavior.
    - **NeuS**: Symmetric around the zero level-set.

* **Results:**
    - **VolSDF** produced **smoother, more stable** results.
    - **NeuS** created **sharper surfaces** with high $s$ values but was **less stable**.
    - **NeuS** was **highly sensitive** to the choice of $s$.

* **Trade-offs:**
    - **Higher $s$ in NeuS** → **Sharper surfaces** but increased **training instability**.
    - **VolSDF** offered **better control** through separate $\alpha$ and $\beta$ parameters.

##### Conclusion
The **NeuS** approach is **mathematically elegant** (being the derivative of the sigmoid function) but requires **careful parameter tuning**, while **VolSDF proved more robust in practice**.


<table>
  <tr>
    <th></th>
    <th>Geometric</th>
    <th>Color</th>
  </tr>
  <tr>
    <td>VolSDF</td>
    <td><img src="./images/part_7_geometry_a_10_b_0.05.gif" width="128" height="128"></td>
    <td><img src="./images/part_7_a_10_b_0.05.gif" width="128" height="128"></td>
  </tr>
  <tr>
    <td>s_val = 1</td>
    <td><img src="./images/part_8_3_geometry_s_1.gif" width="128" height="128"></td>
    <td><img src="./images/part_8_3_s_1.gif" width="128" height="128"></td>
  </tr>
  <tr>
    <td>s_val = 5</td>
    <td><img src="./images/part_8_3_geometry_s_5.gif" width="128" height="128"></td>
    <td><img src="./images/part_8_3_s_5.gif" width="128" height="128"></td>
  </tr>
  <tr>
    <td>s_val = 50</td>
    <td><img src="./images/part_8_3_geometry_s_50.gif" width="128" height="128"></td>
    <td><img src="./images/part_8_3_s_50.gif" width="128" height="128"></td>
  </tr>
  <tr>
    <td>s_val = 100</td>
    <td><img src="./images/part_8_3_geometry_s_100.gif" width="128" height="128"></td>
    <td><img src="./images/part_8_3_s_100.gif" width="128" height="128"></td>
  </tr>
</table>


---