In [41]:
from elastic_nerf.utils import wandb_utils as wu
from pathlib import Path
import pandas as pd
from IPython.display import display
from collections import defaultdict

sweep_mappings = {
    # "2uxektzo": "ngp_occ-mipnerf360-baseline",
    "kebumdc0": "ngp_occ-mipnerf360-baseline",
    # "xxjsfkbw": "ngp_prop-mipnerf360-baseline",
    "8w0wks0x": "ngp_prop-mipnerf360-baseline",
    # "qfkjdvv2": "ngp_occ-mipnerf360-sampling_single",
    # "hy03dx0e": "ngp_occ-mipnerf360-sampling",
    # "wsxh6gjo": "ngp_prop-mipnerf360-sampling",
    # "8ishbvau": "ngp_prop-mipnerf360-sampling_single",
    # "b674pjcs": "ngp_occ-mipnerf360-baseline_head_depth1",
    # "58hgroe5": "ngp_prop-mipnerf360-baseline_head_depth1",
    # "c6g1mc5g": "ngp_occ-mipnerf360-baseline-mup",
    # "ccrwhsr5": "ngp_prop-mipnerf360-baseline-mup",
}

# TR1b_Baseline
This experiment is similar to `tr1a_baseline_fused` and benchmarks the baseline for the Nerfacc NGP Occ and Nerfacc Prop models on all scenes from the Mip-NeRF 360 dataset. However, this time we do all benchmarking using the elastic PyTorch implementations (instead of the fused tiny-cuda-nn implementation). While the results should be similar for widths 64, 32, and 16, this should more accurately reflect the capabilities of the width 8 model as the native pytorch implementation actutally uses the number of parameters corresponding to a width of 8, instead of padding it to the block size as tiny-cuda-nn does.

Similar to Matformer, we sample exponentially spaced widths of $d={64, 32, 16, 8}$ (with $d=64$ being the baseline full-width) and evaluate the performance of both the Nerfacc-Occ and Nerfacc-Prop models after naively shrinking every linear layer to these widths. The goal here is to understand how much of a performance drop there is when you train with a much smaller model. Note that models at all widths are trained using the same hyperparameters (batch size, learning rate, etc) as the baseline full-width implementation. While this is not going to result in optimally tuned small width models, keep in mind that our overarching goal is to be able to train models of multiple widths optimally and simultaneously, but before that, we need to establish baselines.

In [32]:
tables = ["EvalResultsSummarytable"]
sweeps = sweep_mappings.keys()
results_cache_dir = Path("/home/user/shared/results/elastic-nerf")
sweep_results = {}

for sweep in sweeps:
    sweep_results[sweep] = wu.fetch_sweep_results(
        sweep=sweep,
        refresh_cache=False,
        download_history=True,
        tables=tables,
        results_cache_dir=results_cache_dir,
    )
all_history = []
# Create a dataframe with all the results
for sweep_name in sweep_results:
    for run in sweep_results[sweep_name]:
        # Flatten the config
        flat_config = wu.flatten_dict(run.config, sep="-")
        # Concatenate the config with each row of the history results
        # Note that history results are already a dataframe
        history = run.history
        history["sweep_id"] = sweep_name
        history["run_id"] = run.run_id
        history["model_type"] = (
            "ngp_prop" if "prop" in sweep_mappings[sweep_name] else "ngp_occ"
        )
        history["sweep_name"] = sweep_mappings[sweep_name]
        for key in flat_config:
            try:
                history[key] = str(flat_config[key])
            except:
                print(f"Failed to add {key} to history with value {flat_config[key]}")
                raise
        all_history.append(history)

# %%
# Concatenate all the history results into a single dataframe
final_df = pd.concat(all_history, ignore_index=True)


# %%
fp = f"results_tr1b_baseline.csv"
final_df.to_csv(fp, index=False)
print(f"Saved results to {fp}")
df = pd.read_csv(fp)



Saved results to results_tr1b_baseline.csv


# Results
Similar to `tr1a_baseline_fused`, we can see that in general, across scenes and models, the smaller width architectures perform more poorly than larger width architectures. On average, decreasing the size of the Nerfacc Prop model to width of 8 seems to have a larger reduction in average PSNR compared to the Nerfacc Occ model, potentially due to the presence of the 2 proposal networks (which are also being shrunk). On the other hand, the Nerfacc Occ model uses the NGP hash grid estimator (which we do not reduce in size), potentially resulting in it being more robust against downstream width reductions. 

## Head Depth 2
Across the board, we see that decreasing the width of the model results in a decrease in performance. What's surprising however is the fact that the performance drop is extremely severe for certain scenes. For example, note how on the garden scene for `Nerfacc Occ`, the performance drop is $38.7$% and $40.4$% for width 16 and 8 respectively. Similarly for the Stump scene with `Nerfacc Occ`, the performance drop is $11.8$% and $24.1$%. This is a significant drop in performance, and it's not clear why this is happening. This is also a significant difference from the previous benchmarking using `tiny-cuda-nn`. _Additional benchmarking across multiple random seeds may be required to understand this behavior._

## Head Depth 1
Similar to the model with `head_depth==2`, we see that generally, decreasing the width results in a reduction in performance. However, surprisingly for certain scenes, we see that width 8 performs more poorly than width 16. For example for the `Stump` scene trained with `Nerfacc Occ` model, the performance drop for width 16 and 8 is $20.5$% and $0.4$% respectively. For the same scene with `Nerfacc Prop`, the performance drop is $3.3$% and $3.2$%. This is small enough that it could be due to some weird randomness, but it's still worth noting that width 8 isn't significantly worse than width 16 (as opposed to all the other sceens where this is indeed the case). 

## Issues
Overall, it's not clear why the performance drop for Nerfacc Occ is so much more severe than Nerfacc Prop. We'd expect it to be the other way around since the MLPs in the proposal networks are also being shrunk whereas the Occ grid estimator is left untouched...

In [40]:
num_scenes = len(df["scene"].unique())
table_cols = ["Scene", "Width 64", "Width 32", "Width 16", "Width 8"]
for i, (model_type, model_group) in enumerate(df.groupby(by="model_type")):
    model_group = model_group.query("_step == 20000")
    model_name_split = [m.capitalize() for m in model_type.split("_")]
    model_name_split[0] = model_name_split[0].upper()
    model_type_name = " ".join(model_name_split)
    for k, (head_depth, head_depth_group) in enumerate(
        model_group.groupby(by="radiance_field-head_depth")
    ):
        table_data = []
        for j, (scene, scene_group) in enumerate(head_depth_group.groupby(by="scene")):
            base_psnr = model_group.query(
                "_step == 20000 and hidden_dim == 64 and scene == @scene and `radiance_field-head_depth` == 2"
            )["Eval Results Summary/psnr_avg/elastic_64"].iloc[0]
            table_row = {"Scene": scene.capitalize()}
            for dim, dim_group in scene_group.groupby(by="hidden_dim"):
                psnr_col = f"Eval Results Summary/psnr_avg/elastic_{dim}"
                psnr_avg = dim_group[psnr_col].iloc[0]
                pc_diff = 100 * (psnr_avg - base_psnr) / base_psnr
                if dim == 64 and head_depth == 2:
                    table_row.update({f"Width {dim}": f"{psnr_avg:.2f}"})
                else:
                    table_row.update(
                        {f"Width {dim}": f"{psnr_avg:.2f} ({pc_diff:.2f}%)"}
                    )
            table_data.append(table_row)

        table_data = pd.DataFrame(table_data, columns=table_cols)
        caption = (
            f"PSNR values after 20k steps of training for {model_type_name} model (with Radiance Field head MLP depth = {head_depth}) at different widths across scenes from the MipNeRF-360 dataset."
            f"  Values in brackets are the percentage difference compared to the baseline PSNR for each model at full-size (width 64) and Radiance Field Head MLP depth 2 (default model)."
        )
        table_data = table_data.style.set_caption(caption)
        display(table_data)
    # print(
    #     table_data.to_latex(
    #         index=False,
    #         caption=f"Baseline performance (PSNR) after 20k steps of training for {model_type_name} model at different widths across scenes from the MipNeRF-360 dataset",
    #         label=f"tab:baseline_{model_type_name.replace(' ', '_')}",
    #         position="h",
    #         column_format="lcccccc",
    #         escape=True,
    #         bold_rows=True,
    #     )
    # )

Unnamed: 0,Scene,Width 64,Width 32,Width 16,Width 8
0,Bicycle,22.45 (0.26%),22.27 (-0.54%),21.93 (-2.07%),21.70 (-3.08%)
1,Bonsai,29.59 (-1.68%),29.32 (-2.55%),28.85 (-4.15%),27.55 (-8.45%)
2,Counter,26.67 (-0.41%),26.55 (-0.87%),26.05 (-2.71%),25.52 (-4.71%)
3,Garden,24.54 (-0.18%),24.30 (-1.15%),24.17 (-1.66%),23.69 (-3.60%)
4,Kitchen,27.61 (-1.79%),27.05 (-3.76%),26.62 (-5.30%),25.92 (-7.78%)
5,Room,30.34 (-0.87%),30.05 (-1.79%),29.91 (-2.27%),29.45 (-3.78%)
6,Stump,23.37 (2.09%),23.21 (1.36%),18.20 (-20.50%),22.80 (-0.41%)


Unnamed: 0,Scene,Width 64,Width 32,Width 16,Width 8
0,Bicycle,22.39,22.28 (-0.51%),22.08 (-1.39%),21.57 (-3.65%)
1,Bonsai,30.09,29.57 (-1.73%),28.52 (-5.22%),28.04 (-6.82%)
2,Counter,26.78,26.67 (-0.40%),26.19 (-2.21%),25.58 (-4.49%)
3,Garden,24.58,24.45 (-0.53%),15.06 (-38.73%),14.64 (-40.45%)
4,Kitchen,28.11,27.31 (-2.83%),26.66 (-5.17%),25.79 (-8.25%)
5,Room,30.6,30.24 (-1.17%),29.94 (-2.17%),29.47 (-3.70%)
6,Stump,22.89,22.81 (-0.35%),20.20 (-11.78%),17.38 (-24.10%)


Unnamed: 0,Scene,Width 64,Width 32,Width 16,Width 8
0,Bicycle,23.12 (-0.37%),23.03 (-0.80%),22.73 (-2.06%),22.63 (-2.51%)
1,Bonsai,30.09 (-1.93%),29.42 (-4.12%),29.16 (-4.95%),28.10 (-8.41%)
2,Counter,26.45 (-1.22%),25.86 (-3.44%),24.90 (-7.02%),25.11 (-6.22%)
3,Garden,25.30 (-0.36%),25.08 (-1.25%),24.89 (-1.98%),24.68 (-2.81%)
4,Kitchen,30.33 (-1.52%),29.83 (-3.17%),29.04 (-5.71%),27.95 (-9.26%)
5,Room,30.87 (-0.24%),30.67 (-0.87%),30.30 (-2.08%),29.97 (-3.15%)
6,Stump,25.06 (-0.70%),24.95 (-1.15%),24.41 (-3.28%),24.42 (-3.23%)


Unnamed: 0,Scene,Width 64,Width 32,Width 16,Width 8
0,Bicycle,23.21,23.05 (-0.70%),22.73 (-2.05%),22.64 (-2.44%)
1,Bonsai,30.68,29.72 (-3.13%),29.31 (-4.46%),28.43 (-7.33%)
2,Counter,26.78,26.22 (-2.10%),25.34 (-5.36%),25.10 (-6.26%)
3,Garden,25.39,25.26 (-0.51%),24.96 (-1.73%),24.61 (-3.08%)
4,Kitchen,30.8,29.99 (-2.64%),29.34 (-4.75%),28.23 (-8.36%)
5,Room,30.94,30.73 (-0.68%),30.48 (-1.48%),30.17 (-2.51%)
6,Stump,25.24,24.83 (-1.61%),24.53 (-2.78%),24.37 (-3.43%)


# LaTeX Table for Head Depth 2

In [44]:
num_scenes = len(df["scene"].unique())
table_cols = ["Scene", "Width 64", "Width 32", "Width 16", "Width 8"]

header_row1 = [
    "\multicolumn{4}{c}{NGP Occ} & \\multicolumn{4}{c}{NGP Prop} \\\\",
]
header_row2 = [
    "\\textbf{Scene}",
    "Width 64",
    "Width 32",
    "Width 16",
    "Width 8",
    "Width 64",
    "Width 32",
    "Width 16",
    "Width 8",
]
header = (
    " & ".join(header_row1)
    + " \\midrule \n"
    + " & ".join(header_row2)
    + " \\\\ \\midrule"
)

table_data = []

base_psnrs = defaultdict(dict)
for j, (scene, scene_group) in enumerate(df.groupby(by="scene")):
    table_row = ["\\textbf{" + scene.capitalize() + "}"]  # Scene names in bold
    for i, (model_type, model_group) in enumerate(scene_group.groupby(by="model_type")):
        model_group = model_group.query("_step == 20000")
        model_group = model_group[model_group["radiance_field-head_depth"] == 2]
        base_psnr = model_group.query("hidden_dim == 64")[
            "Eval Results Summary/psnr_avg/elastic_64"
        ].iloc[0]
        base_psnrs[scene][model_type] = base_psnr
        for dim in [64, 32, 16, 8]:
            dim_group = model_group.query(f"hidden_dim == {dim}")
            psnr_col = f"Eval Results Summary/psnr_avg/elastic_{dim}"
            psnr_avg = dim_group[psnr_col].iloc[0]
            pc_diff = 100 * (psnr_avg - base_psnr) / base_psnr
            if dim == 64:
                table_row.append(f"{psnr_avg:.2f}")
            else:
                table_row.append(f"{psnr_avg:.2f} ({pc_diff:.1f}\\%)")
    table_data.append(table_row)

# Converting the Python list of lists to a LaTeX table
table_body = " \\\\\n".join([" & ".join(row) for row in table_data]) + " \\\\"

# Combining the header and body to form the final table
final_table = (
    "\\begin{table*}[h]\n\\centering\n\\small\n"
    "\\caption{Combined Baseline PSNR after 20k steps of training for NGP Occ and NGP Prop models (with depth of 2 for the Radiance Field head MLP) at different widths across scenes from the MipNeRF-360 dataset}\n"
    "\\label{tab:combined_baseline_NGP}\n"
    "\\begin{tabular}{lcccc|cccc}\n\\toprule\n"
    + header
    + "\n"
    + table_body
    + "\n\\bottomrule\n"
    "\\end{tabular}\n\\end{table*}"
)
print(final_table)

\begin{table*}[h]
\centering
\small
\caption{Combined Baseline PSNR after 20k steps of training for NGP Occ and NGP Prop models (with depth of 2 for the Radiance Field head MLP) at different widths across scenes from the MipNeRF-360 dataset}
\label{tab:combined_baseline_NGP}
\begin{tabular}{lcccc|cccc}
\toprule
\multicolumn{4}{c}{NGP Occ} & \multicolumn{4}{c}{NGP Prop} \\ \midrule 
\textbf{Scene} & Width 64 & Width 32 & Width 16 & Width 8 & Width 64 & Width 32 & Width 16 & Width 8 \\ \midrule
\textbf{Bicycle} & 22.39 & 22.28 (-0.5\%) & 22.08 (-1.4\%) & 21.57 (-3.7\%) & 23.21 & 23.05 (-0.7\%) & 22.73 (-2.1\%) & 22.64 (-2.4\%) \\
\textbf{Bonsai} & 30.09 & 29.57 (-1.7\%) & 28.52 (-5.2\%) & 28.04 (-6.8\%) & 30.68 & 29.72 (-3.1\%) & 29.31 (-4.5\%) & 28.43 (-7.3\%) \\
\textbf{Counter} & 26.78 & 26.67 (-0.4\%) & 26.19 (-2.2\%) & 25.58 (-4.5\%) & 26.78 & 26.22 (-2.1\%) & 25.34 (-5.4\%) & 25.10 (-6.3\%) \\
\textbf{Garden} & 24.58 & 24.45 (-0.5\%) & 15.06 (-38.7\%) & 14.64 (-40.4\%) & 25.39 & 25

# LaTeX Table for Head Depth 1

In [45]:
num_scenes = len(df["scene"].unique())
table_cols = ["Scene", "Width 64", "Width 32", "Width 16", "Width 8"]

header_row1 = [
    "\multicolumn{4}{c}{NGP Occ} & \\multicolumn{4}{c}{NGP Prop} \\\\",
]
header_row2 = [
    "\\textbf{Scene}",
    "Width 64",
    "Width 32",
    "Width 16",
    "Width 8",
    "Width 64",
    "Width 32",
    "Width 16",
    "Width 8",
]
header = (
    " & ".join(header_row1)
    + " \\midrule \n"
    + " & ".join(header_row2)
    + " \\\\ \\midrule"
)

table_data = []

base_psnrs = defaultdict(dict)
for j, (scene, scene_group) in enumerate(df.groupby(by="scene")):
    table_row = ["\\textbf{" + scene.capitalize() + "}"]  # Scene names in bold
    for i, (model_type, model_group) in enumerate(scene_group.groupby(by="model_type")):
        model_group = model_group.query("_step == 20000")
        model_group = model_group[model_group["radiance_field-head_depth"] == 1]
        base_psnr = model_group.query("hidden_dim == 64")[
            "Eval Results Summary/psnr_avg/elastic_64"
        ].iloc[0]
        base_psnrs[scene][model_type] = base_psnr
        for dim in [64, 32, 16, 8]:
            dim_group = model_group.query(f"hidden_dim == {dim}")
            psnr_col = f"Eval Results Summary/psnr_avg/elastic_{dim}"
            psnr_avg = dim_group[psnr_col].iloc[0]
            pc_diff = 100 * (psnr_avg - base_psnr) / base_psnr
            if dim == 64:
                table_row.append(f"{psnr_avg:.2f}")
            else:
                table_row.append(f"{psnr_avg:.2f} ({pc_diff:.1f}\\%)")
    table_data.append(table_row)

# Converting the Python list of lists to a LaTeX table
table_body = " \\\\\n".join([" & ".join(row) for row in table_data]) + " \\\\"

# Combining the header and body to form the final table
final_table = (
    "\\begin{table*}[h]\n\\centering\n\\small\n"
    "\\caption{Combined Baseline PSNR after 20k steps of training for NGP Occ and NGP Prop models (with depth of 2 for the Radiance Field head MLP) at different widths across scenes from the MipNeRF-360 dataset}\n"
    "\\label{tab:combined_baseline_NGP}\n"
    "\\begin{tabular}{lcccc|cccc}\n\\toprule\n"
    + header
    + "\n"
    + table_body
    + "\n\\bottomrule\n"
    "\\end{tabular}\n\\end{table*}"
)
print(final_table)

\begin{table*}[h]
\centering
\small
\caption{Combined Baseline PSNR after 20k steps of training for NGP Occ and NGP Prop models (with depth of 2 for the Radiance Field head MLP) at different widths across scenes from the MipNeRF-360 dataset}
\label{tab:combined_baseline_NGP}
\begin{tabular}{lcccc|cccc}
\toprule
\multicolumn{4}{c}{NGP Occ} & \multicolumn{4}{c}{NGP Prop} \\ \midrule 
\textbf{Scene} & Width 64 & Width 32 & Width 16 & Width 8 & Width 64 & Width 32 & Width 16 & Width 8 \\ \midrule
\textbf{Bicycle} & 22.45 & 22.27 (-0.8\%) & 21.93 (-2.3\%) & 21.70 (-3.3\%) & 23.12 & 23.03 (-0.4\%) & 22.73 (-1.7\%) & 22.63 (-2.1\%) \\
\textbf{Bonsai} & 29.59 & 29.32 (-0.9\%) & 28.85 (-2.5\%) & 27.55 (-6.9\%) & 30.09 & 29.42 (-2.2\%) & 29.16 (-3.1\%) & 28.10 (-6.6\%) \\
\textbf{Counter} & 26.67 & 26.55 (-0.5\%) & 26.05 (-2.3\%) & 25.52 (-4.3\%) & 26.45 & 25.86 (-2.2\%) & 24.90 (-5.9\%) & 25.11 (-5.1\%) \\
\textbf{Garden} & 24.54 & 24.30 (-1.0\%) & 24.17 (-1.5\%) & 23.69 (-3.4\%) & 25.30 & 25.0