Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation set used in paper? #3

Closed
wangjksjtu opened this issue Apr 22, 2022 · 10 comments
Closed

Evaluation set used in paper? #3

wangjksjtu opened this issue Apr 22, 2022 · 10 comments

Comments

@wangjksjtu
Copy link

@jasonyzhang Thanks for the awesome work and public code!! In this paper, 20 actors in MVMC dataset are used for quantative evaluation, could you please share the actor ids on evalution set for easier comparisons?

Thanks for your great help!

@wangjksjtu
Copy link
Author

@jasonyzhang Sorry for the follow up message! Could you also share more about the quantative evaluation details (e.g., which view is held out) that would be super helpful for reproducing and comaprions, thanks!

@jasonyzhang
Copy link
Owner

Hi, I'm working on releasing it asap! Will post the code, models, and splits to recreate the numbers, hopefully by end of the week.

@wangjksjtu
Copy link
Author

Thanks so much for your help! really appreciate it;)

@jasonyzhang
Copy link
Owner

Hi,

Sorry for the delay! I've now posted all the data for evaluation, which includes the off-the-shelf camera (pre-processed to minimize re-projection error between the template car mesh and the mask) and the optimized cameras (which have also been processed with some manual input).

The data also includes the rendered views from NeRS in the NVS evaluation protocol. I show how to replicate the numbers using the rendered views as well.

Please let me know if you encounter any issues!

@wangjksjtu
Copy link
Author

Hi @jasonyzhang,

Thanks for the update!! really appreciate it! I could reproduce the numbers using provided evaluation protocol. I noticed that if we use the clean-fid to compute the FID scores, the numbers are inconsistent with papers.

Name            MSE   PSNR   SSIM  LPIPS   clean-FID
ners_fixed   0.0254   16.5  0.720  0.172   113.

I guess you are using pytorch-fid to compute FID scores in the paper. Would you mind sharing the clean FID scores for all the baseline models? Thanks a lot!

@wangjksjtu
Copy link
Author

Sorry, another question is that from the eval code, seems that the evaluation is done on all views (both training views and a held-out view) is it the correct setting? I thought that we should only eval on the novel views?

@wangjksjtu
Copy link
Author

Hi @jasonyzhang, Annother question is that retraining results obtained by running train_evaluation_model.py is more blurry compared to dumped results in data/evaluation. Here is one example:

re-trained model:
render_00

dumped results:
ners_00_fixed

Is it due to different hyperparameters? Thanks a lot for your great help in advance!

@jasonyzhang
Copy link
Owner

Hi,

Re: FID
I computed FID over all of the generated outputs (ie every image generated for every instance) rather than averaging the FID per instance as done for the other metrics. I've posted the code for this now, and here are the number I get:

Name            MSE   PSNR   SSIM  LPIPS    FID
ners_fixed   0.0254   16.5  0.720  0.172   60.4

The FID for ners_fixed in the paper was 60.9, so only slightly off.

Re: Evaluation protocol.
In the evaluation training code, each image/camera pair is independently treated as a target image/camera. For example, if an instance has 10 images, we would train 10 models, where each model holds out one of the target views. For each of the models, we render from the held out view for evaluation. Thus, we end up evaluating all of the input images even though they are all held out views.

Re: blurry results.
I was trying to train a smaller model to save time, but it looks like the performance is much worse. Evaluation code was training an 8-layer texnet for 1000 iterations, whereas demo code trains a 12-layer texnet for 3000 iterations. I switched back to the latter set of hyperparameters. I'm currently re-running it as well.

@wangjksjtu
Copy link
Author

I see I see, thanks a lot for the detailed reply! really appreciate it ;)

@jasonyzhang
Copy link
Owner

jasonyzhang commented Apr 29, 2022

Ahh actually the blurry results is because the number of fourier bases in the default config is too low. The default is 6, but 10 seems to work much better.

Rendering used for evaluation in main paper
render_submission

8-layer tex net, 1k training iterations, L=6
render_8_layer

12-layer tex net, 3k training iterations, L=6
render_12_layer

12-layer tex net, 3k training iterations, L=10
render_12_layer_L10

I have updated the code so that evaluation defaults to L=10. This was already the default for the demo script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants