Skip to content
Neural Reconstruction for Foveated Rendering and Video Compression using Learned Statistics of Natural Videos
PureBasic
Branch: master
Clone or download
Latest commit 7f50462 Dec 5, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
img Replacing placeholder images Nov 18, 2019
CODE_OF_CONDUCT.md Adding code of conduct and contributing.md Nov 18, 2019
CONTRIBUTING.md
LICENSE Initial commit Nov 18, 2019
README.md Clarification Dec 5, 2019
input_graph.pb Initial commit Nov 18, 2019

README.md

DeepFovea

This repository provides the materials related to the DeepFovea project from Facebook Reality Labs. DeepFovea is a network for a foveated rendering that allows to reconstruct a plausible perifery with a small amount of pixels.

If you use any materials from this repository, please cite the publication: Anton Kaplanyan, Anton Sochenov, Thomas Leimkuehler, Mikhail Okunev, Todd Goodall, Gizem Rufo, "DeepFovea: Neural Reconstruction for Foveated Rendering and Video Compression using Learned Statistics of Natural Videos", SIGGRAPH Asia 2019.

At the moment we are releasing only the full graph structure without the weights. We are planning to release the demo binaries and the final weights soon.

input_graph.pb file contains the network graph for both generator and discriminator.

Model

Generator

Generator model is a U-Net with recurrent decoder blocks.

Each encoder block consists of two convolutions, with the second convolution having a stride 2. We use ELU activation function. There are total 5 encoders and 5 decoders with 32-64-128-128-128 features correspondingly. The model has about 3.2M parameters.

Each decoder block consists of an upsampling layer, a convolution and a recurrent convolution layer. The recurrent convolutional layer uses its output as another input on the next time step.

The network is trained on 128x128 videos.

Losses

We are using multiple losses to make the reconstruction plausible

  • Adversarial loss (Following WGAN framework)

    The discriminator model consists of several 3D convolutions with residual connections. Each convolution operates on a whole video of 32 frames. This allows the discriminator to reason not only about spatial details, but also about the temporal artifacts. We are using spectral normalization in discriminator for the regularization. There is also a complementary discriminator with a similar architecture trained on the FFT representation of the input video.

  • LPIPS loss to improve reconstruction of the spatial details

  • Optical flow loss to reduce the peripheral flicker

Pretrained model weights and binaries

Coming soon.

License

DeepFovea is CC-BY-NC 4.0 (FAIR License) licensed, as found in the LICENSE file.

You can’t perform that action at this time.