Skip to content

mo-sameh/AOG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

AOG (Attentions on Gaussians)

Project Status

I am currently in the process of cleaning up the codebase. In the meantime, if you're interested, you can check out the project report here: πŸ“„ Project Report

Results

πŸŽ₯ Video Demonstration

Click to Watch Video

πŸ‘‰ Click the image above to watch the video demonstration!

Overview

AOG is a novel approach to 3D text-guided editing that enhances multi-view consistency when editing images using diffusion models like Stable Diffusion. The core idea behind AOG is that a set of multi-view images represents a single 3D environment and should not be edited independently.

Our approach follows a similar methodology to Instruct-GS2GS, where we first build a Gaussian Splatting Model from a set of multi-view images. We then use InstructPix2Pix, a guided text-editing diffusion model, to edit the original images.

Inspired by Prompt2Prompt, our approach differs by leveraging the geometry obtained from the Gaussian model to model cross-attention maps during image editing. As the images are edited, we back-project them onto the 3D geometry and efficiently render them for the next camera view. This rendered view is then injected into the UNet of the diffusion model, enforcing better 3D consistency in the edited images.

To achieve this, we introduced several changes to the Gaussian Splatting implementation:

  • Added extra attributes to each Gaussian to store cross-attention values πŸ”.
  • Utilized the fast rasterizer for rendering attention maps more efficiently ⚑.

AOG overview

Editing Paradigm

Maintaining a 3D cross-attention model from tens or hundreds of images is computationally expensive πŸ–₯️. Instead, we select a set of key frames to build the 3D cross-attention model.

The key frames are edited sequentially by:

  1. Rendering the latest cross-attention model from the current camera view 🎭.
  2. Injecting these attention maps into the diffusion model with a weight of 0.6, allowing it to propagate new edits to previously unseen areas. Check out the project repo to know more about the importance of the injection weights πŸ”„.
  3. After editing each key frame, extracting its cross-attention maps and updating the 3D attention model for improved consistency πŸ“Œ.

AOG paradigm

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published