DiffEdit

An implementation of the DiffEdit algorithm described in the paper DiffEdit: Diffusion-based semantic image editing with mask guidance.

How It Works

DiffEdit performs semantic image editing through a two-step process:

Mask Generation:
- Computes a mask by denoising half the timesteps with a reference text prompt and half with an empty prompt
- Averages the difference between the two latents over several runs to identify edit regions
Guided Denoising:
- Uses the generated mask to selectively apply different prompts during denoising
- Regions within the mask are denoised using the query text prompt
- Regions outside the mask are denoised using an empty prompt

Implementation Details

The implementation uses the following components:

Scheduler: Controls noise addition to latent representations
VAE Encoder/Decoder: Converts between image and latent space
Text Encoder: Converts text prompts to embeddings
UNet: Performs noise prediction during the diffusion process

Processing Steps

Load image from URL or path
Convert image to latent representation using VAE encoder
Add 50% gaussian noise using the scheduler
Remove outlier noise through clamping
Denoise the latent using both reference and query prompts
Average differences over multiple denoising runs
Generate binary mask using a threshold
Perform final denoising:
- Take the latent representation of the original image
- Run two parallel denoising processes: a. Unconditional path (empty prompt):
  - Denoise the latent with empty text prompt
  - Store intermediate results at each timestep b. Conditional path (query prompt):
  - Denoise the latent with query text prompt
  - At each timestep after halfway point:
    - Use query results for regions inside the mask
    - Use unconditional results for regions outside the mask
Decode final latent to obtain edited image

Current Limitations

Mask computation quality can be improved

Future Work

Experiment with different schedulers, UNets, and VAEs
Investigate alternative models (e.g., Flux) for better mask generation

Examples

Here's an example of using DiffEdit to transform strawberries into apples:

Original Image	Generated Mask	Result

Prompts used:

Reference: "strawberries"
Query: "apples"

The mask (middle image) shows the regions identified for editing, where the model transforms strawberries into apples while preserving the bowl and overall composition.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
.gitignore		.gitignore
DiffEdit.ipynb		DiffEdit.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DiffEdit

How It Works

Implementation Details

Processing Steps

Current Limitations

Future Work

Examples

About

Uh oh!

Releases

Packages

Languages

karthikven/DiffEdit

Folders and files

Latest commit

History

Repository files navigation

DiffEdit

How It Works

Implementation Details

Processing Steps

Current Limitations

Future Work

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages