# [AAAI 2025] Click2Mask: Local Editing with Dynamic Mask Generation

Official Colab demo for ["Click2Mask: Local Editing with Dynamic Mask Generation"](https://omeregev.github.io/click2mask/) (AAAI 2025)

**Paper by:** [Omer Regev](https://www.linkedin.com/in/omeregev/), [Omri Avrahami](https://omriavrahami.com/), [Dani Lischinski](https://www.cs.huji.ac.il/~danix/)

[![Website](https://img.shields.io/badge/Website-blue?style=flat&logo=github)](https://omeregev.github.io/click2mask/)
[![GitHub Code](https://img.shields.io/badge/GitHub-Code-blue?style=flat&logo=github)](https://github.com/omeregev/click2mask)
[![Hugging Face Demo](https://img.shields.io/badge/🤗%20Hugging%20Face-Demo-yellow?style=flat)](https://huggingface.co/spaces/omeregev/click2mask)
[![arXiv](https://img.shields.io/badge/arXiv-2409.08272-b31b1b?style=flat&logo=arxiv)](https://arxiv.org/abs/2409.08272)
[![Paper PDF](https://img.shields.io/badge/Paper-PDF-red?style=flat&logo=adobe)](https://omeregev.github.io/click2mask/static/paper/Click2Mask.pdf)
[![YouTube Video](https://img.shields.io/badge/Video-YouTube-red?style=flat&logo=youtube)](https://youtu.be/A0ZEVTm9SLw?si=_coDIWRXa8Wo-2na)

Given an image, a <span style="white-space: nowrap;">
    <b>Click</b> <img src="https://raw.githubusercontent.com/omeregev/click2mask/main/imgs/point.png" alt="click point" width="10" style="margin-right: 2px;">
</span>, and a prompt for an added object, a **Mask** is generated dynamically,
simultaneously with the object generation throughout the diffusion process.

Current methods rely on existing objects/segments, or user effort (masks/detailed text),
to localize object additions. Our approach enables free-form editing,
where the manipulated area is not well-defined, using just a  <span style="white-space: nowrap;">
    <b>Click</b> <img src="https://raw.githubusercontent.com/omeregev/click2mask/main/imgs/point.png" alt="click point" width="10" style="margin-right: 2px;">
</span> for localization.

## About This Colab

This notebook provides two ways to run Click2Mask:

1. **Gradio Interface** - Interactive web interface with point-and-click editing
2. **Command Line Interface** - For batch processing and programmatic access

## Getting Started

1. **Enable GPU** (required)
2. **Run the setup cells** to download and install dependencies
3. **Choose your interface** - Run either the Gradio or CLI cells below
<br><br>

**📌 See generated examples, comparisons, citation, and acknowledgements below.**

---
# ⚙️ Setup

In [None]:
!git clone https://github.com/omeregev/click2mask.git

In [None]:
# Install packages
!pip install accelerate diffusers transformers pytorch-lightning kornia gradio loralib
!pip install git+https://github.com/SunzeY/AlphaCLIP.git@3457474356108988bed7f0354a5cfeeeb8322aeb

# # Install scikit-fmm
# !pip install scikit-fmm
# June 2025: Temporary fix for scikit-fmm. Can install as usual with "!pip install scikit-fmm" when scikit-fmm build is fixed.
!apt-get install -y build-essential
!pip install git+https://github.com/scikit-fmm/scikit-fmm.git

In [None]:
%cd /content/click2mask

In [None]:
!mkdir checkpoints
!wget -P checkpoints https://download.openxlab.org.cn/models/SunzeY/AlphaCLIP/weight/clip_l14_336_grit1m_fultune_8xe.pth

# If the above link is broken, you can use this Google Drive mirror: https://drive.google.com/file/d/1DeNbUv0lraDxJZItb7shTlvGW6z_Z9Si/view?usp=drive_link

---
# 🌐 Gradio Interface

> After running the Gradio cell, **you'll get a public URL to**:
1. **Upload an image** or load example with the button below
2. **Click on the image** where you want to add an object
3. **Enter a text prompt** describing what you want to add
4. **Click Generate**

> **Note:** First run will take longer due to initial downloads

In [None]:
!python app.py

---
# ⌨️ Command Line Interface


> **Note:** First run will take longer due to initial downloads

### ► Single run example


> Ensure a click file **{filename}_click.jpg** (or **.png**) exists in input dir, as shown in "examples/colab"

In [None]:
!python scripts/text_editing_click2mask.py \
    --image_path "examples/colab/snow.jpg" \
    --prompt "a hut" \
    --output_dir "outputs"

### ►► Batch run example

> Ensure click files **{filename}_click.jpg** (or **.png**) exist for each input image, as shown in "examples/colab".

>Instead of the loop approach shown below, you can also modify the script internally for native batch processing.

In [None]:
# Input setup
image_filenames = ["dogs.jpg", "sea.jpg", "snow.jpg"]
prompts = ["bonfire", "a big ship", "a hut"]
input_dir = "examples/colab/"
out_dir = "outputs/batch"

# Loop
for filename, prompt in zip(image_filenames, prompts):
    img_path = f"{input_dir}/{filename}"

    print(f"Processing {img_path}")

    !python scripts/text_editing_click2mask.py \
        --image_path "{img_path}" \
        --prompt "{prompt}" \
        --output_dir "{out_dir}"

<br><br>

---
# Output Examples and References

## Generated Examples

<img src="https://raw.githubusercontent.com/omeregev/click2mask/main/imgs/results.jpg" alt="Click2Mask Results" width="700"/>

## Comparison with SOTA Methods
<img src="https://raw.githubusercontent.com/omeregev/click2mask/main/imgs/compare.png" alt="Comparison" width="700"/>

## Citation

```bibtex
@misc{regev2024click2masklocaleditingdynamic,
      title={Click2Mask: Local Editing with Dynamic Mask Generation},
      author={Omer Regev and Omri Avrahami and Dani Lischinski},
      year={2024},
      eprint={2409.08272},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.08272},
}
```

This code is based on
[Blended Latent Diffusion](https://github.com/omriav/blended-latent-diffusion/tree/master)
and on [Stable Diffusion](https://github.com/CompVis/stable-diffusion).