Skip to content
/ pixtalk Public

[ICCV 2025] PixTalk: Controlling Photorealistic Image Processing and Editing with Language

Notifications You must be signed in to change notification settings

mv-lab/pixtalk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Marcos V. Conde, Zihao Lu, Radu Timofte

Computer Vision Lab, University of Wuerzburg

paper Hugging Face Replicate - Demos upcoming

TL;DR: quickstart

Text-guided image generation and editing is emerging as a fundamental problem in computer vision. However, most approaches lack control, and the generated results are far from professional photography quality standards. In this work, we propose the first approach that introduces language and explicit control into the image processing and editing pipeline. PixTalk is a vision-language multi-task image processing model, guided using text instructions. Our method is able to perform over 40 transformations --the most popular techniques in photography--, delivering results as professional photography editing software. Our model can process 12MP images on consumer GPUs in real-time (under 1 second). As part of this effort, we propose a novel dataset and benchmark for new research on multi-modal image processing and editing.


PixTalk ICCV 2025


Other relevant stuff

Repo updates

  • HuggingFace demo using Gradio
  • Full code release
  • Dataset request form

(we are currently in Hawaii at ICCV 2025, getting feedback before the big release )

Contacts

For any inquiries contact Marcos V. Conde: marcos.conde[at]uni-wuerzburg.de

This work has been patented worldwide at the European Patent Office (EPO). If you would like to use this work in commercial applications, please contact us :) There are no limitations for open non-profit research and academic research.

Citation

If you find our work interesting, you use our ideas or dataset, please cite properly our works.

@InProceedings{Conde_2025_ICCV,
    author    = {Conde, Marcos V. and Lu, Zihao and Timofte, Radu},
    title     = {PixTalk: Controlling Photorealistic Image Processing and Editing with Language},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {19269-19279}
}

PixTalk is inspired in InstructIR (ECCV 2024).

@inproceedings{conde2024high,
  title={InstructIR: High-Quality Image Restoration Following Human Instructions},
  author={Conde, Marcos V and Geigle, Gregor and Timofte, Radu},
  booktitle    = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2024}
}

Releases

No releases published

Packages

No packages published