Marcos V. Conde, Zihao Lu, Radu Timofte
Computer Vision Lab, University of Wuerzburg
Text-guided image generation and editing is emerging as a fundamental problem in computer vision. However, most approaches lack control, and the generated results are far from professional photography quality standards. In this work, we propose the first approach that introduces language and explicit control into the image processing and editing pipeline. PixTalk is a vision-language multi-task image processing model, guided using text instructions. Our method is able to perform over 40 transformations --the most popular techniques in photography--, delivering results as professional photography editing software. Our model can process 12MP images on consumer GPUs in real-time (under 1 second). As part of this effort, we propose a novel dataset and benchmark for new research on multi-modal image processing and editing.
- A Tour Through AI-powered Photography and Imaging, ICCV 2025 Tutorial
- InstructIR: High-Quality Image Restoration Following Human Instructions (ECCV 2024)
- Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures (ICCV 2025)
- Towards Unified Image Deblurring using a Mixture-of-Experts Decoder (2025)
- HuggingFace demo using Gradio
- Full code release
- Dataset request form
(we are currently in Hawaii at ICCV 2025, getting feedback before the big release )
For any inquiries contact Marcos V. Conde: marcos.conde[at]uni-wuerzburg.de
This work has been patented worldwide at the European Patent Office (EPO). If you would like to use this work in commercial applications, please contact us :) There are no limitations for open non-profit research and academic research.
If you find our work interesting, you use our ideas or dataset, please cite properly our works.
@InProceedings{Conde_2025_ICCV,
author = {Conde, Marcos V. and Lu, Zihao and Timofte, Radu},
title = {PixTalk: Controlling Photorealistic Image Processing and Editing with Language},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {19269-19279}
}
PixTalk is inspired in InstructIR (ECCV 2024).
@inproceedings{conde2024high,
title={InstructIR: High-Quality Image Restoration Following Human Instructions},
author={Conde, Marcos V and Geigle, Gregor and Timofte, Radu},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year={2024}
}
