Skip to content

scheshmi/Text-Based-Image-Editing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text-Based-Image-Editing

Leveraging three computer vision foundation models, Segment Anything Model (SAM), Stable Diffusion, and Grounding DINO, to edit and manipulate images. Starting by leveraging Grounding DINO for zero-shot object detection driven by textual input. Then, using SAM, masks are extracted from the identified bounding boxes. These masks guide Stable Diffusion to replace the masked areas with contextually appropriate content derived from the text prompt, resulting in a cohesive text-based image editing process.

Install Requirements

First, install requirements and Grounding DINO

pip install -r requirements.txt

Grounding DINO

git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
pip install -e .

Run download_files.py to download pre-trained models

python download_files.py

Run & Usage

Run main.py by the following script

python main.py --img_path="input image" --selected_object="your selected object" --prompt="your prompt" --output_path="output path"

Example

Example 1

cars.png

Example 2

flowers.png

More Details

Grounding DINO

Segment Anything Model

Stable Diffusion

About

Combining three computer vision foundation models, Segment Anything Model (SAM), Stable Diffusion, and Grounding DINO, to edit and manipulate images.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages