Text-Based-Image-Editing

Leveraging three computer vision foundation models, Segment Anything Model (SAM), Stable Diffusion, and Grounding DINO, to edit and manipulate images. Starting by leveraging Grounding DINO for zero-shot object detection driven by textual input. Then, using SAM, masks are extracted from the identified bounding boxes. These masks guide Stable Diffusion to replace the masked areas with contextually appropriate content derived from the text prompt, resulting in a cohesive text-based image editing process.

Install Requirements

First, install requirements and Grounding DINO

pip install -r requirements.txt

Grounding DINO

git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
pip install -e .

Run download_files.py to download pre-trained models

python download_files.py

Run & Usage

Run main.py by the following script

python main.py --img_path="input image" --selected_object="your selected object" --prompt="your prompt" --output_path="output path"

Example

Example 1

Example 2

More Details

Grounding DINO

Segment Anything Model

Stable Diffusion

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
README.md		README.md
download_files.py		download_files.py
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

README.md

README.md

download_files.py

download_files.py

main.py

main.py

requirements.txt

requirements.txt

utils.py

utils.py

Repository files navigation

Text-Based-Image-Editing

Install Requirements

Run & Usage

Example

Example 1

Example 2

More Details

About

Releases

Packages

Languages

scheshmi/Text-Based-Image-Editing

Folders and files

Latest commit

History

Repository files navigation

Text-Based-Image-Editing

Install Requirements

Run & Usage

Example

Example 1

Example 2

More Details

About

Topics

Resources

Stars

Watchers

Forks

Languages