Image-to-Prompt Model

This is the repository for Interdisciplinary Data Science 705: Principles of Machine Learning final project.

Purpose Of Project

This study explores the possibility of reversing the relationship between text and image using advanced Vision-Language Pre-training (VLP) frameworks, such as the BLIP and ViT models. We examined a dataset of 7,000 images, randomly selected from DiffusionDB, which features a wide range of prompt-image associations. By leveraging transfer learning and pre-training techniques, we fine-tuned the models and assessed their performance using cosine similarity as the evaluation metric. Our findings demonstrate that the fine-tuned BLIP model significantly surpasses the zero-shot baseline, showing a 25% improvement in average cosine similarity scores on the test set. While the model effectively describes image content, it faces challenges in understanding context, object relationships, and interpreting idioms or metaphors. We also acknowledge the limitations of cosine similarity as an evaluation metric, as it does not consider the sensibility of phrases. Future research should investigate alternative metrics and additional fine-tuning methods to enhance performance on complex and abstract image-text relationships.

How To Run / Train Models

Install Required Libraries

pip install -r requirements.txt

Run BLIP Model

All necessary code used to train the BLIP model are found under 10_code/BLIP_training.ipynb file.

Run ViT Model

All necessary code used to train the ViT model are found under 10_code/VIT_Model_Training.ipynb file.

Results

Model	Avg. Cosine Similarity
BLIP	0.39
ViT	0.46

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
00_data		00_data
10_code		10_code
20_intermediate_files		20_intermediate_files
30_docs		30_docs
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-to-Prompt Model

Purpose Of Project

How To Run / Train Models

Results

About

Releases

Packages

Contributors 3

Languages

qu-genesis/image-to-prompt-project

Folders and files

Latest commit

History

Repository files navigation

Image-to-Prompt Model

Purpose Of Project

How To Run / Train Models

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages