Skip to content

narsisn/Transformer-based-Image-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer-based-Image-Retrieval

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. In this project, I have used the pretrained CLIP models (https://huggingface.co/openai/clip-vit-large-patch14) for the visual search downstream tasks.

Usage

Requirement Packages:

import sys
sys.path.append(str(Path('.').absolute().parent))

from pathlib import Path
from transformers import CLIPVisionModel, RobertaModel, AutoTokenizer,CLIPConfig
from src import SNAPDemo
import glob

Downloading pretrained models

config = CLIPConfig.from_pretrained("openai/clip-vit-large-patch14")
vision_encoder = CLIPVisionModel.from_pretrained('openai/clip-vit-large-patch14', config=config.vision_config)

Generate the image embeddings

imgDirectory = '/path/to/image_directory'
image_path = glob.glob(imgDirectory, recursive=True)
demo = SNAPDemo(vision_encoder)
demo.compute_image_embeddings(image_path)

Run Image Search

imgPath = 'path/to/search_iamge.jpeg'
demo.image_search(imgPath,10)

image

image

image

image

About

Using Transformers for Content-based Image Retrieval

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published