GitHub - monatis/turkish-clip: Embed texts in Turkish to be used with OpenAI's CLIP

Acknowledgement

Google supported this work by providing Google Cloud credit. Thank you Google for supporting the open source! 🎉

What is this?

This work enables to use OpenAI CLIP's ViT-B/32 image encoder with a text encoder in Turkish. It is composed of a base model and a clip head model. The base model is a finetuned version of dbmdz/distilbert-base-turkish-cased and published at HuggingFace's Models Hub. It should be used with clip_head.h5 from this repo.

Installation

First, you need to install CLIP and its requirements according the prompts in its repo. Then, clone this repo and all other requirements can be installed by using requirements.txt:

git clone https://github.com/monatis/turkish-clip.git
cd turkish-clip
pip install -r requirements.txt

Usage

Once you clone the repo and install the requirements, you can run inference.py script for a quick inference demo:

python inference.py

This script loads the base model from HuggingFace's Models Hub and the clip head from this repo. It correctly classifies two sample images with a zero-shot technique.

How it works

encode_text() function agregates per-token hidden states outputted by the Distilbert model to produce a single vector per sequence. Then, clip_head.h5 model projects this vector onto the same vector space as CLIP's text encoder with a single dense layer. First, all the Distilbert layers were frozen an and the head dense layer was trained for a few epochs. Then, freezing was removed and the dense layer was trained with the Distilbert layers for a few more epochs. I created the dataset by machine-translating COCO captions into Turkish. During training, vector representations of English captions outputted by the original CLIP text encoder was used as target values, and MSE between these vectors and clip_head.h5 outputs were minimized.

Future work

The dataset and the training notebook will be released soon. I may also consider releasing bigger models finetuned with better datasets as well as more usage examples if the community finds this work useful. This model will also be added to my ai-aas project.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Images		Images
README.md		README.md
clip_head.h5		clip_head.h5
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images

Images

README.md

README.md

clip_head.h5

clip_head.h5

inference.py

inference.py

requirements.txt

requirements.txt

Repository files navigation

Acknowledgement

What is this?

Installation

Usage

How it works

Future work

About

Releases

Sponsor this project

Packages

Languages

monatis/turkish-clip

Folders and files

Latest commit

History

Repository files navigation

Acknowledgement

What is this?

Installation

Usage

How it works

Future work

About

Topics

Resources

Stars

Watchers

Forks

Sponsor this project

Languages