Customized Embedding Hub - Examples, Datasets, Pre-Trained Matrices #18

Glavin001 · 2023-02-23T22:57:27Z

Problem

The default embeddings (e.g. Ada-002 from OpenAI, etc) are great generalists. However, they are not tailored for your specific use-case.

Proposed Solution

🎉 Customizing Embeddings!

ℹ️ See my tutorial / lessons learned if you're interested in learning more, step-by-step, with screenshots and tips.

🎯 Specifically for Lanchain Hub would be providing a collection of pre-trained custom embeddings.

Similar to https://huggingface.co/models except focused on semantic embeddings.
List the known tasks so developers can search the available custom embeddings for each:

Hub provides a set of Tasks each with:

Modality (e.g. text, image, etc)
Embedding engine to use & # of dimensions (text=>ada-002 with 1536 dimensions, image=>CLIP...)
Expected prompt formats for documents and/or queries (i.e. what data should look like before being sent to embedding model)
- e.g. Documents should look like X. Short form queries look like Y. Topic or objective is Z.
Pre-made Datasets for training on your own
- Data preparation scripts
Pre-trained Matrices

Leverage Langchain's helpers to help train and use the custom embedding matrix:

The text was updated successfully, but these errors were encountered:

Glavin001 mentioned this issue Feb 23, 2023

Task Fine-Tuning - Datasets, Examples, etc #19

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customized Embedding Hub - Examples, Datasets, Pre-Trained Matrices #18

Customized Embedding Hub - Examples, Datasets, Pre-Trained Matrices #18

Glavin001 commented Feb 23, 2023

Customized Embedding Hub - Examples, Datasets, Pre-Trained Matrices #18

Customized Embedding Hub - Examples, Datasets, Pre-Trained Matrices #18

Comments

Glavin001 commented Feb 23, 2023

Problem

Proposed Solution