Embeddings
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
State-of-the-Art Text Embeddings
Retrieval and Retrieval-augmented LLMs
Open-source search and retrieval database for AI applications.
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
LlamaIndex is the leading document agent and OCR platform
A library for efficient similarity search and clustering of dense vectors.
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of …
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
A blazing fast inference solution for text embeddings models
The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text.
Netease Youdao's open-source embedding and reranker models for RAG products.
Vald. A Highly Scalable Distributed Vector Search Engine
Open-source vector similarity search for Postgres
MTEB: Massive Text Embedding Benchmark
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
A lightweight, lightning-fast, in-process vector database