---
sidebar_label: Baseten
---


# BasetenEmbeddings

This will help you get started with Baseten embedding models using LangChain. For detailed documentation on `BasetenEmbeddings` features and configuration options, please refer to the [API reference](https://python.langchain.com/api_reference/baseten/embeddings/langchain_baseten.embeddings.BasetenEmbeddings.html).

## Overview

### Integration details

import { ItemTable } from "@theme/FeatureTables";

<ItemTable category="text_embedding" item="Baseten" />

## Setup

To access Baseten embedding models you'll need to create a Baseten account, get an API key, deploy an embedding model, and install the `langchain-baseten` integration package.

### Credentials

Head to [baseten.co](https://baseten.co/) to sign up to Baseten and generate an API key. Once you've done this set the BASETEN_API_KEY environment variable:


In [None]:
import getpass
import os

if not os.getenv("BASETEN_API_KEY"):
    os.environ["BASETEN_API_KEY"] = getpass.getpass("Enter your Baseten API key: ")


In [None]:
To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:


# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")


In [None]:
If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:


In [None]:
%pip install --upgrade --quiet langchain-baseten


## Instantiation

Now we can instantiate our embedding model. You'll need to deploy an embedding model on Baseten first and get the model URL from your dashboard:


In [None]:
from langchain_baseten import BasetenEmbeddings

embeddings = BasetenEmbeddings(
    model="your-embedding-model",
    model_url="https://model-<id>.api.baseten.co/environments/production/sync",
    # api_key="...",  # or set BASETEN_API_KEY
)


## Indexing and Retrieval

Embedding models are often used in retrieval augmented generation (RAG) flows, both as part of indexing data as well as later retrieving it. For more detailed instructions, please see our RAG tutorials under the [working with external knowledge tutorials](/docs/tutorials/#working-with-external-knowledge).

Below, see how to index and retrieve data using the `embeddings` object we initialized above. In this example, we will index and retrieve a sample document in the `InMemoryVectorStore`.


In [None]:
# Create a vector store with a sample text
from langchain_core.vectorstores import InMemoryVectorStore

text = "LangChain is the framework for building context-aware reasoning applications"

vectorstore = InMemoryVectorStore.from_texts(
    [text],
    embedding=embeddings,
)

# Use the vectorstore as a retriever
retriever = vectorstore.as_retriever()

# Retrieve the most similar text
retrieved_documents = retriever.invoke("What is LangChain?")

# show the retrieved document's content
retrieved_documents[0].page_content


In [None]:
# Embed multiple documents
texts = [
    "Machine learning is a subset of artificial intelligence",
    "Natural language processing helps computers understand text",
    "Vector embeddings represent text as numerical arrays"
]

vectors = embeddings.embed_documents(texts)
print(f"Generated {len(vectors)} embeddings")
print(f"Each embedding has {len(vectors[0])} dimensions")
print(f"First embedding sample: {vectors[0][:3]}")


In [None]:
# Embed single query
query = "What is artificial intelligence?"
query_vector = embeddings.embed_query(query)
print(f"Query embedding dimension: {len(query_vector)}")
print(f"Query embedding sample: {query_vector[:3]}")


# Performance Client Features

Baseten embeddings use the Performance Client for optimized throughput:


In [None]:
# Large batch processing with automatic optimization
large_text_list = [f"Document {i} about various topics" for i in range(100)]
large_vectors = embeddings.embed_documents(large_text_list)

print(f"Processed {len(large_vectors)} embeddings with automatic batching")
print("Performance Client features:")
print("- Automatic batching (batch_size=32)")
print("- Concurrent requests (max_concurrent_requests=128)")
print("- Smart request sizing (max_chars_per_request=8000)")
