# Snowflake Vector Store

This notebook shows how to use the Snowflake Vector Store functionality within LangChain.

[Snowflake](https://www.snowflake.com/) is a cloud-based data warehousing platform that provides native support for vector data types and similarity search functions, making it an excellent choice for storing and querying embeddings.

## Features

- 🏔️ **Native Snowflake Integration**: Uses Snowflake's built-in vector capabilities
- 🔍 **Semantic Search**: Powered by VECTOR_COSINE_SIMILARITY function
- 📊 **Scalable**: Leverages Snowflake's cloud-native architecture
- 🔒 **Secure**: Enterprise-grade security and compliance
- 🚀 **High Performance**: Optimized for large-scale vector operations

## Setup

First, install the required packages:

In [None]:
# Install required packages
# %pip install langchain-snowflake-vectorstore snowflake-connector-python langchain-openai

## Credentials

You'll need to set up your Snowflake credentials. You can do this via environment variables:

In [None]:
import os

# Set your Snowflake credentials
# You can also set these as environment variables
SNOWFLAKE_ACCOUNT = os.getenv("SNOWFLAKE_ACCOUNT", "your-account")
SNOWFLAKE_USER = os.getenv("SNOWFLAKE_USER", "your-username")
SNOWFLAKE_PASSWORD = os.getenv("SNOWFLAKE_PASSWORD", "your-password")
SNOWFLAKE_DATABASE = os.getenv("SNOWFLAKE_DATABASE", "your-database")
SNOWFLAKE_SCHEMA = os.getenv("SNOWFLAKE_SCHEMA", "your-schema")
SNOWFLAKE_WAREHOUSE = os.getenv("SNOWFLAKE_WAREHOUSE", "your-warehouse")
SNOWFLAKE_ROLE = os.getenv("SNOWFLAKE_ROLE", "your-role")

## Quick Start

Here's a quick example to get you started:

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_snowflake_vectorstore import SnowflakeVectorStore

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Create vector store
vector_store = SnowflakeVectorStore(
    account=SNOWFLAKE_ACCOUNT,
    user=SNOWFLAKE_USER,
    password=SNOWFLAKE_PASSWORD,
    database=SNOWFLAKE_DATABASE,
    schema=SNOWFLAKE_SCHEMA,
    warehouse=SNOWFLAKE_WAREHOUSE,
    role=SNOWFLAKE_ROLE,
    table_name="vector_documents",
    embedding_function=embeddings,
    embedding_dimension=1536,  # OpenAI embeddings dimension
)

# Add documents
texts = [
    "LangChain is a framework for developing applications powered by language models.",
    "Snowflake is a cloud-based data warehousing platform.",
    "Vector databases enable semantic search and similarity matching.",
]

ids = vector_store.add_texts(texts)
print(f"Added {len(ids)} documents with IDs: {ids}")

# Search for similar documents
results = vector_store.similarity_search("What is LangChain?", k=2)
for doc in results:
    print(f"Found: {doc.page_content}")

## Initialize the Vector Store

Create a Snowflake vector store instance with detailed configuration:

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_snowflake_vectorstore import SnowflakeVectorStore

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Create the vector store
vector_store = SnowflakeVectorStore(
    account=SNOWFLAKE_ACCOUNT,
    user=SNOWFLAKE_USER,
    password=SNOWFLAKE_PASSWORD,
    database=SNOWFLAKE_DATABASE,
    schema=SNOWFLAKE_SCHEMA,
    warehouse=SNOWFLAKE_WAREHOUSE,
    role=SNOWFLAKE_ROLE,
    table_name="langchain_vector_store",
    embedding_function=embeddings,
    embedding_dimension=1536,  # OpenAI embeddings dimension
)

print("Vector store initialized successfully!")

## Add Documents

Add some sample documents to the vector store:

In [None]:
from langchain_core.documents import Document

# Sample documents
documents = [
    Document(
        page_content="Snowflake is a cloud-based data warehousing platform.",
        metadata={"source": "snowflake_info", "category": "technology"},
    ),
    Document(
        page_content="LangChain is a framework for developing applications powered by language models.",
        metadata={"source": "langchain_info", "category": "technology"},
    ),
    Document(
        page_content="Vector databases are specialized databases for storing and querying high-dimensional vectors.",
        metadata={"source": "vector_db_info", "category": "database"},
    ),
    Document(
        page_content="Machine learning models can generate embeddings that represent semantic meaning.",
        metadata={"source": "ml_info", "category": "machine_learning"},
    ),
]

# Add documents to the vector store
ids = vector_store.add_documents(documents)
print(f"Added {len(ids)} documents to the vector store")
print(f"Document IDs: {ids}")

## Similarity Search

Perform similarity search to find relevant documents:

In [None]:
# Perform similarity search
query = "What is a cloud data warehouse?"
results = vector_store.similarity_search(query, k=3)

print(f"Query: {query}")
print(f"Found {len(results)} similar documents:\n")

for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content}")
    print(f"   Metadata: {doc.metadata}\n")

## Similarity Search with Scores

Get similarity scores along with the results:

In [None]:
# Perform similarity search with scores
query = "machine learning embeddings"
results_with_scores = vector_store.similarity_search_with_score(query, k=3)

print(f"Query: {query}")
print(f"Found {len(results_with_scores)} similar documents with scores:\n")

for i, (doc, score) in enumerate(results_with_scores, 1):
    print(f"{i}. Score: {score:.4f}")
    print(f"   Content: {doc.page_content}")
    print(f"   Metadata: {doc.metadata}\n")

## Adding Documents with Metadata

Add texts with associated metadata:

In [None]:
# Add texts with metadata
texts = ["Document 1", "Document 2"]
metadatas = [{"source": "file1.txt"}, {"source": "file2.txt"}]
ids = vector_store.add_texts(texts, metadatas=metadatas)

print(f"Added {len(ids)} texts with metadata")
print(f"Text IDs: {ids}")

## Using as a Retriever

You can use the vector store as a retriever in LangChain chains:

In [None]:
# Create a retriever from the vector store
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3})

# Use the retriever
query = "What is Snowflake?"
retrieved_docs = retriever.invoke(query)

print(f"Query: {query}")
print(f"Retrieved {len(retrieved_docs)} documents:\n")

for i, doc in enumerate(retrieved_docs, 1):
    print(f"{i}. {doc.page_content}")
    print(f"   Metadata: {doc.metadata}\n")

## Batch Operations

Create vector store from texts using the class method:

In [None]:
# Create from texts (class method)
texts = [
    "Snowflake supports VECTOR data types for efficient similarity search.",
    "VECTOR_COSINE_SIMILARITY is a built-in function in Snowflake.",
    "Cloud data warehouses provide scalable analytics capabilities.",
]

vector_store_from_texts = SnowflakeVectorStore.from_texts(
    texts=texts,
    embedding=embeddings,
    account=SNOWFLAKE_ACCOUNT,
    user=SNOWFLAKE_USER,
    password=SNOWFLAKE_PASSWORD,
    database=SNOWFLAKE_DATABASE,
    schema=SNOWFLAKE_SCHEMA,
    warehouse=SNOWFLAKE_WAREHOUSE,
    role=SNOWFLAKE_ROLE,
    table_name="batch_vector_store",
    embedding_dimension=1536,
)

print(f"Created vector store from {len(texts)} texts")

## Advanced Features

### Custom Table Management

You can recreate the table if needed:

In [None]:
# Recreate the table (this will delete all existing data)
# vector_store.recreate_table()
# print("Table recreated successfully")

### Configuration Parameters

Here's a comprehensive overview of all configuration parameters:

| Parameter | Description | Required |
|-----------|-------------|----------|
| account | Snowflake account identifier | Yes |
| user | Username for authentication | Yes |
| password | Password for authentication | Yes |
| database | Database name | Yes |
| schema | Schema name | Yes |
| warehouse | Warehouse name | Yes |
| role | Role name | No |
| table_name | Table name for storing vectors | Yes |
| embedding_function | Function to generate embeddings | Yes |
| embedding_dimension | Dimension of embedding vectors | Yes |

## Key Features

The Snowflake Vector Store provides:

1. **Native Vector Support**: Uses Snowflake's VECTOR data type for efficient storage
2. **Cosine Similarity**: Leverages VECTOR_COSINE_SIMILARITY function for fast similarity search
3. **Scalability**: Benefits from Snowflake's cloud-native architecture
4. **Security**: Enterprise-grade security features
5. **Integration**: Seamless integration with existing Snowflake workflows
6. **Metadata Support**: Store and query document metadata alongside vectors
7. **Batch Operations**: Efficient bulk insert and query operations

## Requirements

- Python 3.8+
- Snowflake account with vector support
- LangChain Core
- Snowflake Connector for Python
- SQLAlchemy

## Snowflake Setup

Your Snowflake account must support the VECTOR data type and VECTOR_COSINE_SIMILARITY function. These features are available in recent Snowflake versions.

### Required Permissions

Ensure your Snowflake role has the following permissions:

- CREATE TABLE on the target schema
- INSERT, SELECT, UPDATE, DELETE on the vector table
- USAGE on the database, schema, and warehouse

## Best Practices

- Use appropriate embedding dimensions for your use case
- Consider indexing strategies for large datasets
- Leverage Snowflake's clustering keys for better performance
- Monitor warehouse usage and scale as needed
- Use metadata effectively for filtering and organization

## Troubleshooting

Common issues and solutions:

1. **Connection Issues**: Verify your Snowflake credentials and network connectivity
2. **Permission Errors**: Ensure your role has necessary privileges on the database and schema
3. **Table Creation**: Make sure the warehouse is running and has sufficient resources
4. **Embedding Dimension Mismatch**: Ensure the embedding dimension matches your model's output

For more information, refer to the [Snowflake documentation](https://docs.snowflake.com/) and [LangChain documentation](https://python.langchain.com/).