# Snowflake Vector Store

This notebook shows how to use the Snowflake Vector Store functionality within LangChain.

[Snowflake](https://www.snowflake.com/) is a cloud-based data warehousing platform that provides native support for vector data types and similarity search functions, making it an excellent choice for storing and querying embeddings.

## Features

- üèîÔ∏è **Native Snowflake Integration**: Uses Snowflake's built-in vector capabilities
- üîç **Semantic Search**: Powered by VECTOR_COSINE_SIMILARITY function
- üìä **Scalable**: Leverages Snowflake's cloud-native architecture
- üîí **Secure**: Enterprise-grade security and compliance
- üöÄ **High Performance**: Optimized for large-scale vector operations

## Setup

First, install the required packages:

In [None]:
# Install required packages
# %pip install langchain-snowflake-vectorstore snowflake-connector-python langchain-openai

## Credentials

You'll need to set up your Snowflake credentials. You can do this via environment variables:

In [None]:
import os

# Set your Snowflake credentials
# You can also set these as environment variables
SNOWFLAKE_ACCOUNT = os.getenv("SNOWFLAKE_ACCOUNT", "your-account")
SNOWFLAKE_USER = os.getenv("SNOWFLAKE_USER", "your-username")
SNOWFLAKE_PASSWORD = os.getenv("SNOWFLAKE_PASSWORD", "your-password")
SNOWFLAKE_DATABASE = os.getenv("SNOWFLAKE_DATABASE", "your-database")
SNOWFLAKE_SCHEMA = os.getenv("SNOWFLAKE_SCHEMA", "your-schema")
SNOWFLAKE_WAREHOUSE = os.getenv("SNOWFLAKE_WAREHOUSE", "your-warehouse")
SNOWFLAKE_ROLE = os.getenv("SNOWFLAKE_ROLE", "your-role")

## Initialization

Create a Snowflake vector store instance with your configuration:

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_snowflake_vectorstore import SnowflakeVectorStore

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Create the vector store
vector_store = SnowflakeVectorStore(
    account=SNOWFLAKE_ACCOUNT,
    user=SNOWFLAKE_USER,
    password=SNOWFLAKE_PASSWORD,
    database=SNOWFLAKE_DATABASE,
    schema=SNOWFLAKE_SCHEMA,
    warehouse=SNOWFLAKE_WAREHOUSE,
    role=SNOWFLAKE_ROLE,
    table_name="langchain_vector_store",
    embedding_function=embeddings,
    embedding_dimension=1536,  # OpenAI embeddings dimension
)

print("Vector store initialized successfully!")

## Manage vector store

### Add documents

Add documents to the vector store:

In [None]:
from langchain_core.documents import Document

# Sample documents
documents = [
    Document(
        page_content="Snowflake is a cloud-based data warehousing platform.",
        metadata={"source": "snowflake_info", "category": "technology"},
    ),
    Document(
        page_content="LangChain is a framework for developing applications powered by language models.",
        metadata={"source": "langchain_info", "category": "technology"},
    ),
    Document(
        page_content="Vector databases are specialized databases for storing and querying high-dimensional vectors.",
        metadata={"source": "vector_db_info", "category": "database"},
    ),
    Document(
        page_content="Machine learning models can generate embeddings that represent semantic meaning.",
        metadata={"source": "ml_info", "category": "machine_learning"},
    ),
]

# Add documents to the vector store
ids = vector_store.add_documents(documents)
print(f"Added {len(ids)} documents to the vector store")
print(f"Document IDs: {ids}")

### Add texts

You can also add texts directly:

In [None]:
# Add texts directly
texts = [
    "Artificial intelligence is transforming various industries.",
    "Natural language processing enables computers to understand human language.",
    "Deep learning models can process complex patterns in data.",
]

metadatas = [
    {"source": "ai_info", "category": "artificial_intelligence"},
    {"source": "nlp_info", "category": "natural_language_processing"},
    {"source": "dl_info", "category": "deep_learning"},
]

text_ids = vector_store.add_texts(texts, metadatas=metadatas)
print(f"Added {len(text_ids)} texts with IDs: {text_ids}")

### Delete documents

Remove documents from the vector store:

In [None]:
# Delete specific documents by ID
if text_ids:
    # Delete the last added text as an example
    deleted = vector_store.delete([text_ids[-1]])
    print(f"Deleted document: {deleted}")

## Query vector store

### Similarity search

Find documents similar to a query:

In [None]:
# Perform similarity search
query = "What is LangChain?"
results = vector_store.similarity_search(query, k=3)

print(f"Query: {query}")
print(f"Found {len(results)} similar documents:")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content}")
    print(f"   Metadata: {doc.metadata}")
    print()

### Similarity search with scores

Get similarity scores along with the results:

In [None]:
# Similarity search with scores
query = "machine learning and AI"
results_with_scores = vector_store.similarity_search_with_score(query, k=3)

print(f"Query: {query}")
print(f"Results with similarity scores:")
for doc, score in results_with_scores:
    print(f"Score: {score:.4f}")
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")
    print()

### Filtered search

Search with metadata filters:

In [None]:
# Search with metadata filter
query = "technology"
filter_dict = {"category": "technology"}
filtered_results = vector_store.similarity_search(query, k=5, filter=filter_dict)

print(f"Query: {query}")
print(f"Filter: {filter_dict}")
print(f"Found {len(filtered_results)} filtered results:")
for doc in filtered_results:
    print(f"- {doc.page_content}")
    print(f"  Category: {doc.metadata.get('category')}")

## Usage for retrieval-augmented generation

Use the vector store as a retriever for RAG applications:

In [None]:
# Create a retriever from the vector store
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3})

# Test the retriever
query = "How does Snowflake work with vectors?"
retrieved_docs = retriever.invoke(query)

print(f"Query: {query}")
print(f"Retrieved {len(retrieved_docs)} documents:")
for doc in retrieved_docs:
    print(f"- {doc.page_content}")

### RAG Chain Example

Create a complete RAG chain using the Snowflake vector store:

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

# Create a RAG chain
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)


def format_docs(docs):
    """Format documents for RAG context."""
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Test the RAG chain
question = "What is LangChain and how does it relate to language models?"
answer = rag_chain.invoke(question)

print(f"Question: {question}")
print(f"Answer: {answer}")

## API reference

For detailed documentation of all features and configuration options, please refer to the API reference for `SnowflakeVectorStore`.

### Key Parameters

- **account**: Your Snowflake account identifier
- **user**: Snowflake username
- **password**: Snowflake password
- **database**: Target database name
- **schema**: Target schema name
- **warehouse**: Snowflake warehouse to use
- **role**: Snowflake role for permissions
- **table_name**: Name of the table to store vectors
- **embedding_function**: Function to generate embeddings
- **embedding_dimension**: Dimension of the embedding vectors

### Key Methods

- `add_documents(documents)`: Add Document objects to the store
- `add_texts(texts, metadatas)`: Add text strings with optional metadata
- `similarity_search(query, k)`: Find k most similar documents
- `similarity_search_with_score(query, k)`: Search with similarity scores
- `delete(ids)`: Delete documents by their IDs
- `as_retriever()`: Convert to a LangChain retriever

For more information, visit the [langchain-snowflake-vectorstore documentation](https://pypi.org/project/langchain-snowflake-vectorstore/).