<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/DashvectorIndexDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Zvec Vector Store

If you're opening this Notebook on colab, you will probably need to install LlamaIndex ðŸ¦™.

In [None]:
%pip install llama-index-vector-stores-zvec

In [None]:
%pip install llama-index

In [None]:
import logging
import sys
import os

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

import openai

# os.environ["OPENAI_API_KEY"] = "sk-..."
openai.api_key = os.environ["OPENAI_API_KEY"]

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.zvec import ZvecVectorStore

#### Download Data

In [None]:
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

#### Load documents, build the ZvecVectorStore and VectorStoreIndex

In [None]:
from llama_index.core import SimpleDirectoryReader
from IPython.display import Markdown, display

# load documents
documents = SimpleDirectoryReader("./data/paul_graham").load_data()

In [None]:
from llama_index.core import StorageContext, VectorStoreIndex

vector_store = ZvecVectorStore(
    path="zvec_demo.zvec", collection_name="zvec_demo", embed_dim=384
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

#### Query Index

In [None]:
vector_retriever = index.as_retriever()
# search
source_nodes = vector_retriever.retrieve("What did the author do growing up?")
# check source_nodes
for node in source_nodes:
    print(f"---------------------------------------------")
    print("Search Test")
    print(f"---------------------------------------------")
    print(f"Score: {node.score:.3f}")
    print(node.get_content())
    print(f"---------------------------------------------")

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")

In [None]:
display(Markdown(f"<b>{response}</b>"))

### Metadata filter example

It is possible to narrow down the search space by filter with metadata. Below is an example to show that in practice. 

In [None]:
from llama_index.core.schema import Document
from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.vector_stores.zvec import ZvecVectorStore

test_documents = [
    Document(
        text="Artificial intelligence is a branch of computer science that aims to create software or machines that exhibit human-like intelligence.",
        metadata={
            "title": "Introduction to Artificial Intelligence",
            "author": "Dr. Smith",
            "category": "Technology",
            "year": 2023,
        },
        id_="ai_intro_001",
    ),
    Document(
        text="Machine learning is a method of data analysis that automates analytical model building.",
        metadata={
            "title": "Understanding Machine Learning",
            "author": "Prof. Johnson",
            "category": "Technology",
            "year": 2024,
        },
        id_="ml_basics_002",
    ),
    Document(
        text="Big data refers to extremely large datasets that can be analyzed computationally to reveal patterns and trends.",
        metadata={
            "title": "Big Data Concepts",
            "author": "Analyst Wilson",
            "category": "Business",
            "year": 2023,
        },
        id_="big_data_003",
    ),
    Document(
        text="Blockchain is a system of recording information in a way that makes it difficult to change or hack.",
        metadata={
            "title": "Blockchain Technology Overview",
            "author": "Expert Taylor",
            "category": "Finance",
            "year": 2024,
        },
        id_="blockchain_004",
    ),
]

metadata = {
    "title": "str",
    "author": "str",
    "category": "str",
    "year": "int",
}

vector_store = ZvecVectorStore(
    path="zvec_filter_demo.zvec",
    collection_name="zvec_filter_demo",
    embed_dim=384,
    collection_metadata=metadata,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    test_documents, storage_context=storage_context
)

Define the metadata filters.

In [None]:
from llama_index.core.vector_stores import (
    ExactMatchFilter,
    MetadataFilters,
    FilterOperator,
)

filters = MetadataFilters(
    filters=[ExactMatchFilter(key="category", value="Technology")]
)

Use the index as a retriever to use the metadatafilter option. 

In [None]:
retriever = index.as_retriever(filters=filters)
retriever.retrieve("What is computationally about?")

### Query Index with Hybrid Search

Use hybrid search with bm25 and vector.  
`alpha` parameter determines weighting (alpha = 0 -> bm25, alpha=1 -> vector search).  

In [None]:
from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.vector_stores.zvec import ZvecVectorStore

vector_store = ZvecVectorStore(
    path="zvec_hybrid_demo.zvec",
    collection_name="zvec_hybrid_demo",
    embed_dim=384,
    support_sparse_vector=True,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

In [None]:
query_engine = index.as_query_engine(
    vector_store_query_mode="hybrid", alpha=0.7
)
response = query_engine.query(
    "What did the author do growing up?",
)

In [None]:
display(Markdown(f"<b>{response}</b>"))