# TiDB Vector

> [TiDB](https://github.com/pingcap/tidb) is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics.

In its latest version (insert version number here), TiDB introduces support for vector search. This notebook provides a detailed guide on utilizing the tidb vector search in LlamaIndex.

## Setting up environments

In [None]:
%pip install llama-index
%pip install tidbvec

In [None]:
import textwrap
import openai

from llama_index import SimpleDirectoryReader, StorageContext
from llama_index.indices.vector_store import VectorStoreIndex
from llama_index.vector_stores.tidb_vector import TiDBVector

Configure both the OpenAI and TiDB host settings that you will need

In [None]:
# Here we useimport getpass
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
tidb_connection_url = getpass.getpass(
    "TiDB connection URL (format - mysql+pymysql://root@127.0.0.1:4000/test): "
)

Prepare data that used to show case

In [None]:
%pip install pymysql
%mkdir -p 'data/paul_graham/'
%wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [None]:
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print("Document ID:", documents[0].doc_id)
for index, document in enumerate(documents):
    document.metadata = {"book": "paul_graham"}

Document ID: 0408448e-010e-422f-b380-9a0bf218a667


## Create TiDB Vectore Store

The code snippet below creates a table named 'COLLECTION_NAME' in TiDB, optimized for vector searching. Upon successful execution of this code, you will be able to view and access the 'collection name' table directly within your TiDB database environment

In [None]:
COLLECTION_NAME = "paul_graham_test"
tidbvec = TiDBVector(
    connection_string=tidb_connection_url,
    collection_name=COLLECTION_NAME,
    pre_delete_collection=False,
)

Create a query engine based on tidb vectore store

In [None]:
storage_context = StorageContext.from_defaults(vector_store=tidbvec)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, show_progress=True
)

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00,  5.46it/s]
Generating embeddings: 100%|██████████| 21/21 [00:02<00:00,  7.46it/s]


## Semantic similarity search

This section focus on vector search basics and refining results using metadata filters. Please note that tidb vector only supports Deafult VectorStoreQueryMode.

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do?")
print(textwrap.fill(str(response), 100))

The author worked on various things, including writing, programming, and painting. They wrote short
stories and later started programming on microcomputers. They also studied philosophy in college but
switched to AI. The author published essays online and realized the potential of the web as a medium
for publishing. They wrote essays on different topics and eventually had a collection of essays
published as a book. Additionally, the author worked on spam filters, did some painting, and hosted
dinners for friends. They also bought a building in Cambridge to use as an office. The author met
someone named Jessica Livingston at a party and asked her out.


### Filter with metadata

perform searches using metadata filters to retrieve a specific number of nearest-neighbor results that align with the applied filters.

In [None]:
from llama_index.vector_stores.types import MetadataFilter, MetadataFilters

query_engine = index.as_query_engine(
    filters=MetadataFilters(
        filters=[
            MetadataFilter(key="book", value="paul_graham", operator="!="),
        ]
    ),
    similarity_top_k=2,
)
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))

Empty Response


Query again

In [None]:
from llama_index.vector_stores.types import MetadataFilter, MetadataFilters

query_engine = index.as_query_engine(
    filters=MetadataFilters(
        filters=[
            MetadataFilter(key="book", value="paul_graham", operator="=="),
        ]
    ),
    similarity_top_k=2,
)
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))

The author learned how to program on the IBM 1401 in 9th grade using an early version of Fortran.
They also learned about microcomputers and started programming on a TRS-80. In college, the author
initially planned to study philosophy but found it boring and switched to AI. Additionally, the
author applied to art schools and ended up attending the Accademia di Belli Arti in Florence. While
there, they learned about painting and drawing, but also discovered that the faculty did not teach
much and the students did not have a strong desire to learn. The author also started painting still
lives during their time at the Accademia.


## Delete documents

In [None]:
tidbvec.delete(documents[0].doc_id)

Check whether the documents had been deleted

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))

Empty Response
