# Test Creation index local Rust docs stored in VectorDB

See: [Talk to your Text files in Vector Databases with GPT-4 and ChromaDB](https://medium.com/@rubentak/unleashing-the-power-of-intelligent-chatbots-with-gpt-4-and-vector-databases-a-step-by-step-8027e2ce9e78)

Import libraries

In [1]:

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import ChatOpenAI
import os
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader

Load the docs

In [2]:
# Print number of txt files in directory
loader = DirectoryLoader('D:/src/pareidolia/book', glob="./*.md")
doc = loader.load ( )
len(doc)

105

Splitting the text into chunks

In [3]:
text_splitter = RecursiveCharacterTextSplitter (chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(doc)

# Count the number of chunks
len(texts)

1032

## Database creation with ChromaDB
Now let’s put the text chunks into embeddings in a local Chroma vector database. Supplying a persist_directory will store the embeddings on the disk.

In [4]:
persist_directory = 'db'

# OpenAI embeddings
embedding = OpenAIEmbeddings()

vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embedding,
                                 persist_directory=persist_directory)

Persist the db to disk

In [5]:
vectordb.persist()
vectordb = None


Use the notebook notebook_rust_docs_read.ipynb to interrogate the local files (in the VectorDB)
