<a href="https://colab.research.google.com/github/John-Rood/Philosophy/blob/main/Philosophy_ChatGPT_LangChain_and_Milvus_Vectordb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a notebook to show you how to install a Vector database locally, and then fill it with useful information to query with an LLM like ChatGPT.

We use LangChain for easy integration with LLMs in general, and Chroma, an open-source vector database. LangChain is has already done the fundamental work you would need to do to start working with LLMs. Using LangChain is like a shortcut to not needing to write a lot of code yourself. 

**Step 1:** Install deps

In [None]:
!pip install chromadb langchain openai tiktoken requests 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting chromadb
  Downloading chromadb-0.3.21-py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.4/46.4 KB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain
  Downloading langchain-0.0.146-py3-none-any.whl (600 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m600.7/600.7 KB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai
  Downloading openai-0.27.4-py3-none-any.whl (70 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.3/70.3 KB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken
  Downloading tiktoken-0.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m48.8 MB/s[0m eta [36m0:00:00[0m
Collecting hnswlib>=0.7
  Downloading hnswlib-0.7.0.tar.gz (33 

**Step 2:** Import and Initialize

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

%env OPENAI_API_KEY = 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



env: OPENAI_API_KEY=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


**Step 3:** Vectorize the Data with Embeddings & Store to the VDB

In [None]:
import re 
import os
import requests

GITHUB_API_URL = "https://api.github.com/repos/John-Rood/Philosophy/contents/books"

def download_file(file_url):
    response = requests.get(file_url)
    response.raise_for_status()
    return response.text

def process_book(file_url):
    text = download_file(file_url)

    # Extract metadata
    title = re.search(r"Title:\s*(.+)", text)
    author = re.search(r"Author:\s*(.+)", text)

    title = title.group(1).strip() if title else "Unknown Title"
    author = author.group(1).strip() if author else "Unknown Author"

    texts_and_chapters = split_text(text.strip())

    embeddings = OpenAIEmbeddings()
    docsearch = Chroma.from_texts([text for text, _ in texts_and_chapters], embeddings, metadatas=[{"source": f"Text chunk {i} of {len(texts_and_chapters)}",
                                                                                                    "title": title,
                                                                                                    "author": author,
                                                                                                    "chapter": chapter}
                                                                                                   for i, (_, chapter) in enumerate(texts_and_chapters)], persist_directory="db")
    docsearch.persist()
    docsearch = None


def split_text(text, min_threshold=1000):
    segments = []
    sentence_spans = list(re.finditer(r"(?<=[.!?])\s+", text))

    current_segment = []
    current_length = 0
    current_chapter = "Unknown Chapter"

    chapters = list(re.finditer(r"(CHAPTER|BOOK)\s*[\w\d]*\s*[:.\n]\s*(.+)", text, re.IGNORECASE))
    chapter_index = 0

    sentence_start = 0
    for sentence_span in sentence_spans:
        sentence = text[sentence_start:sentence_span.end()]
        
        # Check if there's a next chapter and if the current sentence is the start of the next chapter
        if (chapter_index < len(chapters) - 1 and
                sentence_start >= chapters[chapter_index + 1].start()):
            chapter_index += 1
            current_chapter = chapters[chapter_index].group(1).strip()

        current_segment.append(sentence)
        current_length += len(sentence)

        if current_length >= min_threshold:
            segments.append((" ".join(current_segment), current_chapter))
            current_segment = []
            current_length = 0

        sentence_start = sentence_span.end()

    last_sentence = text[sentence_start:]
    if last_sentence:
        current_segment.append(last_sentence)
        
    if current_segment:
        segments.append((" ".join(current_segment), current_chapter))

    return segments


def get_book_files():
    response = requests.get(GITHUB_API_URL)
    response.raise_for_status()
    files = response.json()
    return [file["download_url"] for file in files if file["name"].endswith(".txt")]

book_urls = get_book_files()

for index, file_url in enumerate(book_urls):
    print(f"Processing {file_url}")
    process_book(file_url)


Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/A%20Treatise%20of%20Human%20Nature%20-%20David%20Hume.txt




Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Aesop%E2%80%99s%20Fables%20-%20Aesop.txt
Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/An%20Enquiry%20Concerning%20Human%20Understanding%20-%20David%20Hume.txt




Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Apology%20-%20Plato.txt




Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Crito%20-%20Plato.txt
Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Ethics%20-%20Aristotle.txt




Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Euthyphro%20-%20Plato.txt
Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Gorgias%20-%20Plato.txt




Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Hidden%20Treasures%20-%20Harry%20A.%20Lewis.txt




Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Laws%20-%20Plato.txt




FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Meditations%20-%20Marcus%20Aurelius.txt




Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Nature%20-%20Ralph%20Waldo%20Emerson.txt


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))



Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Phaedo%20-%20Plato.txt


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))



Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Phaedrus%20-%20Plato.txt


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))



Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Poetics%20-%20Aristotle.txt


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Primitive%20culture%2C%20vol.%20I%20(of%202)%20-%20Edward%20B.%20Tylor.txt




FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))



Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/Symposium%20-%20Plato.txt


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))



Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/The%20Categories%20-%20Aristotle.txt


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))



Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/The%20Consolation%20of%20Philosophy%20-%20Boethius.txt


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))



Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/The%20Dhammapada%20-%20Unknown.txt


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))



Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/The%20Enchiridion%20-%20Epictetus.txt


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/The%20Five%20Great%20Philosophies%20of%20Life%20-%20William%20de%20Witt%20Hyde.txt




FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))



Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/The%20Prince%20-%20Nicolo%20Machiavelli.txt


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/The%20Republic%20-%20Plato.txt




FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/The%20Time%20Machine%20-%20H.%20G.%20(Herbert%20George)%20Wells.txt




FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/The%20World%20As%20Will%20And%20Idea%20(Vol.%201%20of%203)%20-%20Arthur%20Schopenhauer.txt




FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))



Processing https://raw.githubusercontent.com/John-Rood/Philosophy/main/books/The%20Zen%20Experience%20-%20Thomas%20Hoover.txt


FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

FloatProgress(value=0.0, layout=Layout(width='100%'), style=ProgressStyle(bar_color='black'))

**Step 4:**

In [None]:
from textwrap import wrap
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

prompt_template = """Use the following pieces of context to answer the question at the end. 
Answer as if you were the modern voice of the context. Make sure to not just repeat what is refernced. Don't preface, don't mention the context, and at the end, don't give any warnings.

{context}

Question: {question}

(answer the question directly. Most importantly, make your answer interesting, engaging, and helpful) 
Answer:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

embeddings = OpenAIEmbeddings()

docsearch = Chroma(persist_directory="db", embedding_function=embeddings)

chain_type_kwargs = {"prompt": PROMPT}

qa = RetrievalQA.from_chain_type(llm=ChatOpenAI(), chain_type="stuff", retriever=docsearch.as_retriever(), chain_type_kwargs=chain_type_kwargs, return_source_documents=True)

llm = OpenAI(model_name="gpt-3.5-turbo")

user_input = input("Your question: ")

iq = f"be direct and short. Question: {user_input} \n The intent of this question is to: "
intent_expansion = llm(iq)
kq = f"be general, direct, and short. Don't give an answer, only topics this question falls under to this question: {user_input}"
knowledge_expansion = llm(kq)

final_input = f'question_intent: {intent_expansion} | {knowledge_expansion}\n\
Question: {user_input}'
print(final_input + "\n\n")

result = qa({"query": final_input}, return_only_outputs=True)

#### code to wrap the output text to be more readable
long_str = result["result"]
lines = wrap(long_str, 80) 
####

print("\n")
print("Answer: " +"\n".join(lines))
print("\n")
for document in result['source_documents']:
    doc = document.metadata
    title = doc['title']
    author = doc['author']
    print(f"Author: {author}", f"\nTitle: {title}", f"\nChapter: {doc['chapter']}")
    doc_long_str = document.page_content
    doc_lines = wrap(doc_long_str, 80) 
    print('Document:' +"\n" .join(doc_lines))
    print('---') 




Your question: What is the meaning of life?
question_intent: seek an explanation or understanding of the purpose or significance of human existence. | Philosophy, spirituality, religion, existentialism, psychology.
Question: What is the meaning of life?




Answer: The meaning of life is a question that has been pondered by philosophers,
spiritualists, and psychologists alike for centuries. While there may not be a
definitive answer, many believe that the purpose of human existence is to find
one's own unique path and fulfill it to the best of their ability. This can
involve a variety of pursuits, such as personal growth, helping others, or
seeking spiritual enlightenment. Ultimately, the meaning of life is subjective
and varies from individual to individual. What's important is that we each find
our own purpose and strive to live a fulfilling life that brings us joy and
meaning.


Author: Ralph Waldo Emerson 
Title: Nature 
Chapter: CHAPTER
Document:Every man's condition is a solution