# NoteBot prototyping

Let's have a conversation about your (markdown) notes

## Setup

Notes:
* Use FAISS instead of Chroma because of [sqlite3 compatibility issue](https://docs.trychroma.com/troubleshooting#sqlite)

In [None]:
from langchain.document_loaders import GitLoader
from langchain.text_splitter import MarkdownTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

from notebot.constants import NOTE_REPO_URL, NOTES_PATH, DB_PATH
from dotenv import load_dotenv
import textwrap
from typing import Union

In [None]:
load_dotenv()

## Load, split and ingest notes into vector store

Todo:
* [x] Persist database
* [ ] Improve splitting of documents

In [None]:
def filter_notes(file_path: str) -> bool:
    return file_path.endswith(".md") and not file_path.endswith("README.md")

In [None]:
loader = GitLoader(repo_path=str(NOTES_PATH), clone_url=NOTE_REPO_URL, file_filter=filter_notes)

In [None]:
raw_docs = loader.load()

In [None]:
splitter = MarkdownTextSplitter()

In [None]:
docs = splitter.split_documents(raw_docs)

In [None]:
raw_docs[-1].page_content

In [None]:
docs[-1].page_content

In [None]:
db = FAISS.from_documents(documents=docs, embedding=OpenAIEmbeddings())

In [None]:
db.save_local(folder_path=str(DB_PATH))

## Configure Chain

Todo:
* Use open source model hosted on HuggingFace hub
* Tweak prompt
* [Include sources in response](https://python.langchain.com/docs/use_cases/question_answering/how_to/chat_vector_db#conversationalretrievalchain-with-question-answering-with-sources)

In [None]:
llm = ChatOpenAI(temperature=0)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=db.as_retriever(), memory=memory)

## Chat with NoteBot

In [None]:
class NoteBot:
    """The AI assistant chatting about your notes."""

    def __init__(self, chain) -> None:
        self.chain = chain

    def chat(
        self, question: str, print_result: bool = True, return_response: bool = False
    ) -> Union[None, dict]:
        response = self.chain({"question": question})
        if print_result:
            print(textwrap.fill(response["answer"], 88))
        if return_response:
            return response

In [None]:
notebot = NoteBot(chain=chain)

In [None]:
answer = notebot.chat("What is an embedding according to fastai?", return_response=True)

In [None]:
answer.keys()

In [None]:
notebot.chat("What can I do with the Whisper model from OpenAI when using LangChain?")