# NoteBot prototyping

Let's have a conversation about your (markdown) notes

## Setup

Notes:
* Use FAISS instead of Chroma because of [sqlite3 compatibility issue](https://docs.trychroma.com/troubleshooting#sqlite)

In [None]:
import textwrap
from typing import Union

import gradio as gr
from dotenv import load_dotenv
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import GitLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.memory import ConversationBufferMemory
from langchain.text_splitter import MarkdownTextSplitter
from langchain.vectorstores import FAISS

from notebot.constants import DB_PATH, NOTE_REPO_URL, NOTES_PATH

In [None]:
load_dotenv()

## Load, split and ingest notes into vector store

Todo:
* [x] Persist database
* [ ] Improve splitting of documents

In [None]:
def filter_notes(file_path: str) -> bool:
    return file_path.endswith(".md") and not file_path.endswith("README.md")

In [None]:
loader = GitLoader(repo_path=str(NOTES_PATH), clone_url=NOTE_REPO_URL, file_filter=filter_notes)

In [None]:
if not NOTES_PATH.exists():
    raw_docs = loader.load()
    splitter = MarkdownTextSplitter()
    docs = splitter.split_documents(raw_docs)

In [None]:
if not DB_PATH.exists():
    db = FAISS.from_documents(documents=docs, embedding=OpenAIEmbeddings())
    db.save_local(folder_path=str(DB_PATH))
else:
    db = FAISS.load_local(folder_path=str(DB_PATH), embeddings=OpenAIEmbeddings())

## Configure Chain

Todo:
* Use open source model hosted on HuggingFace hub
* Customize prompt to _mainly_ return information from notes
* [Include sources in response](https://python.langchain.com/docs/use_cases/question_answering/how_to/chat_vector_db#conversationalretrievalchain-with-question-answering-with-sources)

In [None]:
llm = ChatOpenAI(temperature=0)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=db.as_retriever(), memory=memory)

## Chat with NoteBot

In [None]:
class NoteBot:
    """The AI assistant chatting about your notes."""

    def __init__(self, chain) -> None:
        self.chain = chain

    def chat(
        self, question: str, print_result: bool = True, return_response: bool = False
    ) -> Union[None, dict]:
        response = self.chain({"question": question})
        if print_result:
            print(textwrap.fill(response["answer"], 88))
        if return_response:
            return response

In [None]:
notebot = NoteBot(chain=chain)

In [None]:
answer = notebot.chat("What is an embedding according to fastai?", return_response=True)

In [None]:
answer.keys()

In [None]:
notebot.chat("What can I do with the Whisper model from OpenAI when using LangChain?")

## Create user interface

TODO:
* Use API to query chatbot via CLI

In [None]:
def chat(message: str, history: list) -> str:
    response = notebot.chat(message, print_result=False, return_response=True)

    return response["answer"]


gr.ChatInterface(
    fn=chat,
    title="NoteBot",
    description="### Let's have a chat about your notes",
    examples=["List the notes I can ask you about"],
).launch()