### This notebook shows example of RAG using langchain, openai api, and chromadb

For this example there are two test file in docs/test, with examples how to print and define function using made up prog lang SimpleLang, to test the RAG.

* For running in codespaces I've had to install these:
    ```bash
    sudo apt-get install -y libgl1-mesa-dev libglib2.0-0
    ```

* Then install pip packages:

    ```bash
    pip install -U langchain openai chromadb langchainhub python-dotenv  pysqlite3-binary unstructured[all-docs] tiktoken
    ```

* And prepend this to file ***~/.python/current/lib/python3.10/site-packages/chromadb/__init__.py*** or whatever python path you have :
    ```python
    __import__('pysqlite3')
    import sys
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')
    ```

In [13]:
from dotenv import load_dotenv
import os 
# Get the current working directory and load .env
cwd = os.getcwd()
env_path = os.path.join(cwd, '.env')
load_dotenv(dotenv_path=env_path)

True

In [None]:
from langchain import hub
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import DirectoryLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

# directory path
directory = '/workspaces/BALSA/docs/test'

loader = DirectoryLoader(directory)
docs = loader.load()
print(f"files in path: {len(docs)} ")

In [15]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings(), persist_directory="./chroma_db")
retriever = vectorstore.as_retriever()

prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [16]:
rag_chain.invoke("How to print hello world using SimpleLang?")

'To print "Hello, World!" using SimpleLang, you can use the p() function and pass "Hello, World!" as an argument. The code would be: p("Hello, World!").'

In [18]:
rag_chain.invoke("How to write a simple function that will calculate and print the number to the power of 2 using SimpleLang?. show example.")

"To write a simple function in SimpleLang that calculates and prints the number to the power of 2, you can define a function with a parameter for the number and use the ** operator to calculate the power. Here's an example:\n\nf powerOfTwo(num) { result = num ** 2 p(result) }\n\nThis function takes a parameter num, calculates the square of num using the ** operator, and then prints the result using p(result)."