### This notebook shows example of RAG using langchain, openai api, and chromadb

For this example there are two test file in docs/test, with examples how to print and define function using made up prog lang SimpleLang, to test the RAG.

* For running in codespaces I've had to install these:
    ```bash
    sudo apt-get update
    sudo apt-get install -y libgl1-mesa-dev libglib2.0-0
    ```

* Then install pip packages:

    ```bash
    pip install -U langchain openai chromadb langchainhub python-dotenv  pysqlite3-binary unstructured[all-docs] tiktoken
    ```

* And prepend this to file 

    ```
    /workspaces/BALSA/.venv/lib/python3.10/site-packages/chromadb/__init__.py

    ```

    ```python
    __import__('pysqlite3')
    import sys
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')
    ```

In [1]:
from dotenv import load_dotenv
import os 
# Get the current working directory and load .env
cwd = os.getcwd()
env_path = os.path.join(cwd, '.env')
load_dotenv(dotenv_path=env_path)

True

In [2]:
from langchain import hub
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import DirectoryLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

# directory path
directory = '/workspaces/BALSA/data/rags'

loader = DirectoryLoader(directory)
docs = loader.load()
print("docs",docs)
print(f"files in path: {len(docs)} ")

  from .autonotebook import tqdm as notebook_tqdm


docs [Document(page_content='Examples in SimpleLang:\n\nWriting a Function.\n\nTo write a simple function that adds two numbers:\n\nf addNumbers(a, b) { result = a + b p(result) }\n\nIn this example, f addNumbers(a, b) { ... } defines a new function named addNumbers with two parameters a and b. Inside the function, result is calculated as the sum of a and b, and then it is printed using p(result). The curly braces {} enclose the function body.', metadata={'source': '/workspaces/BALSA/data/rags/func.txt'}), Document(page_content='Examples in SimpleLang:\n\nPrinting "Hello, World!"\n\np("Hello, World!")\n\nThis line will output "Hello, World!" to the console. The p() function is a built-in function in SimpleLang for printing.', metadata={'source': '/workspaces/BALSA/data/rags/print.md'})]
files in path: 2 


In [3]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings(), persist_directory="./chroma_db")
retriever = vectorstore.as_retriever()

prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [4]:
rag_chain.invoke("How to print hello world using SimpleLang?")

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


'To print "Hello, World!" using SimpleLang, you can use the p() function and pass "Hello, World!" as an argument. The code would be: p("Hello, World!").'

In [5]:
rag_chain.invoke("How to write a simple function that will calculate and print the number to the power of 2 using SimpleLang?. show example.")

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2


"To write a simple function in SimpleLang that calculates and prints the number to the power of 2, you can define a function with a parameter for the number and use the ** operator to calculate the power. Then, use the p() function to print the result. Here's an example:\n\nf powerOfTwo(num) {\n  result = num ** 2\n  p(result)\n}"