# Leveraging RAG for Prebuilt Code Access

In this Jupyter notebook, we'll explore how to use Retrieval-Augmented Generation (RAG) for accessing prebuilt code and facilitating data retrieval. Additionally, we'll demonstrate how RAG enables us to query a Large Language Model (LLM) directly, enhancing our ability to integrate and understand complex code snippets efficiently. This approach aims to streamline the development process by leveraging prebuilt resources and interactive querying, showcasing a practical application of RAG in code generation and knowledge extraction.

In [1]:
# Import Requires Modules
# Standard library imports
import warnings
from pprint import pprint
import re

# Third-party library imports
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter, Language
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers import LanguageParser
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.messages import AIMessage, HumanMessage
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Suppress warnings
warnings.filterwarnings("ignore")

In [2]:
# Insert API Key
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

In [3]:
# Load in loader 
loader = GenericLoader.from_filesystem(
    r"C:\Users\lberm\OneDrive\Desktop\GitHub_Repository\educational_code_gen\stable_properties",
    glob="*",
    suffixes=[".js"],
    parser=LanguageParser(language=Language.JS)
)
docs = loader.load()
print("Length of docs is " , len(docs))

# View Metadata
for document in docs:
    pprint(document.metadata)
    
# text_splitter = RecursiveCharacterTextSplitter.from_language(language=Language.JS, chunk_size=4000, chunk_overlap=200)
# texts = text_splitter.split_documents(docs)
# print(len(texts))


Length of docs is  3
{'content_type': 'simplified_code',
 'language': <Language.JS: 'js'>,
 'source': 'C:\\Users\\lberm\\OneDrive\\Desktop\\GitHub_Repository\\educational_code_gen\\stable_properties\\fluid_and_mechanical_properties.js'}
{'content_type': 'functions_classes',
 'language': <Language.JS: 'js'>,
 'source': 'C:\\Users\\lberm\\OneDrive\\Desktop\\GitHub_Repository\\educational_code_gen\\stable_properties\\UnitConverter.js'}
{'content_type': 'simplified_code',
 'language': <Language.JS: 'js'>,
 'source': 'C:\\Users\\lberm\\OneDrive\\Desktop\\GitHub_Repository\\educational_code_gen\\stable_properties\\UnitConverter.js'}


In [9]:
# text_splitter = RecursiveCharacterTextSplitter.from_language(language=Language.JS, chunk_size=2000, chunk_overlap=200)
# texts = text_splitter.split_documents(docs)
# print(len(texts))

11


In [4]:
# Print out content from the code
print("\n\n--8<--\n\n".join([document.page_content for document in docs]))

const math = require('mathjs');
/**
 * Object containing properties of various fluids.
 * Each fluid is represented as a key with an object containing its properties:
 * - sg: Specific gravity
 * - mu: Dynamic viscosity (Pa.s)
 * - Cp: Specific heat capacity (kJ/kg.K)
 * - Tfreeze: Freezing temperature (°C)
 * - Tboil: Boiling temperature (°C)
 */
const fluidProperties = {
    "water": {
        "sg": 1,
        "mu": 0.001,
        "Cp": 1,
        "Tfreeze": 0,
        "Tboil": 100
    },
    "gasoline": {
        "sg": 0.72,
        "mu": 0.00029,
        "Cp": 2.22,
        "Tfreeze": -50,
        "Tboil": 150
    },
    "diesel": {
        "sg": 0.8,
        "mu": 0.0022,
        "Cp": 2.05,
        "Tfreeze": -60,
        "Tboil": 300
    },
    "benzene": {
        "sg": 0.88,
        "mu": 0.0006,
        "Cp": 1.19,
        "Tfreeze": 5.5,
        "Tboil": 80
    },
    "ethanol": {
        "sg": 0.79,
        "mu": 0.0012,
        "Cp": 2.4,
        "Tfreeze": -114,
        "

## Index Store
We need to index our data so we can search over them. The most common way is to embed the contents of each document split and insert these embeddings into a vector database or vector store. When we want to search over our splits, we take a text search query, embed it, and perform some sort of “similarity” search to identify the stored splits with the most similar embeddings to our query embedding. The simplest similarity measure is cosine similarity — we measure the cosine of the angle between each pair of embeddings (which are high dimensional vectors)

"Quote directly from langchain doc"

In [6]:
vectorstore = Chroma.from_documents(documents=docs, embedding=OpenAIEmbeddings())

## Retrieval and Generation: Retrieve
Now we want to create a simple application that takes in question and searches the document for the documents relevant to the data 

In [7]:
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 8})
retrieved_docs = retriever.invoke("Do I have a thermodynamics property code?")

Number of requested results 20 is greater than number of elements in index 3, updating n_results = 3


Now we can ask any questions about the directory and we will see what we retrive

In [8]:
print(retrieved_docs)
print(retrieved_docs[0].page_content)

[Document(page_content='const math = require(\'mathjs\');\n/**\n * Object containing properties of various fluids.\n * Each fluid is represented as a key with an object containing its properties:\n * - sg: Specific gravity\n * - mu: Dynamic viscosity (Pa.s)\n * - Cp: Specific heat capacity (kJ/kg.K)\n * - Tfreeze: Freezing temperature (°C)\n * - Tboil: Boiling temperature (°C)\n */\nconst fluidProperties = {\n    "water": {\n        "sg": 1,\n        "mu": 0.001,\n        "Cp": 1,\n        "Tfreeze": 0,\n        "Tboil": 100\n    },\n    "gasoline": {\n        "sg": 0.72,\n        "mu": 0.00029,\n        "Cp": 2.22,\n        "Tfreeze": -50,\n        "Tboil": 150\n    },\n    "diesel": {\n        "sg": 0.8,\n        "mu": 0.0022,\n        "Cp": 2.05,\n        "Tfreeze": -60,\n        "Tboil": 300\n    },\n    "benzene": {\n        "sg": 0.88,\n        "mu": 0.0006,\n        "Cp": 1.19,\n        "Tfreeze": 5.5,\n        "Tboil": 80\n    },\n    "ethanol": {\n        "sg": 0.79,\n        

As expected, when asked about modules with unit conversion the document that was most relevant was the UnitConverter File. We can retry this process by asking a different question 

In [9]:
retrieved_docs = retriever.invoke("Do I have a code that has Fluid Properties")
print(retrieved_docs)
print(retrieved_docs[0].page_content)

Number of requested results 20 is greater than number of elements in index 3, updating n_results = 3


[Document(page_content='const math = require(\'mathjs\');\n/**\n * Object containing properties of various fluids.\n * Each fluid is represented as a key with an object containing its properties:\n * - sg: Specific gravity\n * - mu: Dynamic viscosity (Pa.s)\n * - Cp: Specific heat capacity (kJ/kg.K)\n * - Tfreeze: Freezing temperature (°C)\n * - Tboil: Boiling temperature (°C)\n */\nconst fluidProperties = {\n    "water": {\n        "sg": 1,\n        "mu": 0.001,\n        "Cp": 1,\n        "Tfreeze": 0,\n        "Tboil": 100\n    },\n    "gasoline": {\n        "sg": 0.72,\n        "mu": 0.00029,\n        "Cp": 2.22,\n        "Tfreeze": -50,\n        "Tboil": 150\n    },\n    "diesel": {\n        "sg": 0.8,\n        "mu": 0.0022,\n        "Cp": 2.05,\n        "Tfreeze": -60,\n        "Tboil": 300\n    },\n    "benzene": {\n        "sg": 0.88,\n        "mu": 0.0006,\n        "Cp": 1.19,\n        "Tfreeze": 5.5,\n        "Tboil": 80\n    },\n    "ethanol": {\n        "sg": 0.79,\n        

## Retrieval and Generation: Generate
We can not put it all together into a chain that takes in a question, and retrieves relevant documents constructs a prompt, passes that to a model, and parses the output. The following is just a simple Q&A based on our repository

In [10]:
llm = ChatOpenAI(model_name="gpt-3.5-turbo-0125", temperature=0)
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


template = """ Access the code database to find information relevant to answering the question provided. 
If the answer is unknown, please state so without speculating. 
{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)

rag_chain =  (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)


In [11]:
questions = [
    "I want to incorporate different units and their conversion into a javascript module do I have any code that I can use",
    "I want to use Fluid Dynamic Properties, do I have any code?",
    "Give me a summary of all the main modules in the database",
    "Do I have a module for thermodynamic properties such as superheated vapor if I do how can I use the module correctly to get the values needed such as pressure in a given temperature"
]

for question in questions:
    print("Quesiton: ", question, "\n")
    print("Response: ", rag_chain.invoke(question))
    print("\n--------------------------\n")

Number of requested results 20 is greater than number of elements in index 3, updating n_results = 3


Quesiton:  I want to incorporate different units and their conversion into a javascript module do I have any code that I can use 



Number of requested results 20 is greater than number of elements in index 3, updating n_results = 3


Response:  Yes, you can use the `UnitConverter` class provided in the code database to incorporate different units and their conversion into a JavaScript module. The `UnitConverter` class supports conversion between various units in categories such as length, weight, speed, time, pressure, force, and angles. You can utilize the `convert` method of the `UnitConverter` class to convert values from one unit to another within the supported categories. Additionally, the class also includes a method `generateRandomValueForUnit` to generate random values for specified units. 

You can instantiate the `UnitConverter` class and use its methods to handle unit conversions and generate random values within specified ranges. The class is designed to support a wide range of units across different measurement categories.

--------------------------

Quesiton:  I want to use Fluid Dynamic Properties, do I have any code? 



Number of requested results 20 is greater than number of elements in index 3, updating n_results = 3


Response:  Yes, you have code available to retrieve the properties of various fluids, including specific gravity, dynamic viscosity, specific heat capacity, freezing temperature, and boiling temperature. The `getFluidProperties` function allows you to access these properties for a specified fluid and adjust units if necessary.

--------------------------

Quesiton:  Give me a summary of all the main modules in the database 



Number of requested results 20 is greater than number of elements in index 3, updating n_results = 3


Response:  1. fluidProperties: Contains properties of various fluids such as specific gravity, dynamic viscosity, specific heat capacity, freezing temperature, and boiling temperature.
2. materialProperties: Contains properties of various materials including specific gravity, elastic modulus range, ultimate tensile strength range, linear expansion coefficient, Poisson's ratio, and specific heat capacity.
3. getFluidProperties: Function to retrieve properties of a specified fluid and adjust units if necessary.
4. getMaterialProperties: Function to retrieve properties of a specified material, including randomly selected elastic modulus and ultimate tensile strength within provided ranges, and adjust units if necessary.
5. UnitConverter Class: Provides functionality for converting units across various measurement systems including length, weight, speed, time, pressure, force, and angles. It also supports generating random values within specified or custom ranges for these units.

--------