# RAG with AIConfig
This notebook demonstrates using a vector database (Chroma) with AIConfig for Retrieval-Augmented Generation (RAG). For this demo, we create a collection of curriculum for different courses for the Chroma database and use ANN (approximate nearest neighbors) to find the curriculum most relevant to a student's question. Then we use AIConfig to define a prompt named get_courses. This prompt, incorporating the student's question and the relevant curriculum, is then run to identify the appropriate classes within the curriculum that address the student's question. Read more about [AIConfig for prompt and model management](https://github.com/lastmile-ai/aiconfig).




In [None]:
!pip install python-aiconfig
!pip install chromadb

In [2]:
from google.colab import userdata

import openai
import os

openai.api_key = userdata.get('openai_key')

In [3]:
# import ChromaDB and set up the Chroma client
import chromadb
chroma_client = chromadb.Client()

# Create a collection of curriculums
collection = chroma_client.create_collection(name="curriculum")

# Add documents for a math curriculum, physics curriculum, and a chemistry curriculum
collection.add(
    documents=["math", "physics", "chemistry"],
    metadatas=[
        {
            "source": """Subject: Mathematics

            Course 1: Number Sense and Operations
            - Place value
            - Number patterns and relationships
            - Addition, subtraction, multiplication, and division
            - Problem-solving techniques

            Course 2: Algebra
            - Variables and expressions
            - Equations and inequalities
            - Patterns and functions"""
        },
        {
            "source": """Subject: Physics

            Course 1: Introduction to Physics
            - Scientific method and inquiry skills
            - Measurement and scientific notation
            - Physical quantities and units

            Course 2: Motion and Forces
            - Position, distance, and displacement
            - Speed,
            - velocity, and acceleration
            - Newton's laws of motion"""
        },
        {
            "source": """Subject: Chemistry

            Course 1: Introduction to Chemistry
            - Matter and its properties
            - Atoms, elements, and compounds
            - Chemical symbols and formulas"""
        }
    ],
    ids=["id1", "id2", "id3"]
)

# Query the vector database for the closest context to the student question
context = collection.query(
    query_texts=["What is the sum of the first 103 numbers"],
    n_results=1
)

print(context)


/root/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:01<00:00, 47.6MiB/s]


{'ids': [['id1']], 'distances': [[1.5048770904541016]], 'metadatas': [[{'source': 'Subject: Mathematics\n\n            Course 1: Number Sense and Operations\n            - Place value\n            - Number patterns and relationships\n            - Addition, subtraction, multiplication, and division\n            - Problem-solving techniques\n\n            Course 2: Algebra\n            - Variables and expressions\n            - Equations and inequalities\n            - Patterns and functions'}]], 'embeddings': None, 'documents': [['math']], 'uris': None, 'data': None}


In [6]:
from aiconfig import AIConfigRuntime, Prompt, CallbackManager

# Create get_courses prompt
get_courses = Prompt(
    name="get_courses",
    input="""
        Student's question: {{student_question}},
        The middle school curriculum: {{curriculum}}
    """,
    metadata= {
        "model":{
            "name": "gpt-4",
            "settings":{
                "model": "gpt-4",
                "system_prompt":"""
                    You are a very good middle school teacher.

                    Output: You are to identify which course in the curriculum the student question will be covered
                    and concisely respond with where in the curriculum the question will be answered.
                    If the question is not covered in the curriculum, you are to answer with 'Unfortunately, we don't cover that in our curriculum'
                """
            }
        },
    }
)

# Create new AIConfig with get_courses prompt
rag_aiconfig = AIConfigRuntime.create()
rag_aiconfig.callback_manager = CallbackManager([])
rag_aiconfig.add_prompt("get_courses", get_courses)

# Add a new parameter with the retrieved context from Chroma
params = {
    "curriculum": str(context.get('metadatas')),
    "student_question": "What is the sum of the first 103 numbers?"
}

# Execute the new prompt
completion = await rag_aiconfig.run("get_courses", params)
course_result = rag_aiconfig.get_output_text("get_courses")

print(course_result)

This question will be covered in Course 1: Number Sense and Operations, particularly in the topic of 'Problem-solving techniques'.


In [7]:
# Save the aiconfig to disk. and serialize outputs from the model run
rag_aiconfig.save('rag_aiconfig.json', include_outputs=True)

You should now see the AIConfig in 'Files'. You can also upload the config to AI Workbooks to easily edit the prompts and model parameters.