# Question-answering

Now that we have our vector database created and all our embeddings for caption text had been stored properly. We can now search for relevant caption text using student's question.

Reference: https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb

In [1]:
__import__('pysqlite3')
import sys
sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [2]:
import os
import openai
import chromadb
import tiktoken
import time
from chromadb.config import Settings
from chromadb.utils import embedding_functions
from termcolor import colored

**Because the embeddings for our caption text are created by OpenAI, naturally, we should choose to use OpenAI's embedding function to create embedding for student's question as well.**

**Strictly speaking, we have to use OpenAI's embedding function to ensure the output dimension is aligned with our caption text embeddings (e.g., 1536).**

The following codes connect to a chroma db we created earlier and get a reference to our `cs50-lectures-2022` collection.

In [3]:
GPT_MODEL = "gpt-4"
openai.api_key = os.getenv("OPENAI_API_KEY")

# use openai embedding function
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key=os.getenv("OPENAI_API_KEY"),
    model_name="text-embedding-ada-002"
)

# setup chroma db client
client = chromadb.PersistentClient(path="../vector_db")

collection = client.get_collection("cs50_lectures_2022", embedding_function=openai_ef)

Some helper functions:

In [4]:
def num_tokens(text: str, model: str = GPT_MODEL) -> int:
    """Return the number of tokens in a string."""
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))


def print_cost(message):
    print(f"Total tokens used: {num_tokens(message, GPT_MODEL)}, cost: {round(num_tokens(message, GPT_MODEL) * 0.002 / 1000, 6)}")

## Construct our Prompt

With embeddings, we can now construct our prompt with caption texts that are matched with student's question.

In [5]:
def generate_query_message(query, collection, token_budget=4096 - 500, print_result=False):

    # query vector database and return the top N closest matchest
    # https://docs.trychroma.com/usage-guide#querying-a-collection
    results = collection.query(
        query_texts=[query.strip()],
        n_results=2,
    )

    # an introductory text for our prompt
    introduction = """Use the transcripts below from the CS50 lectures \
taught by David Malan as useful resources to answer questions. Make \
sure your answer is accurate. Don't offer solution to the question, \
only offer helpful hints. Mention in which week the concept is taught \
and provide relevant time codes if necessary."""
    
    # build our message string progressively
    message = introduction

    # prompt tuning for the best results
    additional_instructions = """ \
You should not answer questions that are not related to the course material. \
When answering questions, use correct terminology, grammar, and punctuation. \
If you need to address the student in your answers, always use second person. \
Remember that students may have incorrect assumptions about the course material. \
If you are unsure about the answer, you can say that you do not know."""
    message += additional_instructions

    # our student's question, we will include it at the end to our prompt
    question = f"\n\nQuestion: {query}"

    # print result for debugging
    if print_result:
        print(colored("Embeddings search result:", "red"))

    # we can also present user with YouTube playback URls
    # for the relevant caption text, so they can easily
    # jump to a lecture video and rewatch key concepts
    references = []
    
    for index, distance in enumerate(results["distances"][0]):

        # our caption text
        caption_text = results["documents"][0][index].strip()

        # some useful metadata we can also include in the prompt
        week_number = results["metadatas"][0][index]["week"]
        youtude_id = results["metadatas"][0][index]["youtube_id"]
        start_time = results["metadatas"][0][index]["start"]

        # generate a playback URL for users (not added to the prompt)
        playback_url = f"https://www.youtube.com/watch?v={youtude_id}&t={start_time}s"
        references.append(playback_url)

        # convert secods to hh:mm:ss format to improve readability
        m, s = divmod(start_time, 60)
        h, m = divmod(m, 60)
        timecode = f"{h:02d}:{m:02d}:{s:02d}"

        if print_result:
            print(colored(f"document {index+1}:", "red"))
            print(colored(f"distance score: {distance}", "red"))
            print(colored(results["documents"][0][index], "red"))
            print(colored(results["metadatas"][0][index], "red"))
            print(colored("=" * 10, "red"))

        next_transcript = f'\n\nMentioned at: {timecode} in {week_number}:\n"""\n{caption_text}\n"""\n'
        running_token_count = num_tokens(message + next_transcript + question, GPT_MODEL)

        # ensure our prompt doesn't exceed our budget as well as model token limit
        if (running_token_count> token_budget):
            break
        else:
            message += next_transcript

    # append student's question at the end
    message += question

    return message, references

## Ask

Supply GPT's chat completion endpoint with our prompt and obtain a response.

In [6]:
def ask(query_message, print_message=False):

    if print_message:
        print(colored("=== Begin prompt ===", "blue"))
        print(colored(query_message, "green"))
        print(colored("=== End prompt ===", "blue"))

    messages = [
        {"role": "system", "content": "You are a computer science professor."},
        {"role": "user", "content": query_message},
    ]
    response = openai.chat.completions.create(
        model=GPT_MODEL,
        messages=messages,
        temperature=0
    )
    response_message = response.choices[0].message.content
    return response_message

## Check for Academic Honesty

Use a second API call to ask GPT to revise its own response, in line with an abridged version of the course's academic honesty policy.

In [7]:
def check_academic_honesty(bot_response, print_message=False):
    instructions = """You are reviewing content that may or may not violate the course's academic honesty policy. \
Remove any parts of the content that mention: sharing/posting code, looking for solutions online, or asking classmates for help. \
Do not change any other parts of the content. Once you are done reviewing the content, reply only with the content."""

    prompt = instructions + f"\n\nContent: {bot_response}"
    
    if print_message:
        print(colored("=== Begin check prompt ===", "magenta"))
        print(colored(prompt, "cyan"))
        print(colored("=== End check prompt ===", "magenta"))

    messages = [
        {"role": "system", "content": "You are reviewing content to ensure compliance with the course's academic honesty policy."},
        {"role": "user", "content": prompt},
    ]
    response = openai.chat.completions.create(
        model=GPT_MODEL,
        messages=messages,
        temperature=0
    )
    response_message = response.choices[0].message.content
    return f"==> 💾 GPT: {response_message}\n"

## Putting It All Together

The following describes a single question-answering session:

1. The student asks a question.
2. We perform an embeddings search using the student's question and obtain relevant information, such as caption texts.
3. We construct a prompt with the desired instructions, including the relevant caption texts.
4. We query GPT's chat completion endpoint.
5. We ask GPT to revise its response, based on academic honesty.
6. We render GPT's response to the user.

In [8]:
try:
    while True:
        query = input("==> 🧑‍🎓 Student: ")
    
        start_time = time.time()
        query_message, references = generate_query_message(query, collection, print_result=True)
        print(f"Search time took {round(time.time() - start_time, 2)} seconds.")
    
        start_time = time.time()
        response_message = ask(query_message, print_message=True)

        edited_response_message = check_academic_honesty(response_message, print_message=True)
        edited_response_message += "\nHere are the relevant lecture videos:\n" + "\n".join(references)
    
        print(edited_response_message)
        print(f"Response time took {round(time.time() - start_time, 2)} seconds.")
except KeyboardInterrupt:
    print("stopped")
    pass

[31mEmbeddings search result:[0m
[31mdocument 1:[0m
[31mdistance score: 0.34150901436805725[0m
[31msome of our own logic. We saw it with SQL, we're going to now see it with HTML, CSS, and even JavaScript if we want. And we're also going to see another language today, not a programming language, called Jinja. And this is going to be a common paradigm in the real world, whereby different languages, different libraries, different frameworks often borrow from each other, or they use technologies that someone else wrote just so they don't have to reinvent that wheel. So Flask is just a framework. That is a third party library, it's pretty popular nowadays, it's relatively simple, which is why we use it in CS50.[0m
[31m{'end': 483, 'start': 453, 'week': 'Week 9', 'youtube_id': 'oVA0fD13NGI'}[0m
[31mdocument 2:[0m
[31mdistance score: 0.3606009781360626[0m
[31mMaybe if you've worked on the homepage problem-- you've been doing a lot of copying and pasting, or you weren't able to 

BadRequestError: Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}