## Operating a RAG pipeline using OpenAI GPT-4 API

In this notebook, we set up the following RAG architecture with the NY Times comment set using a local ChromaDB vector database to handle the vector embeddings of the comment set, and using GPT-4 as our Large Language Model (LLM). GPT-4 has about 1.7 trillion parameters and occupies over 45GB of disk space.  Running GPT-r requires extremely powerful hardware that would be impossible to host locally.  GPT-4 is not an open model and therefore can only be accessed via an API provided by OpenAI at a cost.  

![Simple RAG Architecture](../rag_architecture.jpg) 

Note that to use this workflow you will need to have access to an OpenAI API endpoint via a token.  See the [OpenAI website](https://openai.com) for more information.  This workflow also assumes that you have already created the Chroma vector database (see the `chromadb_prep` folder for instructions).

In [1]:
# packages
from dotenv import load_dotenv
import pandas as pd
import chromadb
import openai
import os

# load env variables
load_dotenv()
openai_base_url = os.getenv('OPENAI_BASE_URL')
openai_api_key = os.getenv("LLM_TOKEN")

# chromadb location
CHROMA_DATA_PATH = "../chroma_data/"
collection_db="article_comments"

# Initialize an OpenAI client
client = openai.OpenAI(api_key=openai_api_key, base_url=openai_base_url)

# initialize a chroma client
chroma_client = chromadb.PersistentClient(path=CHROMA_DATA_PATH)

# helper function for prompt construction
def construct_prompt(docs: dict, question: str) -> str:
    # convert the docs into a numbered list of comments
    results_df = pd.DataFrame(docs['documents']).transpose()
    results_df.columns = ['Comment']
    results_df['ComNum'] = [str(i) for i in range(1, len(results_df) + 1)]
    results_df['Numbered Comments'] = results_df['ComNum'] + '. ' + results_df['Comment']

    # Collect the results in a context
    context = "\n".join([r for r in results_df['Numbered Comments']])

    # construct prompt
    prompt = f"""
        Answer the following question: {question}.  
        Refer only to the following numbered list of comments from NY Times readers when answering: {context}.
        Check each numbered comment very carefully and ignore it if it does not contain language that is a close match to the original question.
        Provide as much information as possible in the summary, subject to the conditions already given.
        Begin your answer with 'Based on the responses from selected NY Times readers', and try to give a sense of majority and minority opinions on the topic, but only if there is an identifiable majority opinion.
        If there is not enough information provided to give a summarized opinion, indicate that this is the case.
        """

    return prompt

In [2]:
# function to execute RAG pipeline using GPT-4
def ask_question_openai(question:str, client = client, 
                        collection: chromadb.PersistentClient() = collection_db, 
                        n_docs:int = 30, filters: dict ={}) -> str:
    
    # Find close documents in chromadb
    collection = chroma_client.get_collection(collection)
    results = collection.query(
       query_texts=[question],
       n_results=n_docs,
       where=filters
    )
    
    prompt = construct_prompt(results, question)
    
    # send prompt to GPT-4
    chat_completion = client.chat.completions.create(
        messages=[{
                "role": "user",
                "content": prompt,
        }],
        model="gpt-4",
    )
    
    # display response
    print(chat_completion.choices[0].message.content)

In [4]:
# test function
ask_question_openai("What do readers think about US foreign policy towards North Korea?", n_docs = 50)

BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 9316 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}