# RAG Pipeline

The AI RAG Pipeline is a powerful tool for building and deploying AI models. It provides a streamlined workflow for data preprocessing, model training, and model deployment, making it easier to develop and deploy AI solutions.

- In this work, a RAG Pipeline will be built to make interactions with pdf files.
- For model in this work OpenAI GPT 3.5 and a local LLAMA3:8B model will be used.


In [1]:
# Imports
import os
from dotenv import load_dotenv
from operator import itemgetter

from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import ChatPromptTemplate

## Setting Up the OpenAI Model

To set up the OpenAI model, you need to instantiate the `ChatOpenAI` class with the OpenAI API key and the desired model name.

In [3]:
# Load environment variables
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [21]:
# Setup the model
openai_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model_name="gpt-3.5-turbo-0125")

# Setup the embeddings
openai_embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model="text-embedding-ada-002")

# Test the model
openai_model.invoke("Hey are you there?")

AIMessage(content="Yes, I'm here. How can I help you?", response_metadata={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 12, 'total_tokens': 24}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-859d139c-98eb-4841-9c4c-ae88e5fb7607-0', usage_metadata={'input_tokens': 12, 'output_tokens': 12, 'total_tokens': 24})

## Setting up the Local LLAMA Model

To set up the LLAMA model, you need to instantiate the `Ollama` class with the desired model name.

In [5]:
# Setup the model
llama_model = Ollama(model="llama2:7b")

# Setup the embeddings
llama_embeddings = OllamaEmbeddings(model="llama2:7b")

# Test the model
llama_model.invoke("Hey are you there?")

"<h1>Hey! I'm here and ready to help! What's up?</h1>"

## Parser

Direct response is not usable for interactions.Content needs to be extracted from the AIMesagge. Langchain parser can help to achieve that

In [6]:
parser = StrOutputParser()

## Prompt Template

A langchain "prompt template" is a predefined structure or format that provides guidance for constructing prompts in langchain workflows. It helps users organize and structure their input to generate desired outputs.

In [7]:
prompt_template = """
Give a brief answer to the question below based on the given context.
Answer will be just one sentence.
Answer of the question will just include the information asked in the question not the whole context.
If you don't know the answer or don't have any information for the question, answer with "I don't know".

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(prompt_template)
prompt.format(
    context="Rahman has reached diamond rank in League of Legends in 2015.",
    question="When did Rahman reach diamond rank in League of Legends?"
    )

'Human: \nGive a brief answer to the question below based on the given context.\nAnswer will be just one sentence.\nAnswer of the question will just include the information asked in the question not the whole context.\nIf you don\'t know the answer or don\'t have any information for the question, answer with "I don\'t know".\n\nContext: Rahman has reached diamond rank in League of Legends in 2015.\n\nQuestion: When did Rahman reach diamond rank in League of Legends?\n'

## Simple Testing

- Let's build a simple test with prompt, model and parser.
- A chain can be build for this task.
- Models will generate the response according to the given prompt and the response will be parsed to just get the string output.

In [8]:
# Chain the prompt with the openai model and parser
chain = prompt | openai_model | parser

# Invoke the chain
chain.invoke({
    "context": "Rahman reached diamond rank in League of Legends in 2015.",
    "question": "When did Rahman reach diamond rank in League of Legends?"
})

'Rahman reached diamond rank in League of Legends in 2015.'

In [9]:
# Chain the prompt with the llama2 model and parser
chain = prompt | llama_model | parser

# Invoke the chain
chain.invoke({
    "context": "Rahman reached diamond rank in League of Legends in 2015.",
    "question": "When did Rahman reach diamond rank in League of Legends?"
})

'Rahman reached diamond rank in League of Legends in 2015.'

They're both working properly. But response generation from local model takes higher amount of time compared to the openai model. That's because response generation is happenning on PC's hardware and it's depends on various reasons:

- GPU availability / capacity
- CPU capacity
- Input length
- Output length
- Model size

So, if we want to create responses locally & fast we need to consider these features.

## Simple RAG Pipeline

We're now ready for simple rag pipeline. I will use my own conference paper answer-response. To do that the following steps needs to be taken:

- Importing the pdf
- Creating embeddings & vector database
- Adding input data to the chain
- Triggering the chain and asking questions related to input file

In [11]:
# Loading pdf
loader = PyPDFLoader("docs/20230923_ELECO_Data-Driven-PHEV-Model.pdf")
pages = loader.load_and_split()
pages

[Document(page_content="Data -Driven  Model ling of a Plug -In Hybrid Electric Vehicle  \n \nAdnan Furkan Y ildiz1, Rahman Sahinler1, Cansu Ozturk1 \n \n1AVL Research and Engineering  Turkey , Istanbul, Turkey  \nfurkan. yildiz@avl.com , rahman .sahinler@avl.com , cansu.ozturk@a vl.com  \n \nAbstract  \n  \nGrowing  need for environmentally friendly  transportation \nhas led to the rise of hybrid electric vehicles (HEVs).  Plug -in \nhybrid electric vehicles (PHEVs ), whic h are a sub -class of \nHEVs , provide lower fuel consumption and exhaust gas \nemissions thanks to their e xternal charging and higher range \nof electric drive capabilities . Therefore,  they are  increasingly \npopular in the automotive i ndustry  nowada ys. To make a  \nrapid assessment of th e cont rol or the ene rgy management \nstrate gies applied to a PHEV , the generation of a  vehicle \nmodel has a crucial  role.  This paper propose s a rapid -\nresponsive, data -driven, kinematic, and map -based PHEV  \nmo

In [26]:
# Vector databases from pdf
vector_db_openai = DocArrayInMemorySearch.from_documents(pages, embedding=openai_embeddings)
vector_db_llama = DocArrayInMemorySearch.from_documents(pages, embedding=llama_embeddings)

In [27]:
# Retrieve from the openai vector database based on relation
retriever = vector_db_openai.as_retriever()
retriever.invoke("Python")

[Document(page_content='Fig. 1. Block diagra m of the vehicle model \nOur proposed  data-driven  vehicle model  aims to create \nrepresentativeness of  a PHEV in various real-world driving \nconditions.  To that attempt , we used a kine matic  approach \nwhere th e input v ariables are the speed of the vehicle  in km/h  \nand the grad e angle of the road  in % . Unlike most of the studies \nmentioned above, our vehicle model was developed within a \nPython environment using object -oriented program ming. W e \nchose to use the Python environment for two main reasons: its \nobject -oriented capabilities provide a rapid response, a nd it \noffers powerful frameworks for machine learning projects that \nwill be the basis of our future work. In particular, our model ca n \nevaluate 2000 seconds of veh icle data in just 3 -5 seconds. This \nefficiency surpasses many models in the field, including rule-\nbased approaches as given by [2].  \nReal-world maneuvers with urban, rural, and highway

In [28]:
# Retrieve from the openai vector database based on relation
retriever = vector_db_llama.as_retriever()
retriever.invoke("Python")

[Document(page_content='addition, t he wheel speed  was s imply calculated from the \nvehicl e speed and tire diameter . While SoC at the previous  step \nand vehicle speed as well as  acceleration va lues at the curren t \nstep are used in the energy managem ent strategy , the w heel \nspeed is used in the gear selection algorithm  to obtain the speed  \nof ICE and EM.  Then, the power di stribution s of ICE and  EM \nare obtained  using the generated maps from the data . Finally, \nwhile  ICE power and speed are  used for fuel consumption \ncalculation  using a fuel consumption map , the EM power and \nspeed are used for the calculation of  SoC at the next  step using a \nbattery current map and co ulomb  count ing approach . \nTraction power is calculated based  on the equation o f motion \nsuch a s given in ( 1), where  V is vehicle speed in km/h,  F0, F1, F2 \nare road load coefficients in N, N/(km /h), N/(km/h)2, \nrespectivel y, m is vehicle mass in kg , α is road  gradi ent in 

In [31]:
# Let's create a chain to retrieve the document based on the query
context_question_dict = {
    "context": itemgetter("question") | retriever,
    "question": itemgetter("question")
}

openai_chain = context_question_dict | prompt | openai_model | parser
llama_chain = context_question_dict | prompt | llama_model | parser

In [32]:
# Questions
questions = [
    "Who are the authors of the paper?",
    "What is the main contribution of the paper?",
    "What is the main cotribution of Python?",
    "What is the modeling strategy used in the paper?",
    "What is the future work?"
]

In [33]:
# Answering the questions with openai model
for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {openai_chain.invoke({'question': question})}")

Question: Who are the authors of the paper?
Answer: H. Lee and S. W. Cha.
Question: What is the main contribution of the paper?
Answer: The main contribution of the paper is the creation of a fast-responding vehicle model within the Python environment that can be used in future machine-learning applications.
Question: What is the main cotribution of Python?
Answer: The main contribution of Python is generating a fast-responding vehicle model in a Python environment successfully developed.
Question: What is the modeling strategy used in the paper?
Answer: The modeling strategy used in the paper is a data-oriented vehicle model approach.
Question: What is the future work?
Answer: The future work involves using powerful frameworks for machine learning projects to enhance the efficiency of the model.


In [34]:
# Answering the questions with llama model
for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {llama_chain.invoke({'question': question})}")

Question: Who are the authors of the paper?
Answer: The authors of the paper are:

1. S. Onori
2. L. Serrao
3. G. Rizzoni
Question: What is the main contribution of the paper?
Answer: The main contribution of the paper is the development of a data-driven vehicle model for plug-in hybrid electric vehicles (PHEVs) using real-world driving data. The proposed model is designed to be fast-responding and can represent the state of charge (SoC) and fuel consumption behaviors in various driving conditions, including city driving (CD), suburban driving (SD), and highway driving (HD). Unlike traditional modeling approaches that rely on standard driving cycles, the authors' approach uses real-world data to create a map-based vehicle model that can adapt easily for different hybrid vehicles with various powertrain architectures when wide-ranging operational data of the vehicle is available. The paper also presents a novel energy management strategy and shows the modeling results in section 5, demo

## Results

Time: Required time was really high to generate one response from the local model.
Answer accuracy: Answers was considerably accurate. Both models understand the concept of the works but extracting direct information was inaccurate. (At least they should give my name within the writers :D)