In [1]:
import os
import openai

openai.api_key = os.environ["OPENAI_API_KEY"]

In [None]:
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader(
    input_files=['file']
).load_data()

In [None]:
'''print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])'''

## Basic RAG Pipeline

### Pipeline includes three components: Ingestion -> Retrieval -> Synthesis

##### **Ingestion**: Documents are divided into chunks -> chuncks are embedded using embedding model -> the embeddings are stored in a Vector Store Index
##### **Retrieval**: User's query is matched with the embeddings of chunks in index -> Top K chunks are taken out that match the embeddings of the query
##### **Synthesis**: K chunks taken in the previous component are combined with the user's query -> Combined text (embeddings) is sent to the LLM for its response

In [4]:
# Merging the elements in the documents above
from llama_index import Document

document = Document(text='\n\n'.join([doc.text for doc in documents]))

### Ingestion

In [5]:
# Vector Store Index is used to store the chunks (from documents), text (chunk's text), and their corresponding embeddings
# Service Context is used to change the default llm and embedding model used by llama index (it used OpenAI's llm and embedding model by default)
from llama_index import VectorStoreIndex, ServiceContext
from llama_index.llms import OpenAI

llm = OpenAI(model='gpt-3.5-turbo',temperature=0.1)
service_context = ServiceContext.from_defaults(llm=llm, embed_model='local:BAAI/bge-small-en-v1.5')

index = VectorStoreIndex([document], service_context=service_context)

Initializing query engine from the above index that allows us to send user's query to the **synthesis** component

In [6]:
query_engine = index.as_query_engine()

In [7]:
response = query_engine.query('How is the project deployed')
print(str(response))

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


The project is deployed on AWS EC2 through AWS ECR (Dockerized Container).


### Evaluation setup using TruLens

##### TruLens is a software tool that helps you to objectively measure the quality and effectiveness of your LLM-based applications using feedback functions.

In [8]:
eval_questions = []
with open('eval_questions.txt','r') as file:
    for line in file:
        # Removing new line character and converting it to integer
        item = line.strip()
        print(item)
        eval_questions.append(item)

What are the technologies used in the project?
What is the installation process?
Can you talk about the dataset used in the project?
Can you explain the usage of the project?
What is the process of deployment used in the project?
What are the policies that needs to be attached to the IAM User?
What are the packages that needs to be installed on EC2 isntance?
What are the keys that needs to be initialized?
Who is the author of the project?
What license does the project use?


In [11]:
from trulens_eval import Tru, TruLlama
tru = Tru()

tru.reset_database()

In [12]:
tru_recorder = TruLlama(query_engine, app_id='Direct Query Engine')

In [13]:
with tru_recorder as recording:
    for question in eval_questions:
        response = query_engine.query(question)

In [14]:
records, feedback = tru.get_records_and_feedback(app_ids=[])

In [15]:
records.head()

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,latency,total_tokens,total_cost
0,Direct Query Engine,"{""app_id"": ""Direct Query Engine"", ""tags"": ""-"",...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_a8d5d37c03de07b74e4ac073f683fb9a,"""What are the technologies used in the project?""","""The technologies used in the project include:...",-,"{""record_id"": ""record_hash_a8d5d37c03de07b74e4...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-11-30T01:01:22.535105"", ""...",2023-11-30T01:01:27.735625,5,1045,0.001582
1,Direct Query Engine,"{""app_id"": ""Direct Query Engine"", ""tags"": ""-"",...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_b33498f62e6e5884ea63b21c69461f0e,"""What is the installation process?""","""To install the required packages for the Medi...",-,"{""record_id"": ""record_hash_b33498f62e6e5884ea6...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-11-30T01:01:27.787209"", ""...",2023-11-30T01:01:37.866469,10,1113,0.00172
2,Direct Query Engine,"{""app_id"": ""Direct Query Engine"", ""tags"": ""-"",...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_211d0a15b287212c3ec5df33edb93270,"""Can you talk about the dataset used in the pr...","""The dataset used in the project consists of a...",-,"{""record_id"": ""record_hash_211d0a15b287212c3ec...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-11-30T01:01:37.908117"", ""...",2023-11-30T01:01:44.118351,6,1073,0.001637
3,Direct Query Engine,"{""app_id"": ""Direct Query Engine"", ""tags"": ""-"",...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_8e7ecf8cf73bcbc6cc529dfe0c8a323b,"""Can you explain the usage of the project?""","""The usage of the project involves several ste...",-,"{""record_id"": ""record_hash_8e7ecf8cf73bcbc6cc5...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-11-30T01:01:44.169589"", ""...",2023-11-30T01:02:01.555708,17,1173,0.001838
4,Direct Query Engine,"{""app_id"": ""Direct Query Engine"", ""tags"": ""-"",...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_65808ba1c89a5102d1009e33727191b9,"""What is the process of deployment used in the...","""The process of deployment used in the project...",-,"{""record_id"": ""record_hash_65808ba1c89a5102d10...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-11-30T01:02:01.602333"", ""...",2023-11-30T01:02:22.927264,21,1193,0.001877


In [16]:
# launches on http://localhost:8501/
tru.run_dashboard()

Starting dashboard ...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://192.168.4.34:8501 .


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>