# Lesson 1: Advanced RAG Pipeline

In [1]:
import utils
# Import the utils module, which likely contains utility functions used in the script.

import os
import openai
# Import the os module for interacting with the operating system and the openai module for OpenAI API interactions.

openai.api_key = utils.get_openai_api_key()
# Set the OpenAI API key using a function from the utils module to retrieve the API key.


✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input response will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .


In [2]:
from llama_index import SimpleDirectoryReader
# Import the SimpleDirectoryReader class from the llama_index module, 
# which is used to read and process documents.

documents = SimpleDirectoryReader(
    input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()
# Initialize the SimpleDirectoryReader with a list of input files (in this case, a PDF).
# Call the load_data() method to load the data from the specified PDF file into the documents variable.


In [3]:
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])

<class 'list'> 

41 

<class 'llama_index.schema.Document'>
Doc ID: 3a75e44c-6cac-4f7d-9bdf-9254a4069580
Text: PAGE 1Founder, DeepLearning.AICollected Insights from Andrew Ng
How to  Build Your Career in AIA Simple Guide


## Basic RAG pipeline

In [4]:
from llama_index import Document
# Import the Document class from the llama_index module.

document = Document(text="\n\n".join([doc.text for doc in documents]))
# Create a new Document instance by joining the text of all loaded documents.
# The list comprehension extracts the text from each doc in the documents list.
# The texts are joined with double newline characters ("\n\n") to separate them in the final document.


In [5]:
from llama_index import VectorStoreIndex
# Import the VectorStoreIndex class from the llama_index module, which is used for creating a vector-based index.

from llama_index import ServiceContext
# Import the ServiceContext class from the llama_index module, which is used to configure the context in which services (like LLMs and embedding models) operate.

from llama_index.llms import OpenAI
# Import the OpenAI class from the llama_index.llms submodule, used to interact with OpenAI's language models.

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
# Create an instance of the OpenAI class, specifying the model to use (gpt-3.5-turbo) and setting the temperature, 
# which controls the randomness of the model's output.

service_context = ServiceContext.from_defaults(
    llm=llm, embed_model="local:BAAI/bge-small-en-v1.5"
)
# Create a ServiceContext instance with default settings, overriding the default LLM with the specified OpenAI model, 
# and specifying a local embedding model ("BAAI/bge-small-en-v1.5").

index = VectorStoreIndex.from_documents([document], service_context=service_context)
# Create a VectorStoreIndex from the provided document, using the specified service context, 
# which includes the OpenAI model and the local embedding model.


config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [6]:
query_engine = index.as_query_engine()
# Convert the VectorStoreIndex into a QueryEngine.
# This allows you to perform natural language queries against the indexed documents.


In [7]:
response = query_engine.query(
    "What are steps to take when finding projects to build your experience?"
)
# Query the indexed documents for advice on finding projects to build experience.
# The query is executed by the query engine, which searches the indexed content for relevant information.

print(str(response))
# Convert the response object to a string and print it.
# This typically outputs the text response generated by the query engine based on the indexed document(s).


Develop a side hustle, ensure the project will help you grow technically, collaborate with good teammates, and consider if the project can be a stepping stone to larger projects.


## Evaluation setup using TruLens

In [8]:
eval_questions = []
# Initialize an empty list to store evaluation questions.

with open('eval_questions.txt', 'r') as file:
    # Open the file 'eval_questions.txt' in read mode.
    
    for line in file:
        # Iterate through each line in the file.
        
        item = line.strip()
        # Remove any leading/trailing whitespace, including the newline character.
        
        print(item)
        # Print the cleaned item to the console.
        
        eval_questions.append(item)
        # Append the cleaned item to the 'eval_questions' list.


What are the keys to building a career in AI?
How can teamwork contribute to success in AI?
What is the importance of networking in AI?
What are some good habits to develop for a successful career?
How can altruism be beneficial in building a career?
What is imposter syndrome and how does it relate to AI?
Who are some accomplished individuals who have experienced imposter syndrome?
What is the first step to becoming good at AI?
What are some common challenges in AI?
Is it normal to find parts of AI challenging?


In [9]:
# You can try your own question:
new_question = "What is the right AI job for me?"
eval_questions.append(new_question)

In [10]:
print(eval_questions)

['What are the keys to building a career in AI?', 'How can teamwork contribute to success in AI?', 'What is the importance of networking in AI?', 'What are some good habits to develop for a successful career?', 'How can altruism be beneficial in building a career?', 'What is imposter syndrome and how does it relate to AI?', 'Who are some accomplished individuals who have experienced imposter syndrome?', 'What is the first step to becoming good at AI?', 'What are some common challenges in AI?', 'Is it normal to find parts of AI challenging?', 'What is the right AI job for me?']


In [11]:
from trulens_eval import Tru
# Import the Tru class from the trulens_eval module, which is used for managing evaluation processes.

tru = Tru()
# Create an instance of the Tru class. This instance will be used to interact with the evaluation framework.

tru.reset_database()
# Reset the evaluation database, likely clearing any existing evaluation data. 
# This is useful when you want to start with a clean slate for a new set of evaluations.


🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `Tru` to prevent this.


For the classroom, we've written some of the code in helper functions inside a utils.py file.  
- You can view the utils.py file in the file directory by clicking on the "Jupyter" logo at the top of the notebook.
- In later lessons, you'll get to work directly with the code that's currently wrapped inside these helper functions, to give you more options to customize your RAG pipeline.

In [12]:
from utils import get_prebuilt_trulens_recorder
# Import the get_prebuilt_trulens_recorder function from the utils module. 
# This function likely returns a preconfigured recorder for tracking or recording evaluation metrics.

tru_recorder = get_prebuilt_trulens_recorder(query_engine,
                                             app_id="Direct Query Engine")
# Call the get_prebuilt_trulens_recorder function with the query_engine and an app_id as arguments.
# The app_id "Direct Query Engine" is used to identify this specific recording session.
# The resulting tru_recorder object will be used to track or record evaluations related to the query_engine.


In [13]:
with tru_recorder as recording:
    for question in eval_questions:
        response = query_engine.query(question)

In [14]:
records, feedback = tru.get_records_and_feedback(app_ids=[])

In [15]:
records.head()

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,Answer Relevance,Context Relevance,Groundedness,Answer Relevance_calls,Context Relevance_calls,Groundedness_calls,latency,total_tokens,total_cost
0,Direct Query Engine,"{""app_id"": ""Direct Query Engine"", ""tags"": ""-"",...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_189c7b9be368144164083508bb1cbce5,"""What are the keys to building a career in AI?""","""Learning foundational technical skills, worki...",-,"{""record_id"": ""record_hash_189c7b9be3681441640...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-08-12T06:15:47.667747"", ""...",2024-08-12T06:15:48.917157,1.0,1.0,1.0,[{'args': {'prompt': 'What are the keys to bui...,[{'args': {'prompt': 'What are the keys to bui...,"[{'args': {'source': 'PAGE 1Founder, DeepLearn...",1,2066,0.003123
1,Direct Query Engine,"{""app_id"": ""Direct Query Engine"", ""tags"": ""-"",...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_c96da9ba408bd9a1ad8fb3bf4eb8adc4,"""How can teamwork contribute to success in AI?""","""Teamwork can contribute to success in AI by a...",-,"{""record_id"": ""record_hash_c96da9ba408bd9a1ad8...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-08-12T06:15:49.043014"", ""...",2024-08-12T06:15:51.418627,1.0,0.5,1.0,[{'args': {'prompt': 'How can teamwork contrib...,[{'args': {'prompt': 'How can teamwork contrib...,[{'args': {'source': 'Hopefully the previous c...,2,1698,0.002583
2,Direct Query Engine,"{""app_id"": ""Direct Query Engine"", ""tags"": ""-"",...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_31df7eb1500504cae15da448623f4596,"""What is the importance of networking in AI?""","""Networking is crucial in AI as it helps indiv...",-,"{""record_id"": ""record_hash_31df7eb1500504cae15...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-08-12T06:15:51.538566"", ""...",2024-08-12T06:15:52.787601,1.0,0.55,0.666667,[{'args': {'prompt': 'What is the importance o...,[{'args': {'prompt': 'What is the importance o...,[{'args': {'source': 'Hopefully the previous c...,1,1692,0.002572
3,Direct Query Engine,"{""app_id"": ""Direct Query Engine"", ""tags"": ""-"",...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_1c5b2f51874c6ed5deb525a0c094c52c,"""What are some good habits to develop for a su...","""Developing good habits in areas such as eatin...",-,"{""record_id"": ""record_hash_1c5b2f51874c6ed5deb...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-08-12T06:15:52.908594"", ""...",2024-08-12T06:15:53.838583,1.0,0.95,1.0,[{'args': {'prompt': 'What are some good habit...,[{'args': {'prompt': 'What are some good habit...,[{'args': {'source': 'Hopefully the previous c...,0,1631,0.002465
4,Direct Query Engine,"{""app_id"": ""Direct Query Engine"", ""tags"": ""-"",...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_4fec7eef8ceb2e2d9a2940a182a1b754,"""How can altruism be beneficial in building a ...","""Helping others during your career journey can...",-,"{""record_id"": ""record_hash_4fec7eef8ceb2e2d9a2...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-08-12T06:15:53.963949"", ""...",2024-08-12T06:15:54.723846,1.0,0.55,0.875,[{'args': {'prompt': 'How can altruism be bene...,[{'args': {'prompt': 'How can altruism be bene...,[{'args': {'source': 'Hopefully the previous c...,0,1609,0.002421


In [17]:
# launches on http://localhost:8501/
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.
Dashboard already running at path: https://s172-30-153-57p38560.lab-aws-production.deeplearning.ai/


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

## Advanced RAG pipeline

### 1. Sentence Window retrieval

In [18]:
from llama_index.llms import OpenAI

# Initialize the OpenAI language model with specific parameters
llm = OpenAI(
    model="gpt-3.5-turbo",  # Specifies the GPT-3.5-turbo model to use
    temperature=0.1        # Sets the temperature to 0.1 for more deterministic and focused responses
)


In [19]:
from utils import build_sentence_window_index

# Build a sentence window index for a given document
sentence_index = build_sentence_window_index(
    document,  # The document to index
    llm,       # The language model used for processing
    embed_model="local:BAAI/bge-small-en-v1.5",  # Embedding model for sentence representations
    save_dir="sentence_index"  # Directory to save the generated index
)


In [20]:
from utils import get_sentence_window_query_engine

# Create a query engine based on the sentence window index
sentence_window_engine = get_sentence_window_query_engine(sentence_index)  # Pass the previously built sentence index


config.json:   0%|          | 0.00/799 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

In [21]:
window_response = sentence_window_engine.query(
    "how do I get started on a personal project in AI?"
)
print(str(window_response))

You can begin a personal project in AI by following the steps outlined in the chapters provided. Start by identifying and scoping AI projects that align with your career goals. Ensure that the projects you choose are responsible, ethical, and beneficial to people. As you progress, aim to work on projects that increase in scope, complexity, and impact over time. Building a portfolio of projects that demonstrate skill progression is also essential. Finally, consider using a simple framework for starting your AI job search to further develop your expertise in the field.


In [22]:
tru.reset_database()

tru_recorder_sentence_window = get_prebuilt_trulens_recorder(
    sentence_window_engine,
    app_id = "Sentence Window Query Engine"
)

In [23]:
for question in eval_questions:
    with tru_recorder_sentence_window as recording:
        response = sentence_window_engine.query(question)
        print(question)
        print(str(response))

What are the keys to building a career in AI?
Learning foundational technical skills, working on projects, finding a job, and being part of a supportive community are the keys to building a career in AI.
How can teamwork contribute to success in AI?
Teammates play a crucial role in the success of AI projects. Working collaboratively with colleagues who are dedicated, continuously learning, and focused on using AI to benefit all people can significantly impact the outcome of a project. The ability to work effectively with teammates, share knowledge, and leverage each other's strengths can lead to improved project outcomes and overall success in the field of AI.
What is the importance of networking in AI?
Networking in AI is crucial as it can provide valuable insights, guidance, and opportunities for individuals looking to advance in the field. By connecting with professionals who have experience in AI, individuals can gain knowledge about the industry, potential career paths, and curren

In [24]:
tru.get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Context Relevance,Answer Relevance,Groundedness,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Sentence Window Query Engine,0.51,1.0,0.603333,1.363636,0.000814


In [25]:
# launches on http://localhost:8501/
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.
Dashboard already running at path: https://s172-30-153-57p38560.lab-aws-production.deeplearning.ai/


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

### 2. Auto-merging retrieval

In [26]:
from utils import build_automerging_index

automerging_index = build_automerging_index(
    documents,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index"
)

In [27]:
from utils import get_automerging_query_engine

automerging_query_engine = get_automerging_query_engine(
    automerging_index,
)

In [28]:
auto_merging_response = automerging_query_engine.query(
    "How do I build a portfolio of AI projects?"
)
print(str(auto_merging_response))

> Merging 1 nodes into parent node.
> Parent node id: 4fe462d2-56fc-4e77-b30d-7389a70812bf.
> Parent node text: PAGE 21Building a Portfolio of 
Projects that Shows 
Skill Progression CHAPTER 6
PROJECTS

> Merging 1 nodes into parent node.
> Parent node id: 0042a514-8853-4c9b-9b4f-87ee70a75e9e.
> Parent node text: PAGE 21Building a Portfolio of 
Projects that Shows 
Skill Progression CHAPTER 6
PROJECTS

Building a portfolio of AI projects involves showcasing progress from simple to complex undertakings over time. It is important to be able to communicate your thinking effectively to demonstrate the value of your work and gain trust from others. Identifying worthwhile ideas to work on is a crucial skill for an AI architect, and gaining experience through working on projects in various industries can help in building a strong portfolio.


In [29]:
tru.reset_database()

tru_recorder_automerging = get_prebuilt_trulens_recorder(automerging_query_engine,
                                                         app_id="Automerging Query Engine")

In [30]:
for question in eval_questions:
    with tru_recorder_automerging as recording:
        response = automerging_query_engine.query(question)
        print(question)
        print(response)

> Merging 2 nodes into parent node.
> Parent node id: 2e6be5e1-6b85-436d-9089-10cf53f59ffa.
> Parent node text: PAGE 3Table of 
ContentsIntroduction: Coding AI is the New Literacy.
Chapter 1: Three Steps to Ca...

> Merging 1 nodes into parent node.
> Parent node id: 1b32c60e-4f5c-46b7-9359-da900d991908.
> Parent node text: PAGE 3Table of 
ContentsIntroduction: Coding AI is the New Literacy.
Chapter 1: Three Steps to Ca...

What are the keys to building a career in AI?
Learning foundational technical skills, working on projects, finding a job, and being part of a community are keys to building a career in AI. Additionally, collaborating in teams, influencing others, and being influenced by others are critical for success in AI.
How can teamwork contribute to success in AI?
Teamwork can contribute to success in AI by allowing individuals to work together effectively on large projects. Collaborating in teams enables individuals to leverage each other's strengths, share knowledge, and col

Is it normal to find parts of AI challenging?
It is normal to find parts of AI challenging.
> Merging 1 nodes into parent node.
> Parent node id: abf9d727-4ed9-4094-8133-5ff496edf42c.
> Parent node text: PAGE 31Finding the Right 
AI Job for YouCHAPTER 9
JOBS

> Merging 1 nodes into parent node.
> Parent node id: 89b7a096-a741-476f-bfe3-e762e30e2446.
> Parent node text: If you’re leaving 
a job, exit gracefully. Give your employer ample notice, give your full effort...

> Merging 1 nodes into parent node.
> Parent node id: 4f4a07d7-18e3-4d43-b64b-8491fd942cc1.
> Parent node text: PAGE 28Using Informational 
Interviews to Find 
the Right JobCHAPTER 8
JOBS

> Merging 1 nodes into parent node.
> Parent node id: 5a9dd671-bebf-4839-8ffe-5b89f5aece98.
> Parent node text: PAGE 31Finding the Right 
AI Job for YouCHAPTER 9
JOBS

> Merging 1 nodes into parent node.
> Parent node id: e44f65cc-4073-4ed7-a087-06541c26238b.
> Parent node text: PAGE 28Using Informational 
Interviews to Find 
the Right

In [31]:
tru.get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Context Relevance,Answer Relevance,Groundedness,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Automerging Query Engine,0.85,1.0,1.0,2.454545,0.000867


In [32]:
# launches on http://localhost:8501/
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.
Dashboard already running at path: https://s172-30-153-57p38560.lab-aws-production.deeplearning.ai/


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>