## Setup and Import Libraries

In [2]:
import os
from llama_index.core.schema import Document
from llama_index.llms.openai import OpenAI
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from trulens_eval import Tru
from utils import (
    get_prebuilt_trulens_recorder, build_sentence_window_index, 
    get_sentence_window_query_engine, build_automerging_index,
    get_automerging_query_engine
)
from dotenv import load_dotenv

In [3]:
load_dotenv()

True

In [4]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["HUGGINGFACE_API_KEY"] = os.getenv("HUGGINGFACE_API_KEY")

In [5]:
documents = SimpleDirectoryReader(
    input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()

In [6]:
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])

<class 'list'> 

41 

<class 'llama_index.core.schema.Document'>
Doc ID: ebad54c2-3753-4060-af26-2a7a96c59ad3
Text: PAGE 1 Founder, DeepLearning.AI Collected Insights from Andrew
Ng How to  Build Your Career in AI A Simple Guide


## Basic RAG Pipeline

In [7]:
document = Document(text="\n\n".join([doc.text for doc in documents]))

In [8]:
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

In [9]:
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

In [11]:
index = VectorStoreIndex.from_documents(
    [document], embed_model=Settings.embed_model)

In [12]:
query_engine = index.as_query_engine()

In [13]:
response = query_engine.query(
    "What are steps to take when finding projects to build your experience?"
)
print(str(response))

Develop a side hustle, ensure the project helps you grow technically, collaborate with good teammates, and consider if the project can serve as a stepping stone to larger projects.


### Evaluation setup using TruLens

In [14]:
eval_questions = []
with open('eval_questions.txt', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        print(item)
        eval_questions.append(item)

What are the keys to building a career in AI?
How can teamwork contribute to success in AI?
What is the importance of networking in AI?
What are some good habits to develop for a successful career?
How can altruism be beneficial in building a career?
What is imposter syndrome and how does it relate to AI?
Who are some accomplished individuals who have experienced imposter syndrome?
What is the first step to becoming good at AI?
What are some common challenges in AI?
Is it normal to find parts of AI challenging?


In [15]:
new_question = "What is the right AI job for me?"
eval_questions.append(new_question)

In [16]:
print(eval_questions)

['What are the keys to building a career in AI?', 'How can teamwork contribute to success in AI?', 'What is the importance of networking in AI?', 'What are some good habits to develop for a successful career?', 'How can altruism be beneficial in building a career?', 'What is imposter syndrome and how does it relate to AI?', 'Who are some accomplished individuals who have experienced imposter syndrome?', 'What is the first step to becoming good at AI?', 'What are some common challenges in AI?', 'Is it normal to find parts of AI challenging?', 'What is the right AI job for me?']


In [17]:
tru = Tru()

tru.reset_database()

🦑 Initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `TruSession` to prevent this.


Updating app_name and app_version in apps table: 0it [00:00, ?it/s]
Updating app_id in records table: 0it [00:00, ?it/s]
Updating app_json in apps table: 0it [00:00, ?it/s]


In [18]:
tru_recorder = get_prebuilt_trulens_recorder(
    query_engine, app_id="Direct Query Engine"
)

instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.embeddings.multi_modal_base.MultiModalEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.base.embeddings.base.BaseEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.TransformComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.BaseComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'pydantic.main.BaseModel'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base

In [20]:
with tru_recorder as recording:
    for question in eval_questions:
        response = query_engine.query(question)

In [21]:
records, feedback = tru.get_records_and_feedback(app_ids=[])

In [22]:
records.head()

Unnamed: 0,app_name,app_version,app_id,app_json,type,record_id,input,output,tags,record_json,...,Answer Relevance,Answer Relevance_calls,Answer Relevance feedback cost in USD,Context Relevance,Context Relevance_calls,Context Relevance feedback cost in USD,latency,total_tokens,total_cost,cost_currency
0,Direct Query Engine,base,app_hash_6e8221fde876d15698298cea8c0d1bd6,"{'tru_class_info': {'name': 'TruLlama', 'modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_cae7d2bf891610b0ee22ce0712f99e73,What is the right AI job for me?,The right AI job for you would be one that ali...,-,{'record_id': 'record_hash_cae7d2bf891610b0ee2...,...,,,,,,,1.794142,2210,0.003363,USD
1,Direct Query Engine,base,app_hash_6e8221fde876d15698298cea8c0d1bd6,"{'tru_class_info': {'name': 'TruLlama', 'modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_104864c0cd0d64303a64fa34d24f68a4,Is it normal to find parts of AI challenging?,It is common to find parts of AI challenging.,-,{'record_id': 'record_hash_104864c0cd0d64303a6...,...,,,,,,,0.922265,2129,0.003199,USD
2,Direct Query Engine,base,app_hash_6e8221fde876d15698298cea8c0d1bd6,"{'tru_class_info': {'name': 'TruLlama', 'modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_d0afdf8e8210781c2aeff914e445da63,What are some common challenges in AI?,Common challenges in AI include understanding ...,-,{'record_id': 'record_hash_d0afdf8e8210781c2ae...,...,1.0,[{'args': {'prompt': 'What are some common cha...,0.0,,,,1.386312,2132,0.003216,USD
3,Direct Query Engine,base,app_hash_6e8221fde876d15698298cea8c0d1bd6,"{'tru_class_info': {'name': 'TruLlama', 'modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_b301896aa3b58e1a5e3b53b2434c39a6,What is the first step to becoming good at AI?,Learning foundational technical skills.,-,{'record_id': 'record_hash_b301896aa3b58e1a5e3...,...,1.0,[{'args': {'prompt': 'What is the first step t...,0.0,,,,0.713381,1727,0.002593,USD
4,Direct Query Engine,base,app_hash_6e8221fde876d15698298cea8c0d1bd6,"{'tru_class_info': {'name': 'TruLlama', 'modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_f8ded726ccbfb017e0026c861263597a,Who are some accomplished individuals who have...,"Former Facebook COO Sheryl Sandberg, U.S. firs...",-,{'record_id': 'record_hash_f8ded726ccbfb017e00...,...,1.0,[{'args': {'prompt': 'Who are some accomplishe...,0.0,0.5,[{'args': {'prompt': 'Who are some accomplishe...,0.0,0.998477,2143,0.003237,USD


In [24]:
# tru.run_dashboard()

## Advanced RAG pipeline

### 1. Sentence Window retrieval

In [25]:
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

In [26]:
sentence_index = build_sentence_window_index(
    document,
    llm,
    embed_model_name="BAAI/bge-small-en-v1.5",
    save_dir="sentence_index"
)

Loading llama_index.core.storage.kvstore.simple_kvstore from sentence_index\docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from sentence_index\index_store.json.


In [27]:
sentence_window_engine = get_sentence_window_query_engine(sentence_index)

In [29]:
window_response = sentence_window_engine.query(
    "how do I get started on a personal project in AI?"
)
print(str(window_response))

Start by selecting a project that interests you and aligns with your current skill level. It's beneficial to begin with a simple project that allows you to learn and practice new AI techniques. Clearly define the goals and objectives of your project, and consider how it can demonstrate your skills and growth over time. Communication is key - be able to explain the value of your project to others and seek feedback from colleagues, mentors, or managers. As you progress, consider collaborating with stakeholders who may not have expertise in AI to broaden your perspective. Remember that each project is a step in your journey, so don't worry about starting small and focus on continuous learning and improvement.


In [30]:
tru_recorder_sentence_window = get_prebuilt_trulens_recorder(
    sentence_window_engine,
    app_id = "Sentence Window Query Engine"
)

instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.embeddings.multi_modal_base.MultiModalEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.base.embeddings.base.BaseEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.TransformComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.BaseComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'pydantic.main.BaseModel'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base

In [32]:
for question in eval_questions:
    with tru_recorder_sentence_window as recording:
        response = sentence_window_engine.query(question)
        print(question)
        print(str(response))

What are the keys to building a career in AI?
Teamwork, networking, job search, personal discipline, and altruism are the keys to building a career in AI.
How can teamwork contribute to success in AI?
Teamwork can contribute to success in AI by enabling individuals to collaborate effectively, influence and be influenced by others, and work together towards a common goal. The ability to work in teams allows for the sharing of diverse perspectives, expertise, and ideas, leading to more innovative solutions and better outcomes in AI projects.
What is the importance of networking in AI?
Networking is crucial in AI as it helps individuals build a strong professional network that can provide support, guidance, and opportunities. By connecting with others in the field, individuals can gain valuable insights, collaborate on projects, and stay updated on industry trends. Additionally, networking can lead to mentorship opportunities, potential job referrals, and access to a community of like-min

In [33]:
tru.get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Unnamed: 1_level_0,Answer Relevance,Context Relevance,latency,total_cost
app_name,app_version,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Direct Query Engine,base,0.893939,0.5,1.201726,0.002939
Sentence Window Query Engine,base,0.877193,0.578431,2.382052,0.002939


In [35]:
# tru.run_dashboard()

### 2. Auto-merging retrieval

In [38]:
automerging_index = build_automerging_index(
    documents,
    llm,
    embed_model_name="BAAI/bge-small-en-v1.5",
    save_dir="merging_index"
)

Loading llama_index.core.storage.kvstore.simple_kvstore from merging_index\docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from merging_index\index_store.json.


In [39]:
automerging_query_engine = get_automerging_query_engine(
    automerging_index,
)

In [41]:
auto_merging_response = automerging_query_engine.query(
    "How do I build a portfolio of AI projects?"
)
print(str(auto_merging_response))

> Merging 1 nodes into parent node.
> Parent node id: 0892dd37-f8d1-4eef-871e-16ddb6ed324d.
> Parent node text: PAGE 21
Building a Portfolio of 
Projects that Shows 
Skill Progression 
CHAPTER 6
PROJECTS

> Merging 1 nodes into parent node.
> Parent node id: da2efc27-af1b-474b-af11-e465be5aa5bd.
> Parent node text: PAGE 17
Finding Projects that 
Complement Your 
Career Goals
CHAPTER 5
PROJECTS

> Merging 1 nodes into parent node.
> Parent node id: bb3eeaa7-26cb-43a9-a59b-34d4320ad040.
> Parent node text: PAGE 14
Scoping Successful 
AI Projects
CHAPTER 4
PROJECTS

> Merging 1 nodes into parent node.
> Parent node id: 24b1595e-cccb-4963-b044-56a3e32af551.
> Parent node text: PAGE 21
Building a Portfolio of 
Projects that Shows 
Skill Progression 
CHAPTER 6
PROJECTS

> Merging 1 nodes into parent node.
> Parent node id: 314b6a51-01a8-4374-b568-50e78c89f162.
> Parent node text: PAGE 17
Finding Projects that 
Complement Your 
Career Goals
CHAPTER 5
PROJECTS

> Merging 1 nodes into parent no

In [43]:
tru_recorder_automerging = get_prebuilt_trulens_recorder(
    automerging_query_engine,  app_id="Automerging Query Engine"
)

instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.embeddings.multi_modal_base.MultiModalEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.base.embeddings.base.BaseEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.TransformComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.BaseComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'pydantic.main.BaseModel'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base

In [45]:
for question in eval_questions:
    with tru_recorder_automerging as recording:
        response = automerging_query_engine.query(question)
        print(question)
        print(response)

> Merging 2 nodes into parent node.
> Parent node id: c135ee70-03d3-4326-a096-a9dc646fd43c.
> Parent node text: PAGE 3
Table of 
Contents
Introduction: Coding AI is the New Literacy.
Chapter 1: Three Steps to ...

> Merging 1 nodes into parent node.
> Parent node id: c7b5af52-aa08-4f62-ad14-a3f2818908e9.
> Parent node text: PAGE 3
Table of 
Contents
Introduction: Coding AI is the New Literacy.
Chapter 1: Three Steps to ...

What are the keys to building a career in AI?
Learning foundational technical skills, working on projects, finding a job, and being part of a community are key steps to building a career in AI. Additionally, collaborating with others, influencing, and being influenced by others are critical for success in tackling large projects in AI.
How can teamwork contribute to success in AI?
Teamwork can contribute to success in AI by enhancing the ability to collaborate effectively with others, influence team members, and be influenced by them. This collaboration allows for a

In [46]:
tru.get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Unnamed: 1_level_0,Answer Relevance,Context Relevance,latency,total_cost
app_name,app_version,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Direct Query Engine,base,0.893939,0.5,1.201726,0.002939
Automerging Query Engine,base,0.883333,0.5,2.473891,0.00086
Sentence Window Query Engine,base,0.863636,0.545455,2.382052,0.002939


In [48]:
# tru.run_dashboard()