### Sub Question Engine ###

In [1]:
from config import set_environment
set_environment()

In [2]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [3]:
import nest_asyncio

nest_asyncio.apply()

In [4]:
from llama_index.core import Settings
from llama_index.core.response.notebook_utils import display_response, display_source_node

from llama_index.core import SimpleDirectoryReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor, KeywordNodePostprocessor

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

from llama_index.core import get_response_synthesizer

INFO:numexpr.utils:Note: NumExpr detected 20 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 20 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.


In [5]:
# Various Test Docs

paul_graham = "data/test/paul_graham_essay.txt"

aetna_policy = "data/benefits_qa_store/AetnaSBC.pdf"
company_policy = "data/benefits_qa_store/Company - SPD.pdf"


In [6]:
# Node Parser
chunk_size = 256
chunk_overlap = 20

# Retriever Settings
similarity_top_k = 2

# Context Post Processor Settings
required_key_words = [""]
excluded_key_words = [""]
similarity_cutoff = 0.2

# Response Settings
response_mode = "refine" # "refine", "tree_summarize"

# Source Node Display Length
source_length = 200

# Document 
document_list = [company_policy]

# Query

query = "what coverage do you offer for life, dental, and vision"

In [7]:
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large",dimensions=512,)
#Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

#Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

In [8]:
Settings.llm = OpenAI(temperature=0, model="gpt-4")

In [9]:
documents = SimpleDirectoryReader(input_files=document_list).load_data()

In [10]:
node_parser = SentenceSplitter(chunk_size=chunk_size, chunk_overlap = chunk_overlap)
nodes = node_parser.get_nodes_from_documents(documents)
# set node ids to be a constant
for idx, node in enumerate(nodes):
    node.id_ = f"node-{idx}"

In [11]:
index = VectorStoreIndex(nodes, embed_model=Settings.embed_model, show_progress=True)

Generating embeddings:   0%|          | 0/691 [00:00<?, ?it/s]

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.open

In [12]:
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=similarity_top_k
)

In [13]:
node_postprocessors = [
    #KeywordNodePostprocessor(
    #   required_keywords=required_key_words, exclude_keywords=excluded_key_words
    #),
    SimilarityPostprocessor(similarity_cutoff=similarity_cutoff) 
]

In [14]:
response_synthesizer = get_response_synthesizer(response_mode = response_mode)

In [15]:
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=node_postprocessors
)

In [16]:
query = "what coverage do you offer for life, dental, and vision"

In [17]:
response = query_engine.query(query)
display_response(response)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


**`Final Response:`** The company offers Group Term Life Insurance and Accidental Death & Dismemberment (AD&D) insured by MetLife. Dental and Vision plans are also available. The Dental plan was effective from February 1, 1983, and the Life Insurance from June 1, 1983. The Vision plans became effective on February 1, 1991.

In [18]:
retrievals = retriever.retrieve(query)
for n in retrievals:
    display_source_node(n, source_length=source_length)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


**Node ID:** node-57<br>**Similarity:** 0.6059803326223837<br>**Text:** Default Coverage Includes :  
 UnitedHealthcare HSA Medical Plan (Employee Only)  
 Life Insurance: $10,000 Pre -Tax 
 Acciden tal Death & Dismemberment (AD&D) Insurance: $10,000  
 Long Term D...<br>

**Node ID:** node-686<br>**Similarity:** 0.5883440210815001<br>**Text:** 2017 
(Original Dental Effective: Date 
February 1, 1983)  
Group Term Life Insurance & 
Accidental Death & Dismemberment 
(AD&D)  Insured -  
Insurer  MetLife  
1 Madison Avenue New York, 
NY 1001...<br>

In [19]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

In [20]:
# Using the LlamaDebugHandler to print the trace of the sub questions
# captured by the SUB_QUESTION callback event type
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])

Settings.callback_manager = callback_manager

In [21]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=query_engine,
        metadata=ToolMetadata(
            name="benefits questions",
            description="Question about benefits coverage",
        ),
    ),
]

subquestion_query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    use_async=True,
)

In [22]:
response = subquestion_query_engine.query(query)
    

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Generated 3 sub questions.
[1;3;38;2;237;90;200m[benefits questions] Q: What is the life insurance coverage offered?
[0m[1;3;38;2;90;149;237m[benefits questions] Q: What is the dental insurance coverage offered?
[0m[1;3;38;2;11;159;203m[benefits questions] Q: What is the vision insurance coverage offered?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https:

In [23]:
display_response(response)

**`Final Response:`** For life insurance, two options are offered: a pre-tax option of $10,000 or an after-tax option equal to one times your Annual Benefits Compensation. If no initial life insurance coverage election is made, you will automatically be enrolled in the Default Coverage which is the pre-tax coverage amount of $10,000.

For dental insurance, preventive services are covered at 100% of PDP (Network) or R&C (Non-Network). This includes oral exams and prophylaxis/cleaning twice per year, sealants once every 36 months for children up to age 19, topical application of sodium or stannous fluoride twice per year for children up to age 19, and X-rays for diagnosis. Other X-rays should not exceed one full-mouth series in a 36-month period and one set of bitewings twice per year. X-rays must be performed by the dentist or a licensed dental hygienist under the dentist’s supervision to be covered.

For vision insurance, two plans are offered. Vision Plan I allows you to acquire glasses (including frame and lenses) or contact lenses once per year. Vision Plan II allows you to get glasses (frame and lenses) or contacts twice per year and offers a greater coverage level for frames or elective contact lenses. However, you would be responsible for the costs of certain elective services and supplies such as cosmetic lenses, frames exceeding the plan allowance, certain low vision care limitations, and any supplemental tests or fittings associated with contact lenses, among others.