In [5]:
pip install python-dotenv

Collecting python-dotenv
  Using cached python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Using cached python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1
Note: you may need to restart the kernel to use updated packages.


In [7]:
pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


In [6]:
pip freeze > req.txt

Note: you may need to restart the kernel to use updated packages.


In [40]:
from helpers.helper import get_openai_api_key
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import SummaryIndex, VectorStoreIndex
from llama_index.core.tools import QueryEngineTool, FunctionTool
# from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
# from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.vector_stores import MetadataFilters, FilterCondition
from llama_index.core.agent import FunctionCallingAgentWorker, AgentRunner
from typing import List
import nest_asyncio

In [28]:
nest_asyncio.apply()
OPENAI_API_KEY = get_openai_api_key()

In [29]:
# Load the documents
documents = SimpleDirectoryReader(input_files=['./data/NIPS-2017-attention-is-all-you-need-Paper.pdf']).load_data()

In [30]:
# Split the document into nodes based on chunk size
splitter = SentenceSplitter(chunk_size=500, chunk_overlap=500)
nodes = splitter.get_nodes_from_documents(documents)
print(nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: NIPS-2017-attention-is-all-you-need-Paper.pdf
file_path: data/NIPS-2017-attention-is-all-you-need-Paper.pdf
file_type: application/pdf
file_size: 569417
creation_date: 2024-09-28
last_modified_date: 2024-09-28

Attention Is All You Need
Ashish Vaswani∗
Google Brain
avaswani@google.comNoam Shazeer∗
Google Brain
noam@google.comNiki Parmar∗
Google Research
nikip@google.comJakob Uszkoreit∗
Google Research
usz@google.com
Llion Jones∗
Google Research
llion@google.comAidan N. Gomez∗†
University of Toronto
aidan@cs.toronto.eduŁukasz Kaiser∗
Google Brain
lukaszkaiser@google.com
Illia Polosukhin∗‡
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, di

In [31]:
# Define embedding model and llm model in global settings
Settings.llm = OpenAI(temperature=0, model="gpt-4o")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

In [32]:
# Define summary index and vector index - Summary index returns all nodes, vector index returns nodes based on embedding + query
summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

In [41]:
# Define query engines which is an abstraction tool over indexes
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
def vector_query(query: str, page_numbers:List[str]) -> str:
    """
    Perform a vector search over an index.
    :param query: string query to be embedded
    :param page_numbers: Filter by set of pages. Leave BLANK if we want to perform vector search over all pages.
    :return: response from LLM as string
    """
    metadata_dict = [
        {
            "key": "page_label", "value": p
        } for p in page_numbers
    ]
    vector_query_engine = vector_index.as_query_engine(
        similarity_top_k=2,
        filter=MetadataFilters.from_dicts(
            metadata_dict,
            condition=FilterCondition.OR
        )
    )
    
    response_from_llm = vector_query_engine.query(query)
    return response_from_llm

In [42]:
# Define query tool which is an abstraction tool over query engines
summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description='Summarizing the documents'
)

vector_tool = FunctionTool.from_defaults(
    name="vector_tool",
    description="Retrieve specific context from the documents",
    fn=vector_query
)

In [36]:
# Define the Router using in-built selectors from Llama Index
# query_engine = RouterQueryEngine(
#     selector=LLMSingleSelector.from_defaults(),
#     query_engine_tools=[summary_tool],
#     verbose=True
# )

In [37]:
# Get a summarized response from the document
# summary_response = query_engine.query("What is the summary of this document?")
# print(str(summary_response))

[1;3;38;5;200mSelecting query engine 0: The question directly asks for a summary of the document, and the provided choice explicitly mentions summarizing the documents..
[0mThe document presents the Transformer, a novel network architecture for sequence transduction tasks, which relies entirely on attention mechanisms, eliminating the need for recurrent or convolutional neural networks. The Transformer model demonstrates superior performance in machine translation tasks, achieving state-of-the-art BLEU scores on the WMT 2014 English-to-German and English-to-French translation tasks. The architecture allows for significant parallelization, reducing training time and computational costs. The document details the model's architecture, including multi-head attention, position-wise feed-forward networks, and positional encodings. It also discusses the training regimen, including data batching, hardware setup, optimizer settings, and regularization techniques. The results show that the Tra

In [38]:
# Length of the source nodes is equal to the number of chunks the documents is split into
# print(len(summary_response.source_nodes))

131


In [14]:
# Get a specific query response from the document
# vector_response = query_engine.query("Which optimizer is used to train the model for this paper?")
# print(str(vector_response))

[1;3;38;5;200mSelecting query engine 1: The question asks for specific information about the optimizer used to train the model, which requires retrieving specific context from the documents..
[0mThe optimizer used to train the model for this paper is the Adam optimizer.


In [15]:
# Only few nodes are used since the vector engine only retrieves relevant nodes and not all nodes from the document store
# print(len(vector_response.source_nodes))

2


In [45]:
# Implementing summary and vector query with page numbers using tool calling technique
response= Settings.llm.predict_and_call(
    [summary_tool, vector_tool],
    "What are the input and output layer dimensionality for feed forward networks as described on page 5?",
    verbose=True
)

=== Calling Function ===
Calling function: vector_tool with args: {"query": "input and output layer dimensionality for feed forward networks", "page_numbers": ["5"]}
=== Function Output ===
The input and output layer dimensionality for the feed-forward networks is 512.


In [46]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '5', 'file_name': 'NIPS-2017-attention-is-all-you-need-Paper.pdf', 'file_path': 'data/NIPS-2017-attention-is-all-you-need-Paper.pdf', 'file_type': 'application/pdf', 'file_size': 569417, 'creation_date': '2024-09-28', 'last_modified_date': '2024-09-28'}
{'page_label': '5', 'file_name': 'NIPS-2017-attention-is-all-you-need-Paper.pdf', 'file_path': 'data/NIPS-2017-attention-is-all-you-need-Paper.pdf', 'file_type': 'application/pdf', 'file_size': 569417, 'creation_date': '2024-09-28', 'last_modified_date': '2024-09-28'}
