In [None]:
%%capture
!pip install llama-index==0.10.37 cohere==5.5.0 openai==1.30.1 llama-index-embeddings-openai==0.1.9 llama-index-llms-cohere==0.2.0 qdrant-client==1.9.1 llama-index-vector-stores-qdrant==0.2.8 

In [1]:
import os

from getpass import getpass
import nest_asyncio

from dotenv import load_dotenv

nest_asyncio.apply()

load_dotenv()

True

In [2]:
CO_API_KEY = os.environ['CO_API_KEY'] or getpass("Enter your Cohere API key: ")

In [3]:
OPENAI_API_KEY = os.environ['OPENAI_API_KEY'] or getpass("Enter your OpenAI API key: ")

In [4]:
QDRANT_URL = os.environ['QDRANT_URL'] or getpass("Enter your Qdrant URL:")

In [5]:
QDRANT_API_KEY = os.environ['QDRANT_API_KEY'] or  getpass("Enter your Qdrant API Key:")

# Querying

- 📊 Now that you've loaded your data and built an index, it's time to focus on the core of an LLM application: querying.

- 🤖 Querying at its simplest involves making a prompt call to an LLM - this could be asking a question, requesting a summary, or giving more complex instructions.

- 🔗 For more advanced uses, querying can include repeated or chained prompt calls to an LLM, or even a reasoning loop across multiple components.

Let's first instantiate the `qdrant` vector store.

In [6]:
import qdrant_client
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import StorageContext

embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")

# initialize qdrant client
client = qdrant_client.QdrantClient(
    url=QDRANT_URL, 
    api_key=QDRANT_API_KEY,
)

vector_store = QdrantVectorStore(
    client=client, 
    collection_name="it_can_be_done",
    embed_model=embed_model,
)

# assign qdrant vector store to storage context
storage_context = StorageContext.from_defaults(
    vector_store=vector_store,
    )

# load your index from stored vectors
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, 
    embed_model=embed_model,
    storage_context=storage_context
)

# 🧐 The `QueryEngine`

A Query Engine is a higher-level construct that uses an `Index` (and by extension, a `Retriever`) to answer queries. 

It not only retrieves the relevant data but also processes it to generate a response to the query. A `Query Engine` uses the `Retriever` to fetch data and then applies additional logic to generate a response.

Here's what happens under the hood:

- 📚 **Retrieval**: Find and return the most relevant documents from the `Index` using strategies like "top-k" semantic retrieval.

- 🔧 **Postprocessing**: Optionally rerank, transform, or filter retrieved Nodes, often based on specific metadata like keywords.

- 🔄 **Response Synthesis**: Combine the query, relevant data, and prompt to generate a response from your LLM.

Note, there are [a wide variety of Query Engines](https://github.com/run-llama/llama_index/tree/main/llama-index-core/llama_index/core/query_engine) available in LlamaIndex. We won't touch on all of them in this course, but I encourage you to explore what's available and think of how you may be able to use them.


In [7]:
from llama_index.llms.cohere import Cohere

llm = Cohere(model="command-r-plus")

query_engine = index.as_query_engine(llm=llm, streaming=True)

response = query_engine.query(
    "What do the Sikh Stoics believe?"
)

response.print_response_stream()

Sorry, I can't find any information about what the Sikh Stoics believe. Can you clarify what you're looking for?

In [11]:
response.source_nodes

[NodeWithScore(node=TextNode(id_='b9e49bc9-1074-4521-b359-b5a07da3f779', embedding=None, metadata={'file_path': 'data/pg10763.txt', 'file_name': 'pg10763.txt', 'file_type': 'text/plain', 'file_size': 405159, 'creation_date': '2024-05-21', 'last_modified_date': '2024-05-05'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='data/pg10763.txt', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': 'data/pg10763.txt', 'file_name': 'pg10763.txt', 'file_type': 'text/plain', 'file_size': 405159, 'creation_date': '2024-05-21', 'last_modified_date': '2024-05-05'}, hash='ce8443a693c1546f39c6bb0336c4d2f7929dd66c58288850e8b2df75ffa28edb'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='723274c5-5a11-4

In [12]:
response.source_nodes[0].get_text()

'"One needs seven souls for this long requiting,"\r\n    He said at length.\r\n\r\n  "Six times have I come where my first hope jeered me\r\n    And laughed me to scorn;\r\n  But now I fear as I never feared me\r\n    To fall forsworn.\r\n\r\n  "God! when they fight upright and at me\r\n    I give them back\r\n  Even such blows as theirs that combat me;\r\n    But now, alack!\r\n\r\n  "They fight with the wiles of fiends escaping\r\n    And underhand.\r\n  Six times, O God, and my wounds are gaping!\r\n    I--reel to stand.\r\n\r\n  "Six battles\' span! By this gasping breath\r\n    No pantomime.\r\n  Tis all that I can. I am sick unto death.\r\n    And--a seventh time?\r\n\r\n  "This is beyond all battles\' soreness!"\r\n    Then his wonder cried;\r\n  For Laughter, with shield and steely harness,\r\n    Stood up at his side!\r\n\r\n\r\n_William Rose Benét,_\r\n\r\nFrom "Merchants from Cathay."\r\n\r\n\r\n\r\n\r\nSUBMISSION\r\n\r\n\r\nThere are times when the right thing to do is to s

In [13]:
response.source_nodes[1].get_text()

'When God at first made Man,\r\n  Having a glass of blessings standing by;\r\n  Let us (said He) pour on him all we can:\r\n  Let the world\'s riches, which disperséd lie,\r\n    Contract into a span.\r\n\r\n    So strength first made a way;\r\n  Then beauty flow\'d, then wisdom, honor, pleasure\r\n  When almost all was out, God made a stay,\r\n  Perceiving that alone, of all His treasure,\r\n    Rest in the bottom lay.\r\n\r\n    For if I should (said He)\r\n  Bestow this jewel also on My creature,\r\n  He would adore My gifts instead of Me,\r\n  And rest in Nature, not the God of Nature.\r\n    So both should losers be.\r\n\r\n    Yet let him keep the rest,\r\n  But keep them with repining restlessness:\r\n  Let him be rich and weary, that at least,\r\n  If goodness lead him not, yet weariness\r\n    May toss him to My breast.\r\n\r\n\r\n_George Herbert._\r\n\r\n\r\n\r\n\r\nA PHILOSOPHER\r\n\r\n\r\n"The web of our life is of mingled yarn, good and ill together," says\r\nShakespeare. 

### Streaming response

In [14]:
response = query_engine.query(
    "What poems by Rudyard Kipling are in this book?"
)

response.print_response_stream()

The poems by Rudyard Kipling that are in the book are "If" and "When Earth's Last Picture Is Painted".

### 💬 Chat Engine

In [15]:
chat_engine = index.as_chat_engine(llm=llm)

chat_engine.streaming_chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.

Assistant:  Rudyard Kipling's book of poems includes topics such as the military, travel, and perseverance.



### Chat modes

#### Simple

Chat with LLM, without making use of a knowledge base. To use this mode set `chat_mode="simple"`.

Corresponds to [`SimpleChatEngine`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/chat_engine/simple.py). 

#### Condense question

Generate a standalone question from the conversation context and the last message. Then, ask the query engine for a response. To use this mode set `chat_mode="condense_question"`.

Corresponds to [`CondenseQuestionChatEngine`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/chat_engine/condense_question.py).

#### Context 

Retrieve text from the index based on the user's message. Utilize this context to formulate a response. To use this mode set `chat_mode="context"`.

Corresponds to [`ContextChatEngine`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/chat_engine/context.py).

#### Condense plus context

Condense a conversation and latest user message to a standalone question. Then build a context for the standalone question from a retriever. Finally, pass the context along with prompt and user message to LLM to generate a response. To use this mode set `chat_mode="condense_plus_context"`.

Corresponds to [`CondensePlusContextChatEngine`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/chat_engine/condense_plus_context.py).

#### ReACT
Corresponds to [`ReActAgent`](https://github.com/run-llama/llama_index/blob/37c95965426bddae82cec1ad49d3aa82e8bfe819/llama-index-core/llama_index/core/agent/react/base.py#L36).

Use a ReAct agent loop with query engine tools. To use this mode set `chat_mode="react"`.

#### Best

Select the best chat engine based on the current LLM. To use this mode set `chat_mode="best"`.

Corresponds to `OpenAIAgent` if using an OpenAI model that supports function calling API, otherwise, corresponds to `ReActAgent`.

In [None]:
from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

chat_engine = index.as_chat_engine(
    llm=llm,
    chat_mode="context",
    memory=memory,
    system_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about a book of poems called 'It Can Be Done'."
    ),
)

chat_engine.streaming_chat_repl()

# Customizing Querying

- 🔧 **Customizing Retrieval**: Use LlamaIndex's low-level composition API to adjust `top_k` value for more granular control over query results.

- 📈 **Adding Post-Processing**: Implement a step to ensure only nodes meeting a minimum similarity score are included, balancing between data richness and relevance.

- 🎚️ **SimilarityPostprocessor**: Set a similarity score threshold, compatible only with embedding-based retrievers, to ensure high relevance.

In [None]:
from llama_index.core import get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

# configure a retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,
)

# configure a post processor
similarity_processor = SimilarityPostprocessor(similarity_cutoff=0.42)

# configure a response sythesizer
response_synthsizer = get_response_synthesizer(llm=llm)

# create a query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthsizer,
    node_postprocessors=[similarity_processor],
)

In [None]:
query_engine.query("Compare the portrayal of internal versus external battles in the narratives and poems")

In [16]:
client.close()