# Trailhead


In [1]:
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())
import nest_asyncio
nest_asyncio.apply()
from loguru import logger

In [2]:
import os

QDRANT_API_KEY = os.environ.get("QDRANT_API_KEY", "")
QDRANT_HOSTED_URL = os.environ.get("QDRANT_HOSTED_URL", "")
QDRANT_PORT = os.environ.get("QDRANT_PORT", 6333)

len(QDRANT_API_KEY), QDRANT_API_KEY[:10], len(QDRANT_HOSTED_URL), QDRANT_HOSTED_URL[:10], QDRANT_PORT, type(QDRANT_PORT)

(54, 'PM4CDFLF-s', 75, 'https://94', 6333, int)

In [3]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

llm = OpenAI(
    model="gpt-4o",
)
embedding_model = OpenAIEmbedding(
    model="text-embedding-3-large",
)

In [4]:
from llama_index.core import VectorStoreIndex
from llama_index.core.chat_engine.types import ChatMode
from llama_index.core.vector_stores.types import VectorStoreQueryMode
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client

COLLECTION_NAME = "trailhead"
q_client= qdrant_client.QdrantClient(
    url=QDRANT_HOSTED_URL, 
    port=QDRANT_PORT, 
    api_key=QDRANT_API_KEY
)

In [5]:
trailhead_vector_store = QdrantVectorStore(
    collection_name=COLLECTION_NAME,
    client=q_client,
    prefer_grpc=True,
    parallel=8,
    enable_hybrid=False
)

trailhead_index = VectorStoreIndex.from_vector_store(
    vector_store=trailhead_vector_store,
    embed_model=embedding_model,
    show_progress=True
)

In [6]:
trailhead_vector_store._collection_exists(COLLECTION_NAME)

True

In [7]:
trailhead_retriever = trailhead_index.as_retriever(
    similarity_top_k=8,
    vector_store_query_mode=VectorStoreQueryMode.MMR,
    vector_store_kwargs={
        "mmr_prefetch_k": 16,
    },
)

In [8]:
nodes = trailhead_retriever.retrieve("What AI products are there for advertisers?")
len(nodes)

8

In [9]:
from llama_index.core.response_synthesizers import get_response_synthesizer, ResponseMode
trailhead_engine = trailhead_index.as_chat_engine(
    chat_mode=ChatMode.BEST,
    llm=llm,
    similarity_top_k=8,
    vector_store_query_mode=VectorStoreQueryMode.MMR,
    vector_store_kwargs={
        "mmr_prefetch_k": 16,
    },
    response_synthesizer=get_response_synthesizer(
        response_mode=ResponseMode.TREE_SUMMARIZE,
    )
)

In [23]:
response = trailhead_engine.chat("Where should I be using batch Apex?")
type(response), vars(response)

(llama_index.core.chat_engine.types.AgentChatResponse,
 {'response': "Batch Apex in Salesforce is ideal for scenarios where you need to process large volumes of data asynchronously. Here are some common use cases:\n\n1. **Data Cleansing**: Cleaning and updating large datasets to ensure data quality and consistency.\n\n2. **Data Migration**: Moving large amounts of data from one system to another, or within Salesforce, without hitting governor limits.\n\n3. **Complex Calculations**: Performing calculations on large datasets that would otherwise exceed normal processing limits.\n\n4. **Scheduled Jobs**: Running regular maintenance tasks or updates on large datasets at scheduled intervals.\n\n5. **Data Archiving**: Archiving old records to maintain system performance while keeping historical data accessible.\n\nBatch Apex is particularly useful because it processes data in smaller chunks, helping to manage and optimize resource usage while adhering to Salesforce's governor limits.",
  'so

In [24]:
from IPython.display import Markdown, display

In [25]:
display(Markdown(response.response)), response.source_nodes[0].score if len(response.source_nodes) > 0 else ""

Batch Apex in Salesforce is ideal for scenarios where you need to process large volumes of data asynchronously. Here are some common use cases:

1. **Data Cleansing**: Cleaning and updating large datasets to ensure data quality and consistency.

2. **Data Migration**: Moving large amounts of data from one system to another, or within Salesforce, without hitting governor limits.

3. **Complex Calculations**: Performing calculations on large datasets that would otherwise exceed normal processing limits.

4. **Scheduled Jobs**: Running regular maintenance tasks or updates on large datasets at scheduled intervals.

5. **Data Archiving**: Archiving old records to maintain system performance while keeping historical data accessible.

Batch Apex is particularly useful because it processes data in smaller chunks, helping to manage and optimize resource usage while adhering to Salesforce's governor limits.

(None, '')

In [22]:

response = trailhead_engine.chat("What is Future Apex used for?  Please give a concrete example.")
display(Markdown(response.response)), len(response.source_nodes)

Future Apex in Salesforce is used for executing operations asynchronously, which allows them to run in the background without blocking the user interface. This is particularly useful for tasks that involve callouts to external systems or operations that are resource-intensive and do not need to be completed immediately.

### Concrete Example:

**Scenario**: Suppose you have a Salesforce application that needs to update a third-party inventory management system whenever a product's stock level changes in Salesforce. This update involves making an HTTP callout to the external system's API.

**Using Future Apex**:
- **Problem**: Making a callout to an external system cannot be done in a synchronous transaction in Salesforce because it might take too long and affect the user experience.
- **Solution**: Use Future Apex to handle the callout asynchronously.

```apex
public class InventoryService {
    @future(callout=true)
    public static void updateInventory(String productId, Integer newStockLevel) {
        // Construct the HTTP request
        HttpRequest req = new HttpRequest();
        req.setEndpoint('https://api.inventorysystem.com/updateStock');
        req.setMethod('POST');
        req.setHeader('Content-Type', 'application/json');
        req.setBody('{"productId": "' + productId + '", "stockLevel": ' + newStockLevel + '}');
        
        // Send the HTTP request
        Http http = new Http();
        HttpResponse res = http.send(req);
        
        // Handle the response from the inventory system
        if (res.getStatusCode() == 200) {
            // Log success or update Salesforce records if needed
        } else {
            // Handle errors, possibly logging them for later review
        }
    }
}
```

**How it Works**:
- When a product's stock level changes, the `updateInventory` method is called.
- The method is annotated with `@future(callout=true)`, allowing it to run asynchronously and make HTTP callouts.
- The callout is made to the external inventory system to update the stock level.
- The user can continue interacting with the Salesforce application without waiting for the callout to complete.

This approach ensures that the integration with the external system does not disrupt the user experience and adheres to Salesforce's governor limits.

(None, 0)

## Sentence Window Node Parsing

We use the `SentenceWindowNodeParser` to parse documents into single sentences per node. Each node also contains a "window" with the sentences on either side of the node sentence.

Then, after retrieval, before passing the retrieved sentences to the LLM, the single sentences are replaced with a window containing the surrounding sentences using the `MetadataReplacementNodePostProcessor`.

This is most useful for large documents/indexes, as it helps to retrieve more fine-grained details.

By default, the sentence window is 5 sentences on either side of the original sentence.

In this case, chunk size settings are not used, in favor of following the window settings.


In [27]:
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.node_parser import SentenceSplitter

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

node_parser = SentenceSplitter.from_defaults(
    chunk_size=512,
    chunk_overlap=32,
)

In [29]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embedding_model
Settings.text_splitter = node_parser
Settings.node_parser = node_parser

In [35]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

wider_query_engine = trailhead_index.as_query_engine(
    similarity_top_k=8,
    vector_store_query_mode=VectorStoreQueryMode.MMR,
    vector_store_kwargs={
        "mmr_prefetch_k": 16,
    },
    node_postprocessors=MetadataReplacementPostProcessor(
        target_metadata_key="window"
    ),
)


In [37]:
response = wider_query_engine.query("What is Future Apex used for?  Please give a concrete example.")
display(Markdown(response.response)), len(response.source_nodes)


Future Apex is used to run processes asynchronously in Salesforce, allowing for operations that can be executed at a later time when system resources are available. This is particularly useful for long-running operations or when you need to call an external web service without holding up the main execution thread.

A concrete example of using Future Apex could be updating a large number of records in Salesforce. Instead of updating all records synchronously, which could hit governor limits, you can use a Future method to perform the updates asynchronously. This way, the updates are processed in the background, freeing up the main thread to continue executing other operations.

(None, 8)

Let's check the original sentence that was retrieved for each node, as well as the actual window of sentences that was sent to the LLM.


In [38]:
window = response.source_nodes[0].node.metadata
window


{'file_path': '/var/folders/9v/753h998178382xkzg96v6mwr0000gn/T/tmpizt689yk.zip',
 'file_name': 'tmpizt689yk.zip',
 'file_type': 'application/zip',
 'file_size': 2098792,
 'creation_date': '2025-03-22',
 'last_modified_date': '2025-03-22',
 'questions_this_excerpt_can_answer': '1. What is the file path of the temporary zip file mentioned in the context?\n2. What is the name of the directory within the zip file that contains the file `async_apex.md`?\n3. What is the specific file path for the `debugging_diagnostics.md` file within the zip archive?\n4. What is the content or purpose of the file `async_apex.md` as indicated by its location in the directory structure?\n5. What metadata or additional information is provided about the `async_apex.md` file in the context?',
 'document_title': '"Comprehensive Guide to Advanced Technologies: Salesforce, AWS, AI, Automation, Cybersecurity, and Nonprofit Cloud Management"',
 'excerpt_keywords': 'Salesforce, AWS, AI, Automation, Cybersecurity'}

In [None]:
original_text = response.source_nodes[0].node.metadata["original_text"]

display(Markdown(f"**Original Sentence:**\n{original_text}"))
display(Markdown(f"**Window:**\n{window}"))