# Part 2: Multiple Knowledge Sources

In Part 1, you queried a single knowledge source. In part 2, you'll scale up to query multiple knowledge sources (HR + Health docs) from a single knowledge base. You'll learn how to control which sources get queried and how to guide the agentic behavior of the Knowledge Base using natural language instructions.

## Step 1: Load Environment Variables

Run below cell to load the configuration for your Azure resources, choose the **.venv(3.12.1)** environment that is created for you. 

Note that this time, the knowledge base name reflects that we're working with multiple sources: `hr-and-health-docs-knowledge-base`.

> **⚠️ Troubleshooting**
>
> If code cells get stuck and keep spinning, select **Restart** from the notebook toolbar at the top. If the issue persists after a couple of tries, close VS Code completely and reopen it.

In [None]:
import os

from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv

load_dotenv(override=True) # take environment variables from .env.

# Azure AI Search configuration
endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
credential = DefaultAzureCredential()

# Knowledge base name
knowledge_base_name = "hr-and-health-docs-knowledge-base"

# Azure OpenAI configuration
azure_openai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
azure_openai_chatgpt_deployment = os.getenv("AZURE_OPENAI_CHATGPT_DEPLOYMENT", "gpt-4.1")
azure_openai_chatgpt_model_name = os.getenv("AZURE_OPENAI_CHATGPT_MODEL_NAME", "gpt-4.1")

print("Environment variables loaded")

## Step 2: Create Two Knowledge Sources

You'll create two knowledge sources:
- **healthdocs-knowledge-source**: Points to the `healthdocs` index (334 document chunks about health benefits and insurance)
- **hrdocs-knowledge-source**: Points to the `hrdocs` index (50 document chunks about HR policies)

Both sources connect to different indexes but use the same field configuration (`blob_path` for citations, `snippet` for content).

By creating multiple knowledge sources, you enable the knowledge base to intelligently decide which data to query based on the user's question.

In [None]:
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import SearchIndexFieldReference, SearchIndexKnowledgeSource, SearchIndexKnowledgeSourceParameters

index_client = SearchIndexClient(endpoint=endpoint, credential=credential)

ks = SearchIndexKnowledgeSource(
    name="healthdocs-knowledge-source",
    description="Zava health documents: health benefits and insurance plan information",
    search_index_parameters=SearchIndexKnowledgeSourceParameters(
        search_index_name="healthdocs",
        source_data_fields=[SearchIndexFieldReference(name="blob_path"), SearchIndexFieldReference(name="snippet")],
        search_fields=[SearchIndexFieldReference(name="snippet")]
    ),
)
index_client.create_or_update_knowledge_source(knowledge_source=ks)
print(f"Knowledge source '{ks.name}' created or updated successfully.")

ks = SearchIndexKnowledgeSource(
    name="hrdocs-knowledge-source",
    description="Zava HR documents: company policies, job roles, workplace guidelines, and general benefits",
    search_index_parameters=SearchIndexKnowledgeSourceParameters(
        search_index_name="hrdocs",
        source_data_fields=[SearchIndexFieldReference(name="blob_path"), SearchIndexFieldReference(name="snippet")],
        search_fields=[SearchIndexFieldReference(name="snippet")]
    )
)

index_client.create_or_update_knowledge_source(knowledge_source=ks)
print(f"Knowledge source '{ks.name}' created or updated successfully.")

## Step 3: Create Combined Knowledge Base

Now create a knowledge base that references both knowledge sources. Notice how the `knowledge_sources` parameter takes a list of references.

This single knowledge base can now query both HR and Health documents. When you send a query, the knowledge base will analyze the question and determine which sources are relevant and it can query one source, both sources, or intelligently select based on the question content.

In [None]:
from azure.search.documents.indexes.models import AzureOpenAIVectorizerParameters, KnowledgeBase, KnowledgeBaseAzureOpenAIModel, KnowledgeRetrievalOutputMode, KnowledgeSourceReference

aoai_params = AzureOpenAIVectorizerParameters(
    resource_url=azure_openai_endpoint,
    deployment_name=azure_openai_chatgpt_deployment,
    model_name=azure_openai_chatgpt_model_name
)

knowledge_base = KnowledgeBase(
    name=knowledge_base_name,
    models=[KnowledgeBaseAzureOpenAIModel(azure_open_ai_parameters=aoai_params)],
    knowledge_sources=[
        KnowledgeSourceReference(name="healthdocs-knowledge-source"),
        KnowledgeSourceReference(name="hrdocs-knowledge-source")
    ],
    output_mode=KnowledgeRetrievalOutputMode.ANSWER_SYNTHESIS
)

index_client.create_or_update_knowledge_base(knowledge_base)
print(f"Knowledge base '{knowledge_base_name}' created or updated successfully.")

## Step 4: Query Multiple Sources

Let's ask two questions in one query:
- "What is the responsibility of the Zava CEO?" (HR-related)
- "What health plan would you recommend if they wanted the best coverage for mental health services?" (Health-related)

When you run this query, the knowledge base uses agentic retrieval:
1. Analyzes the query to understand you're asking about two different topics
2. Decomposes the query into focused subqueries (one for each sub-topic in the query)
3. Determines which knowledge sources are relevant for each subquery
4. Runs searches concurrently against the selected sources
5. Uses semantic ranker to rerank and filter results
6. Synthesizes a coherent answer from both sources

By default, the knowledge base intelligently selects which sources to query. It might only query the sources it deems relevant, which you'll verify in the next step.

In [None]:
from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient
from azure.search.documents.knowledgebases.models import KnowledgeBaseMessage, KnowledgeBaseMessageTextContent, KnowledgeBaseRetrievalRequest, SearchIndexKnowledgeSourceParams
from IPython.display import display, Markdown

knowledge_base_client = KnowledgeBaseRetrievalClient(endpoint=endpoint, knowledge_base_name=knowledge_base_name, credential=credential)

healthdocs_ks_params = SearchIndexKnowledgeSourceParams(
    knowledge_source_name="healthdocs-knowledge-source",
    include_references=True,
    include_reference_source_data=True
)
hrdocs_ks_params = SearchIndexKnowledgeSourceParams(
    knowledge_source_name="hrdocs-knowledge-source",
    include_references=True,
    include_reference_source_data=True
)
req = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(role="user", content=[KnowledgeBaseMessageTextContent(text="""
            What is the responsibility of the Zava CEO?
            What health plan would you recommend if they wanted the best coverage for mental health services?
        """)])
    ],
    knowledge_source_params=[
        healthdocs_ks_params,
        hrdocs_ks_params
    ],
    include_activity=True
)

result = knowledge_base_client.retrieve(retrieval_request=req)
display(Markdown(result.response[0].content[0].text))

## Step 5: Map Citations to Knowledge Sources

This helper function traces each citation in the answer back to its original knowledge source. It shows you whether information came from `hrdocs` or `healthdocs`.

You can use this function to see which sources the knowledge base chose, verify information origins, and check if it's picking the right sources for different question types.

Run the cell below to see which knowledge source each citation came from.

In [None]:
import re


def find_source_of_reference(reference_id):
    activity_id = None
    for reference in result.references:
        if reference.id == reference_id:
            activity_id = reference.activity_source
            break
    for activity in result.activity:
        if activity.id == activity_id:
            return activity.knowledge_source_name
    return None

def cite_sources(text):
    references = re.findall(r'\[ref_id:(\d+)\]', text)
    references.sort(key=int)
    sources = {}
    for ref_id in references:
        source_info = find_source_of_reference(ref_id)
        if source_info:
            sources[ref_id] = source_info
    
    return sources

sources = cite_sources(result.response[0].content[0].text)
print("Cited sources:", sources)


## Step 6: Force Querying All Sources

By setting `always_query_source=True` on both knowledge source parameters, you can force the knowledge base to query both sources regardless of whether it thinks they're relevant.

The code below runs a single simpler question. This time, both sources are queried because `always_query_source=True` overrides the intelligent selection. You'll see citations from both sources even when one wasn't strictly necessary, more comprehensive answers pulling from both sources, and different source mappings in the output.

This is useful when you want comprehensive coverage across all your data, but it comes at the cost of higher latency and token usage.

Try it with different questions and see how the results vary!

In [None]:
healthdocs_ks_params.always_query_source = True
hrdocs_ks_params.always_query_source = True

req = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(role="user", content=[KnowledgeBaseMessageTextContent(text="What health benefits are there?")])
    ],
    knowledge_source_params=[
        healthdocs_ks_params,
        hrdocs_ks_params
    ],
    include_activity=True
)

result = knowledge_base_client.retrieve(retrieval_request=req)
display(Markdown(result.response[0].content[0].text))

sources_cited = cite_sources(result.response[0].content[0].text)
print("Cited sources:", sources)

sources_queried = [activity.knowledge_source_name for activity in result.activity if activity.type == "searchIndex"]
print("Queried sources:", sources_queried)

## Step 7: Guide Source Selection with Retrieval Instructions

**Retrieval instructions** let you guide the knowledge base's source selection using natural language instead of forcing it to query everything.

You can set `retrieval_instructions` to provide guidance like "Use healthdocs for health questions, hrdocs for HR questions." The knowledge base still makes the final decision, but now with your guidance. It can skip irrelevant sources for better efficiency and uses the context you provide to understand intent better than with `always_query_source=True`.

The first cell below updates the knowledge base with retrieval instructions. 

The second cell queries with `always_query_source=False` to see how the instructions guide source selection.

In [None]:
knowledge_base.retrieval_instructions="If the question is about health benefits or insurance specifically, use healthdocs. Otherwise, use hrdocs for other HR-related questions."

index_client.create_or_update_knowledge_base(knowledge_base)
print(f"Knowledge base '{knowledge_base_name}' created or updated successfully.")

In [None]:
healthdocs_ks_params.always_query_source = False
hrdocs_ks_params.always_query_source = False

req = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(role="user", content=[KnowledgeBaseMessageTextContent(text="What health benefits are there?")])
    ],
    knowledge_source_params=[
        healthdocs_ks_params,
        hrdocs_ks_params
    ],
    include_activity=True
)


result = knowledge_base_client.retrieve(retrieval_request=req)
display(Markdown(result.response[0].content[0].text))

sources_cited = cite_sources(result.response[0].content[0].text)
print("Cited sources:", sources)

sources_queried = [activity.knowledge_source_name for activity in result.activity if activity.type == "searchIndex"]
print("Queried sources:", sources_queried)

## Step 8: Format Answers with Answer Instructions

**Answer instructions** control how the knowledge base formats its response. They don't change which sources are queried or what information is retrieved - just how it's presented.

The code below uses `answer_instructions` to instruct the knowledge base to format answers as bulleted lists. You'll want to customize this setting whenever you need application-specific formatting.

The first cell below updates the knowledge base with the new `answer_instructions`. The second cell runs the same query to show the formatted results.

In [None]:
knowledge_base.answer_instructions="Always use a bulleted list format when providing answers. Each bullet should be on a separate line"

index_client.create_or_update_knowledge_base(knowledge_base)
print(f"Knowledge base '{knowledge_base_name}' created or updated successfully.")


In [None]:
result = knowledge_base_client.retrieve(retrieval_request=req)
display(Markdown(result.response[0].content[0].text))

In [None]:
import json

references = json.dumps([ref.as_dict() for ref in result.references], indent=2)
print(references)

## Summary

You've now worked with multiple knowledge sources in a single knowledge base and learned how to control which ones get queried.

**Key concepts to remember:**
- A single knowledge base can reference multiple knowledge sources
- The knowledge base intelligently selects which sources to query based on the question
- `always_query_source=True` forces querying all sources
- `retrieval_instructions` guide source selection with natural language
- `answer_instructions` control response formatting without changing content

### What's Next?

➡️ Continue to [Part 3: SharePoint Knowledge Source](part3-sharepoint-knowledge-source.ipynb) to learn how to connect your knowledge base directly to SharePoint document libraries.