# Part 3: SharePoint Knowledge Source

In Parts 1 and 2, you worked with search indexes. In Part 3, you'll connect directly to SharePoint documents using `RemoteSharePointKnowledgeSource`. This lets you query live SharePoint content without indexing it first.

## Step 1: Load Environment Variables

Run below cell to load the configuration for your Azure resources, choose the **.venv(3.11.9)** environment that is created for you.

> **⚠️ Troubleshooting**
>
> If code cells get stuck and keep spinning, select **Restart** from the notebook toolbar at the top. If the issue persists after a couple of tries, close VS Code completely and reopen it.

In [48]:
import os

from azure.core.credentials import AzureKeyCredential
from dotenv import load_dotenv

load_dotenv(override=True) # take environment variables from .env.

# Azure AI Search configuration
endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
credential = AzureKeyCredential(os.environ["AZURE_SEARCH_ADMIN_KEY"])

# Knowledge base name
knowledge_base_name = "sharepoint-knowledge-base"

# Azure OpenAI configuration
azure_openai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
azure_openai_key = os.environ["AZURE_OPENAI_KEY"]
azure_openai_chatgpt_deployment = os.getenv("AZURE_OPENAI_CHATGPT_DEPLOYMENT", "gpt-4.1")
azure_openai_chatgpt_model_name = os.getenv("AZURE_OPENAI_CHATGPT_MODEL_NAME", "gpt-4.1")

print("Environment variables loaded")

Environment variables loaded


## Step 2: Create Remote SharePoint Knowledge Source

A **RemoteSharePointKnowledgeSource** connects directly to SharePoint documents without requiring you to index them first. This is different from the `SearchIndexKnowledgeSource` you used in Parts 1 and 2.

The code below creates a SharePoint knowledge source with `RemoteSharePointKnowledgeSourceParameters()`. The service handles authentication and document access automatically.

In [49]:
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import RemoteSharePointKnowledgeSource, RemoteSharePointKnowledgeSourceParameters

index_client = SearchIndexClient(endpoint=endpoint, credential=credential)

ks = RemoteSharePointKnowledgeSource(
    name="sharepoint-knowledge-source",
    description="Knowledge source for SharePoint Documents",
    remote_share_point_parameters=RemoteSharePointKnowledgeSourceParameters()
)
index_client.create_or_update_knowledge_source(knowledge_source=ks)
print(f"Knowledge source '{ks.name}' created or updated successfully.")

Knowledge source 'sharepoint-knowledge-source' created or updated successfully.


## Step 3: Create SharePoint Knowledge Base

In this step, you'll create a knowledge base that references the SharePoint knowledge source. The setup is identical to Parts 1 and 2, you configure the Azure OpenAI model parameters, add a reference to your knowledge source, and set `output_mode=ANSWER_SYNTHESIS` to enable AI-powered answer generation.

The only difference is the knowledge source type, instead of pointing to a search index, this knowledge base queries SharePoint directly.

In [50]:
from azure.search.documents.indexes.models import AzureOpenAIVectorizerParameters, KnowledgeBase, KnowledgeBaseAzureOpenAIModel, KnowledgeRetrievalOutputMode, KnowledgeSourceReference

aoai_params = AzureOpenAIVectorizerParameters(
    resource_url=azure_openai_endpoint,
    deployment_name=azure_openai_chatgpt_deployment,
    model_name=azure_openai_chatgpt_model_name,
    api_key=azure_openai_key
)

knowledge_base = KnowledgeBase(
    name=knowledge_base_name,
    models=[KnowledgeBaseAzureOpenAIModel(azure_open_ai_parameters=aoai_params)],
    knowledge_sources=[
        KnowledgeSourceReference(name=ks.name)
    ],
    output_mode=KnowledgeRetrievalOutputMode.ANSWER_SYNTHESIS
)

index_client.create_or_update_knowledge_base(knowledge_base)
print(f"Knowledge base '{knowledge_base_name}' created or updated successfully.")

Knowledge base 'sharepoint-knowledge-base' created or updated successfully.


## Step 4: Query SharePoint Documents

Now you can query your SharePoint documents using the knowledge base you created.

There's one key difference when querying SharePoint: you need to provide an identity token (`x_ms_query_source_authorization`) so the knowledge base can access SharePoint on behalf of the token owner. The cell below gets this token using your Azure credentials. In a production application, that token would come from the user logged into your app.

The second cell runs a question which can only be answered from Sharepoint documents shared with your test account. If you'd like to see those documents, navigate to the [Lab Sharepoint](https://lodsprodmca.sharepoint.com/sites/lab511) and opening the Documents folder.

When you run the second cell, the knowledge base analyzes your question, decomposes it into focused subqueries, searches SharePoint, uses semantic ranking to filter results, and synthesizes a grounded answer with citations.

In [51]:
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

identity_token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://search.azure.com/.default")
token = identity_token_provider()
print("Authentication completed")


Authentication completed


In [52]:
# Check the current knowledge source configuration
ks_retrieved = index_client.get_knowledge_source("sharepoint-knowledge-source")
print("Current SharePoint Knowledge Source Configuration:")
print(f"Name: {ks_retrieved.name}")
print(f"Type: {type(ks_retrieved).__name__}")
if hasattr(ks_retrieved, 'remote_share_point_parameters') and ks_retrieved.remote_share_point_parameters:
    params = ks_retrieved.remote_share_point_parameters
    print(f"SharePoint Parameters: {params.as_dict() if hasattr(params, 'as_dict') else params}")
else:
    print("SharePoint Parameters: Not configured")
    
print("\n" + "="*60)
print("IMPORTANT: RemoteSharePointKnowledgeSourceParameters needs configuration!")
print("="*60)

Current SharePoint Knowledge Source Configuration:
Name: sharepoint-knowledge-source
Type: RemoteSharePointKnowledgeSource
SharePoint Parameters: {'resource_metadata': []}

IMPORTANT: RemoteSharePointKnowledgeSourceParameters needs configuration!


### Fix: Configure Knowledge Source with Filter Expression

The issue was that `RemoteSharePointKnowledgeSourceParameters()` was empty. It needs a **`filter_expression`** parameter that uses SharePoint KQL (Keyword Query Language) to specify which SharePoint site to search.

The `Path:` filter tells Azure AI Search which SharePoint site URL to query.

In [53]:
# Recreate the knowledge source with proper filter expression
from azure.search.documents.indexes.models import RemoteSharePointKnowledgeSource, RemoteSharePointKnowledgeSourceParameters

# Configure SharePoint parameters with filter expression
# The filter expression uses SharePoint KQL to scope the search
# Path:https://... specifies which SharePoint site to search
sp_params = RemoteSharePointKnowledgeSourceParameters(
    filter_expression='Path:"https://mngenvmcap338326.sharepoint.com/sites/lab511-demo"',
    resource_metadata=["Author", "FileName", "LastModifiedTime"]  # Optional: metadata to retrieve
)

# Recreate the knowledge source
ks_fixed = RemoteSharePointKnowledgeSource(
    name="sharepoint-knowledge-source",
    description="Knowledge source for SharePoint Documents - lab511-demo site",
    remote_share_point_parameters=sp_params
)

index_client.create_or_update_knowledge_source(knowledge_source=ks_fixed)
print(f"Knowledge source '{ks_fixed.name}' recreated successfully!")
print(f"Configured to search: https://mngenvmcap338326.sharepoint.com/sites/lab511-demo")
print("Filter expression:", sp_params.filter_expression)

Knowledge source 'sharepoint-knowledge-source' recreated successfully!
Configured to search: https://mngenvmcap338326.sharepoint.com/sites/lab511-demo
Filter expression: Path:"https://mngenvmcap338326.sharepoint.com/sites/lab511-demo"


### Test: Query After Fixing Configuration

Now let's test if the query works with the properly configured knowledge source.

In [54]:
# Test query after fixing the configuration
test_req_fixed = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(role="user", content=[KnowledgeBaseMessageTextContent(text="zava")])
    ],
    knowledge_source_params=[
        RemoteSharePointKnowledgeSourceParams(
            knowledge_source_name="sharepoint-knowledge-source",
            include_references=True,
            include_reference_source_data=True
        )
    ],
    include_activity=True
)

print("Testing with simple query: 'zava'")
result_fixed = knowledge_base_client.retrieve(retrieval_request=test_req_fixed, x_ms_query_source_authorization=token)

print("\n=== Response ===")
print(result_fixed.response[0].content[0].text)

print("\n=== References ===")
if result_fixed.references:
    for ref in result_fixed.references:
        print(f"- {ref.source if hasattr(ref, 'source') else ref}")
else:
    print("No references found")

print("\n=== Activity Summary ===")
if hasattr(result_fixed, 'activity') and result_fixed.activity:
    for activity in result_fixed.activity:
        print(f"- {activity.type}")
else:
    print("No activity log available")

Testing with simple query: 'zava'

=== Response ===
No relevant content was found for your query.

=== References ===
No references found

=== Activity Summary ===
- modelQueryPlanning
- remoteSharePoint
- agenticReasoning
- modelAnswerSynthesis

=== Response ===
No relevant content was found for your query.

=== References ===
No references found

=== Activity Summary ===
- modelQueryPlanning
- remoteSharePoint
- agenticReasoning
- modelAnswerSynthesis


### Troubleshooting: Try Different Filter Expressions

The filter might be too specific. Let's try without the Path filter to search all SharePoint content accessible to your account.

In [55]:
# Option 1: Try without any filter (searches all SharePoint content you have access to)
print("=== Testing Option 1: No filter (all accessible SharePoint content) ===")
sp_params_no_filter = RemoteSharePointKnowledgeSourceParameters()

ks_no_filter = RemoteSharePointKnowledgeSource(
    name="sharepoint-knowledge-source",
    description="Knowledge source for all accessible SharePoint Documents",
    remote_share_point_parameters=sp_params_no_filter
)

index_client.create_or_update_knowledge_source(knowledge_source=ks_no_filter)
print("Knowledge source updated to search ALL accessible SharePoint content")

# Test query
test_req_no_filter = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(role="user", content=[KnowledgeBaseMessageTextContent(text="zava")])
    ],
    knowledge_source_params=[
        RemoteSharePointKnowledgeSourceParams(
            knowledge_source_name="sharepoint-knowledge-source",
            include_references=True,
            include_reference_source_data=True
        )
    ],
    include_activity=True
)

result_no_filter = knowledge_base_client.retrieve(retrieval_request=test_req_no_filter, x_ms_query_source_authorization=token)

print("\n=== Response ===")
print(result_no_filter.response[0].content[0].text)

print("\n=== References ===")
if result_no_filter.references:
    for ref in result_no_filter.references:
        print(f"- {ref.source if hasattr(ref, 'source') else ref}")
else:
    print("No references found")

=== Testing Option 1: No filter (all accessible SharePoint content) ===
Knowledge source updated to search ALL accessible SharePoint content
Knowledge source updated to search ALL accessible SharePoint content

=== Response ===
No relevant content was found for your query.

=== References ===
No references found

=== Response ===
No relevant content was found for your query.

=== References ===
No references found


### Option 2: Try with File Type Filter

If the above works, try filtering by file type (.docx) to narrow it down.

In [56]:
# Option 2: Filter by file type only
print("=== Testing Option 2: Filter by .docx files only ===")
sp_params_filetype = RemoteSharePointKnowledgeSourceParameters(
    filter_expression='filetype:docx'
)

ks_filetype = RemoteSharePointKnowledgeSource(
    name="sharepoint-knowledge-source",
    description="Knowledge source for Word documents",
    remote_share_point_parameters=sp_params_filetype
)

index_client.create_or_update_knowledge_source(knowledge_source=ks_filetype)
print("Knowledge source updated to search .docx files")
print("Filter expression:", sp_params_filetype.filter_expression)

# Test query
test_req_filetype = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(role="user", content=[KnowledgeBaseMessageTextContent(text="zava health")])
    ],
    knowledge_source_params=[
        RemoteSharePointKnowledgeSourceParams(
            knowledge_source_name="sharepoint-knowledge-source",
            include_references=True,
            include_reference_source_data=True
        )
    ],
    include_activity=True
)

result_filetype = knowledge_base_client.retrieve(retrieval_request=test_req_filetype, x_ms_query_source_authorization=token)

print("\n=== Response ===")
print(result_filetype.response[0].content[0].text)

print("\n=== References ===")
if result_filetype.references:
    for ref in result_filetype.references:
        print(f"- {ref.source if hasattr(ref, 'source') else ref}")
else:
    print("No references found")

=== Testing Option 2: Filter by .docx files only ===
Knowledge source updated to search .docx files
Filter expression: filetype:docx
Knowledge source updated to search .docx files
Filter expression: filetype:docx

=== Response ===
No relevant content was found for your query.

=== References ===
No references found

=== Response ===
No relevant content was found for your query.

=== References ===
No references found


### Important Note: RemoteSharePoint Relies on Microsoft 365 Search

The `RemoteSharePointKnowledgeSource` uses the **Microsoft 365 Copilot Retrieval API**, which has some limitations:

1. **Only indexed content**: Your document must be in Microsoft 365's search index (which we know it now is)
2. **Account access**: The search only returns content accessible to the authenticated user
3. **Search latency**: Even though SharePoint search finds it, the Copilot Retrieval API might have a different index with additional delay
4. **Limited file types**: Hybrid queries only support specific file types (.doc, .docx, .pptx, .pdf, .aspx, .one)

**Next Steps:**
- If Option 1 (no filter) works, your account has access but the Path filter was too restrictive
- If nothing works, there might be a delay in the Copilot Retrieval API indexing (separate from SharePoint search)
- Alternative: Use the lab's original SharePoint site if accessible

In [40]:
from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient
from azure.search.documents.knowledgebases.models import KnowledgeBaseMessage, KnowledgeBaseMessageTextContent, KnowledgeBaseRetrievalRequest, RemoteSharePointKnowledgeSourceParams
from IPython.display import display, Markdown

knowledge_base_client = KnowledgeBaseRetrievalClient(endpoint=endpoint, knowledge_base_name=knowledge_base_name, credential=credential)

sharepoint_ks_params = RemoteSharePointKnowledgeSourceParams(
    knowledge_source_name="sharepoint-knowledge-source",
    include_references=True,
    include_reference_source_data=True
)
req = KnowledgeBaseRetrievalRequest(
    messages=[
        #KnowledgeBaseMessage(role="user", content=[KnowledgeBaseMessageTextContent(text="What are major announcements and innovations in Zava Smart Wearables & Clothing Industry in 2025?")])
        KnowledgeBaseMessage(role="user", content=[KnowledgeBaseMessageTextContent(text="What documents are available?")])
    ],
    knowledge_source_params=[
        sharepoint_ks_params
    ],
    include_activity=True
)


result = knowledge_base_client.retrieve(retrieval_request=req, x_ms_query_source_authorization=token)
display(Markdown(result.response[0].content[0].text))

No relevant content was found for your query.

In [38]:
# Simple test query to verify the file is accessible
test_req = KnowledgeBaseRetrievalRequest(
    messages=[
        KnowledgeBaseMessage(role="user", content=[KnowledgeBaseMessageTextContent(text="What content is in the Zava demo health document?")])
    ],
    knowledge_source_params=[
        RemoteSharePointKnowledgeSourceParams(
            knowledge_source_name="sharepoint-knowledge-source",
            include_references=True,
            include_reference_source_data=True
        )
    ],
    include_activity=True
)

test_result = knowledge_base_client.retrieve(retrieval_request=test_req, x_ms_query_source_authorization=token)

print("=== Response ===")
print(test_result.response[0].content[0].text)
print("\n=== References Found ===")
if test_result.references:
    for ref in test_result.references:
        print(f"- {ref.source if hasattr(ref, 'source') else 'Source unavailable'}")
else:
    print("No references found - the file may not be accessible or indexed yet")

=== Response ===
No relevant content was found for your query.

=== References Found ===
No references found - the file may not be accessible or indexed yet


### Test: Verify File Access

Let's first test if Azure AI Search can see your uploaded Word document.

## Step 5: Review Response, References, and Activity

The two cells below show the citations and activity log from the SharePoint query.

The references reveal which SharePoint documents were used. Each citation includes the file path and the specific text snippet that contributed to the answer.

The activity log reveals what happened behind the scenes: how the knowledge base accessed SharePoint, what searches it performed, and how it ranked the results.

In [41]:
import json

references = json.dumps([ref.as_dict() for ref in result.references], indent=2)
print(references)

[]


In [26]:
import pandas as pd

activity_types = [{"type": a.type} for a in result.activity]

df = pd.DataFrame(activity_types)

print("Activity Log Steps")
df

Activity Log Steps


Unnamed: 0,type
0,modelQueryPlanning
1,remoteSharePoint
2,remoteSharePoint
3,agenticReasoning
4,modelAnswerSynthesis


In [27]:
activity_content = json.dumps([a.as_dict() for a in result.activity], indent=2)
print("Activity Details")
print(activity_content)

Activity Details
[
  {
    "id": 0,
    "type": "modelQueryPlanning",
    "elapsed_ms": 952,
    "input_tokens": 1465,
    "output_tokens": 82
  },
  {
    "id": 1,
    "type": "remoteSharePoint",
    "error": {
      "message": "Copilot retrieval call failed with status code = Forbidden, Path = /beta/copilot/retrieval. \nError - {\"error\":{\"code\":\"Forbidden\",\"message\":\"Authorization Failed - User does not have valid license\",\"httpCode\":403}}"
    },
    "query_time": "2025-12-07T10:01:16.699Z",
    "count": 0,
    "remote_share_point_arguments": {
      "search": "Zava Smart Wearables & Clothing Industry major announcements 2025"
    }
  },
  {
    "id": 2,
    "type": "remoteSharePoint",
    "error": {
      "message": "Copilot retrieval call failed with status code = Forbidden, Path = /beta/copilot/retrieval. \nError - {\"error\":{\"code\":\"Forbidden\",\"message\":\"Authorization Failed - User does not have valid license\",\"httpCode\":403}}"
    },
    "query_time": "20

## Summary

In this part, you learned how to connect directly to SharePoint documents using `RemoteSharePointKnowledgeSource`. This allows you to query live SharePoint content without needing to index it first.

**Key concepts to remember:**
- `RemoteSharePointKnowledgeSource` queries SharePoint documents in real-time
- SharePoint queries require an identity token (`x_ms_query_source_authorization`)
- The knowledge base handles authentication and document access automatically
- Citations include SharePoint file paths, not index references

### What's Next?

➡️ Continue to [Part 4: Web Knowledge Source](part4-web-knowledge-source.ipynb) to learn how to query public web URLs alongside your internal data sources.