# Getting Started: SharePoint Remote Knowledge Source

This notebook demonstrates how to query SharePoint Online documents at runtime without ingesting them. This is ideal for scenarios where you want real-time access to SharePoint content without data duplication.

## What You'll Learn

- Configure SharePoint Online authentication
- Create a knowledge base with existing resources
- Query SharePoint documents at runtime (remote)
- Apply filters and search scopes

## Prerequisites

- Azure subscription
- SharePoint Online site with documents
- Azure CLI installed and logged in (`az login`)
- Existing Azure AI Foundry project (see notebook 01 for deployment)
- Existing Azure AI Search service (see notebook 01 for deployment)

## Architecture Overview

```
SharePoint Online → [Runtime Query] → Knowledge Base → Retrieval API
                         ↓
          Azure AI Search (reranking)
```

**Note:** Remote SharePoint sources do NOT ingest or index documents. They query SharePoint in real-time during retrieval.

## Step 1: Configure SharePoint Authentication

To query SharePoint, you need an authorization token.

In [None]:
# Configuration - Update these values
SUBSCRIPTION_ID = "<your-subscription-id>"

# Existing resources (from notebook 01 or your own)
EXISTING_SEARCH_ENDPOINT = "https://<your-search-service>.search.windows.net"
EXISTING_SEARCH_API_KEY = "<your-search-api-key>"
EXISTING_FOUNDRY_ENDPOINT = "https://<your-foundry-project>.services.ai.azure.com/api/projects/<project-name>"
EXISTING_AZURE_OPENAI_KEY = "<your-api-key>"
EXISTING_CHAT_DEPLOYMENT = "gpt-4o-mini"

# SharePoint configuration
SHAREPOINT_SITE_URL = "https://yourtenant.sharepoint.com/sites/yoursite"  # Your SharePoint site URL

# API version
API_VERSION = "2025-11-01-preview"

In [None]:
# Get SharePoint authorization token
import subprocess

result = subprocess.run(
    ["az", "account", "get-access-token",
     "--resource", "https://search.azure.com",
     "--query", "accessToken",
     "-o", "tsv"],
    capture_output=True,
    text=True
)

SHAREPOINT_AUTH_TOKEN = result.stdout.strip()
print(f"SharePoint auth token obtained: {SHAREPOINT_AUTH_TOKEN[:30]}...")

## Step 2: Create Knowledge Base

Create a knowledge base that will use SharePoint remote sources at runtime. Note that we don't create a knowledge source upfront for remote SharePoint.

In [None]:
import requests
import json

KNOWLEDGE_BASE_NAME = "sharepoint-remote-kb"

url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}?api-version={API_VERSION}"

headers = {
    "api-key": EXISTING_SEARCH_API_KEY,
    "Content-Type": "application/json"
}

body = {
    "name": KNOWLEDGE_BASE_NAME,
    "description": "Knowledge base with SharePoint remote access",
    "knowledgeSources": [],  # No pre-defined sources for remote
    "models": [
        {
            "kind": "azureOpenAI",
            "azureOpenAIParameters": {
                "resourceUri": EXISTING_FOUNDRY_ENDPOINT,
                "deploymentId": EXISTING_CHAT_DEPLOYMENT,
                "modelName": EXISTING_CHAT_DEPLOYMENT,
                "apiKey": EXISTING_AZURE_OPENAI_KEY
            }
        }
    ],
    "outputMode": "answerSynthesis",
    "retrievalInstructions": "Retrieve relevant information from SharePoint documents.",
    "answerInstructions": "Provide clear answers with citations from SharePoint documents."
}

response = requests.put(url, headers=headers, json=body)
print(f"Status: {response.status_code}")
print(json.dumps(response.json(), indent=2))

## Step 3: Query SharePoint at Runtime

The remote SharePoint source is specified at query time, not during knowledge base creation.

In [None]:
# Simple query with remote SharePoint
url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}/retrieve?api-version={API_VERSION}"

headers = {
    "api-key": EXISTING_SEARCH_API_KEY,
    "Content-Type": "application/json",
    "x-ms-query-source-authorization": SHAREPOINT_AUTH_TOKEN  # Required for SharePoint access
}

query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What are the latest project updates?"
                }
            ]
        }
    ],
    "includeActivity": True,
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": "remote-sharepoint",  # Runtime name
            "kind": "remoteSharePoint",
            "includeReferences": True,
            "includeReferenceSourceData": True,
            "alwaysQuerySource": True,
            "rerankerThreshold": 0.45
        }
    ]
}

response = requests.post(url, headers=headers, json=query_body)
result = response.json()

print("Answer:")
print(result["choices"][0]["message"]["content"])
print("\nReferences:")
for ref in result.get("activity", {}).get("references", []):
    print(f"- {ref.get('title', 'Unknown')}: {ref.get('url', 'No URL')}")

## Step 4: Query with Filters

You can apply KQL (Keyword Query Language) filters to narrow down results.

In [None]:
# Query with content type filter
query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Find meeting notes from the last month"
                }
            ]
        }
    ],
    "includeActivity": True,
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": "remote-sharepoint",
            "kind": "remoteSharePoint",
            "includeReferences": True,
            "includeReferenceSourceData": True,
            "alwaysQuerySource": True,
            "rerankerThreshold": 0.45,
            "filterExpressionAddOn": "ContentType='Document' AND Modified > '2024-01-01'"  # KQL filter
        }
    ]
}

response = requests.post(url, headers=headers, json=query_body)
result = response.json()

print("Answer:")
print(result["choices"][0]["message"]["content"])
print("\nReferences:")
for ref in result.get("activity", {}).get("references", []):
    print(f"- {ref.get('title', 'Unknown')}: Modified {ref.get('lastModifiedDateTime', 'N/A')}")

## Step 5: Advanced Query Parameters

Fine-tune retrieval with advanced parameters.

In [None]:
# Advanced query with reasoning effort
query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is our company policy on remote work?"
                }
            ]
        }
    ],
    "maxRuntimeInSeconds": 90,
    "maxOutputSize": 5000,
    "retrievalReasoningEffort": {
        "kind": "medium"  # Options: low, medium, high
    },
    "includeActivity": True,
    "outputMode": "answerSynthesis",
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": "remote-sharepoint",
            "kind": "remoteSharePoint",
            "includeReferences": True,
            "includeReferenceSourceData": True,
            "alwaysQuerySource": True,
            "rerankerThreshold": 0.5  # Higher threshold = more selective results
        }
    ]
}

response = requests.post(url, headers=headers, json=query_body)
result = response.json()

print("Answer:")
print(result["choices"][0]["message"]["content"])

if "activity" in result:
    print(f"\nActivity:")
    print(f"- Total references: {len(result['activity'].get('references', []))}")
    print(f"- Reasoning steps: {len(result['activity'].get('steps', []))}")

## Common KQL Filter Examples

Here are some useful KQL filters for SharePoint queries:

```python
# Filter by content type
"ContentType='Document'"

# Filter by modified date
"Modified > '2024-01-01'"

# Filter by file extension
"FileExtension='docx' OR FileExtension='pdf'"

# Filter by folder path
"Path:'/sites/yoursite/Shared Documents/Projects'"

# Combine multiple conditions
"ContentType='Document' AND Modified > '2024-01-01' AND FileExtension='pdf'"

# Filter by author
"Author:'John Doe'"

# Filter by title
"Title:'Project Plan'"
```

## Refreshing the Auth Token

The SharePoint auth token expires after about 1 hour. Run this cell to refresh it.

In [None]:
# Refresh SharePoint auth token
result = subprocess.run(
    ["az", "account", "get-access-token",
     "--resource", "https://search.azure.com",
     "--query", "accessToken",
     "-o", "tsv"],
    capture_output=True,
    text=True
)

SHAREPOINT_AUTH_TOKEN = result.stdout.strip()
print("SharePoint auth token refreshed!")

## Cleanup

Clean up the knowledge base when done.

In [None]:
# Delete knowledge base
url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}?api-version={API_VERSION}"
headers = {
    "api-key": EXISTING_SEARCH_API_KEY,
    "Content-Type": "application/json"
}
response = requests.delete(url, headers=headers)
print(f"Delete knowledge base: {response.status_code}")

## Summary

In this notebook, you learned how to:

1. Configure SharePoint Online authentication
2. Create a knowledge base for runtime queries
3. Query SharePoint documents at runtime (without ingestion)
4. Apply KQL filters to narrow search scope
5. Use advanced retrieval parameters
6. Refresh authentication tokens

## Key Differences: Remote vs. Indexed SharePoint

| Feature | Remote SharePoint | Indexed SharePoint |
|---------|-------------------|--------------------|
| **Data Location** | Queries SharePoint directly | Ingests to Azure AI Search |
| **Latency** | Higher (real-time query) | Lower (pre-indexed) |
| **Freshness** | Always current | Depends on ingestion schedule |
| **Cost** | Query-time costs | Storage + indexing costs |
| **Setup** | No ingestion required | Requires App Registration |
| **Best For** | Frequently changing docs | Static or scheduled updates |

## Next Steps

- Explore indexed SharePoint (notebook 03) for pre-ingested scenarios
- Combine remote SharePoint with other knowledge sources
- Implement caching strategies for frequently accessed documents
- Set up monitoring for SharePoint query performance