# Getting Started: MCP Tool Knowledge Source (GitHub)

This notebook demonstrates how to create a knowledge source using Model Context Protocol (MCP) tools to query external services. In this example, we'll use GitHub's MCP server to search issues and pull requests.

## What You'll Learn

- Understand MCP Tool knowledge sources
- Create a knowledge source pointing to GitHub's MCP server
- Configure authentication headers for MCP tools
- Query GitHub issues and PRs through the knowledge base
- Use MCP tools at runtime for dynamic external data

## Prerequisites

- Azure subscription
- Existing Azure AI Foundry project (see foundry-knowledge-blob-storage.ipynb for deployment)
- Existing Azure AI Search service
- GitHub Personal Access Token (for accessing GitHub API)
- Python environment with requests library

## Architecture Overview

```
GitHub MCP Server → [Runtime Query] → Knowledge Base → Retrieval API
        ↓
   search_issues tool
        ↓
  GitHub Issues API
```

**Note:** MCP Tool sources query external services at runtime. No data is ingested or stored in Azure AI Search.

## What is MCP (Model Context Protocol)?

MCP is a standardized protocol for connecting AI models to external tools and data sources. MCP Tool knowledge sources allow you to:

- Query external APIs during retrieval
- Access real-time data (GitHub issues, web search, etc.)
- Extend knowledge bases with dynamic content
- No data ingestion or storage required

**Status:** Private Preview (as of API version 2025-11-01-preview)

## Step 1: Get GitHub Personal Access Token

To query GitHub's API, you need a Personal Access Token (PAT).

In [None]:
# How to get a GitHub Personal Access Token:
# 1. Go to https://github.com/settings/tokens
# 2. Click "Generate new token" -> "Generate new token (classic)"
# 3. Give it a descriptive name (e.g., "Azure AI Knowledge Base")
# 4. Select scopes:
#    - repo (for private repositories)
#    - public_repo (for public repositories only)
# 5. Click "Generate token"
# 6. Copy the token (you won't see it again!)

print("GitHub Personal Access Token Setup:")
print("1. Visit: https://github.com/settings/tokens")
print("2. Generate a new classic token")
print("3. Select scopes: 'repo' or 'public_repo'")
print("4. Copy the token and paste it below")
print("\nToken format: ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

In [None]:
# GitHub configuration
GITHUB_TOKEN = "ghp_your_github_personal_access_token_here"  # Replace with your token

# GitHub MCP server endpoint
GITHUB_MCP_SERVER_URL = "https://api.github.com/mcp"  # Hypothetical GitHub MCP endpoint
GITHUB_MCP_TOOL_NAME = "search_issues"  # MCP tool name

print(f"GitHub MCP configured for tool: {GITHUB_MCP_TOOL_NAME}")

## Step 2: Configure Azure Resources

In [None]:
import requests
import json

# Existing resources (from foundry-knowledge-blob-storage.ipynb or your own)
EXISTING_SEARCH_ENDPOINT = "https://<your-search-service>.search.windows.net"
EXISTING_SEARCH_API_KEY = "<your-search-api-key>"
EXISTING_FOUNDRY_ENDPOINT = "https://<your-foundry-project>.services.ai.azure.com/api/projects/<project-name>"
EXISTING_AZURE_OPENAI_KEY = "<your-api-key>"
EXISTING_CHAT_DEPLOYMENT = "gpt-4o-mini"

# API version
API_VERSION = "2025-11-01-preview"

print("Azure resources configured")

## Step 3: Create MCP Tool Knowledge Source

Create a knowledge source that points to GitHub's MCP server.

**Note:** MCP Tool knowledge sources are created but not ingested. They're used at runtime during queries.

In [None]:
KNOWLEDGE_SOURCE_NAME = "github-issues-mcp"

url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeSources/{KNOWLEDGE_SOURCE_NAME}?api-version={API_VERSION}"

headers = {
    "api-key": EXISTING_SEARCH_API_KEY,
    "Content-Type": "application/json"
}

body = {
    "name": KNOWLEDGE_SOURCE_NAME,
    "kind": "mcpTool",
    "description": "Search GitHub issues and pull requests using MCP",
    "mcpToolParameters": {
        "serverURL": GITHUB_MCP_SERVER_URL,
        "toolName": GITHUB_MCP_TOOL_NAME
    }
}

response = requests.put(url, headers=headers, json=body)
print(f"Status: {response.status_code}")
print(json.dumps(response.json(), indent=2))

if response.status_code in [200, 201]:
    print("\n✅ MCP Tool knowledge source created")
else:
    print(f"\n❌ Failed to create knowledge source: {response.text}")

## Step 4: Create Knowledge Base

Create a knowledge base that includes the MCP tool source.

In [None]:
KNOWLEDGE_BASE_NAME = "github-kb"

url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}?api-version={API_VERSION}"

body = {
    "name": KNOWLEDGE_BASE_NAME,
    "description": "Knowledge base with GitHub MCP tool integration",
    "knowledgeSources": [
        {
            "name": KNOWLEDGE_SOURCE_NAME
        }
    ],
    "models": [
        {
            "kind": "azureOpenAI",
            "azureOpenAIParameters": {
                "resourceUri": EXISTING_FOUNDRY_ENDPOINT,
                "deploymentId": EXISTING_CHAT_DEPLOYMENT,
                "modelName": EXISTING_CHAT_DEPLOYMENT,
                "apiKey": EXISTING_AZURE_OPENAI_KEY
            }
        }
    ],
    "outputMode": "answerSynthesis",
    "retrievalInstructions": "Search GitHub for relevant issues and pull requests.",
    "answerInstructions": "Provide helpful summaries of GitHub issues with links."
}

response = requests.put(url, headers=headers, json=body)
print(f"Status: {response.status_code}")
print(json.dumps(response.json(), indent=2))

if response.status_code in [200, 201]:
    print("\n✅ Knowledge base created")
else:
    print(f"\n❌ Failed to create knowledge base: {response.text}")

## Step 5: Query GitHub Issues via MCP Tool

Query the knowledge base with MCP tool authentication headers.

In [None]:
# Query for GitHub issues
url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}/retrieve?api-version={API_VERSION}"

query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Find recent issues in the microsoft/TypeScript repository related to type inference"
                }
            ]
        }
    ],
    "includeActivity": True,
    "retrievalReasoningEffort": {
        "kind": "medium"
    },
    "maxRuntimeInSeconds": 45,
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": KNOWLEDGE_SOURCE_NAME,
            "kind": "mcpTool",
            "headers": {
                "Authorization": f"Bearer {GITHUB_TOKEN}"  # GitHub auth
            },
            "includeReferences": True,
            "includeReferenceSourceData": True,
            "alwaysQuerySource": True,
            "rerankerThreshold": 0.3
        }
    ]
}

response = requests.post(url, headers=headers, json=query_body)

if response.status_code == 200:
    result = response.json()
    print("✅ Query successful!\n")
    print("Answer:")
    print(result["choices"][0]["message"]["content"])
    
    print("\nReferences:")
    for ref in result.get("activity", {}).get("references", []):
        print(f"- {ref.get('title', 'Unknown')}: {ref.get('url', 'No URL')}")
else:
    print(f"❌ Query failed: {response.text}")

## Step 6: Query with Different Repositories

Search issues across different repositories.

In [None]:
# Query for Azure SDK issues
query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What are the open issues in Azure/azure-sdk-for-python related to authentication?"
                }
            ]
        }
    ],
    "includeActivity": True,
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": KNOWLEDGE_SOURCE_NAME,
            "kind": "mcpTool",
            "headers": {
                "Authorization": f"Bearer {GITHUB_TOKEN}"
            },
            "includeReferences": True,
            "includeReferenceSourceData": True,
            "alwaysQuerySource": True
        }
    ]
}

response = requests.post(url, headers=headers, json=query_body)

if response.status_code == 200:
    result = response.json()
    print("Answer:")
    print(result["choices"][0]["message"]["content"])
else:
    print(f"Query failed: {response.text}")

## Step 7: Query Pull Requests

Search for pull requests instead of issues.

In [None]:
# Query for recent PRs
query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Find recently merged pull requests in the React repository that improved performance"
                }
            ]
        }
    ],
    "includeActivity": True,
    "retrievalReasoningEffort": {
        "kind": "high"  # More thorough search
    },
    "knowledgeSourceParams": [
        {
            "knowledgeSourceName": KNOWLEDGE_SOURCE_NAME,
            "kind": "mcpTool",
            "headers": {
                "Authorization": f"Bearer {GITHUB_TOKEN}"
            },
            "includeReferences": True,
            "includeReferenceSourceData": True,
            "alwaysQuerySource": True
        }
    ]
}

response = requests.post(url, headers=headers, json=query_body)

if response.status_code == 200:
    result = response.json()
    print("Answer:")
    print(result["choices"][0]["message"]["content"])
    
    # Show activity details
    if "activity" in result:
        print(f"\nActivity:")
        for activity in result["activity"]:
            if activity.get("type") == "mcpTool":
                print(f"- MCP Tool called: {activity.get('toolName', 'unknown')}")
                print(f"  Elapsed: {activity.get('elapsedMs', 0)}ms")
else:
    print(f"Query failed: {response.text}")

## Alternative MCP Authentication Methods

MCP tools support different authentication patterns:

In [None]:
# Example 1: API Key authentication (like Exa)
mcp_params_api_key = {
    "knowledgeSourceName": "exa-search-mcp",
    "kind": "mcpTool",
    "headers": {
        "x-api-key": "your-exa-api-key"
    },
    "includeReferences": True
}

# Example 2: Bearer token authentication (like GitHub)
mcp_params_bearer = {
    "knowledgeSourceName": "github-issues-mcp",
    "kind": "mcpTool",
    "headers": {
        "Authorization": f"Bearer {GITHUB_TOKEN}"
    },
    "includeReferences": True
}

# Example 3: Custom headers
mcp_params_custom = {
    "knowledgeSourceName": "custom-mcp",
    "kind": "mcpTool",
    "headers": {
        "x-custom-header": "custom-value",
        "x-tenant-id": "tenant-123"
    },
    "includeReferences": True
}

print("MCP authentication patterns:")
print("1. API Key: x-api-key header")
print("2. Bearer Token: Authorization header")
print("3. Custom: Any custom headers needed")

## MCP Tool Knowledge Source Use Cases

MCP tools are ideal for:

### 1. Real-Time External Data
- GitHub issues and PRs
- Jira tickets
- ServiceNow incidents
- Live API data

### 2. Web Search Integration
- Exa AI search
- Bing search API
- Custom search engines

### 3. Dynamic Knowledge Sources
- Stock prices
- Weather data
- News feeds
- Social media

### 4. Custom Business Systems
- CRM data
- ERP systems
- Internal APIs
- Legacy databases

## Combining MCP Tools with Other Sources

You can combine MCP tools with ingested sources for hybrid knowledge bases.

In [None]:
# Example: Hybrid knowledge base with blob storage + GitHub MCP
hybrid_kb_body = {
    "name": "hybrid-docs-github-kb",
    "description": "Combine internal docs with live GitHub issues",
    "knowledgeSources": [
        {
            "name": "internal-docs-blob"  # Ingested blob storage
        },
        {
            "name": KNOWLEDGE_SOURCE_NAME  # MCP tool (GitHub)
        }
    ],
    "models": [
        {
            "kind": "azureOpenAI",
            "azureOpenAIParameters": {
                "resourceUri": EXISTING_FOUNDRY_ENDPOINT,
                "deploymentId": EXISTING_CHAT_DEPLOYMENT,
                "modelName": EXISTING_CHAT_DEPLOYMENT,
                "apiKey": EXISTING_AZURE_OPENAI_KEY
            }
        }
    ],
    "outputMode": "answerSynthesis",
    "retrievalInstructions": "Search internal docs first, then check GitHub for related issues.",
    "answerInstructions": "Provide answers from docs and cite relevant GitHub issues."
}

print("Hybrid Knowledge Base Example:")
print("- Source 1: Internal documentation (blob storage)")
print("- Source 2: GitHub issues (MCP tool)")
print("\nBenefit: Combine static docs with live external data")

## Cleanup

Clean up test resources.

In [None]:
# Delete knowledge base
url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}?api-version={API_VERSION}"
response = requests.delete(url, headers=headers)
print(f"Delete knowledge base: {response.status_code}")

In [None]:
# Delete knowledge source
url = f"{EXISTING_SEARCH_ENDPOINT}/knowledgeSources/{KNOWLEDGE_SOURCE_NAME}?api-version={API_VERSION}"
response = requests.delete(url, headers=headers)
print(f"Delete knowledge source: {response.status_code}")

## Summary

In this notebook, you learned how to:

1. Create a GitHub Personal Access Token for API access
2. Create an MCP Tool knowledge source pointing to GitHub
3. Configure authentication headers for MCP tools
4. Query GitHub issues and pull requests via the knowledge base
5. Use different authentication patterns (API key, Bearer token, custom headers)
6. Combine MCP tools with other knowledge sources

## Key Differences: MCP Tool vs. Other Sources

| Feature | MCP Tool | Ingested Sources |
|---------|----------|------------------|
| **Data Location** | External API | Azure AI Search |
| **Query Time** | Runtime (slower) | Pre-indexed (faster) |
| **Data Freshness** | Real-time | Scheduled updates |
| **Storage Cost** | None | Yes |
| **Setup** | Minimal | Ingestion required |
| **Best For** | Live data, external APIs | Static content |

## MCP Tool Knowledge Source Parameters

When querying with MCP tools, you can configure:

```python
{
  "knowledgeSourceName": "your-mcp-source",
  "kind": "mcpTool",
  "headers": {  # Authentication headers (required for most APIs)
    "Authorization": "Bearer token",  # or
    "x-api-key": "api-key",  # or
    "custom-header": "value"
  },
  "includeReferences": true,
  "includeReferenceSourceData": true,
  "alwaysQuerySource": true,  # Always call the MCP tool
  "rerankerThreshold": 0.3  # Reranking threshold
}
```

## Important Notes

**Private Preview Status:**
- MCP Tool knowledge sources are in private preview
- API may change before general availability
- Contact Azure support for access

**Authentication:**
- Most MCP tools require authentication
- Use appropriate headers for each service
- Secure your tokens/API keys

**Performance:**
- MCP tool queries are slower than ingested sources
- External API rate limits apply
- Use `maxRuntimeInSeconds` to control timeout

## Next Steps

- Explore other MCP tools (Exa AI, custom APIs)
- Build hybrid knowledge bases (MCP + ingested)
- Implement caching for frequently accessed MCP data
- Create custom MCP servers for your business logic
- Set up monitoring for MCP tool performance