# Getting Started: Azure Blob Storage Knowledge Base

This notebook demonstrates a complete end-to-end workflow for creating a Knowledge Base using Azure Blob Storage as the data source.

## What You'll Learn

- Deploy Azure resources (Storage Account, AI Foundry Project, AI Search)
- Upload documents to Azure Blob Storage
- Create a Knowledge Source from Blob Storage
- Build a Knowledge Base
- Query the Knowledge Base for intelligent retrieval
- Use existing resources (alternative approach)

## Prerequisites

- Azure subscription
- Azure CLI installed and logged in (`az login`)
- Python environment with requests library

## Architecture Overview

```
Azure Blob Storage (Documents) → Knowledge Source → Knowledge Base → Retrieval API
                                        ↓
                    AI Foundry (Embedding + Chat Models)
```

## Step 1: Deploy Azure Resources

We'll deploy:
1. **Azure Blob Storage Account** (westus2) - to store documents
2. **Azure AI Foundry Project** (westus2) - for embedding and chat models
3. **Azure AI Search** (westcentralus) - for knowledge base operations

In [None]:
# Configuration - Update these values
SUBSCRIPTION_ID = "<your-subscription-id>"
RESOURCE_GROUP = "rg-knowledge-demo"
LOCATION_FOUNDRY = "westus2"  # For Foundry and Storage
LOCATION_SEARCH = "westcentralus"  # For AI Search

# Resource names
STORAGE_ACCOUNT = "stkbdemo" + "<unique-suffix>"  # Must be globally unique, lowercase, no hyphens
STORAGE_CONTAINER = "documents"
FOUNDRY_PROJECT = "foundry-kb-demo"
SEARCH_SERVICE = "srch-kb-demo" + "<unique-suffix>"  # Must be globally unique

# Model deployments
EMBEDDING_DEPLOYMENT = "text-embedding-3-small"
CHAT_DEPLOYMENT = "gpt-4o-mini"

### 1.1 Create Resource Group

In [None]:
!az group create \
  --name {RESOURCE_GROUP} \
  --location {LOCATION_FOUNDRY}

### 1.2 Deploy Azure Blob Storage Account

In [None]:
# Create storage account
!az storage account create \
  --name {STORAGE_ACCOUNT} \
  --resource-group {RESOURCE_GROUP} \
  --location {LOCATION_FOUNDRY} \
  --sku Standard_LRS \
  --kind StorageV2 \
  --allow-blob-public-access false

In [None]:
# Get storage account connection string
import subprocess
import json

result = subprocess.run(
    ["az", "storage", "account", "show-connection-string",
     "--name", STORAGE_ACCOUNT,
     "--resource-group", RESOURCE_GROUP,
     "--output", "json"],
    capture_output=True,
    text=True
)
BLOB_CONNECTION_STRING = json.loads(result.stdout)["connectionString"]
print(f"Storage connection string obtained: {BLOB_CONNECTION_STRING[:50]}...")

In [None]:
# Create blob container
!az storage container create \
  --name {STORAGE_CONTAINER} \
  --account-name {STORAGE_ACCOUNT} \
  --connection-string "{BLOB_CONNECTION_STRING}"

### 1.3 Deploy Azure AI Foundry Project with Models

In [None]:
# Create AI Foundry hub
HUB_NAME = f"hub-{FOUNDRY_PROJECT}"

!az ml workspace create \
  --kind hub \
  --resource-group {RESOURCE_GROUP} \
  --name {HUB_NAME} \
  --location {LOCATION_FOUNDRY}

In [None]:
# Create AI Foundry project
!az ml workspace create \
  --kind project \
  --resource-group {RESOURCE_GROUP} \
  --name {FOUNDRY_PROJECT} \
  --location {LOCATION_FOUNDRY} \
  --hub-id /subscriptions/{SUBSCRIPTION_ID}/resourceGroups/{RESOURCE_GROUP}/providers/Microsoft.MachineLearningServices/workspaces/{HUB_NAME}

In [None]:
# Get Foundry project endpoint
result = subprocess.run(
    ["az", "ml", "workspace", "show",
     "--name", FOUNDRY_PROJECT,
     "--resource-group", RESOURCE_GROUP,
     "--output", "json"],
    capture_output=True,
    text=True
)
workspace_info = json.loads(result.stdout)
FOUNDRY_ENDPOINT = workspace_info["discovery_url"].replace("/discovery", "") + f"/api/projects/{FOUNDRY_PROJECT}"
print(f"Foundry endpoint: {FOUNDRY_ENDPOINT}")

In [None]:
# Deploy embedding model (text-embedding-3-small)
# Note: Adjust model version and SKU as needed
!az ml online-deployment create \
  --resource-group {RESOURCE_GROUP} \
  --workspace-name {FOUNDRY_PROJECT} \
  --name {EMBEDDING_DEPLOYMENT} \
  --model azureml://registries/azure-openai/models/text-embedding-3-small/versions/1 \
  --instance-type Standard_DS3_v2 \
  --instance-count 1

In [None]:
# Deploy chat model (gpt-4o-mini)
!az ml online-deployment create \
  --resource-group {RESOURCE_GROUP} \
  --workspace-name {FOUNDRY_PROJECT} \
  --name {CHAT_DEPLOYMENT} \
  --model azureml://registries/azure-openai/models/gpt-4o-mini/versions/1 \
  --instance-type Standard_DS3_v2 \
  --instance-count 1

In [None]:
# Get API key for the models
result = subprocess.run(
    ["az", "ml", "workspace", "list-keys",
     "--name", FOUNDRY_PROJECT,
     "--resource-group", RESOURCE_GROUP,
     "--output", "json"],
    capture_output=True,
    text=True
)
keys = json.loads(result.stdout)
AZURE_OPENAI_KEY = keys["primaryKey"]
print(f"API Key obtained: {AZURE_OPENAI_KEY[:20]}...")

### 1.4 Deploy Azure AI Search Service

In [None]:
# Create Azure AI Search service
!az search service create \
  --name {SEARCH_SERVICE} \
  --resource-group {RESOURCE_GROUP} \
  --location {LOCATION_SEARCH} \
  --sku basic

In [None]:
# Get search service endpoint and key
result = subprocess.run(
    ["az", "search", "service", "show",
     "--name", SEARCH_SERVICE,
     "--resource-group", RESOURCE_GROUP,
     "--output", "json"],
    capture_output=True,
    text=True
)
search_info = json.loads(result.stdout)
SEARCH_ENDPOINT = f"https://{search_info['name']}.search.windows.net"

result = subprocess.run(
    ["az", "search", "admin-key", "show",
     "--service-name", SEARCH_SERVICE,
     "--resource-group", RESOURCE_GROUP,
     "--output", "json"],
    capture_output=True,
    text=True
)
SEARCH_API_KEY = json.loads(result.stdout)["primaryKey"]

print(f"Search endpoint: {SEARCH_ENDPOINT}")
print(f"Search API key: {SEARCH_API_KEY[:20]}...")

## Step 2: Upload Sample Documents to Blob Storage

Let's create and upload some sample documents to our blob container.

In [None]:
# Create sample documents
import os

os.makedirs("sample_docs", exist_ok=True)

# Product documentation
with open("sample_docs/product_guide.txt", "w") as f:
    f.write("""
Product Guide - CloudMax Platform

CloudMax is a comprehensive cloud platform that provides scalable infrastructure,
managed services, and advanced analytics capabilities.

Key Features:
- Auto-scaling compute resources
- Managed database services (SQL, NoSQL)
- Built-in AI/ML capabilities
- Enterprise-grade security
- 99.99% uptime SLA

Getting Started:
1. Create a CloudMax account
2. Set up your first project
3. Deploy your application
4. Monitor and scale as needed
""")

# FAQ document
with open("sample_docs/faq.txt", "w") as f:
    f.write("""
Frequently Asked Questions

Q: How do I reset my password?
A: Click on 'Forgot Password' on the login page and follow the instructions sent to your email.

Q: What payment methods are accepted?
A: We accept all major credit cards, PayPal, and wire transfers for enterprise customers.

Q: Is there a free trial available?
A: Yes, we offer a 30-day free trial with $200 in credits.

Q: How do I contact support?
A: You can reach our support team at support@cloudmax.com or via the chat widget in your dashboard.
""")

# Pricing document
with open("sample_docs/pricing.txt", "w") as f:
    f.write("""
CloudMax Pricing

Starter Plan: $29/month
- 2 vCPUs
- 4GB RAM
- 50GB Storage
- Email support

Professional Plan: $99/month
- 8 vCPUs
- 16GB RAM
- 500GB Storage
- 24/7 chat support

Enterprise Plan: Custom pricing
- Unlimited resources
- Dedicated support team
- SLA guarantees
- Custom integrations
""")

print("Sample documents created successfully!")

In [None]:
# Upload documents to blob storage
!az storage blob upload-batch \
  --destination {STORAGE_CONTAINER} \
  --source ./sample_docs \
  --account-name {STORAGE_ACCOUNT} \
  --connection-string "{BLOB_CONNECTION_STRING}"

print("Documents uploaded to blob storage!")

## Step 3: Create Knowledge Source from Blob Storage

Now we'll create a knowledge source that ingests and chunks the documents from blob storage.

In [None]:
import requests

API_VERSION = "2025-11-01-preview"
KNOWLEDGE_SOURCE_NAME = "blob-docs-source"

# Create knowledge source
url = f"{SEARCH_ENDPOINT}/knowledgeSources/{KNOWLEDGE_SOURCE_NAME}?api-version={API_VERSION}"

headers = {
    "api-key": SEARCH_API_KEY,
    "Content-Type": "application/json"
}

body = {
    "name": KNOWLEDGE_SOURCE_NAME,
    "kind": "azureBlob",
    "description": "Product documentation from blob storage",
    "azureBlobParameters": {
        "connectionString": BLOB_CONNECTION_STRING,
        "containerName": STORAGE_CONTAINER,
        "folderPath": "",
        "isADLSGen2": False,
        "ingestionParameters": {
            "identity": None,
            "embeddingModel": {
                "kind": "azureOpenAI",
                "azureOpenAIParameters": {
                    "resourceUri": FOUNDRY_ENDPOINT,
                    "deploymentId": EMBEDDING_DEPLOYMENT,
                    "modelName": EMBEDDING_DEPLOYMENT,
                    "apiKey": AZURE_OPENAI_KEY
                }
            },
            "chatCompletionModel": {
                "kind": "azureOpenAI",
                "azureOpenAIParameters": {
                    "resourceUri": FOUNDRY_ENDPOINT,
                    "deploymentId": CHAT_DEPLOYMENT,
                    "modelName": CHAT_DEPLOYMENT,
                    "apiKey": AZURE_OPENAI_KEY
                }
            },
            "disableImageVerbalization": False,
            "contentExtractionMode": "minimal"
        }
    }
}

response = requests.put(url, headers=headers, json=body)
print(f"Status: {response.status_code}")
print(response.json())

In [None]:
# Check knowledge source status
import time

status_url = f"{SEARCH_ENDPOINT}/knowledgeSources/{KNOWLEDGE_SOURCE_NAME}/status?api-version={API_VERSION}"

print("Waiting for ingestion to complete...")
while True:
    response = requests.get(status_url, headers=headers)
    status = response.json()
    
    if status.get("status") == "succeeded":
        print("Ingestion completed successfully!")
        print(json.dumps(status, indent=2))
        break
    elif status.get("status") == "failed":
        print("Ingestion failed!")
        print(json.dumps(status, indent=2))
        break
    else:
        print(f"Status: {status.get('status', 'unknown')}")
        time.sleep(10)

## Step 4: Create Knowledge Base

Combine the knowledge source into a knowledge base for querying.

In [None]:
KNOWLEDGE_BASE_NAME = "product-kb"

url = f"{SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}?api-version={API_VERSION}"

body = {
    "name": KNOWLEDGE_BASE_NAME,
    "description": "Product documentation knowledge base",
    "knowledgeSources": [
        {
            "name": KNOWLEDGE_SOURCE_NAME
        }
    ],
    "models": [
        {
            "kind": "azureOpenAI",
            "azureOpenAIParameters": {
                "resourceUri": FOUNDRY_ENDPOINT,
                "deploymentId": CHAT_DEPLOYMENT,
                "modelName": CHAT_DEPLOYMENT,
                "apiKey": AZURE_OPENAI_KEY
            }
        }
    ],
    "outputMode": "answerSynthesis",
    "retrievalInstructions": "Provide accurate information from the product documentation.",
    "answerInstructions": "Provide clear, concise answers with relevant citations."
}

response = requests.put(url, headers=headers, json=body)
print(f"Status: {response.status_code}")
print(json.dumps(response.json(), indent=2))

## Step 5: Query the Knowledge Base

Test the knowledge base with sample queries.

In [None]:
# Simple query
url = f"{SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}/retrieve?api-version={API_VERSION}"

query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What are the key features of CloudMax?"
                }
            ]
        }
    ],
    "includeActivity": True
}

response = requests.post(url, headers=headers, json=query_body)
result = response.json()

print("Answer:")
print(result["choices"][0]["message"]["content"])
print("\nReferences:")
for ref in result.get("activity", {}).get("references", []):
    print(f"- {ref.get('source', 'Unknown')}")

In [None]:
# Another query about pricing
query_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's included in the Professional plan?"
                }
            ]
        }
    ],
    "includeActivity": True
}

response = requests.post(url, headers=headers, json=query_body)
result = response.json()

print("Answer:")
print(result["choices"][0]["message"]["content"])

## Alternative: Using Existing Resources

If you already have Azure resources deployed, you can skip the deployment steps and configure the variables below.

In [None]:
# Use this cell if you have existing resources
# Comment out the deployment sections above and use these variables instead

# Existing Storage Account
EXISTING_STORAGE_ACCOUNT = "<your-storage-account-name>"
EXISTING_BLOB_CONNECTION_STRING = "<your-connection-string>"
EXISTING_CONTAINER = "<your-container-name>"

# Existing AI Foundry Project
EXISTING_FOUNDRY_ENDPOINT = "https://<your-foundry-project>.services.ai.azure.com/api/projects/<project-name>"
EXISTING_AZURE_OPENAI_KEY = "<your-api-key>"
EXISTING_EMBEDDING_DEPLOYMENT = "text-embedding-3-small"
EXISTING_CHAT_DEPLOYMENT = "gpt-4o-mini"

# Existing AI Search Service
EXISTING_SEARCH_ENDPOINT = "https://<your-search-service>.search.windows.net"
EXISTING_SEARCH_API_KEY = "<your-search-api-key>"

# Assign to main variables
BLOB_CONNECTION_STRING = EXISTING_BLOB_CONNECTION_STRING
STORAGE_CONTAINER = EXISTING_CONTAINER
FOUNDRY_ENDPOINT = EXISTING_FOUNDRY_ENDPOINT
AZURE_OPENAI_KEY = EXISTING_AZURE_OPENAI_KEY
EMBEDDING_DEPLOYMENT = EXISTING_EMBEDDING_DEPLOYMENT
CHAT_DEPLOYMENT = EXISTING_CHAT_DEPLOYMENT
SEARCH_ENDPOINT = EXISTING_SEARCH_ENDPOINT
SEARCH_API_KEY = EXISTING_SEARCH_API_KEY

print("Using existing resources!")

## Cleanup

When you're done, clean up the resources to avoid incurring costs.

In [None]:
# Delete knowledge base
url = f"{SEARCH_ENDPOINT}/knowledgeBases/{KNOWLEDGE_BASE_NAME}?api-version={API_VERSION}"
response = requests.delete(url, headers=headers)
print(f"Delete knowledge base: {response.status_code}")

In [None]:
# Delete knowledge source
url = f"{SEARCH_ENDPOINT}/knowledgeSources/{KNOWLEDGE_SOURCE_NAME}?api-version={API_VERSION}"
response = requests.delete(url, headers=headers)
print(f"Delete knowledge source: {response.status_code}")

In [None]:
# Optional: Delete the entire resource group (if you created new resources)
# WARNING: This will delete ALL resources in the resource group
# !az group delete --name {RESOURCE_GROUP} --yes --no-wait

## Summary

In this notebook, you learned how to:

1. Deploy Azure infrastructure (Storage, AI Foundry, AI Search)
2. Upload documents to Azure Blob Storage
3. Create a knowledge source from blob storage with automatic chunking and embedding
4. Build a knowledge base that combines knowledge sources
5. Query the knowledge base for intelligent retrieval with citations
6. Use existing Azure resources as an alternative

## Next Steps

- Explore other knowledge source types (SharePoint, OneLake, Search Index)
- Customize chunking and embedding parameters
- Add multiple knowledge sources to a single knowledge base
- Integrate with Azure AI Foundry agents for conversational experiences