# 5. DMS Upload Demo

This notebook demonstrates how to upload a document into the local DMS mock using our `src/dms` service layer.

> **📖 For background**: We attach to the services started via `compose.yml` (PostgreSQL and Azurite) and perform a simple end‑to‑end upload and verification.

## What you will do
- **Initialize environment**: Resolve project root, import libraries, and create clients
- **Ensure schema**: Apply minimal tables if missing
- **Upload**: Push a sample PDF to blob storage via the DMS service
- **Verify**: Read back metadata and download the file

**Estimated time:** 2–3 minutes


## 1. Prerequisites

- Docker Desktop running
- `uv` installed with project dependencies (`uv sync`)
- Services started with `docker compose up -d` from the project root


## 2. Environment configuration

### 2.1 Imports and project root detection

We use the same root detection approach as Notebook 01 to ensure imports from `src/*` work inside this notebook.


In [9]:
# Standard imports
from pathlib import Path
import os
import sys
import psycopg2
from azure.storage.blob import BlobServiceClient

# Project root detection (same as Notebook 01)
current_directory = Path.cwd()
if current_directory.name == "5-dms-upload" and current_directory.parent.name == "notebooks":
    project_root_directory = current_directory.parent.parent
elif (current_directory / "compose.yml").exists():
    project_root_directory = current_directory
else:
    project_root_directory = None

if not project_root_directory or not (project_root_directory / "compose.yml").exists():
    raise RuntimeError("Cannot find project root (compose.yml not found). Run notebook from repo or notebooks folder.")

if str(project_root_directory) not in sys.path:
    sys.path.insert(0, str(project_root_directory))

# Adapters and service (ensure latest code is loaded in this kernel)
import importlib
import src.dms.adapters as dms_adapters
importlib.reload(dms_adapters)
from src.dms.adapters import AzureBlobStorageClient, PostgresMetadataRepository
from src.dms.service import DmsService

print(f"Project root: {project_root_directory}")


Project root: /Users/markuskuehnle/Documents/projects/credit-ocr-system


### 2.2 Service clients

We attach to the running Postgres and Azurite services from `compose.yml` and construct the DMS service with adapters.


In [10]:
# Config for compose-based services
POSTGRES_HOST: str = "localhost"
POSTGRES_PORT: int = 5432
POSTGRES_DBNAME: str = "dms_meta"
POSTGRES_USER: str = "dms"
POSTGRES_PASSWORD: str = "dms"

AZURITE_BLOB_PORT: int = 10000
AZURITE_ACCOUNT_NAME: str = "devstoreaccount1"
AZURITE_ACCOUNT_KEY: str = (
    "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/"
    "K1SZFPTOtr/KBHBeksoGMGw=="
)
CONTAINER_NAME: str = "documents"

# Optionally reuse an existing connection string if set
existing_conn_str: str | None = os.environ.get("AZURE_STORAGE_CONNECTION_STRING")
if existing_conn_str:
    connection_string: str = existing_conn_str
else:
    connection_string = (
        "DefaultEndpointsProtocol=http;"
        f"AccountName={AZURITE_ACCOUNT_NAME};"
        f"AccountKey={AZURITE_ACCOUNT_KEY};"
        f"BlobEndpoint=http://localhost:{AZURITE_BLOB_PORT}/devstoreaccount1;"
    )

# Initialize clients
blob_service_client: BlobServiceClient = BlobServiceClient.from_connection_string(connection_string)
container_client = blob_service_client.get_container_client(CONTAINER_NAME)
try:
    container_client.create_container()
except Exception:
    pass

pg_conn = psycopg2.connect(
    host=POSTGRES_HOST,
    port=POSTGRES_PORT,
    database=POSTGRES_DBNAME,
    user=POSTGRES_USER,
    password=POSTGRES_PASSWORD,
)

storage_client = AzureBlobStorageClient(blob_service_client)
metadata_repo = PostgresMetadataRepository(pg_conn)

dms_service = DmsService(storage_client=storage_client, metadata_repository=metadata_repo)

print("Environment ready: connected to Postgres and Azurite")


Environment ready: connected to Postgres and Azurite


## 3. Database schema

We'll apply the minimal schema so the `documents` table exists. This is idempotent and safe to re-run.


In [11]:
schema_path = project_root_directory / "database" / "schemas" / "schema.sql"
if not schema_path.exists():
    raise RuntimeError(f"Schema file not found at {schema_path}")

with pg_conn.cursor() as cur:
    with open(schema_path, "r", encoding="utf-8") as f:
        cur.execute(f.read())

print("Schema ensured (documents, ocr_results)")


Schema ensured (documents, ocr_results)


## 4. Upload workflow

We'll upload the sample file, then fetch metadata and download the blob to confirm it is stored correctly.


In [12]:
from datetime import datetime

# Resolve sample file
sample_pdf_path = project_root_directory / "data" / "loan_application.pdf"
assert sample_pdf_path.exists(), "Sample PDF not found"

DOCUMENT_TYPE: str = "loan-application"

# Upload
document_id: str = dms_service.upload_document(
    file_path=sample_pdf_path,
    document_type=DOCUMENT_TYPE,
    source_filename=sample_pdf_path.name,
    linked_entity="CREDIT_APPLICATION",
    linked_entity_id="CA-" + datetime.now().strftime("%Y%m%d%H%M%S"),
)

print("Uploaded:", document_id)

# Metadata
metadata = dms_service.get_document(document_id)
print("Metadata keys:", sorted(list(metadata.keys())) if metadata else None)

# Download bytes
downloaded = dms_service.download_document(document_id)
print("Downloaded bytes:", len(downloaded) if downloaded else None)


Uploaded: 7a385202-712d-461b-abbf-cfeb4de4b809
Metadata keys: ['blob_path', 'document_type', 'file_size', 'hash_sha256', 'id', 'linked_entity', 'linked_entity_id', 'mime_type', 'source_filename', 'textextraction_status', 'uploaded_at']
Downloaded bytes: 147568


## 5. Results summary

A compact summary of what we uploaded and what is stored.


In [13]:
summary = {
    "document_id": document_id,
    "blob_path": metadata.get("blob_path") if metadata else None,
    "downloaded_bytes": len(downloaded) if downloaded else 0,
}
summary

{'document_id': '7a385202-712d-461b-abbf-cfeb4de4b809',
 'blob_path': 'raw/loan-application/7a385202-712d-461b-abbf-cfeb4de4b809.pdf',
 'downloaded_bytes': 147568}

## 6. Cleanup

Close database connections. The uploaded blob remains for inspection unless removed manually.

In [14]:
try:
    pg_conn.close()
    print("Closed PostgreSQL connection")
except Exception:
    pass

Closed PostgreSQL connection


## 7. Summary

You have successfully uploaded a document to the local DMS mock.

**What we did:**
- Connected to PostgreSQL (`dms_meta`) and Azurite (Blob Storage)
- Ensured the minimal database schema is present
- Uploaded `loan_application.pdf` through `DmsService`
- Verified metadata and downloaded the stored blob

**You're ready for:**
- Creating additional document uploads
- Linking uploads to downstream OCR and LLM pipelines
- Extending the schema for richer document metadata and processing statuses