# Google Workflows Integration (process + rawDocument)

Bu notebook, **Google Workflows** içinde **Document AI `:process`** çağrısını **rawDocument (base64)** ile yaparak,
çıktıdan `document.text` okumayı gösterir.


Akış:
1) GCS’den PDF indir → base64 encode  
2) Workflow YAML üret (args.pdf_b64 + args.mime_type)  
3) Deploy/Update  
4) Execute → `document.text` preview


## 0) Kurulum & Ön koşullar

- Workflows API + Document AI API etkin
- Yetkiler:
  - `roles/workflows.admin` (deploy)
  - `roles/workflows.invoker` (run)
  - `roles/documentai.apiUser`
- Auth: `gcloud auth application-default login`


In [None]:
%pip -q install google-cloud-storage google-cloud-workflows google-auth

# Kurulum sonrası kernel restart gerekebilir.


[31mERROR: Could not find a version that satisfies the requirement google-cloud-workflows-executions (from versions: none)[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m26.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
[31mERROR: No matching distribution found for google-cloud-workflows-executions[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


## 1) Konfigürasyon

In [33]:
import json, base64
from google.cloud import storage
from google.cloud import workflows_v1
from google.cloud.workflows import executions_v1
from google.api_core.exceptions import AlreadyExists

project_id = "vertextraining-486212"

workflow_region = "europe-west2"
workflow_name = "docai-process-rawdocument"

docai_region = "eu"      # processor region ile aynı olmalı
processor_id = "dda63aa0d93c03aa" # örn: f0bd8dcffc752533
mime_type = "application/pdf"

gcs_uri = "gs://my-vertex-training-bucket/ornek_fatura.pdf"

gemini_region = "europe-west2"
gemini_model = "gemini-2.5-flash"

print("Config loaded.")


Config loaded.


## 2) GCS'den PDF indir ve base64'e çevir

In [29]:
def download_gcs_bytes(gcs_uri: str) -> bytes:
    assert gcs_uri.startswith("gs://")
    _, rest = gcs_uri.split("gs://", 1)
    bucket_name, blob_name = rest.split("/", 1)

    client = storage.Client(project=project_id)
    bucket = client.bucket(bucket_name)
    blob = bucket.blob(blob_name)
    return blob.download_as_bytes()

pdf_bytes = download_gcs_bytes(gcs_uri)
pdf_b64 = base64.b64encode(pdf_bytes).decode("utf-8")

print("✅ Downloaded bytes:", len(pdf_bytes))
print("✅ Base64 length:", len(pdf_b64))
print("Base64 preview:", pdf_b64[:80] + "...")


✅ Downloaded bytes: 57019
✅ Base64 length: 76028
Base64 preview: JVBERi0xLjcNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFuZyhl...


## 3) Workflow YAML (Document AI :process + rawDocument)

Endpoint:
`https://{region}-documentai.googleapis.com/v1/projects/{project}/locations/{region}/processors/{processor}:process`


In [None]:
workflow_yaml = f"""main:
  params: [args]
  steps:
    - init:
        assign:
          - project: "{project_id}"
          - docai_region: "{docai_region}"
          - processor: "{processor_id}"
          - pdf_b64: ${{args.pdf_b64}}
          - mime_type: ${{default(args.mime_type, "{mime_type}")}}
          - gemini_region: "{gemini_region}"
          - gemini_model: "{gemini_model}"

    - docai_process:
        call: http.post
        args:
          url: ${{"https://" + docai_region + "-documentai.googleapis.com/v1/projects/" + project + "/locations/" + docai_region + "/processors/" + processor + ":process"}}
          auth:
            type: OAuth2
          headers:
            Content-Type: application/json
          body:
            rawDocument:
              content: ${{pdf_b64}}
              mimeType: ${{mime_type}}
        result: docai_resp

    - extract_text:
        assign:
          - doc_text: ${{default(docai_resp.body.document.text, "")}}
          - doc_text_short: ${{text.substring(doc_text, 0, 1200)}}

    - gemini_summarize:
        call: http.post
        args:
          url: ${"https://" + gemini_region + "-aiplatform.googleapis.com/v1/projects/" + project_id + "/locations/" + gemini_region + "/publishers/google/models/" + gemini_model + ":generateContent"}
          auth:
            type: OAuth2
          headers:
            Content-Type: application/json
          body:
            contents:
              - role: user
                parts:
                  - text: ${'Aşağıdaki metni 5 maddede özetle. Teknik terimleri koru, uydurma bilgi ekleme.' + '\n\n' + doc_text_short}
            generationConfig:
              maxOutputTokens: 512
              temperature: 0
        result: gemini_resp

    - done:
        return:
          text_preview: ${{text.substring(doc_text, 0, 800)}}
          summary: ${{gemini_resp.body.candidates[0].content.parts[0].text}}
"""

print(workflow_yaml[:1200])


main:
  params: [args]
  steps:
    - init:
        assign:
          - project: "vertextraining-486212"
          - docai_region: "eu"
          - processor: "dda63aa0d93c03aa"
          - pdf_b64: ${args.pdf_b64}
          - mime_type: ${default(args.mime_type, "application/pdf")}
          - gemini_region: "europe-west2"
          - gemini_model: "gemini-2.5-flash"

    - docai_process:
        call: http.post
        args:
          url: ${"https://" + docai_region + "-documentai.googleapis.com/v1/projects/" + project + "/locations/" + docai_region + "/processors/" + processor + ":process"}
          auth:
            type: OAuth2
          headers:
            Content-Type: application/json
          body:
            rawDocument:
              content: ${pdf_b64}
              mimeType: ${mime_type}
        result: docai_resp

    - extract_text:
        assign:
          - doc_text: ${default(docai_resp.body.document.text, "")}
          - doc_text_short: ${text.substring(doc_te

## 4) Deploy / Update Workflow

In [None]:
wf_client = workflows_v1.WorkflowsClient()

parent = f"projects/{project_id}/locations/{workflow_region}"
wf_path = wf_client.workflow_path(project_id, workflow_region, workflow_name)

workflow = workflows_v1.Workflow(
    name=wf_path,
    description="Document AI process (rawDocument base64) -> return document.text preview",
    source_contents=workflow_yaml,
)

try:
    op = wf_client.create_workflow(parent=parent, workflow=workflow, workflow_id=workflow_name)
    created = op.result()
    print("✅ Created:", created.name)
except AlreadyExists:
    op = wf_client.update_workflow(workflow=workflow, update_mask={"paths": ["source_contents", "description"]})
    updated = op.result()
    print("✅ Updated:", updated.name)


InvalidArgument: 400 main.yaml: parse error: Unterminated expression: ${"Aşağıdaki metni 5 maddede özetle. Teknik terimleri koru, uydurma bilgi ekleme.. Ensure the closing brace is present and wrap with single quotes (e.g. '${...}').
 3: main.yaml: parse error: Unterminated expression: ${"Aşağıdaki metni 5 maddede özetle. Teknik terimleri koru, uydurma bilgi ekleme.. Ensure the closing brace is present and wrap with single quotes (e.g. '${...}').


## 5) Execute (Workflow'u çalıştır)

In [30]:
exec_client = executions_v1.ExecutionsClient()
wf_full = f"projects/{project_id}/locations/{workflow_region}/workflows/{workflow_name}"

input_args = {
    "pdf_b64": pdf_b64,
    "mime_type": mime_type
}

execution = executions_v1.Execution(argument=json.dumps(input_args))
op = exec_client.create_execution(parent=wf_full, execution=execution)
print("✅ Execution started:", op.name)

import time
while True:
    ex = exec_client.get_execution(name=op.name)
    state = ex.state.name
    if state in ["SUCCEEDED", "FAILED", "CANCELLED"]:
        print("State:", state)
        print("Result:", ex.result[:2000] if ex.result else None)
        print("Error:", ex.error)
        break
    time.sleep(2)


✅ Execution started: projects/540407658224/locations/europe-west2/workflows/docai-process-rawdocument/executions/b3659335-fe70-45e2-ba16-3ef73dfefce8
State: SUCCEEDED
Result: {"text_preview":"ÖRNEK FATURA\nDocument Al form key/val çıkarımı için etiketli örnek pdf.\nSatıcı\nAlıcı\nByteFlow Bilişim LTD\nVergi No: 123456789\nAdres: Ataşehir, İstanbul, TR\nE-posta: billing@byteflow.tr\nACME Corp\nVergi No: 987654321\nAdres: Levent, İstanbul, TR\nE-posta: finance@acme.corp\nFatura Bilgileri\nFatura No\nINV-2026-0001\nFatura Tarihi\n02.02.2026\nVade Tarihi\n16.02.2026\nSipariş No\nPO-1111\nPara Birimi\nTRY\nKalemler\nItem#\nAdet\nBirim Fiyat\nTutar\n1\nAçıklama\nRAG Danışmanlık\nVertex Eğitim\n1\n1.000,00\n1.000,00\n2\n1\n2.000,00\n2.000,00\n3\nDestek\n2\n250,00\n500,00\nToplam\nAra Toplam\n3.500,00\nKDV Oranı\n%20\nKDV Tutarı\n700,00\nGenel Toplam\n4.200,00\nNot: Ödeme açıklaması: INV-2026-0001. IBAN: TR00 0000 0000 0000 0000 0000 00\n"}
Error: 
