# Azure ML v2 — Batch Endpoint (ModelBatchDeployment)

Ten notebook tworzy **"lepszą" prostszą wersję** batch endpointu, tj. **ModelBatchDeployment** (bez parallel/`amlbi_main.py`, bez wymogu `azureml-core`).

Co robi:
1) Łączy się z Azure ML workspace (`MLClient`)
2) Tworzy **3 AmlCompute clustery** (PL/EN/DE) z `min_instances=0` → **brak kosztów gdy brak jobów**
3) Tworzy **1 Batch Endpoint**
4) Tworzy **3 ModelBatchDeployment** (PL/EN/DE), każdy przypięty do innego clustra
5) (Opcjonalnie) Uruchamia 3 batch joby równolegle (po jednym na język)

## Dlaczego to jest "lepsze" w Twoim case
- Nie uruchamia się driver `driver/amlbi_main.py`
- Nie potrzebujesz paczek `azureml-core` (SDK v1) w env
- Izolujesz zasoby per język przez osobne clustery, nadal z `min=0`


## 0) Instalacja (jeśli potrzebujesz)
Jeśli odpalasz lokalnie, odkomentuj.

In [None]:
# %pip install -U azure-ai-ml azure-identity


## 1) Konfiguracja
Uzupełnij dane workspace + nazwy model/env. W prod zalecane jest podawanie konkretnych wersji modeli i env.

In [None]:
SUBSCRIPTION_ID = "<SUBSCRIPTION_ID>"
RESOURCE_GROUP  = "<RESOURCE_GROUP>"
WORKSPACE_NAME  = "<WORKSPACE_NAME>"

# Batch endpoint
BATCH_ENDPOINT_NAME = "doc-classifier-batch"

# Environment (zarejestrowany w AML)
ENV_NAME    = "model-x-env"
ENV_VERSION = "5"

# 3 modele (zarejestrowane w AML)
MODELS = {
    "pl": {"name": "model-x-pl", "version": "1"},
    "en": {"name": "model-x-en", "version": "1"},
    "de": {"name": "model-x-de", "version": "1"},
}

# Code + scoring
CODE_DIR = "./src"
SCORING_SCRIPT = "score.py"  # batch: musi zawierać init() + run(mini_batch)

# Compute
COMPUTE_SIZE = "Standard_DS3_v2"  # dopasuj do potrzeb
MIN_NODES = 0  # 0 -> brak kosztów gdy brak jobów
MAX_NODES_PER_LANG = {"pl": 4, "en": 4, "de": 4}

# Jak szybko wygaszać VM po zakończeniu jobów (sekundy)
IDLE_TIME_BEFORE_SCALE_DOWN = 120

# Równoległość per VM
MAX_CONCURRENCY_PER_INSTANCE = 2


## 2) MLClient

In [None]:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id=SUBSCRIPTION_ID,
    resource_group_name=RESOURCE_GROUP,
    workspace_name=WORKSPACE_NAME,
)

print("Workspace:", ml_client.workspaces.get(WORKSPACE_NAME).name)


## 3) Utwórz 3 AmlCompute clustery (min=0)
Osobny cluster per język = DE nie zabierze node'ów EN. Brak kosztu idle dzięki `min_instances=0`.

In [None]:
from azure.ai.ml.entities import AmlCompute

compute_names = {lang: f"cpu-batch-{lang}" for lang in MODELS.keys()}

for lang, cname in compute_names.items():
    compute = AmlCompute(
        name=cname,
        size=COMPUTE_SIZE,
        min_instances=MIN_NODES,
        max_instances=MAX_NODES_PER_LANG[lang],
        idle_time_before_scale_down=IDLE_TIME_BEFORE_SCALE_DOWN,
    )
    print(f"Creating/updating compute: {cname} (min={MIN_NODES}, max={MAX_NODES_PER_LANG[lang]})")
    ml_client.compute.begin_create_or_update(compute).result()

print("Compute ready:", compute_names)


## 4) Pobierz Environment

In [None]:
env = ml_client.environments.get(name=ENV_NAME, version=ENV_VERSION)
print("Env:", env.name, env.version)


## 5) Utwórz 1 Batch Endpoint

In [None]:
from azure.ai.ml.entities import BatchEndpoint

endpoint = BatchEndpoint(
    name=BATCH_ENDPOINT_NAME,
    description="ModelBatchDeployment batch endpoint for PL/EN/DE with separate compute clusters (min=0).",
)

ml_client.batch_endpoints.begin_create_or_update(endpoint).result()
print("Batch endpoint ready:", BATCH_ENDPOINT_NAME)


## 6) Utwórz 3 ModelBatchDeployment (to jest klucz)
Ta ścieżka NIE używa parallel drivera (`amlbi_main.py`).

Każdy deployment:
- ma inny model (PL/EN/DE)
- używa tego samego `score.py`
- wskazuje inny compute cluster


In [None]:
from azure.ai.ml.entities import ModelBatchDeployment

deployment_names = {lang: f"deploy-{lang}" for lang in MODELS.keys()}

for lang, spec in MODELS.items():
    model = ml_client.models.get(name=spec["name"], version=spec["version"])

    dep = ModelBatchDeployment(
        name=deployment_names[lang],
        endpoint_name=BATCH_ENDPOINT_NAME,
        model=model,
        environment=env,
        code_configuration={"code": CODE_DIR, "scoring_script": SCORING_SCRIPT},
        compute=compute_names[lang],
        instance_count=1,
        max_concurrency_per_instance=MAX_CONCURRENCY_PER_INSTANCE,
    )

    print(f"Creating/updating deployment {dep.name}: model={model.name}:{model.version}, compute={compute_names[lang]}")
    ml_client.batch_deployments.begin_create_or_update(dep).result()

print("Deployments ready:", deployment_names)


## 7) (Opcjonalnie) Default deployment
Jeśli czasem nie podajesz `deployment_name` przy jobie, ustaw default. Przy routing po języku zwykle i tak podajesz deployment_name.

In [None]:
endpoint = ml_client.batch_endpoints.get(BATCH_ENDPOINT_NAME)
endpoint.default_deployment_name = deployment_names["en"]
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()
print("Default deployment:", endpoint.default_deployment_name)


## 8) (Opcjonalnie) Uruchom 3 joby równolegle (PL+EN+DE)
To odpali 3 niezależne batch joby. Dzięki osobnym clustrom nie będą walczyć o te same node'y.

Ustaw ścieżki input/output w datastore.


In [None]:
from azure.ai.ml.entities import BatchJob, Input
import time

INPUTS = {
    "pl": "azureml://datastores/workspaceblobstore/paths/in/pl/",
    "en": "azureml://datastores/workspaceblobstore/paths/in/en/",
    "de": "azureml://datastores/workspaceblobstore/paths/in/de/",
}

OUTPUTS = {
    "pl": "azureml://datastores/workspaceblobstore/paths/out/pl/",
    "en": "azureml://datastores/workspaceblobstore/paths/out/en/",
    "de": "azureml://datastores/workspaceblobstore/paths/out/de/",
}

submitted = {}
ts = int(time.time())

for lang in MODELS.keys():
    job = BatchJob(
        name=f"run-{lang}-{ts}",
        endpoint_name=BATCH_ENDPOINT_NAME,
        deployment_name=deployment_names[lang],
        inputs={"input_data": Input(type="uri_folder", path=INPUTS[lang])},
        outputs={"output_data": Input(type="uri_folder", path=OUTPUTS[lang])},
    )

    created = ml_client.batch_jobs.begin_create_or_update(job).result()
    submitted[lang] = created.name
    print("Submitted:", lang, created.name)

print("All jobs submitted:", submitted)


## 9) Status jobów
Po zakończeniu, node'y wygasną automatycznie po `idle_time_before_scale_down` (przy `min=0`).

In [None]:
def show_job(job_name: str):
    j = ml_client.batch_jobs.get(job_name)
    print(f"{job_name}: status={j.status}")

# for lang, job_name in submitted.items():
#     show_job(job_name)
