# Demo of Translation Use Case with Lumigator SDK

### Pre-requisites
1. To use this notebook, you'll need to have Lumigator running. In a terminal, run:
```
git clone git@github.com:mozilla-ai/lumigator.git
cd lumigator
make start-lumigator-build
```

2. Follow the [README.md](https://github.com/mozilla-ai/lumigator/tree/main/notebooks#readme) to setup your Python environment. Then attach this notebook to the corresponding kernel.

### Lumigator Client
Check if client is up and running

In [1]:
from lumigator_sdk.lumigator import LumigatorClient

In [2]:
LUMI_HOST = "localhost:8000"
client = LumigatorClient(api_host=LUMI_HOST)
print(f"Connection is: {client.health.healthcheck().status}")

Connection is: OK


### Dataset
Using `google/wmt24pp` datasets from Huggingface Datasets, released as a part of [WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects](https://arxiv.org/abs/2502.12404v1)

In [3]:
import pandas as pd
from datasets import load_dataset

pd.set_option("display.max_colwidth", None)

In [4]:
source_language = "English"
target_language = "German"
language_pair = "en-de_DE"
dataset = load_dataset("google/wmt24pp", language_pair, split="train")

In [5]:
df_translation = pd.DataFrame(dataset[1:])[["source", "target"]]
print(f"Loaded {len(df_translation)} sentence pairs")
df_translation.head()

Loaded 997 sentence pairs


Unnamed: 0,source,target
0,"Siso's depictions of land, water center new gallery exhibition",Sisos Darstellungen von Land und Wasser in neuer Ausstellung
1,"""People Swimming in the Swimming Pool"" from 2022 is one Vicente Siso artwork that will display at Tierra del Sol Gallery beginning Jan. 13. (photo courtesy of Vicente Siso)","„People Swimming in the Swimming Pool“ aus dem Jahr 2022 ist ein Kunstwerk von Vicente Siso, das ab dem 13. Januar in der Galerie „Tierra del Sol“ ausgestellt wird (Foto bereitgestellt von Vicente Siso)"
2,"Tierra del Sol is pleased to present ""Vicente Siso: Memories of the Land and Water"" at the new gallery location in West Hollywood. Siso has been an artist in the Studio Arts Program since 2012, this marks his debut solo exhibition. Siso was born 1962 in Madrid and raised between Venezuela, Trinidad and Miami; he moved with his family to Southern California in his early 20s.","„Vicente Siso: Memories of the Land and Water“ – das ist der Titel der Ausstellung im neuen Standort der Galerie „Tierra del Sol“ in West Hollywood. Siso ist seit 2012 als Künstler im Studio Arts Progam dabei und zeigt jetzt seine erste Solo-Ausstellung. Siso wurde 1962 in Madrid geboren und ist in Venezuela, Trinidad und Miami aufgewachsen. Anfang zwanzig zog er mit seiner Familie nach Südkalifornien."
3,"Masterfully working across subject matter, Siso has generated a prolific series of landscapes, portraits, and still-life works rendered in either acrylic, pastel, pencil or watercolor. Drawing from family portraits, his own reference photographs, and recollection, his colorful compositions demonstrate his range of interests and skill across media. Siso's tropical landscapes and seascapes reflect the geographies of his past, employing rich patterns and incorporating people to make meaningful connections between culture, memory and the environment. Siso titles his artworks in a mix of Spanish and English, signifying the celebrated and integral complexities of his life in Los Angeles County. ""Vicente Siso: Memories of the Land and Water"" opens on Saturday, Jan. 13, with a reception from 6-8 p.m. The exhibition is on view through Sunday, March 3.","In zahlreichen Serien von Landschaften, Porträts und Stillleben in Acryl, Pastell, Bleistift oder Aquarell zeigt Siso sein meisterhaftes Können. Inspiriert von Familienporträts, seinen eigenen Referenzfotografien und durch Erinnerungen, beweisen seine farbenfreudigen Kompositionen die ganze Bandbreite seiner Interessen und seines Könnens in unterschiedlichen Medien. Seine tropischen Landschaften und Seestücke zeigen die Orte seiner Vergangenheit in detaillierten Mustern. Auch Personen werden einbezogen, um bedeutungsvolle Verbindungen zwischen Kultur, Erinnerung und Natur zu kreieren. Siso betitelt seine Kunstwerke mit einer Mischung aus Spanisch und Englisch – ein Zeichen für die gefeierten und so wichtigen Komplexitäten seines Lebens im Los Angeles County. „Vicente Siso: Memories of the Land and Water“ wird am Samstag, dem 13. Januar, mit einem Empfang von 18 bis 20 Uhr eröffnet. Die Ausstellung wird bis Sonntag, den 3. März, gezeigt."
4,"The Tierra del Sol Gallery is located at 7414 Santa Monica Blvd. For information, visit tierradelsolgallery.org.","Galerie „Tierra del Sol“, 7414 Santa Monica Blvd. Weitere Informationen finden Sie unter tierradelsolgallery.org."


In [6]:
# Rename the columns to examples and ground_truth
df_translation = df_translation.rename(columns={"source": "examples", "target": "ground_truth"})
filename = "translation_eval_dataset.csv"
df_translation = df_translation.head(10)
df_translation.to_csv(filename, index=False)  # Save the file locally with 10 samples

### Upload Dataset

In [7]:
from pathlib import Path

from lumigator_schemas.datasets import DatasetFormat

# Upload that file that we created earlier
with Path.open(Path(filename), "r") as file:
    data = file.read()
dataset_response = client.datasets.create_dataset(dataset=data, format=DatasetFormat.JOB)
dataset_id = dataset_response.id
print(f"Dataset uploaded and has ID: {dataset_id}")

Dataset uploaded and has ID: 6c82b36a-64b2-42ff-9e7f-cf1e6768bc44


### Create Experiment

In [8]:
from lumigator_schemas.experiments import ExperimentCreate

# Task as translation with necessary params
task_definition = {
    "task": "translation",
    "source_language": source_language,
    "target_language": target_language,
}
max_samples = 5

request = ExperimentCreate(
    name=f"WMT24++ Demo Experiment {source_language} to {target_language}",
    description="Experiment for demo",
    dataset=dataset_id,
    task_definition=task_definition,
    max_samples=max_samples,
)

experiment_response = client.experiments.create_experiment(request)
experiment_id = experiment_response.id
print(f"Experiment created and has ID: {experiment_id}")

Experiment created and has ID: 34


### Create and Run Workflows
- One with API-based Open AI GPT-4o-mini
- One with locally hosted Ollama model (Pre-requisite: Execute `ollama run qwen2.5` in a terminal window)
- One with a multilingual model on HuggingFace
- One with a bilingual Opus-MT model on HuggingFace

In [9]:
from lumigator_schemas.workflows import WorkflowCreateRequest

custom_system_prompt = f"""
You are an expert in {source_language} and {target_language}. 
Please provide a high-quality translation of the following text from {source_language} to {target_language}.
Only generate the translated text. No additional text or explanation needed.
"""
batch_size = 1
metrics = ["bleu", "meteor", "comet"]
configurations = [
    # OpenAI GPt-4o-mini no explicit system prompt - uses default prompt set by Lumigator under the hood
    {
        "name": "Translation with gpt-4o-mini",
        "model": "gpt-4o-mini",
        "provider": "openai",
        "secret_key_name": "openai_api_key",
    },
    # Locally hosted Ollama model with custom system prompt
    {
        "name": "Translation with ollama qwen2.5",
        "model": "qwen2.5",
        "provider": "openai",
        "base_url": "http://localhost:11434/v1",
        "system_prompt": custom_system_prompt,
        "secret_key_name": "openai_api_key",
    },
    # HuggingFace multi-lingual model
    {
        "name": "Translation with HF m2m100",
        "model": "facebook/m2m100_418M",
        "provider": "hf",
    },
    # HuggingFace Opus-MT bi-lingual model
    {
        "name": "Translation with Helsinki-NLP/opus-mt",
        "model": "Helsinki-NLP/opus-mt-en-de",
        "provider": "hf",
    },
]

workflow_ids = []
for config in configurations:
    params = {
        "name": config["name"],
        "model": config["model"],
        "experiment_id": experiment_id,
        "batch_size": batch_size,
        "metrics": metrics,
    }
    params.update(config)
    request = WorkflowCreateRequest(**params)
    created_workflow = client.workflows.create_workflow(request)
    print(f"Created workflow {created_workflow.name} with ID {created_workflow.id} for model {created_workflow.model}")
    workflow_ids.append(created_workflow.id)

Created workflow Translation with gpt-4o-mini with ID 6b7fb6995e93451b89d63a6af6b5c1f8 for model gpt-4o-mini
Created workflow Translation with ollama qwen2.5 with ID b08d8b4b45e34742bfab155a6b5a0eb6 for model qwen2.5
Created workflow Translation with HF m2m100 with ID d999309f328c4169b108fcb5c8cc6a5a for model facebook/m2m100_418M
Created workflow Translation with Helsinki-NLP/opus-mt with ID 92ee08d58b9e4a4eb69ae56727101a0d for model Helsinki-NLP/opus-mt-en-de


### View Results
Check http://localhost/experiments for status. 

__Once the experiment status is `SUCCEEDED`, proceed further.__

In [11]:
from time import sleep

from lumigator_schemas.workflows import WorkflowStatus

still_running = True
while still_running:
    experiment_details = client.experiments.get_experiment(experiment_id)
    still_running_workflows = [
        workflow.name
        for workflow in experiment_details.workflows
        if workflow.status not in [WorkflowStatus.SUCCEEDED, WorkflowStatus.FAILED]
    ]
    still_running = bool(still_running_workflows)
    if still_running:
        print("Workflows still running ...")
        sleep(60)

workflow_results = [client.workflows.get_workflow(workflow_id) for workflow_id in workflow_ids]
df_metrics = pd.concat([pd.DataFrame.from_dict({wr.model: wr.metrics}).T for wr in workflow_results])
df_metrics[["meteor_meteor_mean", "bleu_bleu_mean", "comet_mean_score"]]

Workflows still running ...
Workflows still running ...
Workflows still running ...
Workflows still running ...
Workflows still running ...
Workflows still running ...
Workflows still running ...
Workflows still running ...
Workflows still running ...
Workflows still running ...
Workflows still running ...


Unnamed: 0,meteor_meteor_mean,bleu_bleu_mean,comet_mean_score
gpt-4o-mini,0.644,0.37,0.647
qwen2.5,0.567,0.309,0.607
facebook/m2m100_418M,0.532,0.259,0.455
Helsinki-NLP/opus-mt-en-de,0.602,0.223,0.585


In [13]:
output_texts_per_model = {
    wr.model: pd.read_json(wr.artifacts_download_url)["artifacts"][
        ["examples", "ground_truth", "predictions"]
    ].to_dict()
    for wr in workflow_results
}
for model_name, df_texts in output_texts_per_model.items():
    print(f"Model: {model_name}")
    display(pd.DataFrame(df_texts).head(2))

Model: gpt-4o-mini


Unnamed: 0,examples,ground_truth,predictions
0,"Siso's depictions of land, water center new gallery exhibition",Sisos Darstellungen von Land und Wasser in neuer Ausstellung,Sisos Darstellungen von Land und Wasser stehen im Mittelpunkt der neuen Galerieausstellung.
1,"""People Swimming in the Swimming Pool"" from 2022 is one Vicente Siso artwork that will display at Tierra del Sol Gallery beginning Jan. 13. (photo courtesy of Vicente Siso)","„People Swimming in the Swimming Pool“ aus dem Jahr 2022 ist ein Kunstwerk von Vicente Siso, das ab dem 13. Januar in der Galerie „Tierra del Sol“ ausgestellt wird (Foto bereitgestellt von Vicente Siso)","""Menschen, die im Schwimmbad schwimmen"" aus dem Jahr 2022 ist ein Kunstwerk von Vicente Siso, das ab dem 13. Januar in der Tierra del Sol Gallery ausgestellt wird. (Foto mit freundlicher Genehmigung von Vicente Siso)"


Model: qwen2.5


Unnamed: 0,examples,ground_truth,predictions
0,"Siso's depictions of land, water center new gallery exhibition",Sisos Darstellungen von Land und Wasser in neuer Ausstellung,Sisos Darstellungen von Land und Wasser bilden den Mittelpunkt der neuen Galerieausstellung
1,"""People Swimming in the Swimming Pool"" from 2022 is one Vicente Siso artwork that will display at Tierra del Sol Gallery beginning Jan. 13. (photo courtesy of Vicente Siso)","„People Swimming in the Swimming Pool“ aus dem Jahr 2022 ist ein Kunstwerk von Vicente Siso, das ab dem 13. Januar in der Galerie „Tierra del Sol“ ausgestellt wird (Foto bereitgestellt von Vicente Siso)","""Leute, die im Schwimmbad schwimmen,"" von 2022 ist ein Werk des Künstlers Vicente Siso, das ab dem 13. Januar im Tierra del Sol Gallery ausgestellt sein wird. (Foto courtoisie von Vicente Siso)"


Model: facebook/m2m100_418M


Unnamed: 0,examples,ground_truth,predictions
0,"Siso's depictions of land, water center new gallery exhibition",Sisos Darstellungen von Land und Wasser in neuer Ausstellung,"Siso's Abbildungen von Land, Wasserzentrum neue Galerie Ausstellung"
1,"""People Swimming in the Swimming Pool"" from 2022 is one Vicente Siso artwork that will display at Tierra del Sol Gallery beginning Jan. 13. (photo courtesy of Vicente Siso)","„People Swimming in the Swimming Pool“ aus dem Jahr 2022 ist ein Kunstwerk von Vicente Siso, das ab dem 13. Januar in der Galerie „Tierra del Sol“ ausgestellt wird (Foto bereitgestellt von Vicente Siso)","""People Swimming in the Swimming Pool"" von 2022 ist ein Kunstwerk von Vicente Siso, das am 13. Januar in der Tierra del Sol Gallery ausgestellt wird (Foto von Vicente Siso)"


Model: Helsinki-NLP/opus-mt-en-de


Unnamed: 0,examples,ground_truth,predictions
0,"Siso's depictions of land, water center new gallery exhibition",Sisos Darstellungen von Land und Wasser in neuer Ausstellung,"Siso's Darstellungen von Land, Wasserzentrum neue Galerie Ausstellung"
1,"""People Swimming in the Swimming Pool"" from 2022 is one Vicente Siso artwork that will display at Tierra del Sol Gallery beginning Jan. 13. (photo courtesy of Vicente Siso)","„People Swimming in the Swimming Pool“ aus dem Jahr 2022 ist ein Kunstwerk von Vicente Siso, das ab dem 13. Januar in der Galerie „Tierra del Sol“ ausgestellt wird (Foto bereitgestellt von Vicente Siso)","""People Swimming in the Swimming Pool"" von 2022 ist ein Vicente Siso Kunstwerk, das in der Tierra del Sol Gallery ab dem 13. Januar ausgestellt wird. (Foto mit freundlicher Genehmigung von Vicente Siso)"


### Models Endpoint
- List models supported for task(s)

In [14]:
### Get the list of models supported for translation
models_response = client.models.get_suggested_models(tasks=["translation"])
for model_config in models_response.items:
    print(model_config.model)

facebook/m2m100_418M
facebook/m2m100_1.2B
Helsinki-NLP/opus-mt
bigscience/mt0-base
bigscience/mt0-large
bigscience/mt0-xl
gpt-4o-mini
gpt-4o
deepseek-reasoner
deepseek-chat
ministral-8b-latest
mistralai/Mistral-7B-Instruct-v0.2
