# Translation API `v0.0.1` Examples

This notebook uses the `requests` library to interact with a previously-deployed translation API service. It demonstrates how to send text data for translation and receive the translated output.

This exploration can alternatively be done on a web browser using the automatically-generated Swagger UI documentation available at the `/docs` endpoint of the deployed API.

## Table of Contents

1. [Root Endpoint](#1.-Root-Endpoint)
2. [Health Endpoint](#2.-Health-Endpoint)
3. [Models Endpoint](#3.-Models-Endpoint)
   - [3.1 Basic response](#3.1-Basic-response)
   - [3.2 Detailed response](#3.2-Detailed-response)
4. [Prediction Endpoint](#4.-Prediction-Endpoint)
   - [4.1 Single Translation with pre-loaded model and basic parameters](#4.1-Single-Translation-with-pre-loaded-model-and-basic-parameters)
   - [4.2 Single Translation with model downloading and basic parameters](#4.2-Single-Translation-with-model-downloading-and-basic-parameters)
   - [4.3 Sending Multiple Items at once](#4.3-Sending-Multiple-Items-at-once)
   - [4.4 Single Translation with advanced parameters](#4.4-Single-Translation-with-advanced-parameters)
   - [4.5 Endpoint failure case](#4.5-Endpoint-failure-case)

In [14]:
# get required libraries
import json
import time
import requests

# define constants
base_api_url = 'http://0.0.0.0:8000/'

## 1. Root Endpoint

The root endpoint (`/`) provides basic information about the API.

In [16]:
start_time = time.time()
response = requests.get(base_api_url)
end_time = time.time()

print("Request time (s):", round(end_time - start_time, 3))
print("Response status code:", response.status_code)
try:
    content = json.loads(response.content)
    print("Response content:", json.dumps(content, indent=4))
except json.JSONDecodeError:
    print("Response content is not valid JSON:", response.content)

Request time (s): 0.006
Response status code: 200
Response content: {
    "name": "Translation API",
    "version": "v0.0.1",
    "description": "API for text translation using pre-trained Transformer models."
}


## 2. Health Endpoint

The health endpoint (`/health`) returns a simple response to check the API is up and running.

In [17]:
health_url = base_api_url + 'health'

start_time = time.time()
response = requests.get(health_url)
end_time = time.time()
print("Request time (s):", round(end_time - start_time, 3))
print("Response status code:", response.status_code)
try:
    content = json.loads(response.content)
    print("Response content:", json.dumps(content, indent=4))
except json.JSONDecodeError:
    print("Response content is not valid JSON:", response.content)

Request time (s): 0.006
Response status code: 200
Response content: {
    "status": "ok"
}


## 3. Models Endpoint

From here on, things get more interesting. The `/models` endpoint allows you to check on the translation models that are currently available (downloaded and enabled) in the API for each translation pair.

This endpoint includes an optional query parameter, `return_model_config`, which when set to `true` will return additional metadata about each model from its configuration file.

### 3.1 Basic response

In [18]:
return_model_config = False
models_url = base_api_url + 'models' + ('?return_model_config=true' if return_model_config else '')

start_time = time.time()
response = requests.get(models_url)
end_time = time.time()

print("Request time (s):", round(end_time - start_time, 3))
print("Response status code:", response.status_code)
try:
    content = json.loads(response.content)
    print("Response content:", json.dumps(content, indent=4))
except json.JSONDecodeError:
    print("Response content is not valid JSON:", response.content)

Request time (s): 0.009
Response status code: 200
Response content: {
    "models": {
        "en-fr": {
            "model_name": "Helsinki-NLP/opus-mt-en-fr",
            "file_type": "ONNX",
            "config": null
        },
        "en-es": {
            "model_name": "Helsinki-NLP/opus-mt-en-es",
            "file_type": "ONNX",
            "config": null
        }
    }
}


The basic response returns for each translation pair:
* Hugging Face model name
* Storage file type, which is always "ONNX"

### 3.2 Detailed response

In [19]:
return_model_config = True
models_url = base_api_url + 'models' + ('?return_model_config=true' if return_model_config else '')

start_time = time.time()
response = requests.get(models_url)
end_time = time.time()

print("Request time (s):", round(end_time - start_time, 3))
print("Response status code:", response.status_code)
try:
    content = json.loads(response.content)
    print("Response content:", json.dumps(content, indent=4))
except json.JSONDecodeError:
    print("Response content is not valid JSON:", response.content)

Request time (s): 0.008
Response status code: 200
Response content: {
    "models": {
        "en-fr": {
            "model_name": "Helsinki-NLP/opus-mt-en-fr",
            "file_type": "ONNX",
            "config": {
                "_num_labels": 3,
                "activation_dropout": 0.0,
                "activation_function": "swish",
                "add_bias_logits": false,
                "add_final_layer_norm": false,
                "architectures": [
                    "MarianMTModel"
                ],
                "attention_dropout": 0.0,
                "bos_token_id": 0,
                "classif_dropout": 0.0,
                "classifier_dropout": 0.0,
                "d_model": 512,
                "decoder_attention_heads": 8,
                "decoder_ffn_dim": 2048,
                "decoder_layerdrop": 0.0,
                "decoder_layers": 6,
                "decoder_start_token_id": 59513,
                "decoder_vocab_size": 59514,
                "dropout":

In addition to the elements in the basic response, the detailed response includes a `config` key, which contains the full model configuration as specified in the model's configuration file.

## 4. Prediction Endpoint

The prediction endpoint is the core functionality of the API. It allows you to send text data for translation and receive the translated output.

In contrast to the previous endpoints, this one requires a POST request with a JSON payload containing the text to be translated and the desired translation pair, among other optional parameters.

By default, before initializing the API service some of the models available for translation will be pre-emptively downloaded locally so they may be used quickly, but not all of them, in order to conserve volume space. This behavior can be controlled using the `STARTUP_MODEL_LOADING_LIMIT` variable in `config.py`.

in this case, the value was set to 2, which means the models for the first two translation pairs within `AVAILABLE_TRANSLATIONS`, 'en-fr' and 'en-es', were downloaded before starting the API service.

### 4.1 Single Translation with pre-loaded model and basic parameters

This first example shows how to perform a single translation to a pre-loaded model using only the required parameters: `source`, `target`, and `text`.

The first request should take longer than subsequent requests, because although the model is already downloaded, it still needs to be loaded into memory for inference.

First request:

In [24]:
prediction_url = base_api_url + 'predict'

payload = {
    "items": [{
        "source": "en",
        "target": "fr",
        "text": "The boy with fair hair lowered himself down in the last few feet of rocky ground and began to pick his way toward the lagoon."
    }]
}

start_time = time.time()
response = requests.post(prediction_url, json=payload)
end_time = time.time()

print("Request time (s):", round(end_time - start_time, 3))
print("Response status code:", response.status_code)
try:
    content = json.loads(response.content)
    print("Response content:", json.dumps(content, indent=4))
except json.JSONDecodeError:
    print("Response content is not valid JSON:", response.content)

Request time (s): 1.56
Response status code: 200
Response content: {
    "results": [
        {
            "position": 0,
            "result": "Le gar\u00e7on aux cheveux justes s'est abaiss\u00e9 dans les derniers pieds de terre rocheuse et a commenc\u00e9 \u00e0 prendre son chemin vers la lagune."
        }
    ]
}


Second request:

In [25]:
prediction_url = base_api_url + 'predict'

payload = {
    "items": [{
        "source": "en",
        "target": "fr",
        "text": "Ralph paddled backwards down the slope, immersed his mouth, and blew a jet of water into the air."
    }]
}

start_time = time.time()
response = requests.post(prediction_url, json=payload)
end_time = time.time()

print("Request time (s):", round(end_time - start_time, 3))
print("Response status code:", response.status_code)
try:
    content = json.loads(response.content)
    print("Response content:", json.dumps(content, indent=4))
except json.JSONDecodeError:
    print("Response content is not valid JSON:", response.content)

Request time (s): 0.413
Response status code: 200
Response content: {
    "results": [
        {
            "position": 0,
            "result": "Ralph pagaie \u00e0 l'envers sur la pente, plonge sa bouche et souffle un jet d'eau dans l'air."
        }
    ]
}


### 4.2 Single Translation with model downloading and basic parameters

In this example, a translation request will be made to a model that isn't yet loaded. Latency for the first request should be very high as the model is being downloaded and then loaded into memory for inference, and should drop for subsequent requests.

First request:

In [28]:
payload = {
    "items": [{
        "source": "en",
        "target": "de",
        "text": "A storm of laughter arose and even the tiniest child joined in."
    }]
}

start_time = time.time()
response = requests.post(prediction_url, json=payload)
end_time = time.time()

print("Request time (s):", round(end_time - start_time, 3))
print("Response status code:", response.status_code)
try:
    content = json.loads(response.content)
    print("Response content:", json.dumps(content, indent=4))
except json.JSONDecodeError:
    print("Response content is not valid JSON:", response.content)

Request time (s): 20.938
Response status code: 200
Response content: {
    "results": [
        {
            "position": 0,
            "result": "Es entstand ein Sturm des Lachens und sogar das kleinste Kind schloss sich an."
        }
    ]
}


Second request:

In [29]:
payload = {
    "items": [{
        "source": "en",
        "target": "de",
        "text": "You're no good on a job like this."
    }]
}

start_time = time.time()
response = requests.post(prediction_url, json=payload)
end_time = time.time()

print("Request time (s):", round(end_time - start_time, 3))
print("Response status code:", response.status_code)
try:
    content = json.loads(response.content)
    print("Response content:", json.dumps(content, indent=4))
except json.JSONDecodeError:
    print("Response content is not valid JSON:", response.content)

Request time (s): 0.322
Response status code: 200
Response content: {
    "results": [
        {
            "position": 0,
            "result": "Du bist in so einem Job nicht gut."
        }
    ]
}


### 4.3 Sending Multiple Items at once

The API also supports sending more than one item to be translated in a single request in different languages. This is done by including multiple items in the `items` list of the JSON payload.

In [32]:
payload = {
    "items": [
        {
            "source": "fr",
            "target": "es",
            "text": "Le premier rythme auquel ils s'habituèrent fut le lent balancement de l'aube au crépuscule rapide."
        },
        {
            "source": "en",
            "target": "es",
            "text": "Keep the fire going."
        },
        {
            "source": "es",
            "target": "de",
            "text": "¡No existe la bestia!"
        }
    ]
}

start_time = time.time()
response = requests.post(prediction_url, json=payload)
end_time = time.time()

print("Request time (s):", round(end_time - start_time, 3))
print("Response status code:", response.status_code)
try:
    content = json.loads(response.content)
    print("Response content:", json.dumps(content, indent=4))
except json.JSONDecodeError:
    print("Response content is not valid JSON:", response.content)

Request time (s): 0.512
Response status code: 200
Response content: {
    "results": [
        {
            "position": 0,
            "result": "El primer ritmo al que se acostumbraron fue el lento balanceo del amanecer al atardecer r\u00e1pido."
        },
        {
            "position": 1,
            "result": "Mantenga el fuego encendido."
        }
    ]
}


The translation pair for the third item, `es-de`, wasn't included in the original `AVAILABLE_TRANSLATIONS` list, so the individual request failed and the translation wasn't included in the response.

### 4.4 Single Translation with advanced parameters

Lastly, the API supports advanced parameters to pass onto the translation model, which is ultimately a transformer-type neural network. These parameters are:
* `max_length`: Maximum length of the generated translation (in tokens).
* `num_beams`: Number of beams for beam search, aka the number of parallel hypotheses to consider during decoding. At the end, the beam (sequence) with the highest overall probability is selected as the final output.
* `early_stopping`: Whether to stop the beam search when at least `num_beams` sentences are finished per batch item.

In [33]:
payload = {
    "items": [{
        "source": "en",
        "target": "es",
        "text": "Maybe there is a beast, what I mean is... maybe it's only us.",
        "max_length": 200,
        "num_beams": 10,
        "early_stopping": True
    }]
}

start_time = time.time()
response = requests.post(prediction_url, json=payload)
end_time = time.time()

print("Request time (s):", round(end_time - start_time, 3))
print("Response status code:", response.status_code)
try:
    content = json.loads(response.content)
    print("Response content:", json.dumps(content, indent=4))
except json.JSONDecodeError:
    print("Response content is not valid JSON:", response.content)

Request time (s): 0.51
Response status code: 200
Response content: {
    "results": [
        {
            "position": 0,
            "result": "Tal vez hay una bestia, lo que quiero decir es... tal vez s\u00f3lo somos nosotros."
        }
    ]
}


### 4.5 Endpoint failure case

The API will return a 500 error if all of the translation requests within the batch failed without breaking the application. In this case, the response failed because the translation pair `en-pt` is not included in the `AVAILABLE_TRANSLATIONS` list.

In [35]:
payload = {
    "items": [{
        "source": "en",
        "target": "pt",
        "text": "Maybe there is a beast, what I mean is... maybe it's only us."
    }]
}

start_time = time.time()
response = requests.post(prediction_url, json=payload)
end_time = time.time()

print("Request time (s):", round(end_time - start_time, 3))
print("Response status code:", response.status_code)
try:
    content = json.loads(response.content)
    print("Response content:", json.dumps(content, indent=4))
except json.JSONDecodeError:
    print("Response content is not valid JSON:", response.content)

Request time (s): 0.012
Response status code: 500
Response content: {
    "detail": "All translation attempts failed."
}
