# Translation with OpenAI's API
This notebook demonstrates how to use OpenAI's API to translate text from one language to another. The API can be used to translate text between any pair of languages supported by the model. In this notebook, we will use the API to translate text from English to another language.

## References
- [OpenAI API](https://openai.com/index/openai-api/)

## Requirements
- OpenAI Python package
- [OpenAI API key](https://platform.openai.com/docs/guides/authentication)

## [Optional] Use a virtual environment
It is recommended to use a virtual environment to manage the dependencies of your project. You can create a virtual environment using `venv` or `conda`. Here is an example using `conda`:
```bash
# Create a new virtual environment
conda create -n openai python=3.10

# Activate the virtual environment
conda activate openai

# Install the required packages
pip install openai
```

Otherwise, you can install the required packages globally using `pip`:
```bash
pip install openai
```

In [1]:
# Importing the required libraries

import json
import os
import tqdm
from openai import OpenAI

In [2]:
# NOTE: The API key is required to access the OpenAI API
# You can get the API key from the OpenAI dashboard
# Remember to keep your API key secret!
API_KEY = None

# If the API key is not defined, then we will try to get it from the environment variables
if not API_KEY:
    API_KEY = os.getenv("OPENAI_API_KEY")

# If the API key is still not defined, then we will raise an exception
if not API_KEY:
    raise Exception("API Key is not defined")

In [3]:
# Model to use for the translation
SYSTEM_NAME = "gpt-4o-mini-2024-07-18"

# If you want to use a more powerful (and more expensive) model, then you can use the following model
# SYSTEM_NAME = "gpt-4o-2024-08-06"

# Source language (language to translate from); default is English
SOURCE_LANGUAGE = "English"

# Target language (language to translate to); default is all languages
TARGET_LANGUAGES = [
    "Arabic",
    "Chinese (Traditional)",
    "French",
    "German",
    "Italian",
    "Japanese",
    "Korean",
    "Spanish",
    "Thai",
    "Turkish",
]

# Data directory
DATA_DIR = "../data"
SPLIT = "validation"

# Mapping language to language code
LANGUAGES = {
    "Arabic": "ar_AE",
    "English": "en_US",
    "French": "fr_FR",
    "German": "de_DE",
    "Italian": "it_IT",
    "Japanese": "ja_JP",
    "Korean": "ko_KR",
    "Thai": "th_TH",
    "Turkish": "tr_TR",
    "Spanish": "es_ES",
    "Chinese (Traditional)": "zh_TW",
}

In [4]:
def load_references(path: str) -> dict:
    """
    Load the text to translate from the given path.

    Args:
        path (str): The path to the file containing the text to translate.

    Returns:
        references (dict): A dictionary containing the text to translate.
    """
    references = {}

    with open(path, "r") as f:

        for line in f:
            data = json.loads(line)
            references[data["id"]] = data

    print(f"Loaded {len(references)} references from {path}")
    return references

In [5]:
def save_predictions(predictions, path):
    """
    Save the predictions to the given path.

    Args:
        predictions (dict): The predictions to save.
        path (str): The path to save the predictions.
    """
    os.makedirs(os.path.dirname(path), exist_ok=True)

    with open(path, "w") as f:

        for prediction in predictions.values():
            f.write(json.dumps(prediction, ensure_ascii=False) + "\n")

In [6]:
def translate_text(
    client: OpenAI,
    text: str,
    source_language: str,
    target_language: str,
    max_retries: int = 3,
) -> str:
    """
    Translate the given text from the source language to the target language.

    Args:
        client (OpenAI): The OpenAI client.
        text (str): The text to translate.
        source_language (str): The source language.
        target_language (str): The target language.
        max_retries (int): The maximum number of retries.

    Returns:
        translation (str): The translated text.
    """
    retries = 0
    while retries < max_retries:
        try:
            response = client.chat.completions.create(
                model=SYSTEM_NAME,
                messages=[
                    {
                        "role": "system",
                        "content": f"You are an expert translator. Translate from {source_language} to {target_language}. Only provide the translation without explanations.",
                    },
                    {"role": "user", "content": text},
                ],
            )
            return response.choices[0].message.content.strip()

        except Exception as e:
            retries += 1
            if retries == max_retries:
                print(f"Failed to translate text after {max_retries} attempts: {e}")
                return None
            print(f"Attempt {retries} failed: {e}. Retrying...")

In [7]:
# Initialize the OpenAI client
client = OpenAI(api_key=API_KEY)

In [8]:
# Translate the text to each target language
for target_language in TARGET_LANGUAGES:
    print(f"Translating to {target_language}...")

    # The path to the references is formatted as follows:
    # data/references/{split}/{target_language}.jsonl
    path_to_references = os.path.join(
        DATA_DIR,
        "references",
        SPLIT,
        f"{LANGUAGES[target_language]}.jsonl",
    )

    # The path to the predictions is formatted as follows:
    # data/predictions/{system_name}/{split}/{target_language}.jsonl
    path_to_predictions = os.path.join(
        DATA_DIR,
        "predictions",
        SYSTEM_NAME,
        SPLIT,
        f"{LANGUAGES[target_language]}.jsonl",
    )

    # Load the references
    references = load_references(path_to_references)

    # Translate the text
    predictions = {}

    for id, reference in tqdm.tqdm(references.items()):
        source_text = reference["source"]
        prediction = translate_text(
            client, source_text, SOURCE_LANGUAGE, target_language
        )

        if prediction is None:
            print(f"Failed to translate text for id {id}. Skipping...")
            continue

        predictions[id] = {
            "id": id,
            "source_language": SOURCE_LANGUAGE,
            "target_language": target_language,
            "text": source_text,
            "prediction": prediction,
        }

    # Save the predictions
    save_predictions(predictions, path_to_predictions)
    print(f"Saved {len(predictions)} predictions to {path_to_predictions}")

Translating to Arabic...
Loaded 722 references from ../data/references/validation/ar_AE.jsonl


100%|██████████| 722/722 [07:29<00:00,  1.61it/s]


Saved 722 predictions to ../data/predictions/gpt-4o-2024-08-06/validation/ar_AE.jsonl
Translating to Chinese (Traditional)...
Loaded 722 references from ../data/references/validation/zh_TW.jsonl


100%|██████████| 722/722 [07:50<00:00,  1.53it/s]


Saved 722 predictions to ../data/predictions/gpt-4o-2024-08-06/validation/zh_TW.jsonl
Translating to French...
Loaded 724 references from ../data/references/validation/fr_FR.jsonl


100%|██████████| 724/724 [07:24<00:00,  1.63it/s]


Saved 724 predictions to ../data/predictions/gpt-4o-2024-08-06/validation/fr_FR.jsonl
Translating to German...
Loaded 731 references from ../data/references/validation/de_DE.jsonl


100%|██████████| 731/731 [07:51<00:00,  1.55it/s]


Saved 731 predictions to ../data/predictions/gpt-4o-2024-08-06/validation/de_DE.jsonl
Translating to Italian...
Loaded 730 references from ../data/references/validation/it_IT.jsonl


100%|██████████| 730/730 [07:41<00:00,  1.58it/s]


Saved 730 predictions to ../data/predictions/gpt-4o-2024-08-06/validation/it_IT.jsonl
Translating to Japanese...
Loaded 723 references from ../data/references/validation/ja_JP.jsonl


100%|██████████| 723/723 [08:28<00:00,  1.42it/s]


Saved 723 predictions to ../data/predictions/gpt-4o-2024-08-06/validation/ja_JP.jsonl
Translating to Korean...
Loaded 745 references from ../data/references/validation/ko_KR.jsonl


100%|██████████| 745/745 [08:28<00:00,  1.47it/s]


Saved 745 predictions to ../data/predictions/gpt-4o-2024-08-06/validation/ko_KR.jsonl
Translating to Spanish...
Loaded 739 references from ../data/references/validation/es_ES.jsonl


100%|██████████| 739/739 [08:13<00:00,  1.50it/s]


Saved 739 predictions to ../data/predictions/gpt-4o-2024-08-06/validation/es_ES.jsonl
Translating to Thai...
Loaded 710 references from ../data/references/validation/th_TH.jsonl


100%|██████████| 710/710 [15:08<00:00,  1.28s/it]    


Saved 710 predictions to ../data/predictions/gpt-4o-2024-08-06/validation/th_TH.jsonl
Translating to Turkish...
Loaded 732 references from ../data/references/validation/tr_TR.jsonl


100%|██████████| 732/732 [07:57<00:00,  1.53it/s]

Saved 732 predictions to ../data/predictions/gpt-4o-2024-08-06/validation/tr_TR.jsonl



