# Transliteration "the" and "a"
---
This notebook provides code that specifically handles the transliteration of the words "the" and "a", as their pronunciation can vary depending on the words that follow them in a sentence. The transliterations for these words are provided in advance for the target languages. To run this code, you need access to the OpenAI API. Visit [OpenAI's website](https://openai.com/index/openai-api/) to purchase the required quotas. Once you have your API credentials, put them in the following cell: your API key (`api_key`) and API Base Link (`api_base`).

---

In [None]:
#####################################################
############### API Key of Elevenlabs ###############
#####################################################

# api_key = "sk_..."
# api_base = ""

#####################################################
#####################################################
#####################################################

import warnings
warnings.filterwarnings("ignore")

from phonemizer import phonemize
import pandas as pd
import collections
import numpy as np
from tqdm import tqdm
import glob
import os
from openai import OpenAI

import sys
sys.path.append('../pyfiles/')
from normalizer import EnglishTextNormalizer
from postprocessing import get_json_result, CheckResultValidity, PostprocessTransliteration, GetResult
from gpt import gpt_api_no_stream, GetLLMPrompt


normalizer = EnglishTextNormalizer()
client = OpenAI(api_key=api_key, base_url=api_base)

adds = {
    "zhi": ["the", ["ðɪ"]],
    "za": ["the pineapple", ["ðə", "pˈaɪnæpəl"]],
    "ah": ["a little awkward", ["ɐ","lˈɪɾəl","ˈɔːkwɚd"]],
}

---
# Transliterate Multiple Texts
---

In this example, we will transliterate multiple English sentences using a GPT model. To improve the reliability of the results, the code generates several transliteration responses for each sentence. Adjust the following variables as needed:

- `language`: A string specifying the target language for transliteration. The supported options are "Hindi", "Korean", and "Japanese".
- `gptmodel`: A string that indicates which GPT model to use. The available options include "gpt-3.5", "gpt-4omini", "gpt-4o", and "gpt-o1mini". You can add or modify the list of models by editing the file `MacST-project-page/pyfiles/gpt.py`.
- `savedir` : A string that specifies the directory where all transliteration responses will be saved.
- `repeatnum`: An integer that sets the number of responses (transliterations) to generate for each sentence.
- `reset_response`: A boolean that determines whether to re-generate the transliteration responses, even if previous responses exist in `savedir`.

---

In [None]:
###########################################
########## Adjustable Parameters ##########
###########################################

language = "Hindi"
gptmodel = "gpt-4omini"
savedir = f"./responses_the_a/{language}/"
repeatnum = 5 # Increase this number for more reliable transliteration
reset_response = False

###########################################
###########################################
###########################################

sentence_list = {key: adds[key][0] for key in adds}

# Save the valid responses
for key in sentence_list:
    print(key)
    exist_length = len(glob.glob(savedir+f"postprocessing_{key}_*.npy"))
    if not(reset_response) and exist_length>=repeatnum:
        continue
    sentence = sentence_list[key]
    inputtext = normalizer(sentence)
    prompt = GetLLMPrompt(inputtext, language, phonemized=adds[key][1])
    
    for r in tqdm(range(repeatnum)):
        savepath = savedir + f"postprocessing_{key}_{r}.npy"
        if not(reset_response) and os.path.exists(savepath):
            continue
        result = GetResult(client, prompt, gptmodel, inputtext, normalizer, display_print=False)
        os.makedirs(os.path.dirname(savepath), exist_ok=True)
        np.save(savepath, result)