### Voice Translation of Audio files 

Ever wanted to translate a podcast into your native language? Translating and dubbing podcasts can make content more accessible to a global audience. This guide will walk you through the process of converting an English audio file into Hindi using OpenAI APIs. We will cover transcription of the audio, translation into target script, text-to-speech conversion into target language, and benchmarking to ensure quality and accuracy.

A note on semantics used in this Cookbook regarding **Language** and **Script**. These words are generally used interchangeably, though it's important to understand the distinction given the task at hand. 
- **Language** refers to the spoken or written system of communication. For instance, Hindi and Marathi are different languages, but both use the Devanagari script. English and French are different languages, but are written in Latin script. 
- **Script** refers to the set of characters or symbols used to write the language. For example, Serbian language traditionally written in Cyrillic Script, is also written in Latin script.


In this cookbook we will walk through steps to dub an audio podcast from English to Hindi. Overall steps are:    

1. **Transcribe** the audio file into text with Whisper. 
2. **Translate** the English language text to Hindi in Devanagari script    
3. **Text-to-speech** conversion of the Devanagari script into spoken Hindi language 
4. **Translation benchmarking** (BLEU or ROUGE) 


**Step 1: Transcribe the audio file into text with Whisper**

Whisper is an open-source automatic speech recognition (ASR) system developed by OpenAI. It can transcribe audio files with high accuracy across multiple languages. 


![Text-to-speech](./images/Whisper.png)

[Learn more about Whisper here](https://platform.openai.com/docs/guides/speech-to-text) 


In [12]:
from openai import OpenAI
client = OpenAI()

audio_file = "../gpt4o/data/keynote_recap.mp3"

audio_file= open(audio_file, "rb")
transcription = client.audio.transcriptions.create(
  model="whisper-1",  
  file=audio_file
)
print(transcription.text)


Welcome to our first-ever OpenAI Dev Day. Today, we are launching a new model, GPT-4 Turbo. GPT-4 Turbo supports up to 128,000 tokens of context. We have a new feature called JSON mode, which ensures that the model will respond with valid JSON. You can now call many functions at once, and it'll do better at following instructions in general. You want these models to be able to access better knowledge about the world. So do we. So we're launching retrieval in the platform. You can bring knowledge from outside documents or databases into whatever you're building. GPT-4 Turbo has knowledge about the world up to April of 2023, and we will continue to improve that over time. Dolly 3, GPT-4 Turbo with Vision, and the new Text-to-Speech model are all going into the API today. Today, we're launching a new program called Custom Models. With Custom Models, our researchers will work closely with the company to help them make a great custom model, especially for them and their use case using our t

Note that the model transcribed "Dall-e" as "Dolly". This is a common issue with similar sounding words. It can be remediated by providing the correct transcription in the `prompt` parameter. 

In [15]:
audio_file = "../gpt4o/data/keynote_recap.mp3"

audio_file= open(audio_file, "rb")
transcription = client.audio.transcriptions.create(
  model="whisper-1", 
    prompt="Dall-e",   
  file=audio_file
)
print(transcription.text)

Welcome to our first ever OpenAI Dev Day. Today, we are launching a new model, GPT-4 Turbo. GPT-4 Turbo supports up to 128,000 tokens of context. We have a new feature called JSON mode, which ensures that the model will respond with valid JSON. You can now call many functions at once, and it will do better at following instructions in general. You want these models to be able to access better knowledge about the world. So do we. So we're launching retrieval in the platform. You can bring knowledge from outside documents or databases into whatever you're building. GPT-4 Turbo has knowledge about the world up to April of 2023, and we will continue to improve that over time. Dall-e 3, GPT-4 Turbo with Vision, and the new Text-to-Speech model are all going into the API today. Today, we're launching a new program called Custom Models. With Custom Models, our researchers will work closely with a company to help them make a great custom model, especially for them and their use case using our 

Now the model accurately transcribed word "Dall-e". 


**Step 2. Translate the English language text to Hindi in Devanagari script**

Next step is to transcribe the text from the given language, to the target language script. In this case we prompt the gpt-4o model to translate the text from English (written in Latin script) to Hindi (written in Devanagari script). 

For certain new words in a language, there may not be an available translation in the target language. In such cases, we prompt the model to retain the words in original language. We can also provide examples of words to keep in the original script (E.g., English) 


In [16]:
response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "system", "content": "You are an assistant that translates text from English to Hindi. User will provide content for you to translate. You may keep certain words in English for which a direct translation doesn't exist. Some words to keep in English include - Turbo, OpenAI, token, GPT, Dall-e, Python"},
    {"role": "user", "content": transcription.text},
  ]
)

message = response.choices[0].message.content

print(message)

हमारे पहले OpenAI Dev Day में आपका स्वागत है। आज, हम एक नया मॉडल, GPT-4 Turbo लॉन्च कर रहे हैं। GPT-4 Turbo 128,000 tokens तक के संदर्भ को सपोर्ट करता है। हमारे पास एक नया फीचर है जिसे JSON mode कहा जाता है, जो सुनिश्चित करता है कि मॉडल वैध JSON के साथ प्रतिक्रिया करेगा। आप अब कई functions को एक साथ कॉल कर सकते हैं, और यह सामान्य तौर पर निर्देशों का बेहतर पालन करेगा। आप चाहते हैं कि ये मॉडल विश्व के बारे में बेहतर ज्ञान तक पहुंच सकें। हम भी ऐसा ही चाहते हैं। इसलिए हम प्लेटफॉर्म में retrieval लॉन्च कर रहे हैं। आप बाहरी दस्तावेजों या डेटाबेस से ज्ञान को अपने जो भी निर्माण कर रहे हैं, उसमें ला सकते हैं। GPT-4 Turbo के पास अप्रैल 2023 तक दुनिया के बारे में ज्ञान है, और हम इसे समय के साथ सुधारते रहेंगे। Dall-e 3, GPT-4 Turbo with Vision, और नया Text-to-Speech मॉडल आज API में जा रहे हैं। आज, हम एक नया प्रोग्राम लॉन्च कर रहे हैं जिसे Custom Models कहा जाता है। Custom Models के साथ, हमारे शोधकर्ता कंपनी के साथ मिलकर काम करेंगे ताकि वे उनके लिए और उनके उपयोग के मामले के लिए एक शानदार कस्टम मॉडल

The transcribed text is a combination of Hindi and English languages, represented in their respective scripts Devanagari and Latin. 

**3. Text-to-speech conversion of the Devanagari script into spoken Hindi language**

OpenAI tts model can take the translated text as input and provide the spoken language output with native sounding Hindi, intermingled with native sounding English words, where appropriate.   

![Text-to-speech](./images/Text-to-speech.png)
 

[Learn more about tts model here](https://platform.openai.com/docs/guides/text-to-speech)

In [17]:
output_file_path = "./sounds/output.mp3"

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input=message,
)

response.stream_to_file(output_file_path)

  response.stream_to_file("output.mp3")


In [18]:
from playsound import playsound

playsound("output.mp3")

**Step 4. Translation benchmarking (e.g., BLEU or ROUGE)** 

