# Audio Transcription and Translation Project

In this project, I have built a system that bridges language barriers: an automatic voice message translation system. Utilizing OpenAI's Whisper API for state-of-the-art speech-to-text capabilities and the ChatCompletion API for accurate text translation, I have created an end-to-end solution that can translate any voice message into a chosen language.
   I have chosen audio from PM Modi's "Man ki Baat' program since it is familiar to most people.

## APIs used

- **Whisper API for Speech Recognition**: OpenAI's Whisper API to convert speech from voice messages into text.
- **ChatCompletion API for Translation**: ChatCompletion API to translate the transcribed text into the desired language.



# 2. Libraries import

First, we will install openai and python-dotenv libraries. Openai for accesssing openai apis and python-dotenv to read in openai api secret key.

In [None]:
!pip install openai
!pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1


In [None]:
import os
import openai
import random

from openai import OpenAI
from dotenv import load_dotenv

# 3. Sending a first request to OpenAI API


### 3.1 Setting up API Key

We will check whether the Opeai api key has been read. The key is kept in a .env file.

In [None]:
# os.environ["OPENAI_API_KEY"] = "sk-XXXXXXXXXXXXX"
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")

client = OpenAI()

API key looks good so far


# 4. Processing Audio files with Whisper

Now, we will read the desired audio file in binary format and pass this file as a parameter to Whisper transcription api to generate the transcript. The output fromat has been chosen as 'vtt' since it also provides the timeline when each word has been spoken.

In [None]:
audio_file = open("Mann_Ki_Baat_January_2025.mp3", "rb")

transcription = client.audio.transcriptions.create(
  model="whisper-1",
  file=audio_file,
  response_format="vtt"
)

In [None]:
print(transcription)

WEBVTT

00:00:00.000 --> 00:00:10.000
मेरे प्यारे देशवासियों, नमस्कार, आज 2025 की पहली मन की बात हो रही है।

00:00:10.000 --> 00:00:20.000
आप लोगों ने एक बात जरूर नोटिस की होगी, हर बार मन की बात महिने के आखरी रविवार को होती है।

00:00:20.000 --> 00:00:28.000
लेकिन इस बार हम एक सप्ता पहले चोथे रविवार के बेजाए तीसरे रविवार को ही मिल रहे हैं।

00:00:28.000 --> 00:00:35.000
क्योंकि अगले सप्ता रविवार के दिन ही गणतंत्र दिवस है।

00:00:35.000 --> 00:00:41.000
मैं सभी देशवासियों को गणतंत्र दिवस की अग्रिम शुब्कामनाय देता हूँ।

00:00:42.000 --> 00:00:47.000
साथियों, इस बार का गणतंत्र दिवस बहुत विशेश है।

00:00:47.000 --> 00:00:52.000
ये भारतिय गणतंत्र की 75 वी वर्षगांठ है।

00:00:52.000 --> 00:00:58.000
इस वर्ष सम्विधान लागु होने के 75 साल हो रहे हैं।

00:00:58.000 --> 00:01:04.000
मैं सम्विधान सभा के उन सभी महान व्यक्तितों को नमन करता हूं,

00:01:04.019 --> 00:01:09.000
जिनोंने हमें हमारा पवितर सम्वधान दिया।

00:01:09.000 --> 00:01:16.000
सम्विधान सभा के दोरान अनेक विशें पर लम्बी-लम्बी चर्चायें

## Audio transcription

Here the same audio file is now used to translate the generated text from Hindi to English language

In [None]:
audio_file = open("Mann_Ki_Baat_January_2025.mp3", "rb")

transcription = client.audio.translations.create(
  model="whisper-1",
  file=audio_file
)

In [None]:
transcription.text

"My dear countrymen, Namaskar. Today is the first Man Ki Baat of 2025. You must have noticed that every time Man Ki Baat happens on the last Sunday of the month. But this time, we are meeting on the third Sunday instead of the fourth Sunday a week ago. Because next week, on Sunday, is the day of the Gantantra. I wish all the countrymen a very happy Gantantra day. Friends, this year's Gantantra day is very special. This is the 75th anniversary of the Indian Gantantra. This year is the 75th anniversary of the implementation of the Constitution. I pay homage to all the great personalities of the Constitutional Council who gave us our sacred Constitution. During the Constitutional Council, there were long discussions on many issues. The discussions, the thoughts of the members of the Constitutional Council, their speeches are our great heritage. Today, in Man Ki Baat, I am trying to give you the original voice of some great leaders. Friends, when the Constitutional Council started its work

## Translating to any language using ChatGPT and Whisper



Here, we will use openai's chat completion api to translate the text in English to another language (French). The temperature and tokens have been kept on the higher side since translations from one language to another are not usually one to one.

In [None]:
target_language = 'French'
messages = [{"role": "system", "content": """I want you to act as an algorithm for translation to language {}. System will provide you with a text, and your only task is to translate it to {}. Never break character.""".format(target_language, target_language)}]
messages.append({"role": "user", "content": transcription.text})


In [None]:
translation_response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=messages,
    max_tokens=2000,
    temperature=0.9
)


In [None]:
translation_response.choices[0].message.content

'Chers compatriotes, Namaskar. Aujourd\'hui est le premier Man Ki Baat de 2025. Vous avez sûrement remarqué que chaque fois que Man Ki Baat a lieu le dernier dimanche du mois. Mais cette fois, nous nous retrouvons le troisième dimanche au lieu du quatrième dimanche il y a une semaine. Car la semaine prochaine, c\'est le jour du Gantantra. Je souhaite à tous les compatriotes un très joyeux jour de Gantantra. Amis, le jour de Gantantra de cette année est très spécial. Il s\'agit du 75ème anniversaire du Gantantra indien. Cette année marque le 75ème anniversaire de la mise en œuvre de la Constitution. J\'hommage à toutes les grandes personnalités du Conseil constitutionnel qui nous ont donné notre Constitution sacrée. Lors du Conseil constitutionnel, de longues discussions ont eu lieu sur de nombreux sujets. Les discussions, les pensées des membres du Conseil constitutionnel, leurs discours sont notre grand patrimoine. Aujourd\'hui, dans Man Ki Baat, j\'essaie de vous transmettre la voix 